OEDP Automation Strategy
There are n steps to going from old paper encyclopedias to encyclopedia articles on Oldpedia (or other Encyclosphere sites).
- Scan the books: go from paper to image-only PDF. This has already been done for well over a hundred volumes, more than enough to keep us busy.
- OCR the PDFs: go from image-only PDF to text-recognized PDF. An ABBYY FineReader OCR Editor (AFR) step.
- Output text or HTML: go from text-recognized PDF to TXT and HTML. Another AFR step.