OEDP Automation Strategy

From Encyclosphere Project Wiki
Revision as of 15:51, 19 February 2024 by Lsanger (talk | contribs) (Initial save)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

There are n steps to going from old paper encyclopedias to encyclopedia articles on Oldpedia (or other Encyclosphere sites).

  1. Scan the books: go from paper to image-only PDF. This has already been done for well over a hundred volumes, more than enough to keep us busy.
  2. OCR the PDFs: go from image-only PDF to text-recognized PDF. An ABBYY FineReader OCR Editor (AFR) step.
  3. Output text or HTML: go from text-recognized PDF to TXT and HTML. Another AFR step.