Old Encyclopedia Digitization Project (OEDP)
Project Info
Old Encyclopedia Digitization Project (OEDP)
URL: https://encyclosphere.org/the-old-encyclopedia-digitization-project
Description: A project to digitize public domain encyclopedias in the ZWI file format.
Project lead(s): Larry Sanger, Nancy Hildebrandt
Biggest challenge: Autogenerating usable ZWI files from OCR text file output.
Date Started: Summer 2022
The Old Encyclopedia Digitization Project aims to digitize old encyclopedias in the ZWI file format. We will also make the results available via Oldpedia.org.
How much knowledge is buried in old encyclopedias?
We aim to find out—and to make it available via modern search engines, readers, and free files available via the Encyclosphere, with the Old Encyclopedia Digitization Project.
We began with this thought: we’ve made a file format for encyclopedia articles (the ZWI file format); wouldn’t it be awesome if we digitized a bunch of old, public domain encyclopedias and put them into ZWI files?
The project's top priority is to speed up the process 100x, mainly by writing software (possibly with AI help) for automatically generating ZWI files correctly from pure OCR output. This is harder than it sounds; an only partly automated software solution was written by Larry Sanger and is here. We have experimented around trying to get an LLM API to help with the tricky parts, and with some progress, but without usable output yet. See OEDP Automation Strategy.
Our ongoing priority is simply to do the hard work of preparing ZWI versions of old encyclopedia articles. This is semi-technical work, done partly in old documents, partly on the command line, so it is a niche thing.
What we are doing
Nancy Hildebrandt has done the most work preparing articles, from an encyclopedia of musical biography (as of 2023-24). Larry has made some ZWI files from an old encyclopedia of Ohio. They both use subscriptions to ABBYY FineReader, which we can make available to anyone who is interested in this sort of work. Please let us know.
John Hampson has been busy (in 2023-24) grading the scan qualities of old encyclopedia PDF files; if that work would help anyone, let us know.
More introductory info (from late 2022) can be found at encyclosphere.org/the-old-encyclopedia-digitization-project.