<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.encyclosphere.org/index.php?action=history&amp;feed=atom&amp;title=OEDP_Automation_Strategy</id>
	<title>OEDP Automation Strategy - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.encyclosphere.org/index.php?action=history&amp;feed=atom&amp;title=OEDP_Automation_Strategy"/>
	<link rel="alternate" type="text/html" href="https://wiki.encyclosphere.org/index.php?title=OEDP_Automation_Strategy&amp;action=history"/>
	<updated>2026-04-25T08:56:01Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://wiki.encyclosphere.org/index.php?title=OEDP_Automation_Strategy&amp;diff=490&amp;oldid=prev</id>
		<title>Lsanger: /* How to automatically ZWI-ify simpler encyclopedias */</title>
		<link rel="alternate" type="text/html" href="https://wiki.encyclosphere.org/index.php?title=OEDP_Automation_Strategy&amp;diff=490&amp;oldid=prev"/>
		<updated>2024-02-22T20:52:25Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;How to automatically ZWI-ify simpler encyclopedias&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 16:52, 22 February 2024&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l16&quot;&gt;Line 16:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 16:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Completed and partially completed tasks:&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Completed and partially completed tasks:&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* &amp;lt;code&amp;gt;text_cleaner.rb&amp;lt;/code&amp;gt; inputs &amp;lt;code&amp;gt;foo.txt&amp;lt;/code&amp;gt; and outputs &amp;lt;code&amp;gt;foo-cu.txt&amp;lt;/code&amp;gt; (for &quot;cleaned-up&quot;). This takes a TXT file (''not'' HTML) and fixes some common problems with OCR output, such as spaces between quotation marks and the words they govern. It also creates a new &amp;lt;code&amp;gt;/foo&amp;lt;/code&amp;gt; directory if one does not yet exist. Since it is HTML only, it does not have any italics, bold, or small caps.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* &amp;lt;code&amp;gt;text_cleaner.rb&amp;lt;/code&amp;gt; inputs &amp;lt;code&amp;gt;foo.txt&amp;lt;/code&amp;gt; and outputs &amp;lt;code&amp;gt;foo-cu.txt&amp;lt;/code&amp;gt; (for &quot;cleaned-up&quot;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;) in the &amp;lt;code&amp;gt;foo/&amp;lt;/code&amp;gt; directory (where 'foo' is the metadata's Publisher code&lt;/ins&gt;). This takes a TXT file (''not'' HTML) and fixes some common problems with OCR output, such as spaces between quotation marks and the words they govern. It also creates a new &amp;lt;code&amp;gt;/foo&amp;lt;/code&amp;gt; directory if one does not yet exist. Since it is HTML only, it does not have any italics, bold, or small caps.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* &amp;lt;code&amp;gt;firstline_finder.rb&amp;lt;/code&amp;gt; inputs a &amp;lt;code&amp;gt;foo-cu.txt&amp;lt;/code&amp;gt; file and outputs &amp;lt;code&amp;gt;foo-mu.txt&amp;lt;/code&amp;gt; (for &quot;marked-up&quot;). This takes the cleaned-up TXT file and adds &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;both &lt;/del&gt;&amp;lt;code&amp;gt;&amp;lt;article&amp;gt;&amp;lt;/code&amp;gt; and &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;(soon) &lt;/del&gt;&amp;lt;code&amp;gt;&amp;lt;nowiki&amp;gt;&amp;lt;span class=&quot;title&quot;&amp;gt;&amp;lt;/nowiki&amp;gt;&amp;lt;/code&amp;gt; tags.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* &amp;lt;code&amp;gt;firstline_finder.rb&amp;lt;/code&amp;gt; inputs a &amp;lt;code&amp;gt;foo-cu.txt&amp;lt;/code&amp;gt; file and outputs &amp;lt;code&amp;gt;foo-mu.txt&amp;lt;/code&amp;gt; (for &quot;marked-up&quot;)&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;, also in the &amp;lt;code&amp;gt;foo/&amp;lt;/code&amp;gt; directory&lt;/ins&gt;. This takes the cleaned-up TXT file and adds &amp;lt;code&amp;gt;&amp;lt;article&amp;gt;&amp;lt;/code&amp;gt; and&amp;lt;code&amp;gt;&amp;lt;nowiki&amp;gt;&amp;lt;span class=&quot;title&quot;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;gt;&amp;lt;/nowiki&amp;gt;&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;&amp;lt;nowiki&amp;gt;&amp;lt;p&lt;/ins&gt;&amp;gt;&amp;lt;/nowiki&amp;gt;&amp;lt;/code&amp;gt; tags&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* &amp;lt;code&amp;gt;htmlizer.rb&amp;lt;/code&amp;gt; takes the &amp;lt;code&amp;gt;foo-mu.txt&amp;lt;/code&amp;gt; (marked-up) file and splits them into individual html files, placing them in a new &amp;lt;code&amp;gt;foo/html/&amp;lt;/code&amp;gt; directory. For example, one of 3000+ HTML autogenerated this way was saved at &amp;lt;code&amp;gt;ccrk/html/Aachen.html&amp;lt;/code&amp;gt;.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* &amp;lt;code&amp;gt;zwify.rb&amp;lt;/code&amp;gt; iterates over the contents of &amp;lt;code&amp;gt;foo/html/&amp;lt;/code&amp;gt;, prepares &amp;lt;code&amp;gt;article.html&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;article.txt&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;media.json&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;metadata.json&amp;lt;/code&amp;gt;, and &amp;lt;code&amp;gt;signature.json&amp;lt;/code&amp;gt; for each article individual article, putting the results in a subdirectory of &amp;lt;code&amp;gt;foo/zwicontent/&amp;lt;/code&amp;gt;; for example, &amp;lt;code&amp;gt;ccrk/zwicontent/Aachen/&amp;lt;/code&amp;gt; . It then iterates over the &amp;lt;code&amp;gt;zwicontent/&amp;lt;/code&amp;gt; subdirectories and produces ZWI files out of them, placing the files in &amp;lt;code&amp;gt;foo/zwi/&amp;lt;/code&amp;gt;; for example, &amp;lt;code&amp;gt;ccrk/zwi/Aachen.zwi&amp;lt;/code&amp;gt;&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Tasks to do:&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Tasks to do:&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Actually do &lt;/del&gt;the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;code&amp;gt;&amp;lt;nowiki&amp;gt;&amp;lt;span class=&quot;title&quot;&amp;gt;&amp;lt;/nowiki&amp;gt;&amp;lt;/code&amp;gt; markup. I seem &lt;/del&gt;to &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;have made an excellent set of rules for this purpose&lt;/del&gt;, &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;so I'm all but ready to add this to &amp;lt;code&amp;gt;firstline_finder.rb&amp;lt;/code&amp;gt;&lt;/del&gt;.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Improve &lt;/ins&gt;the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;accuracy of heuristics used &lt;/ins&gt;to &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;individuate articles&lt;/ins&gt;, &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;and other small things&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Do a test run through the entire ccrk-mu.txt output, make necessary changes &lt;/del&gt;to &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;ccrk.txt, and see if the changes made by hand have the preferred effect&lt;/del&gt;.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Actually push &lt;/ins&gt;to &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Oldpedia&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* Examine ZWIFormat again and decide whether&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Lsanger</name></author>
	</entry>
	<entry>
		<id>https://wiki.encyclosphere.org/index.php?title=OEDP_Automation_Strategy&amp;diff=477&amp;oldid=prev</id>
		<title>Lsanger: /* How to automatically ZWI-ify simpler encyclopedias */</title>
		<link rel="alternate" type="text/html" href="https://wiki.encyclosphere.org/index.php?title=OEDP_Automation_Strategy&amp;diff=477&amp;oldid=prev"/>
		<updated>2024-02-19T21:23:38Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;How to automatically ZWI-ify simpler encyclopedias&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 17:23, 19 February 2024&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l18&quot;&gt;Line 18:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 18:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* &amp;lt;code&amp;gt;text_cleaner.rb&amp;lt;/code&amp;gt; inputs &amp;lt;code&amp;gt;foo.txt&amp;lt;/code&amp;gt; and outputs &amp;lt;code&amp;gt;foo-cu.txt&amp;lt;/code&amp;gt; (for &amp;quot;cleaned-up&amp;quot;). This takes a TXT file (''not'' HTML) and fixes some common problems with OCR output, such as spaces between quotation marks and the words they govern. It also creates a new &amp;lt;code&amp;gt;/foo&amp;lt;/code&amp;gt; directory if one does not yet exist. Since it is HTML only, it does not have any italics, bold, or small caps.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* &amp;lt;code&amp;gt;text_cleaner.rb&amp;lt;/code&amp;gt; inputs &amp;lt;code&amp;gt;foo.txt&amp;lt;/code&amp;gt; and outputs &amp;lt;code&amp;gt;foo-cu.txt&amp;lt;/code&amp;gt; (for &amp;quot;cleaned-up&amp;quot;). This takes a TXT file (''not'' HTML) and fixes some common problems with OCR output, such as spaces between quotation marks and the words they govern. It also creates a new &amp;lt;code&amp;gt;/foo&amp;lt;/code&amp;gt; directory if one does not yet exist. Since it is HTML only, it does not have any italics, bold, or small caps.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* &amp;lt;code&amp;gt;firstline_finder.rb&amp;lt;/code&amp;gt; inputs a &amp;lt;code&amp;gt;foo-cu.txt&amp;lt;/code&amp;gt; file and outputs &amp;lt;code&amp;gt;foo-mu.txt&amp;lt;/code&amp;gt; (for &amp;quot;marked-up&amp;quot;). This takes the cleaned-up TXT file and adds both &amp;lt;code&amp;gt;&amp;lt;article&amp;gt;&amp;lt;/code&amp;gt; and (soon) &amp;lt;code&amp;gt;&amp;lt;nowiki&amp;gt;&amp;lt;span class=&amp;quot;title&amp;quot;&amp;gt;&amp;lt;/nowiki&amp;gt;&amp;lt;/code&amp;gt; tags.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* &amp;lt;code&amp;gt;firstline_finder.rb&amp;lt;/code&amp;gt; inputs a &amp;lt;code&amp;gt;foo-cu.txt&amp;lt;/code&amp;gt; file and outputs &amp;lt;code&amp;gt;foo-mu.txt&amp;lt;/code&amp;gt; (for &amp;quot;marked-up&amp;quot;). This takes the cleaned-up TXT file and adds both &amp;lt;code&amp;gt;&amp;lt;article&amp;gt;&amp;lt;/code&amp;gt; and (soon) &amp;lt;code&amp;gt;&amp;lt;nowiki&amp;gt;&amp;lt;span class=&amp;quot;title&amp;quot;&amp;gt;&amp;lt;/nowiki&amp;gt;&amp;lt;/code&amp;gt; tags.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Tasks to do:&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* Actually do the &amp;lt;code&amp;gt;&amp;lt;nowiki&amp;gt;&amp;lt;span class=&quot;title&quot;&amp;gt;&amp;lt;/nowiki&amp;gt;&amp;lt;/code&amp;gt; markup. I seem to have made an excellent set of rules for this purpose, so I'm all but ready to add this to &amp;lt;code&amp;gt;firstline_finder.rb&amp;lt;/code&amp;gt;.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* Do a test run through the entire ccrk-mu.txt output, make necessary changes to ccrk.txt, and see if the changes made by hand have the preferred effect.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* Examine ZWIFormat again and decide whether&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Lsanger</name></author>
	</entry>
	<entry>
		<id>https://wiki.encyclosphere.org/index.php?title=OEDP_Automation_Strategy&amp;diff=476&amp;oldid=prev</id>
		<title>Lsanger: Where I'm at with AutoZWI</title>
		<link rel="alternate" type="text/html" href="https://wiki.encyclosphere.org/index.php?title=OEDP_Automation_Strategy&amp;diff=476&amp;oldid=prev"/>
		<updated>2024-02-19T20:59:46Z</updated>

		<summary type="html">&lt;p&gt;Where I&amp;#039;m at with AutoZWI&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 16:59, 19 February 2024&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l1&quot;&gt;Line 1:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;== The steps to turn old encyclopedias into ZWI-driven web pages ==&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;There are five steps to go from old paper encyclopedias to encyclopedia articles on Oldpedia (or other [[Encyclosphere]] sites).&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;There are five steps to go from old paper encyclopedias to encyclopedia articles on Oldpedia (or other [[Encyclosphere]] sites).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l9&quot;&gt;Line 9:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 10:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The only type of encyclopedia that is a candidate for full automation is the sort with relatively short articles and relatively few images (although fewer than 50 or 100 plates seems could be processed by hand. It is these encyclopedias that the rest of this page will deal with.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The only type of encyclopedia that is a candidate for full automation is the sort with relatively short articles and relatively few images (although fewer than 50 or 100 plates seems could be processed by hand. It is these encyclopedias that the rest of this page will deal with.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;== How to automatically ZWI-ify simpler encyclopedias ==&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;With [[AutoZWI]], there are several tasks complete, several to do, and several open questions.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Completed and partially completed tasks:&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* &amp;lt;code&amp;gt;text_cleaner.rb&amp;lt;/code&amp;gt; inputs &amp;lt;code&amp;gt;foo.txt&amp;lt;/code&amp;gt; and outputs &amp;lt;code&amp;gt;foo-cu.txt&amp;lt;/code&amp;gt; (for &quot;cleaned-up&quot;). This takes a TXT file (''not'' HTML) and fixes some common problems with OCR output, such as spaces between quotation marks and the words they govern. It also creates a new &amp;lt;code&amp;gt;/foo&amp;lt;/code&amp;gt; directory if one does not yet exist. Since it is HTML only, it does not have any italics, bold, or small caps.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* &amp;lt;code&amp;gt;firstline_finder.rb&amp;lt;/code&amp;gt; inputs a &amp;lt;code&amp;gt;foo-cu.txt&amp;lt;/code&amp;gt; file and outputs &amp;lt;code&amp;gt;foo-mu.txt&amp;lt;/code&amp;gt; (for &quot;marked-up&quot;). This takes the cleaned-up TXT file and adds both &amp;lt;code&amp;gt;&amp;lt;article&amp;gt;&amp;lt;/code&amp;gt; and (soon) &amp;lt;code&amp;gt;&amp;lt;nowiki&amp;gt;&amp;lt;span class=&quot;title&quot;&amp;gt;&amp;lt;/nowiki&amp;gt;&amp;lt;/code&amp;gt; tags.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Lsanger</name></author>
	</entry>
	<entry>
		<id>https://wiki.encyclosphere.org/index.php?title=OEDP_Automation_Strategy&amp;diff=475&amp;oldid=prev</id>
		<title>Lsanger: Draft save</title>
		<link rel="alternate" type="text/html" href="https://wiki.encyclosphere.org/index.php?title=OEDP_Automation_Strategy&amp;diff=475&amp;oldid=prev"/>
		<updated>2024-02-19T20:16:23Z</updated>

		<summary type="html">&lt;p&gt;Draft save&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 16:16, 19 February 2024&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l1&quot;&gt;Line 1:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;There are &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;''n'' &lt;/del&gt;steps to &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;going &lt;/del&gt;from old paper encyclopedias to encyclopedia articles on Oldpedia (or other [[Encyclosphere]] sites).&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;There are &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;five &lt;/ins&gt;steps to &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;go &lt;/ins&gt;from old paper encyclopedias to encyclopedia articles on Oldpedia (or other [[Encyclosphere]] sites).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Scan the books: go from paper to image-only PDF. This has already been done for well over a hundred volumes, more than enough to keep us busy.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Scan the books: go from paper to image-only PDF. This has already been done for well over a hundred volumes, more than enough to keep us busy.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l5&quot;&gt;Line 5:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 5:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Output text or HTML: go from text-recognized PDF to TXT and HTML. Another AFR step.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Output text or HTML: go from text-recognized PDF to TXT and HTML. Another AFR step.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;#Markup: go from either TXT or HTML to a file marked up with &amp;lt;code&amp;gt;&amp;lt;article&amp;gt;&amp;lt;/code&amp;gt; as well as markup for title. This step is performed by a combination of hand and code (in the old [[ZWIFormat]] system) or, for one encyclopedia so far, by code (in the new [[AutoZWI]] system).&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;#Markup: go from either TXT or HTML to a file marked up with &amp;lt;code&amp;gt;&amp;lt;article&amp;gt;&amp;lt;/code&amp;gt; as well as markup for title. This step is performed by a combination of hand and code (in the old [[ZWIFormat]] system) or, for one encyclopedia so far, by code (in the new [[AutoZWI]] system).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;#Make ZWI: go from marked up file to ZWI file. This step is performed by a combination of hand and code (in the old [[ZWIFormat]] system) or, for one encyclopedia so far, by code (in the new [[AutoZWI]] system).&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;#Make ZWI &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;(and upload)&lt;/ins&gt;: go from marked up file to ZWI file. This step is performed by a combination of hand and code (in the old [[ZWIFormat]] system) or, for one encyclopedia so far, by code (in the new [[AutoZWI]] system)&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;While a fully automated system system certainly sounds better, steps 1-3 cannot be automated yet, and encyclopedias with longer articles, and especially those with complex features such as more complex styling, footnotes, tables, and pictures, cannot be handled fully automatically.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt; &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;The only type of encyclopedia that is a candidate for full automation is the sort with relatively short articles and relatively few images (although fewer than 50 or 100 plates seems could be processed by hand. It is these encyclopedias that the rest of this page will deal with&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Lsanger</name></author>
	</entry>
	<entry>
		<id>https://wiki.encyclosphere.org/index.php?title=OEDP_Automation_Strategy&amp;diff=474&amp;oldid=prev</id>
		<title>Lsanger: Finished describing five steps</title>
		<link rel="alternate" type="text/html" href="https://wiki.encyclosphere.org/index.php?title=OEDP_Automation_Strategy&amp;diff=474&amp;oldid=prev"/>
		<updated>2024-02-19T20:08:19Z</updated>

		<summary type="html">&lt;p&gt;Finished describing five steps&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 16:08, 19 February 2024&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l4&quot;&gt;Line 4:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 4:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# OCR the PDFs: go from image-only PDF to text-recognized PDF. An ABBYY FineReader OCR Editor (AFR) step.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# OCR the PDFs: go from image-only PDF to text-recognized PDF. An ABBYY FineReader OCR Editor (AFR) step.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Output text or HTML: go from text-recognized PDF to TXT and HTML. Another AFR step.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Output text or HTML: go from text-recognized PDF to TXT and HTML. Another AFR step.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;#&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;#&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Markup: go from either TXT or HTML to a file marked up with &amp;lt;code&amp;gt;&amp;lt;article&amp;gt;&amp;lt;/code&amp;gt; as well as markup for title. This step is performed by a combination of hand and code (in the old [[ZWIFormat]] system) or, for one encyclopedia so far, by code (in the new [[AutoZWI]] system).&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;#Make ZWI: go from marked up file to ZWI file. This step is performed by a combination of hand and code (in the old [[ZWIFormat]] system) or, for one encyclopedia so far, by code (in the new [[AutoZWI]] system).&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Lsanger</name></author>
	</entry>
	<entry>
		<id>https://wiki.encyclosphere.org/index.php?title=OEDP_Automation_Strategy&amp;diff=473&amp;oldid=prev</id>
		<title>Lsanger: Initial save</title>
		<link rel="alternate" type="text/html" href="https://wiki.encyclosphere.org/index.php?title=OEDP_Automation_Strategy&amp;diff=473&amp;oldid=prev"/>
		<updated>2024-02-19T19:51:56Z</updated>

		<summary type="html">&lt;p&gt;Initial save&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;There are ''n'' steps to going from old paper encyclopedias to encyclopedia articles on Oldpedia (or other [[Encyclosphere]] sites).&lt;br /&gt;
&lt;br /&gt;
# Scan the books: go from paper to image-only PDF. This has already been done for well over a hundred volumes, more than enough to keep us busy.&lt;br /&gt;
# OCR the PDFs: go from image-only PDF to text-recognized PDF. An ABBYY FineReader OCR Editor (AFR) step.&lt;br /&gt;
# Output text or HTML: go from text-recognized PDF to TXT and HTML. Another AFR step.&lt;br /&gt;
#&lt;/div&gt;</summary>
		<author><name>Lsanger</name></author>
	</entry>
</feed>