ZWI file format
The ZWI file format is used to store encyclopedia (originally, wiki) articles and associated data. The format is developed to facilitate exchange between encyclosphere aggregators (originally conceived of as a wiki). A ZWI file contains, at a minimum, an article's HTML file and a plaintext file, but often the Wikitext of the most recent revision of an article, names of contributors, old revisions, embedded media, all of which interact intelligently with HTML files, as well as other (optional) file formats derived from the original Wikitext file, assuming there is such a file (otherwise derived from the HTML). The ZWI files are compact (zipped) files and sharable via the network. The first encyclopedia that published to the Encyclosphere network using the ZWI file format was the HandWiki online encyclopedia.<ref>S.V.Chekanov, HandWiki encyclopedia. https://handwiki.org/ 2021</ref>. Using a plugin for either MediaWiki (which runs Wikipedia) or the also-popular Dokuwiki, some registered users can use a “ZWI export” button (above the editor area) to download the wiki page or to push a version to the EncycloReader aggregator. The ZWI file can be unzipped as any zip archive. The ZWI files have the extension *.zwi.
ZWI file structure
A ZWI file is a ZIP archive thus it can be manipulated using the standard zip compression tools. A typical ZWI file has the following structure:
- article.wikitext - Wikitext of the article with last modification using Mediawiki syntax. It is the main source of HTML, XHTML and other possible derivations.
- article.html - HTML file to view in a browser (with all headers). It is a secondary (derived from article.wikitext) format.
- article.xhtml - HTML portion with the article content (without headers, navigation etc.) (optional)
- article.tex - article in the LaTeX file format (optional)
- article.dokuwiki - article in the DokuWiki file format (optional)
- metadata.json - a JSON file with the information about the articles (editors, revisions, namespaces, abstract etc.)
- signature.json - a JSON file with the signature of the publisher (optional)
- plugins.json - a JSON file with the information about plugins used by software that creates this file (used for a consistency check) (optional)
- media.json - a JSON file with the list of linked media files (images)
- data/media/[namespace]/ - directory with images associated with the article (only if they are available from the local server)
- data/attic/[namespace]/ - directory with files with older revisions of article.wikitext. Each file has the name:
<jcode> [article name].[timestamp].wikitext </jcode>
The most important file that contains the description of the ZWI file is metadata.json. It describes the version of the ZWI format specification, which file is the primary source of derivations (article.wikitext for the MediaWiki software). All other files, such as article.html, article.tex, article.dokuwiki are secondary conversions since they are obtained after using convertors of the original article.wikitext file.
A typical example of the metadata.json file of a Wikipedia article is given here:
<source lang="javascript"> {
"ZWIversion": 1.3, "Title": "Ben Davidson (rugby league)", "ShortTitle": "Ben Davidson", "Topics": [ "Rugby league", "New Zealand rugby league footballer" ], "Lang": "en", "Content": { "article.html": "65c821ccc989721f2fcbeeb69f6b6bed3e32b3de", "article.wikitext": "0783919ee5352969be3445d3c9e7a17a79392ea2", "article.txt": "3a1c5b32ce22d8e8d5fed9d057adcf04b1019988" }, "Primary": "article.wikitext", "Revisions": [], "Publisher": "wikipedia", "CreatorNames": [], "ContributorNames": [], "LastModified": "1657454530", "TimeCreated": "1657454530", "PublicationDate": "2022-08-18", "Categories": [ "1902 births", "1961 deaths", ], "Rating": [0,0], "Description": "Benjamin Alfred Davidson (1902 \u2013 1961) was a New Zealand rugby league footballer who represented New Zealand. ", "Comment": "", "License": "CC BY-SA 3.0", "GeneratorName": "MediaWiki", "SourceURL": "https://en.wikipedia.org/wiki/Ben_Davidson_(rugby_league)"
} </source>
Note that PublicationDate is used for historic articles. It uses the format "yyyy-mm-dd", unlike TimeCreated and LastModified fields that hold the proper timestamps (in seconds since 1970) that correspond to creation and modification time of the ZWI file itself.
The field "Rating" consists of 2 numbers: total score (defined by a publisher) and the number of hits. By default, both numbers are 0. The ZWI file is a self-aware of its rating.
The file "signature.json" contains a token signed with a private key. This token using metadata.json and media.json as inputs. Therefore, any attempts to modify metadata.json (such as rating, publisher, text file or images) will lead to a broken signature and thus such a file cannot be verified.
If a ZWI file is created using DokuWiki software, it is likely that the primary file is article.dokuwiki while article.wikitext is a result of internal conversion. This should be stated in metadata.json.
Generally, all article revisions should be stored. In some cases (like for HandWiki), only the first revision is stored.
The ZWI file can include the images linked in the articles. They are stored in the directory "data/media/[namespace]/". The images are included only if they were located on the local server (i.e. where the wiki with the article is installed). The ZWI export mechanism does not attempt to extract images if they are linked from the Mediawiki commons. However, the ZWI creation mechanism attempts to identifies cached images.
If there are no other (older) revisions of the article, the directory data/attic/[namespace]/ is not created.
The ZWI file format was initially implemented for the SandBox of the HandWiki encyclopedia in March 2021. A proof of the basic principles for creation and insertion of the ZWI files was illustrated using the DokuWiki wiki software. <ref>S.V.Chekanov. EncycloED editor. A wiki editor based on DokuWiki with ZWI file export and import. (retrieved May 2021)</ref>. In April 2021, ZWI file export was deployed as a standard feature of the HandWiki encyclopedia. In October 2021 ZWI file production was launched by the Encycloreader project<ref>Encycloreader. Search and read online encyclopedias [1] (retrieved Oct 2021). KSF </ref>.
Documentation
- ZWI File Format documentation[2]