ZWIBook: Difference between revisions

From Encyclosphere Project Wiki
No edit summary
No edit summary
 
(25 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''ZWIBook''' is a project by [[Larry Sanger]] to put all of Project Gutenberg books, in ZWI format, on a flash drive with a bespoke reader (available for Windows, Mac, and Linux). The project will be [https://encyclosphere.org/product/zwibook-flash-drive available as a thank-you gift] for those who donate $50 or more to the KSF. As of the end of July, the project is essentially ready to launch.
'''ZWIBook''' is a project by [[Larry Sanger]], with help from many (not least of which ChatGPT), to put [https://gutenberg.org all of Project Gutenberg books (as of June 2024)], in ZWI format, on a flash drive with a bespoke reader (available for Windows, Mac, and Linux). The project is [https://encyclosphere.org/product/zwibook-flash-drive available for purchase] for $50, or $100 for the numbered and signed drives. We thank [[Henry Sanger]] for help with CSS and design, and we also thank Project DARA's Louie and Dorian for creating and making available the ZWI files of Project Gutenberg books for version 1.0, which they prepared and which are available independently [https://gutenberg.dara.global here].
 
== Version 1.0 features ==
All drives numbered 1 to 179, and a larger number of non-numbered drives sold before early October, 2024, are version 1.0. These include the following features:
 
* 60,020 book files
* Linux, Windows, and Mac versions included
* Title and author search
* Topic (Library of Congress category) browser
* Bookshelf, including books saved and books opened
* Book reader, including:
** Zoom, font chooser, find on page
** Bookmarks
** Highlighting
** Notes
* Exporting (saving) and importing (loading) bookshelves and notes
* Printing, exporting in PDF, EPUB, and ZWI (including HTML and TXT as available) formats
 
== Known issues with version 1.0 ==
# Several issues with the original collection ZWI book files in the <code>/book_zwis</code> directory came to light near the beginning of October 2024. Most users will not encounter these issues, as they affect fewer than 0.2% of the files. Certain files should be removed or repaired (details below) from ZWIBook flash drives from October 7, 2024 and onward; the original version of ZWIBook drives is hereby dubbed version 1.0, and the new version will be called version 1.1. When our work on this is done (as it should be by approximately October 10), then we will make a ZIP file containing all changed files, with instructions, available; this will let users update their copies of the ZWIBook files, making their local copies equivalent to version 1.1. Here are the issues:
#* At the beginning of August about 85 titles were [https://lists.pglaf.org/archives/list/[email protected]/message/UNF3QARLPJXXSQUDZ5PDUMPEVYULFT2O/ deleted from the PG database] for potential copyright reasons. This announcement was made on a small Project Gutenberg mailing list, but as soon as we became aware of the issue, we put a temporary pause in our marketing and distribution. These ~85 files should be removed from version 1.1.
#* About 871 more items will be deleted, because they were copyrighted; these items were part of the PG collection, and according to the license, the 20% royalties is given to PG as a donation, which PG waived. Nevertheless, after PG's August announcement and after discussing more with PG, we have decided to delete these items as well, out of an abundance of caution. They will be made up for with items added to PG between Dec. 2023 and sometime fall 2024.
#* ''The Gutenberg Webster's Unabridged Dictionary,'' for complicated reasons, causes the software to appear to hang, i.e., to freeze and become inaccessible. (In fact, it will eventually load, but only a blank page, and only after several minutes.) Worse, the software's internal data file recording the latest-viewed book is stays on that page. This appears to render the software unusable. The issue can be fixed as follows: (1) Go to your home directory, or wherever the software data is standardly saved in your system. (2) Look (if necessary showing "hidden files") for <code>.zwibook</code>. (3) Inside that directory you will find <code>latest.txt</code>: simply delete that file and the problem will be fixed. v1.1 drives will omit the offending title.
#* A couple ZWI files are completely missing either the HTML or the TXT file for the book. We will reconstruct the ZWI files for these items in v1.1.
#* A small number of files (15-20?) are "RTF-only", i.e., they do not have HTML or TXT files, but only RTF files; moreover, the RTF files are not part of the ZWI files. In v1.1, we will be removing these because the vast majority are copyrighted.
#* Various other issues were found with very small ZWI files; many are music-only, or even software-only, files, and not books at all. Obviously these should not be counted as books, and we will be removing (and replacing) them in v1.1.
#* There are 34 titles found in "<code>/book_zwis"</code> but which are not contained in the ZWIBook search database; therefore, while the earlier versions of ZWIBook do have a copy, such copies are not accessible from ZWIBook. This is a known bug. The following Project Gutenberg IDs are not included as of October 7: <code>8776, 13395, 7366, 7766, 7844, 8555, 9304, 13320, 14398, 15083, 15258, 15891, 16187, 16188, 16189, 16190, 16883, 17073, 20144, 22662, 23057, 23063, 23326, 25105, 25107, 25109, 25726, 28876, 2953, 44800, 4978, 54963, 6036, 6672</code> We apologize for the issue and will supply new builds of the ZWIBook software (with installation instructions) below soon (v1.1).
# Some non-Latin character sets were not fully supported in the book title and author search. This has been fixed in v1.1.
# Highlighting and bookmarking does not work in the table of contents and other front matter of some books. This is a known limitation, due to the complexity of assigning IDs to complex and unpredictable HTML. This will ''not'' be fixed in version 1.1. Most of the body of the text in all books is, as far as we know, bookmarkable and highlightable. There may be a few exceptions.
# There are some remaining minor styling issues (imperfectly centered text, for example). If you want to collect those issues, we might fix them in a later version. (There will be no change on this in v1.1, because we don't know of any significant issues.)
# Very rarely, some internal links might be broken. We have thoroughly checked many instances of linking from tables of contents and to and from footnotes. Some of these are broken in the original files; some of these were actually fixed by our code. We have encountered very few of these and are aware that not all are fixed. 
# It is possible that there are books that are essentially ''unreadable'' because of styling issues. We do not currently know of any such books. We went through hundreds of books and resolved all issues we saw, but we did not test all the files in the <code>/zwi_books</code> folder. For such books, you can still read the HTML by exporting the ZWI (see under <code>File</code>), unzipping the ZWI file, and clicking on the HTML (or, if text-only, TXT) file.
# Unusually large books can take many seconds (up to 20 or so) to load. These are rare, however.
# One very large book is known to have issues with highlighting. This is a very rare bug.
 
== Version 1.1 features ==
Version 1.1 began shipping November 4, 2024. This represents an incremental improvement. Most users in their daily use would not detect any difference between versions 1.0 and 1.1. Here are the things that are fixed or changed in v1.1:
 
* '''EPUB export''' is now built-in, so that you can save books to read on your phone or tablet even if you're not connected to the internet. Here is an example output file: [https://nc.encyclosphere.org/s/MMjywB5Df79X5is Augustine's ''Confessions'']''.'' (Version 1.0 does have direct links to the Project Gutenberg EPUB files, but this requires an internet connection. While the PG files are a bit better insofar as they have )
* '''Certain books were removed:''' While there are still 69,020 books in v1.1, the collection now excludes all copyrighted (but free-to-read on Project Gutenberg) books. In their place is all the public domain books added from December 2023 through July 2, 2024.
* '''Certain books were added or repaired include:'''
** Project Gutenberg IDs 8776, 13395, 7366, 7766, 7844, 8555, 9304, 13320, 14398, 15083, 15258, 15891, 16187, 16188, 16189, 16190, 16883, 17073, 20144, 22662, 23057, 23063, 23326, 25105, 25107, 25109, 25726, 28876, 44800, 4978, 54963, 6036, 6672 had not included metadata in the original files, so they were unreadable. We added it and now these titles appear in the collection; this includes Bastiat's ''The Law.'' Similarly, some of the original ZWI files were missing both HTML and TXT. We found and added these, as available (includes at least 34228 and 34419).
** Unusually large books can take many seconds (up to 20 or so) to load. These are rare, however. ''War and Peace,'' for example, requires 7 seconds, but once it is loaded, it should behave quickly (as far as we know).
** One enormous book was removed because it caused the software to hang.
*'''The search database was updated''' to include the new books, as were the catalog pages.
*'''Title/author search now supports non-Latin character sets.''' Previously, Chinese and Russian (etc.) titles, as well as diacritical (e.g., accent) marks, were not responded to by the search engine. Now they are.
 
== Known issues with version 1.1 ==
 
#Highlighting and bookmarking does not work in the table of contents and other front matter of some books. This is a known limitation, due to the complexity of assigning IDs to complex and unpredictable HTML. Most of the body of the text in all books is, as far as we know, bookmarkable and highlightable. There may be a few exceptions, but none we know of.
# There are some remaining minor styling issues (rarely, imperfectly centered text, for example). If you want to collect those issues, we might fix them in a later version.
# Very rarely, some internal links might be broken. We have thoroughly checked many instances of linking from tables of contents and to and from footnotes. Some of these are broken in the original files; some of these were actually fixed by our code. We have encountered very few of these and are aware that not all are fixed. 
# It is possible that there are books that are essentially ''unreadable'' because of styling issues. We do not currently know of any such books. We went through hundreds of books and resolved all issues we saw, but we did not test all the files in the <code>/zwi_books</code> folder. For such books, you can still read the HTML by exporting the ZWI (see under <code>File</code>), unzipping the ZWI file, and clicking on the HTML (or, if text-only, TXT) file.
# Unusually large books can take many seconds (up to 20 or so) to load. These are rare, however. ''War and Peace,'' for example, requires 7 seconds, but once it is loaded, it should behave quickly (as far as we know).
# One very large book is known to have issues with highlighting. This is a very rare bug.


== Downloads ==
== Downloads ==
''This is a placeholder. Any future updates will be downloadable in this space.''
 
=== Version 1.1.1 ===
'''''Note on a highlighting issue: a run of 21 drives, mailed November 4,''' is now called "Version 1.1.0". These drives might have difficulties with highlighting. The following builds should <u>not</u> have similar bugs. If you have one of these drives, simply move the problem folder(s) off your drive, or wherever you have the software copied. In its place, unzip the following. It is basically three lines of code different, but those lines fix the highlighting issue.''  
 
[https://nc.encyclosphere.org/s/LREnKdtZXbeAB9o Linux v1.1.1] - zip file
 
[https://nc.encyclosphere.org/s/Y833xmg35BtjLwW Mac v1.1.1] - zip file
 
[https://nc.encyclosphere.org/s/giCEEjHkzK6D2fa Windows v1.1.1] - zip file

Latest revision as of 15:05, 3 May 2025

ZWIBook is a project by Larry Sanger, with help from many (not least of which ChatGPT), to put all of Project Gutenberg books (as of June 2024), in ZWI format, on a flash drive with a bespoke reader (available for Windows, Mac, and Linux). The project is available for purchase for $50, or $100 for the numbered and signed drives. We thank Henry Sanger for help with CSS and design, and we also thank Project DARA's Louie and Dorian for creating and making available the ZWI files of Project Gutenberg books for version 1.0, which they prepared and which are available independently here.

Version 1.0 features

All drives numbered 1 to 179, and a larger number of non-numbered drives sold before early October, 2024, are version 1.0. These include the following features:

  • 60,020 book files
  • Linux, Windows, and Mac versions included
  • Title and author search
  • Topic (Library of Congress category) browser
  • Bookshelf, including books saved and books opened
  • Book reader, including:
    • Zoom, font chooser, find on page
    • Bookmarks
    • Highlighting
    • Notes
  • Exporting (saving) and importing (loading) bookshelves and notes
  • Printing, exporting in PDF, EPUB, and ZWI (including HTML and TXT as available) formats

Known issues with version 1.0

  1. Several issues with the original collection ZWI book files in the /book_zwis directory came to light near the beginning of October 2024. Most users will not encounter these issues, as they affect fewer than 0.2% of the files. Certain files should be removed or repaired (details below) from ZWIBook flash drives from October 7, 2024 and onward; the original version of ZWIBook drives is hereby dubbed version 1.0, and the new version will be called version 1.1. When our work on this is done (as it should be by approximately October 10), then we will make a ZIP file containing all changed files, with instructions, available; this will let users update their copies of the ZWIBook files, making their local copies equivalent to version 1.1. Here are the issues:
    • At the beginning of August about 85 titles were deleted from the PG database for potential copyright reasons. This announcement was made on a small Project Gutenberg mailing list, but as soon as we became aware of the issue, we put a temporary pause in our marketing and distribution. These ~85 files should be removed from version 1.1.
    • About 871 more items will be deleted, because they were copyrighted; these items were part of the PG collection, and according to the license, the 20% royalties is given to PG as a donation, which PG waived. Nevertheless, after PG's August announcement and after discussing more with PG, we have decided to delete these items as well, out of an abundance of caution. They will be made up for with items added to PG between Dec. 2023 and sometime fall 2024.
    • The Gutenberg Webster's Unabridged Dictionary, for complicated reasons, causes the software to appear to hang, i.e., to freeze and become inaccessible. (In fact, it will eventually load, but only a blank page, and only after several minutes.) Worse, the software's internal data file recording the latest-viewed book is stays on that page. This appears to render the software unusable. The issue can be fixed as follows: (1) Go to your home directory, or wherever the software data is standardly saved in your system. (2) Look (if necessary showing "hidden files") for .zwibook. (3) Inside that directory you will find latest.txt: simply delete that file and the problem will be fixed. v1.1 drives will omit the offending title.
    • A couple ZWI files are completely missing either the HTML or the TXT file for the book. We will reconstruct the ZWI files for these items in v1.1.
    • A small number of files (15-20?) are "RTF-only", i.e., they do not have HTML or TXT files, but only RTF files; moreover, the RTF files are not part of the ZWI files. In v1.1, we will be removing these because the vast majority are copyrighted.
    • Various other issues were found with very small ZWI files; many are music-only, or even software-only, files, and not books at all. Obviously these should not be counted as books, and we will be removing (and replacing) them in v1.1.
    • There are 34 titles found in "/book_zwis" but which are not contained in the ZWIBook search database; therefore, while the earlier versions of ZWIBook do have a copy, such copies are not accessible from ZWIBook. This is a known bug. The following Project Gutenberg IDs are not included as of October 7: 8776, 13395, 7366, 7766, 7844, 8555, 9304, 13320, 14398, 15083, 15258, 15891, 16187, 16188, 16189, 16190, 16883, 17073, 20144, 22662, 23057, 23063, 23326, 25105, 25107, 25109, 25726, 28876, 2953, 44800, 4978, 54963, 6036, 6672 We apologize for the issue and will supply new builds of the ZWIBook software (with installation instructions) below soon (v1.1).
  2. Some non-Latin character sets were not fully supported in the book title and author search. This has been fixed in v1.1.
  3. Highlighting and bookmarking does not work in the table of contents and other front matter of some books. This is a known limitation, due to the complexity of assigning IDs to complex and unpredictable HTML. This will not be fixed in version 1.1. Most of the body of the text in all books is, as far as we know, bookmarkable and highlightable. There may be a few exceptions.
  4. There are some remaining minor styling issues (imperfectly centered text, for example). If you want to collect those issues, we might fix them in a later version. (There will be no change on this in v1.1, because we don't know of any significant issues.)
  5. Very rarely, some internal links might be broken. We have thoroughly checked many instances of linking from tables of contents and to and from footnotes. Some of these are broken in the original files; some of these were actually fixed by our code. We have encountered very few of these and are aware that not all are fixed.
  6. It is possible that there are books that are essentially unreadable because of styling issues. We do not currently know of any such books. We went through hundreds of books and resolved all issues we saw, but we did not test all the files in the /zwi_books folder. For such books, you can still read the HTML by exporting the ZWI (see under File), unzipping the ZWI file, and clicking on the HTML (or, if text-only, TXT) file.
  7. Unusually large books can take many seconds (up to 20 or so) to load. These are rare, however.
  8. One very large book is known to have issues with highlighting. This is a very rare bug.

Version 1.1 features

Version 1.1 began shipping November 4, 2024. This represents an incremental improvement. Most users in their daily use would not detect any difference between versions 1.0 and 1.1. Here are the things that are fixed or changed in v1.1:

  • EPUB export is now built-in, so that you can save books to read on your phone or tablet even if you're not connected to the internet. Here is an example output file: Augustine's Confessions. (Version 1.0 does have direct links to the Project Gutenberg EPUB files, but this requires an internet connection. While the PG files are a bit better insofar as they have )
  • Certain books were removed: While there are still 69,020 books in v1.1, the collection now excludes all copyrighted (but free-to-read on Project Gutenberg) books. In their place is all the public domain books added from December 2023 through July 2, 2024.
  • Certain books were added or repaired include:
    • Project Gutenberg IDs 8776, 13395, 7366, 7766, 7844, 8555, 9304, 13320, 14398, 15083, 15258, 15891, 16187, 16188, 16189, 16190, 16883, 17073, 20144, 22662, 23057, 23063, 23326, 25105, 25107, 25109, 25726, 28876, 44800, 4978, 54963, 6036, 6672 had not included metadata in the original files, so they were unreadable. We added it and now these titles appear in the collection; this includes Bastiat's The Law. Similarly, some of the original ZWI files were missing both HTML and TXT. We found and added these, as available (includes at least 34228 and 34419).
    • Unusually large books can take many seconds (up to 20 or so) to load. These are rare, however. War and Peace, for example, requires 7 seconds, but once it is loaded, it should behave quickly (as far as we know).
    • One enormous book was removed because it caused the software to hang.
  • The search database was updated to include the new books, as were the catalog pages.
  • Title/author search now supports non-Latin character sets. Previously, Chinese and Russian (etc.) titles, as well as diacritical (e.g., accent) marks, were not responded to by the search engine. Now they are.

Known issues with version 1.1

  1. Highlighting and bookmarking does not work in the table of contents and other front matter of some books. This is a known limitation, due to the complexity of assigning IDs to complex and unpredictable HTML. Most of the body of the text in all books is, as far as we know, bookmarkable and highlightable. There may be a few exceptions, but none we know of.
  2. There are some remaining minor styling issues (rarely, imperfectly centered text, for example). If you want to collect those issues, we might fix them in a later version.
  3. Very rarely, some internal links might be broken. We have thoroughly checked many instances of linking from tables of contents and to and from footnotes. Some of these are broken in the original files; some of these were actually fixed by our code. We have encountered very few of these and are aware that not all are fixed.
  4. It is possible that there are books that are essentially unreadable because of styling issues. We do not currently know of any such books. We went through hundreds of books and resolved all issues we saw, but we did not test all the files in the /zwi_books folder. For such books, you can still read the HTML by exporting the ZWI (see under File), unzipping the ZWI file, and clicking on the HTML (or, if text-only, TXT) file.
  5. Unusually large books can take many seconds (up to 20 or so) to load. These are rare, however. War and Peace, for example, requires 7 seconds, but once it is loaded, it should behave quickly (as far as we know).
  6. One very large book is known to have issues with highlighting. This is a very rare bug.

Downloads

Version 1.1.1

Note on a highlighting issue: a run of 21 drives, mailed November 4, is now called "Version 1.1.0". These drives might have difficulties with highlighting. The following builds should not have similar bugs. If you have one of these drives, simply move the problem folder(s) off your drive, or wherever you have the software copied. In its place, unzip the following. It is basically three lines of code different, but those lines fix the highlighting issue.

Linux v1.1.1 - zip file

Mac v1.1.1 - zip file

Windows v1.1.1 - zip file