Thousands of Texts

This is a gateway to some of mankind's greatest intellectual achievements, as collected and distributed by Project Gutenberg. I am particularly interested in allowing people to browse through this library, since I did not have to choose what works to interpret—in a sense the world did it by itself. These volunteers were motivated enough to scan and proofread a text and give it freely back to the world, showing a profound commitment to communication and the free exchange of ideas. It is their fine and generous work that provided the basis for TextArc's development, and it inspired me to give my own small bit freely back to the world. A portion of the proceeds from the sale of PG-related fine print editions on the commercial TextArc site is being donated to them, too.

Not all of the files are good candidates for TextArc. Mostly the English files and the smaller ones. I only have an English "stemmer," code which brings together words sharing a root; and the list of common, therefore omittable, words for German and French is largely guesswork so far. You can search by number of lines in the text—start at about 2000 (enter "1000-2000" in the Line Count field) and go higher if your machine isn't straining. We're working on something that will filter out the completely inappropriate ones like MIDI files, or movies, or files too big for memory. It should be installed next week.

The interface is pretty undesigned at this point. Enter search terms and drag a text to one of the three areas at the bottom. More to come.

Note that this program is new research, and will not work on all machines. It has been tested to run on a machine of the following description:

  • 600 Mhz Pentium III or faster, or a recent Mac
  • 1024 x 768 pixel screen or higher resolution, 16 bit color
  • Windows NT, Windows 2000, Windows XP operating systems, or Mac OS X
  • 256 Mb of RAM
  • A fast internet connection
  • No other memory-intensive programs running
  • Microsoft Internet Explorer (5 or later)
  • Netscape (6.2 or later)

TextArc will often stop working the second time it is run in the same browser session. This is caused by the browser remembering parts of the program after it should have exited. The simplest way to run it multiple times is to STOP all browser windows, then come back and run it again. If you are running Netscape, you can clear the browser's memory manually by opening the Java Console (under the Tasks:Tools menu, available in the menu bar at the top of the program) and pressing "x" on the keyboard, so that it "clears the classloader cache." If you do this after each session with TextArc, you should not have to stop all browser windows. (If some brilliant Java/browser hack knows how to get Netscape and MSIE to clear the classloader cache under program control,PLEASE contact b r a d @ d i d i . c o m ! )

Note: TextArc will open a window that covers your primary monitor's desktop. If you prefer to stay in a familiar window, uncheck the checkbox labelled "Start TextArc covering full screen" at the bottom of the LibraryList interface, or just look at the still screen shots.


Click here to run the TextArc Library

(Or click this one if you're on a Macintosh
and using something besides Microsoft Internet Explorer)