I got permission to take four L.M. Montgomery texts from A Celebration of Women Writers (Rainbow Valley, Kilmeny of the Orchard, The Story Girl, and Further Chronicles of Avonlea) and submit them to Project Gutenberg. I’ll have to reformat them (they’re in HTML), of course. That’s not a big problem, though. First I need to finish reformatting The Ball and the Cross and submit it. Then I’ll do these Montgomery texts, and finally I’ll finish the Icelandic primer. It’s tempting to do lots more of this sort of work (since there are lots of other etexts out there, and getting permission is easier than I thought it would be, and reformatting texts is certainly much easier than scanning or typing them in). But I really ought to be preparing for my mission instead. After these etext projects I’ll be done for the summer.
Oh, today Jim Tinsley of Project Gutenberg e-mailed me some output from different OCR packages (Abbyy, DocMorph, and gocr). The difference is amazing. Abbyy is incredible. Almost perfect, in fact. DocMorph didn’t do so well on the image he used, but when I used it, it was almost flawless. There’s no hope for gocr. If only I could get Clara OCR to work so I could see how it compares…
Got the four Montgomery books from the library. I’ll need to compare them to the existing etexts to make sure they’re the same editions (that’s part of copyright clearance). And I need to either scan in the title page and verso or else photocopy them. I prefer scanning, since it takes less time.