Mormon Digitization Project, resurrected

I’m resurrecting the Mormon Digitization Project, which I blogged about nine months ago and then abandoned while I went and got married. (I feel justified. ;))

Project page: Mormon Digitization Project

Brief recap: the goal is to find pre-1923 Mormon books (out of copyright), scan them, OCR them, clean up the OCRed text, and release the plain text files on Project Gutenberg (along with ePub editions, possibly PDFs, and possibly Lulu editions as well).

I’m starting with John A. Widtsoe’s book Joseph Smith As Scientist and will go from there. If you have any suggestions/requests, leave them in the comments (or email them to me). If I get enough people helping out, we’ll be able to tackle a few books at a time.

Process-wise, I’m thinking about trying Bite-Size Edits for at least part of the cleanup. There’s also a remote possibility I’ll use PGDP, but I really, really don’t like their interface. Right now I’m planning to track things using email and a Google Spreadsheet. (If I had more time I’d write a web app to manage it all for me, but Beyond is getting the bulk of my coding time.)

Yes, this will be kind of similar to the Mormon Documentation Project, but they don’t seem to be doing the types of books we’ll be doing. (I did use their text for the Standard Works web app and for this D&C reader’s edition I’m still working on, though. Good stuff.)

Want to help out? Email me (ben dot crowder at gmail) and I’ll add you to the list.

Comments

Kathy
Mar 8, 2010 at 1:50 pm

Will either of the cleanup-helper options you suggested allow displaying the scanned image that corresponds with the OCR’d text, to help people discern what text was originally intended?

Ben
Mar 8, 2010 at 3:17 pm

PGDP does, yes. Bite-Size Edits doesn’t, which is why if I do use it, it’ll be for a later stage of cleanup (a second round).

For now, I think I’ll be farming out a page per person at a time, sending them the page image and the page text. They can then open up the image in an image viewer and open up the text in a text editor. Simple.

Hugh
Mar 23, 2010 at 1:41 pm

Hey Ben,

We’ve been talking to Internet Archive about matching Bite-Size snippets with the corresponding image from their scanned text … but have not progressed on that yet. Still would love to do that though.

Ben
Mar 23, 2010 at 1:57 pm

Ooh, that’d be awesome. Seriously awesome. (And I love the bite-size idea as opposed to full pages — smaller chunks are so much more doable.)