Ancient Greek OCR — Blog

This is cool:

Ancient Greek OCR is free software to accurately convert scans of printed Ancient Greek into unicode text and PDF files, which can be easily searched, copied, archived, and transformed. It uses the excellent Tesseract OCR engine, tailored for Ancient Greek typography, syntax and vocabulary.

I haven’t used Tesseract in 10+ years, but back then it wasn’t too great. According to their website, however: “Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.” That’s encouraging. (I wonder if that’s what they’re using behind the scenes for Google Books and Google Drive and their other things.)