Ben Crowder / Languages
thumbnail

An Icelandic Primer

Henry Sweet

The making of the ebook

I found the image files for this book at Sean Crist’s site (which has many other excellent resources, I might add) and downloaded them all. I then uploaded them one at a time to DocMorph to OCR them. The English text came out fairly well, but the Old Icelandic parts were unrecognizable.

For a while I used the ctrl-v u sequence to enter the Unicode characters (ctrl-v u 00fe to put in an eth, for example), but then I found that Vim’s digraph support for Unicode was much nicer. (Instead of using ctrl-v u 00fe, for example, I could just type ctrl-k d -.) That saved a lot of time.

After cleaning the text files up and comparing against the images, I collated them into one big text file and ran a proofing check, making sure all the paradigm tables were the same. Then I converted the text to HTML, using PHP to automate parts (the paradigm tables and the glossary, mainly). Finally, I proofed the entire etext against the images. After I finished the HTML version I converted the etext to Omega (a TeX package that sits on top of LaTeX) for output to a nice PDF version.

Special characters

Character Description
¯ macron (U+00AF)
´ acute accent (U+00B4)
̄ [E.g. œ̄] combining macron (U+0304)
̈ [E.g. ǫ̈] combining diaresis (U+0308)
þ small thorn (U+00FE)
Þ capital thorn (U+00DE)
ð small eth (U+00F0)
Ð capital eth (U+00D0)
æ small ae (U+00E6)
ǣ small ae with macron (U+01E3)
œ small oe (U+0153)
ā small a with macron (U+0101)
ē small e with macron (U+0113)
ī small i with macron (U+012B)
ō small o with macron (U+014D)
ū small u with macron (U+016B)
ȳ small y with macron (U+0233)
Ā capital a with macron (U+0100)
Ē capital e with macron (U+0112)
Ī capital i with macron (U+012A)
Ō capital o with macron (U+014C)
Ū capital u with macron (U+016A)
ę small e with ogonek (U+0119)
ǫ small o with ogonek (U+01EB)
ø small o with stroke (U+00F8)
ö small o with diaresis (U+00F6)
é small e with acute accent (U+00E9)
§ section sign (U+00A7)

Note that the combining diaresis and macron won’t show up correctly (they should be over the characters they follow).

Changelog

Tags