Ben Crowder / Blog

Blog: #languages

50 posts / tag feed / about the blog / archive / tags

On digital Greek and Latin texts

A good blog post by Gregory Crane (editor-in-chief of the Perseus Digital Library at Tufts) back in February about the Digital Loeb Classical library and the digitization of Greek and Latin texts:

We need transcriptions of public domain print editions to provide a starting point for work. These editions do not have to be the most up-to-date and they do not even have to be error free (99% may be good enough rather than 99.95%). If the community has the ability to correct and augment and to add features such as are described above and to receive recognition for that work, then the editions will evolve rapidly and outperform closed editions. If no community emerges to improve the editions, then the edition is good enough for current purposes. This model moves away from treating the community as a set of consumers and towards viewing members of the community as citizens with an obligation to contribute as well as to use.

The post has links to some fascinating projects I didn’t know about, like the Open Philology Project and the Homer Multitext Project.

Reply via email or office hours

Ancient Greek OCR

This is cool:

Ancient Greek OCR is free software to accurately convert scans of printed Ancient Greek into unicode text and PDF files, which can be easily searched, copied, archived, and transformed. It uses the excellent Tesseract OCR engine, tailored for Ancient Greek typography, syntax and vocabulary.

I haven’t used Tesseract in 10+ years, but back then it wasn’t too great. According to their website, however: “Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.” That’s encouraging. (I wonder if that’s what they’re using behind the scenes for Google Books and Google Drive and their other things.)

Reply via email or office hours

Hebrew similar characters chart

A simple chart to hopefully make it easier for beginners to tell Hebrew characters apart (since several look a lot alike).

Reply via email or office hours

Even more etymologies


First off, Waelisc (“Welsh”) is the Anglo-Saxon word for those pesky Roman and British foreigners. Wales (the country) has the same root, meaning “the foreigners’ land.”

As it turns out, the “wal” in walnut is from that same root, making it the “foreign nut.” Or the Roman nut, more specifically. Down in Rome, they used the generic word for nut (nux) to refer to the walnut. (Other nuts got qualifiers — nux amara meant bitter almond, for example.)

Another reason for adding the wal- prefix was to distinguish the foreign walnut from England’s native hazelnut.


According to the OED, otter (the animal) is “a suffixed form of the Indo-European base of water.” The sound similarity is not just a coincidence.


A Germanic word that originally meant “bit, piece, or morsel.” The word loaf (Old English hlaf) was the word for bread, but over time it came to take on its modern meaning (“a portion of bread baked in one mass”), and bread changed to mean the food itself, rather than just a piece of it.


From the French word sauce, which comes from the Latin word salsa, “salted.” (Yes, this is the salsa of chips and salsa.) And salsa comes from the Latin sal, “salt.”

Our word salad also comes from sal, via the Latin infinitive salare (“to salt”) and then the past participle salata (“having been salted”), through Old French salade.

Salami also comes from Latin salare. The definition: “An Italian variety of sausage, highly salted and flavoured.”

And sausage itself is yet another descendant of this prolific root word. It comes from Old Northern French saussiche, from Latin salsicia, from salsus (also “salted,” same word as salsa but with a different ending).

There’s more, and this word nowadays has nothing to do with food (beyond putting it on the table). Salary comes from Latin salarium, “originally money allowed to Roman soldiers for the purchase of salt.”


A “little cabin” or “small room,” which evolved into the sense of a place to store things. “Small room” also meant “private room,” as in a place for advisors to discuss matters, and then the meaning shifted to its current political meaning of referring to the group of advisors themselves.


From Latin biscoctum, meaning “twice baked.” (The “coctum” part is the perfect passive participle of Latin coquo, “to cook.” And yes, that’s where our word “cook” comes from.)

Incidentally, from the 1500s to the 1700s biscuit was spelled “bisket” in English, but apparently the French spelling was more alluring and eventually took over.


From clod. No, really. Someone looked up and thought the clouds looked an awful lot like rocks in the sky, and started calling them clods. A vowel shift later and you have our modern “cloud.”

Reply via email or office hours

Ogham alphabet chart

Apparently I’m in a language-chart-making mood. This time, though, the nerdiness quotient jumps dramatically, with an Ogham alphabet chart. Ogham is a medieval alphabet used for Primitive Irish and Old Irish and a few other languages. Very obscure, but also very cool, as you can see:

It can be written both vertically and horizontally. The red letters are the transliteration (according to manuscript tradition), the grey letters in brackets are the pronunciation, and the italicized words are the names of the letters. Some of the forfeda (the last group) changed meanings over the course of time, so I’ve included both. (I haven’t included pronunciations for the forfeda, though, mostly because none of my source materials did and I didn’t want to assign incorrect values.)

Reply via email or office hours

Greek alphabet chart

Per Dan Hanks’ request, here’s a Greek alphabet chart (in PDF):

(Classical Greek, that is, not modern Greek.)

Reply via email or office hours

Welsh mutations chart

Continuing with the language chart nerdiness, here’s a chart of Welsh mutations (in Welsh, the initial consonant of a word can change based on what comes before it):

Thanks to Kjerste Christensen for feedback on the chart.

Reply via email or office hours

More etymologies

Time for some more etymological fun.


Algorithm comes to us via Old French augorisme, from the medieval Latin algorism-us. (The Spanish word guarismo “digit, cipher” is also related.)

And medieval Latin got it from the name of the Persian mathematician Muḥammad ibn Mūsā al-Khwārizmī, who gave us Arabic numerals and algebra (which comes from al-jabr, from jabara, “to reunite, to restore,” and we got it via the Italian word algèbra).

I should also point out that the ibn in al-Khwārizmī’s name, which means “son,” is related to the Hebrew word ben, whence I get my name — Benjamin means “son of the right hand.”


Nowadays maudlin means something is shallow and sentimental, but originally it meant “given to tears.” Not too hard to see how it got there. The interesting thing, though, is that it came from Magdalene (via some Middle English variants, whence the spelling and pronunciation difference), and the OED says it was “in allusion to depictions of Mary Magdalene weeping.”


Wardrobe comes from the Old French warderobe, a northeastern variant of garderobe. And that meant a locked-up chamber that guards your robes, basically. Which makes sense.


Sur- “above” comes from the Latin super, which also means “above.”

Name is an old word that’s cognate in most of the Indo-European languages (seriously, it’s everywhere: namo in Gothic and Old Saxon, nama in Old Frisian, nōmen in Latin, ὄνομα in Greek, ainm in Old Irish, etc.).

Put them together, and you get surname, which means “additional name” — something added to your first name, whether it be a name (occupational, locational, patronymic, what have you) or a title or epithet, as was more common back in the day (Richard the Lionheart, Alexander the Great, etc.).

Reply via email or office hours

Title etymologies

I thought it’d be fun to go through some titles and look at the histories behind the words (since I seem to be on an etymology kick lately). We’ll start with royalty and then do the Mr./Mrs./etc. group.


It’s a common Germanic word, from the Old English form cyning. Not much beyond that other than that it’s somehow related to kin “race, tribe” (Old English cynn) — so a king was, by extension, a “leader of a tribe.” That connection is dubious but it’s the main theory right now.

Oh, and king is of course related to modern German könig.


Another Germanic word, seen in Old English as cwen along with a bunch of other forms over the years (cwene, kuen, quyene, qwyn, quewne, queine, and quin being just a few, all from the Middle English period when people exhibited the most creativity in spelling I’ve ever seen).

Cwen came from Proto-Indo-European *gwen- (“woman, wife”), akin to Greek γυνή (“woman, wife”), whence we get words like gynecology and misogyny.


From the Old English word hláford, which came from hláfweard. That’s a compound word made up of hláf (“load, bread”) and weard (“keeper”, related to our modern warden and, through French, guard). So it basically meant the breadwinner. That meaning extended to mean a master or ruler, and thence to mean God. (But in Old English they usually used Drihten where we would use Lord.)

Sometime during the Middle English period the word simplified from hláford to just lord.


From the Old English word hlǣfdige, a merger of hláf (“loaf”) and *dīge (“kneader”). The latter isn’t attested elsewhere.

As I was poking around the entries on lady, I found an interesting bit about the word ladybug, by the way. Back in Old English, Hlǣfdige (“Lady”) referred to the Virgin Mary, and its genitive singular form hlæfdigan was often combined with names of other things (plants, etc.) to create “Our Lady’s [plant, etc.]” forms. And thus ladybug means “Our Lady’s bug.” In German they call the ladybug Die Marienkäfer (Mary, obviously), and in the U.K. and elsewhere they refer to it as ladybird.


From Middle English duc, which is from the Latin word dux (“leader, commander”) via French. Dux comes from duco (“to lead, draw”), and that’s where we get words like deduce, produce, conduct, and seduction (“to draw to oneself”), not to mention educate and the ever-awesome duct tape.


From French duchesse, from Latin ducissa (and thence from dux, as with duke, with the feminine -issa ending which we’ll see again in a moment with mistress).


From the Old English word eorl (“brave man, warrior, leader, chief”). The opposite is ceorl (“churl”), which originally just meant a man without rank.


From Early Middle English barun, from Old French barun, from late Latin baro, baronem, which originally just meant “man.”


From Turkish vezīr, which came from the Arabic word wazīr (وزير, originally “porter,” and from there it came to mean “one who bears the burden of government”). And wazīr came from wazara (وزر, third person singular past tense, “he carried”).

Thanks to Andrew Heiss for the Arabic script here.

It’s fascinating, by the way, how most of these titles evolved from words with comparatively low origins.


Shortened form of master and, later, mister. The usual plural is Messrs. (since Mrs. would be confusing to say the least), from French messieurs (plural of monsieur).

Whence master? Latin magister (“master”), with some influence by French words like maître.

Magister comes from the root mag (“great”), where we get the Magna Carta and, also through French (I’m sensing a theme here), Charlemagne. Mag also gives us words like major (“greater”) and majesty.

And of course there’s a Greek equivalent: mag is related to the Greek prefix μεγα-, also meaning “great” or “big,” and that’s where we get words like megabyte and megalomania.

For the heck of it I looked up magic, by the way, and found that it’s from Latin magicus from the Greek μάγος (“member of the Median caste of priests, Magus,” where we get the three Magi). It’s originally from the Old Persian word Maguš, also referring to the Median priests. I think I need to learn more about these priests.


Shortened version of mistress. The plural is Mmes. (from the French mesdames, plural of madame).

Mistress came from master with the -ess suffix (which came from Latin -issa and from Greek -ισσα).


A shortened version of mistress. When it was first used in the early 1600s, it meant “a kept woman, a concubine.” Towards the end of the 1600s it took on its current, more pleasant meaning, as a title for an unmarried woman or girl.


This one didn’t pop up till 1901. As you could have guessed, it’s a merger of Mrs. and Miss as a way to refer to a woman without having to specify her marital status.


A shortened form of sire, with “the shortening being due to the absence of stress before the following name or appellation.” “Sir Lancelot” rolls off the tongue more easily than “Sire Lancelot,” basically.

Sire comes from Latin senior (“older,” the comparative form of senex, “old”). We also get senile and senator from senex. I’ll let you draw your own conclusions from that.


Shortened from madam, from the Old French madame (ma “my” + dame “lady”, similar to Italian’s madonna and Latin’s mea domina).

Speaking of domina and its masculine form dominus, they come from the root dam, dom, which means “to tame, subdue.” Domain and dominion come from this root.

With that meaning of “tame,” you’d think that domesticate was another grandchild word, but it’s actually from another root, dam, dom (yes, it looks the same), which means “to build.” That’s where we get domicile (via Latin domus “house”).

Reply via email or office hours

Cadets and cephaloids

I was browsing through the OED the other day and came across the entry for cadet. Turns out it originally meant a younger son or brother (particularly the youngest son). It also came to mean “a gentleman who entered the army without a commission, to learn the military profession and find a career for himself (as was regularly done by the younger sons of the French nobility before the Revolution),” and, finally, “a student in a military or naval college,” which is how it’s mainly used today. (I should add that in New Zealand it’s used to refer to a young man learning sheep-farming on a sheep-station. Not quite military but still cool.)


Cadet comes from the French cadet (surprising, I know), which comes from the Provençal word capdet, which itself is from the diminutive of the Latin caput (“head”). So it meant “little chief,” or the “inferior head of a family.”

As for the history of caput, it’s related to the Greek κεϕαλή (also meaning “head”), which is where we get words like hypocephalus (as in the Book of Abraham — “under the head”) and encephalitis (ἐγκέϕαλος means “brain,” with the -itis “disease” suffix).

You also get cool words like bicephalous (“two-headed”), cebocephalic (“monkey-headed”), cephalalgy (fancy word for “headache”), cynocephalus (“one of a fabled race of men with dogs’ heads”), ophiocephale (“serpent-headed”), and pachycephalic (“having a very thick skull,” and yes, it also means “thick-headed” and “stupid”). And pachycephalic may remind you of pachyderm (“thick-skinned”), which we use to refer to animals like elephants, rhinos, and hippos.

Getting back to caput, its Latin relatives include capillaris (“of or pertaining to the hair”), Capitolium (the Roman Capitol), praeceps (“headlong, steep,” whence we get precipice), and biceps (“two-headed” or “divided into two parts”).

Reply via email or office hours