Ben Crowder / Blog

Blog: #ipython

Genealogy notebook proof of concept

Ever since seeing the IPython notebook, I’ve been thinking about how its notebook idea would be great for genealogical research. So I put together a (very rough) proof of concept:

Some notes:

  1. A text-based query interface (the queries are in purple) to get information from a database. I don’t know that the best query language would be natural language like this — it’s more just to get the idea across — but the important thing is being able to easily do these types of queries against the genealogical information you’ve stored, enabling all sorts of analysis. “Who in my tree is born more than nine months after their father died?” “Who got married when they were younger than 12 years old?” “Who are all the children who died when they were younger than eight years old?” And so on.
  2. Interleaving queries, their results, and other text (using Markdown, of course). This is the core notebook idea. Or you can look at it as an annotated transcript of a query session. Rather than just having a list of queries and their results, you sort of embed them in between your writing about the research. (Similar to literate programming.) For me, writing things out helps me to see what I think and wrap my head around the research.
  3. There’s a lot of room for data visualization here. I’ve only shown pedigree charts and some basic tables, but it’d be easy enough to have a query return any other type of chart — bar, pie, fan, you name it. Including interactive things like the family analysis tool.
  4. I didn’t include this in the mockup (because I, uh, just barely thought of it), but you could extend the query language to support hypotheticals. “Show me what William Crowder’s family would look like if he and Sarah got married in 1820” or “From now on, pretend Samuel Shinn was born in 1860.” And then following queries would act as if that were true. It’s nice to be able to establish a few suppositions and then see what the ramifications would be, whether they’re plausible, etc. Or to compare two different hypotheses.
  5. Which brings us to the purpose of all this: to have a good place to do the rough, messy side of genealogical research. Thinking through things, seeing what comes of it, and keeping a record of the journey, basically.
  6. Since the information in the database would be subject to change (as you do more research), it would probably be good to cache the results. Or maybe highlight the queries whose results have changed when you reload a notebook. Personally, I lean more towards caching, so that the notebook accurately represents the database at the time it was written (making it a valuable historical record of your research), but you could also look at it as a living document that updates automatically when the background data is updated.

Anyway, the proof of concept is just a static HTML page. I’m not planning to do anything more with it, but I wanted to get the idea out there.


Reply via email or office hours