Blog Page 1 of 1 (3 posts) :: archive :: feeds

Blogging is low on the priority list at the moment, thanks to school. The preliminary classes for the master’s degree are going well. I’m writing assembly for my computer systems class, and I have to say, I really like assembly. (No sarcasm.) It’s beautiful and simple in a way I didn’t expect. I don’t see myself using it much, but it’s a good tool for the belt.

Oh, with Press, I realized a few days ago that it’s a good candidate for the first implementation of Low Ink (a JSON-based page description language that compiles to PDF). I’ll be re-architecting that part of Press so that it uses Low Ink. Also hoping to finish up the text part of Press (HarfBuzz, etc.) soon so that it’s usable for more than just basic drawing. (I’m dealing with font subsetting and encoding stuff at the moment.)

After a break of several months, I’m getting back to working on Press. Status is pretty much the same as last time I posted about it. (It’s actually even a little more behind than that, since I had HarfBuzz Python bindings working then, but now — after upgrading to macOS Sierra — I’m running into issues with PyGObject’s introspection module. I may end up having to write my own HarfBuzz bindings with CFFI. We’ll see.)

The high-level roadmap right now: get font embedding to work correctly, add support for embedding images (which should be fairly easy, I think), integrate ICU for language analysis and HarfBuzz for shaping, and add color space support.

As of now, I plan to use Press for making language charts (which I’ve been using PlotDevice for) and picture books. Once it’s to the point where I can do that, then I’ll start on Ink (low-level typesetting engine, intended for typesetting books, and higher-level rule-based engine for making it easier to work with).

Making PDFs by hand

I’ve been hand-coding PDFs in Vim, reading the PDF spec to learn how things work. It’s fascinating. My first, extremely simple PDF:

%PDF-1.4
1 0 obj << /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj << /Type /Pages /Kids [3 0 R] /Count 1 >>
endobj
3 0 obj << /Type /Page /Parent 2 0 R /Resources 4 0 R /MediaBox [0 0 500 800] /Contents 6 0 R >>
endobj
4 0 obj << /Font << /F1 5 0 R >> >>
endobj
5 0 obj << /Type /Font /Subtype /Type1 /BaseFont /Helvetica >>
endobj
6 0 obj
<< /Length 44 >>
stream
BT /F1 24 Tf 175 720 Td (Hello World!) Tj ET
endstream
endobj
xref
0 7
0000000000 65535 f
0000000010 00000 n
0000000059 00000 n
0000000116 00000 n
0000000220 00000 n
0000000263 00000 n
0000000333 00000 n
trailer << /Size 7 /Root 1 0 R >>
startxref
427
%%EOF

It’s not as bad as it looks, I promise. (I’m doing PDF 1.4 because CreateSpace doesn’t seem to support higher versions of the spec.)

Anyway, I’ve been reading through chapter 5 of the spec, learning how text works in PDF. I’ve learned how to modify character spacing with Tc, word spacing with Tw, leading with TL, and individual glyph positions with TJ (not sure yet if I can change vertical positioning or not). I’ve also learned how to change the text color. It’s all been fairly straightforward.

As part of this, I’ve used Hex Fiend (an OS X hex editor) to pry apart some simple PDFs I made with PlotDevice, to see how things were encoded. The streams themselves are generally compressed through Flate compression (opposite of deflate, har har), and I found this script to easily decode the streams:

#!/usr/bin/env python

import zlib
import sys

input = sys.argv[1]
output = sys.argv[2]

with open(input, 'rb') as f:
    buffer = f.read()

decomp = zlib.decompress(buffer)

with open(output, 'w') as f:
    f.write(decomp)

I copied each stream in hex from Hex Fiend, pasted it into a file, ran the Python script on it, and it would output decoded text to a new file.

Things I don’t know/understand yet, which are legion:

  • How to encode Unicode (I’m not to this point of the spec yet, but I believe it involves CID fonts and using cmaps to map glyph codes or something like that).
  • How to take a font name and, in a cross-platform way, get the path to the font file so I can embed it and also use it with HarfBuzz.
  • How to take the output of HarfBuzz (a list of glyphs with position coordinates for each) and use that in positioning the glyphs in the PDF. I believe HarfBuzz will handle parsing the OpenType features of the font, but I’m not positive on that. I did get HarfBuzz Python bindings working, though, and I plan to play around with it soon.
  • Whether I need to use FreeType at all. I might need it for font metrics, but HarfBuzz might give me everything I need there.
  • When typesetting multiple lines, I don’t know whether it’s best to use the PDF built-in support (T* and TL and such), or to set each line manually as its own text object. The built-in support seems better, though I don’t know if that limits what’s possible.

At some point soon — I think when I start embedding fonts — doing this by hand in Vim will stop being as feasible, and at that point I’ll start writing Python to manage the PDF creation process for me. For now, though, it’s easier to just edit the PDF manually.