Home / Blog Menu ↓

Blog: #ink

Update on Press (the PDF compiler). I haven’t worked on it at all lately, but I wanted to document its current state for history’s sake, and as part of working in public. (I’ve also been sitting on this post for over a year.)

Back in 2017 I did end up re-architecting Press to use Low Ink as an intermediate page description language. In the process, Low Ink changed from a JSON-based idea to this:

:page 11x8.5in
:bleedbox x=0.125in y=0.125in w=5.75in h=8.75in
:fontmap family=helv weight=regular style=normal standard=Helvetica
:yinvert
:push
:translate x=72 y=72

# ascender
:push
:translate x=0 y=1040
:strokecolor hex=#999
:linewidth 0.25pt
:line x1=0 y1=0 x2=1080 y2=0
:stroke
:push
:fillcolor hex=#999
:font family=helv size=14pt
:text x=1085 y=-3 text="ascender"
:pop
:pop

# filled glyph
:push
:translate x=1320 y=240
:fillcolor hex=#000
:moveto x=0 y=0
:pathto x=400 y=300 cx1=120 cy1=300 cx2=140 cy2=300
:pathto x=320 y=200 cx1=540 cy1=300 cx2=320 cy2=180
:lineto x=350 y=350
:lineto x=450 y=250
:lineto x=150 y=0
:moveto x=200 y=200
:lineto x=200 y=250
:lineto x=250 y=250
:lineto x=250 y=200
:lineto x=200 y=200
:fill
:pop

It was intended to be a fairly low-level wrapper on the PDF format, with the idea being that other libraries/apps would provide more ergonomic abstractions on top of it.

I initially used Python because Press started out as a library, but with the pivot to a compiler model, I think Go or Rust would probably end up being a better choice (Rust would make integrating HarfBuzz a bit easier, at any rate).

Potential improvements

To my 2021 eyes, the language design isn’t particularly elegant. I like that the parameters are named (clarity), but for most of the commands there aren’t actually that many parameters, because many of the settings that would normally be parameters are separate commands. For parameters that are clearly unambiguous, the names hamper readability. For example, I think something like this might be better:

:line 0,0 to 1080,0
:fillcolor #345

I’ve also thought that push and pop could potentially be clearer as curly braces, and that the initial colons aren’t really necessary:

{
  translate 0,1040
  strokecolor #999
  linewidth 0.25pt
  line 0,0 to 1080,0
  stroke

  {
    fillcolor #999
    font 14pt helv
    text 1085,-3 "ascender"
  }
}

The future

My initial reason for building Press was to have an easy, programmable cross-platform way to create language chart PDFs (so I could move away from PlotDevice/DrawBot), and what I’ve realized (acknowledging that I haven’t really been making language charts in recent years) is that there are some other, better options now.

One that seems decent is SVG, converted to PDF by way of Inkscape. Initial tests here seem like it would probably work fine.

Another promising option that I admittedly haven’t looked into very much yet is Paged.js. HTML and CSS are already great for declarative typesetting, and the more I’ve thought about programmatic typesetting, the more this model seems to be the future I’d want to work with (and not just because of parity with web, though that makes it much more compelling).

tl;dr I don’t see myself continuing on with Press, so we may as well call a mortem on it.


Reply via email or office hours

Some quick thoughts about the project space I see myself working in (meaning personal coding projects that aren’t the productivity tools I mentioned before), both now and for the foreseeable future. To be honest, it’s mostly a roadmap for myself, posted here as part of working in public.

Bookmaking tools

One of the areas in the project space is bookmaking tools: tools that help with making either print books or ebooks. What I’ve worked on in that area (and some of these are still in progress or in the future):

  • Press — low-level typesetting (PDF compiler)
  • Ink — higher-level typesetting
  • Curves — programmatic type design
  • Typlate — type design templates
  • md2epub/Caxton — ebook compiler
  • epubdiff — ebook differ
  • Fledge — text processing shell
  • Storybook — writing tool (covered under the productivity tools, yes, but I feel it fits in here)

Creativity tools

The next area, somewhat related, is creativity tools: tools for making art, music, etc. I do realize that there’s a bit of overlap between the two areas — art can be used in books, for example. This is not a rigorous taxonomy.

What I’ve worked on:

  • Trill — music composition REPL
  • Grain — command-line tool for texturing art

While I haven’t done much in this area so far, the intersection of software and art has been calling to me more lately. I expect creativity tools to become much more of a focus for me, probably even more so than the bookmaking tools.

Human-Computer Interaction

Last but not least, HCI. My master’s thesis is in this area, and much of my other work also touches on it in limited ways. (What I mean by that, I think, is that with projects like Trill, Curves, and Press, the parts that have most interested me are the interfaces. Also, those interfaces have been textual in these particular cases, but I’m also interested in other kinds of UIs.) So I plan to start building more proofs of concept and interface experiments — like the spatial interface ideas I mentioned several weeks ago.


Reply via email or office hours

Blogging is low on the priority list at the moment, thanks to school. The preliminary classes for the master’s degree are going well. I’m writing assembly for my computer systems class, and I have to say, I really like assembly. (No sarcasm.) It’s beautiful and simple in a way I didn’t expect. I don’t see myself using it much, but it’s a good tool for the belt.

Oh, with Press, I realized a few days ago that it’s a good candidate for the first implementation of Low Ink (a JSON-based page description language that compiles to PDF). I’ll be re-architecting that part of Press so that it uses Low Ink. Also hoping to finish up the text part of Press (HarfBuzz, etc.) soon so that it’s usable for more than just basic drawing. (I’m dealing with font subsetting and encoding stuff at the moment.)


Reply via email or office hours

Rather than starting work on Ink with the low-level typesetting engine, I’m thinking it’ll be worthwhile instead to start with a processor that goes through Ink rules and outputs TeX and/or SILE code. More to come later.


Reply via email or office hours

Rule-based typesetting with Ink

The plan for Ink took a bit of a turn a few nights ago. Erlang’s pattern matching was on my mind (having read about it earlier that evening) when I came across a passage from Mitchell’s Book Typography on house rules:

The following are examples of the authors’ own house rules:

  • Speech to be indicated by single quotation marks (‘quote’ not “quote”)
  • Circa shortened to italic c. with no word-space (c.1895 not c. 1895)
  • Use multiplication symbol, not ‘x’ for dimensions (24 × 36 not 24 x 36)
  • Letter-space strings of capital letters (ABCD not ABCD)

The two ideas came together and I saw that declarative typesetting (rule-based typesetting) could be a much nicer way to typeset.

For example: if you want to use the multiplication symbol for dimensions (× instead of x), you usually have to edit your source file (InDesign, TeX, etc.), find matching instances, and change them. It’s a one-time thing, a permanent transformation.

If, on the other hand, you had a rule that said “find any ‘x’ characters between numbers and transform them to ‘×’), then you could leave your source file alone and let the rule do the work for you instead. Using rules like this — textual and stylistic transformations applied at compile time — seems far more reusable, shareable, and easier to use.

From there, the Ink language morphed almost completely from how I was envisioning it earlier (TeX with nicer syntax, basically) to this new thing, inspired by Erlang, XSLT/XPath, CSS, Inform, and more.

A few quick notes before we get to the examples:

  • I’m leaning very much towards a template/data separation, like in Django and Mustache and other template engines popular in web frameworks.
  • For this rule-based thing to work well, you have to be able to set general rules but also fix specific cases where the rule doesn’t apply, or where it doesn’t make sense to write a general rule. At the moment I’m leaning towards having those specific fixes be rules as well, rather than tagging the source file. See the second rule listed in Exceptions below for an initial stab at this idea.
  • Ink will be three languages — High Ink (or just Ink), which is the rule-based language shown below; Medium Ink, a tagged version of the source text with all the rules applied; and Low Ink (har har), a page description language that gets compiled to PDF.
  • Splitting it up like this allows for extra flexibility — it would be relatively easy, for example, to write a compiler that takes Medium Ink and outputs HTML/CSS or EPUB or what have you. I don’t know that that would actually be a good idea, but it’s more possible this way. Splitting it up also makes it more manageable.
  • Rephrased, the Ink-to-Medium-Ink compilation involves applying the rules intelligently. Medium-Ink-to-Low-Ink compilation involves the typesetting itself — line breaks, page breaks, etc. Low-Ink-to-PDF compilation will be easiest, translating the Low Ink code to PDF code.
  • This morning I came across Jon Gold’s post on declarative design tools, with somewhat similar ideas. I like the direction he’s gone in with the combinations — it’s a nice workflow. We could do something in that vein here, with syntax to output a bunch of variations (typefaces, sizes, leading, etc.) with minimal effort.

Examples (gist)

Note: these are all first-draft thoughts on how to do this kind of a thing. Syntax is very much not set in stone at all — rule/endrule vs. rule { }, selector syntax, whether to use regular expressions or something simpler, etc.

rule
    size 6x9";
    # alternates
    size letter;
    size 210x297mm;
    size a4;

    font Arno Pro, 10/13pt;

    margin 1";
    inner-margin .75"; # overrides earlier margin value
endrule

# Named rule (for use later)
rule @times
    find \dx\d : replace \1 × \1
endrule @times

rule @year-labels
    # Find "a.d." and turn on the smcp OpenType feature
    find a.d. : feature smcp;
    # alternate way
    find [b.c. | b.c.e. | c.e. | a.d.] : feature smcp;
endrule

# Exceptions
rule
    # If a.d. is found in heading style text, don't run @year-labels on it
    find a.d. (style=heading) : ignore @year-labels;

    # Find a specific "m.a.d." and don't run @year-labels on it
    # This selector language needs a lot of work
    find /chapter:4/paragraph:2/word:[m.a.d.] : ignore @year-labels;

    # Usually, though, you'd want to revise the general rule like this
    # Only run the rule if it's by itself (word boundaries) and not a heading
    find \wa.d.\w (style!=heading) : feature smcp;
endrule

rule @paragraph-indents
    # Indent first line of all paragraphs 1.25em;
    find %paragraph : initial-indent 1.25em;

    # Override for first paragraph of a section/chapter
    find %paragraph:nth(1) : initial-indent 0;

    # Paragraphs following tables aren't indented
    find %table + %paragraph : initial-indent 0;

    # Hanging indents for paragraphs with hanging tag
    find %paragraph.hanging : hanging-indent .125";
endrule

# Tracking/kerning
rule
    # Find sequential uppercase and bump tracking up
    find [A-Z]+ | tracking 50;

    # Find V followed by a and kern -25
    find Va | kern -25;
endrule

# Coptic (named Unicode range)
range @coptic
    u+2c80 .. u+2cff;
    u+03e2 .. u+03ee;
    u+03ef; # separated for demo purposes
endrange

rule @coptic-font
    # Any characters in the Coptic range should be set in Antinoou
    find range @coptic : font Antinoou, 24/28pt, dlig;
endrule

# Unicode properties
rule @numbers
    # Replace any numbers with old-style figures
    find [ unicode.Nd | unicode.No ] : feature onum;
endrule

# Masters
master @a
    frame ...; # incomplete, but this part would have text frames
endmaster

# Apply masters
rule
    # All pages get master @a by default
    page * : master @a;

    # Remove master for pages i-iv
    page i-iv : master none;

    # Ignore @paragraph-indents rule on page 9
    page 9 : ignore @paragraph-indents;

    # Set data to be put into masters
    data @main, @index;

    # Variable used for running heads
    $title War and Peace;

    # Set running heads
    # (@inside-header etc. are frames in the master)
    page.odd : @inside-header $pagenum;
    page.odd : @outside-header $section; # set in data
    page.even : @center-header $title;
endrule

# Data
data @main
    transform @ingest;
    include preface.txt;
    include chapters/chapter*.txt;
enddata

data @index
    include index.json;
enddata

# Transform
# Used for an initial transform if necessary
transform @ingest (text)
    // JavaScript or other scripting language
    // These aren't great examples, though
    var response = text.replace(/CHAPTER/, "\n\nChapter");
    response = response.replace(/\wteh\w/, "the");
    return response;
endtransform

# Styles
# I don't have an example yet of how to use this, but imagine
# something ala InDesign or CSS
style @heading1
    font Warnock Pro;
    size 18/24pt;
    onum;
    -smcp; # turn off small caps
    space-before 1.24em;
    space-after 1.24em;
endstyle

Going forward

I’m open to feedback on all of this, of course, so feel free to comment or get in touch with me.


Reply via email or office hours

I’ve renamed inkpdf to Press (as in printing press).

I reached the point where creating the PDF manually is no longer feasible, so I’ve been working on getting Press to a point where I can implement the PDF generation. The basic structure is in place, sans the PDF part. (That’s next.)

Here’s what a Press script looks like right now:

from press import Press

p = Press('output.pdf', width=6*Press.INCH, height=11*Press.INCH,
          margin=1*Press.INCH)

# Horizontal borders at top and bottom of page
p.stroke('#000')
p.pen(1.0)
p.line(p.page_min_x, p.page_min_y, p.page_max_x, p.page_min_y)
p.line(p.page_max_x, p.page_min_y, p.page_max_x, p.page_max_y)

# Page 2
p.page(2)
p.layer('base')
p.stroke(rgb=(1, 0, 0))
p.line(150, 150, 300, 300)
p.layer('fg')
p.stroke(hsl=(0, 0.5, 0.8))
p.line(300, 300, 450, 150)

# Go back and add another line to page 1
p.page(1)
p.stroke('#025')
p.line(p.page_min_x, p.page_min_y, p.page_min_x, p.page_max_y)

p.save() # this doesn't work yet

You can also do something like this:

with Press('output2.pdf', size=Press.LETTER,
           margin=(0.5*Press.INCH, 1.0*Press.INCH),
           inner_margin=0.5*Press.INCH,
           outer_margin=1.25*Press.INCH,
           bleed=.125*Press.INCH) as p:
    p.line(50, 50, 250, 50)
    # And so on

(Context manager, inner/outer margin, bleed, built-in paper sizes.)

Up next: adding more primitives, designing the font selection mechanism, getting it to generate an actual PDF, embedding fonts, using arbitrary Unicode code points, integrating HarfBuzz, etc.


Reply via email or office hours

Making PDFs by hand

I’ve been hand-coding PDFs in Vim, reading the PDF spec to learn how things work. It’s fascinating. My first, extremely simple PDF:

%PDF-1.4
1 0 obj << /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj << /Type /Pages /Kids [3 0 R] /Count 1 >>
endobj
3 0 obj << /Type /Page /Parent 2 0 R /Resources 4 0 R /MediaBox [0 0 500 800] /Contents 6 0 R >>
endobj
4 0 obj << /Font << /F1 5 0 R >> >>
endobj
5 0 obj << /Type /Font /Subtype /Type1 /BaseFont /Helvetica >>
endobj
6 0 obj
<< /Length 44 >>
stream
BT /F1 24 Tf 175 720 Td (Hello World!) Tj ET
endstream
endobj
xref
0 7
0000000000 65535 f
0000000010 00000 n
0000000059 00000 n
0000000116 00000 n
0000000220 00000 n
0000000263 00000 n
0000000333 00000 n
trailer << /Size 7 /Root 1 0 R >>
startxref
427
%%EOF

It’s not as bad as it looks, I promise. (I’m doing PDF 1.4 because CreateSpace doesn’t seem to support higher versions of the spec.)

Anyway, I’ve been reading through chapter 5 of the spec, learning how text works in PDF. I’ve learned how to modify character spacing with Tc, word spacing with Tw, leading with TL, and individual glyph positions with TJ (not sure yet if I can change vertical positioning or not). I’ve also learned how to change the text color. It’s all been fairly straightforward.

As part of this, I’ve used Hex Fiend (an OS X hex editor) to pry apart some simple PDFs I made with PlotDevice, to see how things were encoded. The streams themselves are generally compressed through Flate compression (opposite of deflate, har har), and I found this script to easily decode the streams:

#!/usr/bin/env python

import zlib
import sys

input = sys.argv[1]
output = sys.argv[2]

with open(input, 'rb') as f:
    buffer = f.read()

decomp = zlib.decompress(buffer)

with open(output, 'w') as f:
    f.write(decomp)

I copied each stream in hex from Hex Fiend, pasted it into a file, ran the Python script on it, and it would output decoded text to a new file.

Things I don’t know/understand yet, which are legion:

  • How to encode Unicode (I’m not to this point of the spec yet, but I believe it involves CID fonts and using cmaps to map glyph codes or something like that).
  • How to take a font name and, in a cross-platform way, get the path to the font file so I can embed it and also use it with HarfBuzz.
  • How to take the output of HarfBuzz (a list of glyphs with position coordinates for each) and use that in positioning the glyphs in the PDF. I believe HarfBuzz will handle parsing the OpenType features of the font, but I’m not positive on that. I did get HarfBuzz Python bindings working, though, and I plan to play around with it soon.
  • Whether I need to use FreeType at all. I might need it for font metrics, but HarfBuzz might give me everything I need there.
  • When typesetting multiple lines, I don’t know whether it’s best to use the PDF built-in support (T* and TL and such), or to set each line manually as its own text object. The built-in support seems better, though I don’t know if that limits what’s possible.

At some point soon — I think when I start embedding fonts — doing this by hand in Vim will stop being as feasible, and at that point I’ll start writing Python to manage the PDF creation process for me. For now, though, it’s easier to just edit the PDF manually.


Reply via email or office hours

Ink

As mentioned on Twitter, I’ve decided to write my own typesetting engine, called Ink. Apparently I’m crazy.

The details are still very much in the air, but here are some quick notes:

  • Written in Rust (for speed)
  • Programmatic (sort of like TeX)
  • Scripting language for extensibility (JavaScript or Lua or Python)
  • Intended for use in typesetting book interiors, covers, and charts
  • Possibly some kind of template/data division
  • Full OpenType feature support (shaping via HarfBuzz)
  • Custom PDF generation library (inkpdf)

Reasons for doing this insane thing:

  • PlotDevice only runs on OS X and I want the source of my language charts to be usable on other platforms.
  • I’d like to open source the books I typeset, so InDesign isn’t a great solution.
  • TeX is powerful and well-seasoned and all, but it’s not exactly pleasant or easy to work with, especially for the kind of stuff I do.
  • I’ll learn a lot and have fun while I’m at it.

The initial roadmap, not necessarily in order:

  • Write inkpdf in Python (which I think will be a better fit for the charts anyway)
  • Get familiar with HarfBuzz
  • Learn Rust
  • Port inkpdf to Rust
  • Plan out the Ink language (I’ve started on this and it’s looking promising)
  • Figure out how scripting is going to work and embed the interpreter

I’ll document the process on this blog, of course. First steps: reading the PDF spec and figuring out how to make PDFs by hand.

(For those who’ve been reading for a while, Ink was also the name of my static blog engine. That’s now ink-static, and at some point I’ll either retire it completely or change the name to something unrelated.)


Reply via email or office hours