[XeTeX] Ligatures and searching in PDFs

Andy Lin kiryen at gmail.com
Tue Jun 8 03:39:39 CEST 2010


It seems I misunderstood what exactly the TECkit mapping does. All it
does is change the input as instructed. All other "features" --
copy/paste and search compatibility -- I'd assumed was attributed to
TECkit is actually that of the PDF reader (in my case, Adobe Reader).

So, when Adobe Reader encounters the f-ligature, it knows to treat it
as 'f' and another character; they have specific Unicode code points
and thus any program can decompose them if they need to. However, the
'ch' and 'Th' ligatures in Linux Libertine are in the Private Use
Area, which are, by definition, non-standard, so they cannot be
anticipated by a PDF reader.

Now, I'm assuming it's possible to make these ligatures
copy/paste/search-able, just as it's possible to make small caps
searchable (although Charis SIL is the only I've found that's managed
it), but TECkit is not the way to do it. All TECkit does is take the
input, modify it based on the mapping, and pass the result to the
font/type engine without any additional information.

The reason why the TECkit mapping worked for the fonts I mentioned in
my previous post is because they had the ligatures at both the
standard Unicode codepoint and in the PUA, but for whatever reason,
had their ligature tables point to the PUA glyph. At least, I think
that's what was happening.

If I am mistaken, please correct me.

-Andy Lin

> I had noticed that the ligatures 'ch' and 'Th' are not searchable in
> Linux Libertine. I added the following mappings:
> U+0063 U+0068   <>      U+E03B  ; ch -> ch ligature
> U+0054 U+0068   <>      U+E049  ; Th -> Th ligature
> But these do not make it possible to search or copy/paste as uncompiled.
> The .tec file is compiled correctly and XeTeX finds it. Any thoughts?



More information about the XeTeX mailing list