[XeTeX] Ligatures and searching in PDFs
Diederick C. Niehorster
dcnieho at gmail.com
Tue Jun 1 20:12:06 CEST 2010
Hi Andy,
Thanks a lot for your post, this is very useful!
One ting I'm wondering about: not all of the fonts I use always have
all those ligatures. From what I understand from you, can't check
right now, glyphs will be replaced usgin the mapping regardless of
glyph availability, which would lead to missing glyphs in the docuemnt
if not available.
Would it therefore make more sense to put these mappings in a separate
file and load that mapping as well when required? Can multiple
mappings be loaded?
Can't try this out for a while and guess I'm not the only one who
thinks of this, so just posting the question to list.
Thanks a lot again for your post!
Best,
Dee
On Wed, Jun 2, 2010 at 1:19 AM, Andy Lin <kiryen at gmail.com> wrote:
> Sorry to revive this topic, but I think I've found a solution.
>
> The original post described a problem when using the rare ligatures
> (e.g. "fty") in the Junicode font, in that the strings could not be
> found by their decomposed characters. At the time, it was suggested
> the /ActualText PDF feature would be useful, but no implementation was
> given.
>
> I'll save the details for how I stumbled onto the solution for another
> time, but here's the result:
>
> There are two ways about this: font encoding and text mapping. If you
> have any Adobe OpenType fonts, you might have noticed that the ffi and
> ffl ligatures can be copied from a PDF intact, but the fi and fl
> ligatures will show up as ??. On the other hand, if you use Latin
> Modern, you will not encounter any problem of the sort. This is
> because the font tables in LM were done properly.
>
> If your font does not have the proper tables, you can supplement them
> with a TECkit mapping, which are quite powerful. (I posted in Sept '09
> about using them for Inuktitut syllabary-romanization conversion, and
> I've also used them for Persian script-transliteration conversion.)
> You've probably used Mapping=tex-text at some point, and the solution
> I'm proposing requires you to just add a couple of lines to the
> tex-text.map file and compile it (you may wish to make a copy and make
> changes to that).
>
> When you open the tex-text.map file (in \fonts\misc\xetex\fontmapping
> for miktex portable), you'll see mappings from individual characters
> to composed unicode glyphs, for example:
> ; ligatures from Knuth's original CMR fonts
> U+002D U+002D <> U+2013 ; -- -> en dash
> U+002D U+002D U+002D <> U+2014 ; --- -> em dash
>
> In order to make the common f/ff ligatures searchable in PDFs, add the
> following lines and compile the map file with teckit_compile (should
> be in the bin folder):
> U+0066 U+0066 <> U+FB00 ; ff -> ff ligature
> U+0066 U+0069 <> U+FB01 ; fi -> fi ligature
> U+0066 U+006C <> U+FB02 ; fl -> fl ligature
> U+0066 U+0066 U+0069 <> U+FB03 ; ffi -> ffi ligature
> U+0066 U+0066 U+006C <> U+FB04 ; ffl -> ffl ligature
>
> I've attached such a map file and the resulting tec file for those who
> aren't interested in the nitty-gritty. Simply drop these into the
> fonts\misc\xetex\fontmapping folder and run texhash/mktexlsr.
>
> BTW, when you use this teckit mapping for ligatures, it bypasses the
> OpenType ligature setting, i.e. you can't turn them off unless you use
> a different mapping. And it won't check to see if your font has the
> required glyphs. However, it does allow you to easily access ligatures
> in fonts that don't have an OT ligature table (e.g. Times New Roman
> and Georgia, which is why I made the map file in the first place).
>
> Hope someone will find this useful.
>
> -Andy Lin
>
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
> http://tug.org/mailman/listinfo/xetex
>
>
More information about the XeTeX
mailing list