[XeTeX] Ligatures and searching in PDFs

Joel C. Salomon joelcsalomon at gmail.com
Sun May 16 21:02:17 CEST 2010


On 05/10/2010 03:36 AM, Janusz S. Bień wrote:
> On Mon, 10 May 2010  Paul Foley <paul at mises.com> wrote:
>> Try the following:
>>
>> \documentclass{article}
>> \usepackage{xltxtra}
>> \setmainfont[Mapping=tex-text,Numbers=OldStyle,Ligatures={Required,Common,Rare}]{Junicode}
>>
>> \begin{document}
>> Fifty afflicted fjords.
>> \end{document}
>>
>> Load the PDF, and search for any of the words.
>>
>> The "fty", "ct" and "fj" ligatures aren't in Unicode, and the private-use
>> characters obviously can't be decomposed by the PDF viewer.  The same
>> problem will obviously occur for variant letter shapes, old-style digits,
>> etc.
> 
> The proper solution would be to use /ActualText feature of the PDF
> specification.

IIRC, the proper solution is for the font to have an OpenType table that
links arbitrary ligature glyphs to the character string they represent
(ligature decomposition).  If the (e.g.) “fty” ligature has been
(improperly) encoded in the Unicode PUA that will make this solution harder.

—Joel Salomon


More information about the XeTeX mailing list