[XeTeX] Ligatures and searching in PDFs

Janusz S. Bień jsbien at mimuw.edu.pl
Mon May 10 09:36:33 CEST 2010


On Mon, 10 May 2010  Paul Foley <paul at mises.com> wrote:

> 1.  (*) text/plain          ( ) text/html           
>
> Try the following:
>
> \documentclass{article}
> \usepackage{xltxtra}
> \setmainfont[Mapping=tex-text,Numbers=OldStyle,Ligatures={Required,Common,Rare}]{Junicode}
>
> \begin{document}
> Fifty afflicted fjords.
> \end{document}
>
> Load the PDF, and search for any of the words.
>
> The "fty", "ct" and "fj" ligatures aren't in Unicode, and the private-use
> characters obviously can't be decomposed by the PDF viewer.  The same
> problem will obviously occur for variant letter shapes, old-style digits,
> etc.
>
> But scanned documents in PDF often have an invisible text layer attached
> which can be searched, etc.; is it possible to use the same technique to put
> the decomposed letters over the visible private-use characters, so that
> documents remain searchable (and copy/paste-able)?

The proper solution would be to use /ActualText feature of the PDF
specification.

Best regards

Janusz

-- 
                     ,   
dr hab. Janusz S. Bien, prof. UW -  Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - Warsaw University (Department of Formal Linguistics)
jsbien at uw.edu.pl, jsbien at mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/


More information about the XeTeX mailing list