[XeTeX] Ligatures and searching in PDFs

David J. Perry hospes02 at scholarsfonts.net
Thu Jun 10 02:17:18 CEST 2010


Scripsit Gareth:

> What is more, I do a lot of work with Syriac, a cursive script for which
> most joined shapes are encoded in the PUA or somewhere that's going
> spare. This means that my XeTeX PDFs aren't searchable or copyable in
> Syriac. Only one or two Syriac letters per word can be searched or copied.
I am curious; are you using standard Unicode Syriac fonts?  In such fonts, 
there is no need for, nor should there be, PUA assignments for the joined 
shapes.  (And any font whose maker puts joined shapes "somewhere that's 
going to spare" needs to go back to Unicode 101 and learn some good 
practices.  There is no such place in Unicode and putting one's private 
characters in codepoints marked reserved or used for other scripts is really 
bad.)  I just looked at the Estrangelo Edessa font and it (correctly) has no 
PUA assignments for other than the isolated shapes.  (If you are using older 
fonts, created before Syriac was supported in Unicode, of course there will 
be all sorts of nonstandard things.  But we can't use those to judge whether 
XeTeX is doing the right thing.)

Another fundamental question is whether Adobe even claims that rtl or mixed 
directional text can be searched or copied correctly from a PDF.  I did some 
googling on RTL support in PDFs and didn't really find an answer.  But the 
overall support for RTL in PDF seems pretty spotty, which is perhaps not 
surprising given Adobe's track record with RTL in other products such as 
InDesign.  So the non-searchable PDFs may not be the fault of XeTeX.  If you 
or anyone else knows the answer, please let us know--I agree with you 
completely that it is an important issue.

David 



More information about the XeTeX mailing list