[XeTeX] how to do (better) searchable PDFs in xelatex?

Peter Baker psb6m at virginia.edu
Mon Oct 15 16:19:41 CEST 2012

Here's an example file:

%&encoding=UTF-8 Unicode
\noindent You can search for these:

\noindent first flat office afflict\\

\noindent But you cannot search for these:

\noindent after fifty front\\

\noindent You can search for these words because small caps have been 
moved out
of the PUA in recent versions of Junicode:

\noindent\textsc{first flat office afflict after fifty front}

Here's a link to an uncompressed (using pdftk) PDF:


I honestly have no idea what I'm looking at when I open that in Emacs. 
Here is info about the Junicode ligatures that can't be searched:

glyph name f_t, encoding U+EECB
glyph name f_t_y, encoding U+EED0
glyph name f_r, encoding U+EECA

Small caps are named like "a.sc" and they are unencoded. The font is 
generated by FontForge. The PDF is generated by XeTeX (XeLaTeX 
actually). I don't know if another program (e.g. LuaTeX) would yield 
different results.


On 10/14/12 10:56 PM, Ross Moore wrote:
> Any chance of providing example PDFs of this? (preferably using 
> uncompressed streams, to more easily examine the raw PDF content) Do 
> the documents also have CMap resources for the fonts, or is the sole 
> means of identifying the meaning of the ligature characters coming 
> from their names only? Have these difficulties been reported to Adobe 
> recently? If not, would you mind me doing so? 

More information about the XeTeX mailing list