[XeTeX] how to do (better) searchable PDFs in xelatex?

Peter Baker psb6m at virginia.edu
Mon Oct 15 16:19:41 CEST 2012


Here's an example file:

%&program=xelatex
%&encoding=UTF-8 Unicode
\documentclass{book}
\usepackage[silent]{fontspec}
\usepackage{xltxtra}
\setromanfont{Junicode}
\begin{document}
\noindent You can search for these:

\noindent first flat office afflict\\

\noindent But you cannot search for these:

\noindent after fifty front\\

\noindent You can search for these words because small caps have been 
moved out
of the PUA in recent versions of Junicode:

\noindent\textsc{first flat office afflict after fifty front}
\end{document}

Here's a link to an uncompressed (using pdftk) PDF:

https://dl.dropbox.com/u/35611549/test_uncompressed.pdf

I honestly have no idea what I'm looking at when I open that in Emacs. 
Here is info about the Junicode ligatures that can't be searched:

glyph name f_t, encoding U+EECB
glyph name f_t_y, encoding U+EED0
glyph name f_r, encoding U+EECA

Small caps are named like "a.sc" and they are unencoded. The font is 
generated by FontForge. The PDF is generated by XeTeX (XeLaTeX 
actually). I don't know if another program (e.g. LuaTeX) would yield 
different results.

Peter

On 10/14/12 10:56 PM, Ross Moore wrote:
> Any chance of providing example PDFs of this? (preferably using 
> uncompressed streams, to more easily examine the raw PDF content) Do 
> the documents also have CMap resources for the fonts, or is the sole 
> means of identifying the meaning of the ligature characters coming 
> from their names only? Have these difficulties been reported to Adobe 
> recently? If not, would you mind me doing so? 


More information about the XeTeX mailing list