[XeTeX] how to do (better) searchable PDFs in xelatex?
Peter Baker
psb6m at virginia.edu
Mon Oct 15 16:19:41 CEST 2012
Here's an example file:
%&program=xelatex
%&encoding=UTF-8 Unicode
\documentclass{book}
\usepackage[silent]{fontspec}
\usepackage{xltxtra}
\setromanfont{Junicode}
\begin{document}
\noindent You can search for these:
\noindent first flat office afflict\\
\noindent But you cannot search for these:
\noindent after fifty front\\
\noindent You can search for these words because small caps have been
moved out
of the PUA in recent versions of Junicode:
\noindent\textsc{first flat office afflict after fifty front}
\end{document}
Here's a link to an uncompressed (using pdftk) PDF:
https://dl.dropbox.com/u/35611549/test_uncompressed.pdf
I honestly have no idea what I'm looking at when I open that in Emacs.
Here is info about the Junicode ligatures that can't be searched:
glyph name f_t, encoding U+EECB
glyph name f_t_y, encoding U+EED0
glyph name f_r, encoding U+EECA
Small caps are named like "a.sc" and they are unencoded. The font is
generated by FontForge. The PDF is generated by XeTeX (XeLaTeX
actually). I don't know if another program (e.g. LuaTeX) would yield
different results.
Peter
On 10/14/12 10:56 PM, Ross Moore wrote:
> Any chance of providing example PDFs of this? (preferably using
> uncompressed streams, to more easily examine the raw PDF content) Do
> the documents also have CMap resources for the fonts, or is the sole
> means of identifying the meaning of the ligature characters coming
> from their names only? Have these difficulties been reported to Adobe
> recently? If not, would you mind me doing so?
More information about the XeTeX
mailing list