[XeTeX] Res: small caps not searcheable

Peter Dyballa Peter_Dyballa at Web.DE
Tue Aug 4 17:46:28 CEST 2009

Am 04.08.2009 um 13:14 schrieb Flavio Costa:

> Do you know why Computer Modern works as expected?

Because it's not Unicode. Nowhere.

> "If you want to be able to search small capitals, then use only the  
> faulty ones. Or add a CMAP that maps them into the Basic Latin (or  
> some other appropriate) block."
> What do you mean by "use only the faulty ones"?

Artificial small caps (like from MS Word).

> From what I've been reading yesterday, new Adobe font do not encode  
> small caps in the PUA anymore, they make it accessible only via  
> OpenType Layout features.

This should be the standard now with OT fonts. (Besides, the glyphs  
have to stored somewhere, and the PUA is a perfect physical space for  
everything not [yet?] standardised.) And a CMAP table *inside* the  
PDF output should allow to present (and search) the characters from  
the normal and the small caps glyphs as the same.

> Since Minion Pro have its small caps in the PUA, adding a cmap may  
> be a good option. Unfortunately I don't know how to do it...

Me too. I once received a message from Akira Kakuto (2008-08-13) on  
this list. A test file uses in the preamble:

	\AtBeginShipoutFirst{\special{pdf:tounicode UTF8-UCS2}}

> I just found the cmap package:
> http://tug.ctan.org/tex-archive/macros/latex/contrib/cmap/

These certainly don't work in XeTeX, they're meant for pdfTeX  
(\usepackage{cmap}) and do map the arbitrary positions of characters  
in the many TeX font encodings to Unicode that when you copy from a  
PDF file created directly by pdfTeX the visible accented and maths  
characters will be pasted and not exotic particles from inside TeX.

BTW, which XeTeX are you using? That from Mac TeX 2008? Or some Linux  
variant of TeX Live 200x?



Time is an illusion. Lunchtime, doubly so.

More information about the XeTeX mailing list