LaTeX and \l and generated pdf and suppressing chars

Ulrike Fischer news3 at nililand.de
Thu Jun 20 10:14:02 CEST 2024


Am Thu, 20 Jun 2024 10:37:45 +0900 schrieb Norbert Preining:


> when I do a simple latex file, no preamble, and just simple text:
> 	W\l{}odzimierz
> the resulting PDF is ... strange. 

Well OT1 encoding is ... strange.

> It produces the following entry
> in the PDF:
> 	(W\040)278(lo)-28(dzimierz)
> and text extraction using pdfminer gives me:
> 	W(cid:32)lodzimierz
> mark the (cid:32) which comes from the space character \040 which
> in OT1 encoding contains the small /
> 
> Looking into the included encoding in the PDF file I see:
> ```
> /Encoding 256 array
> 0 1 255 {1 index exch /.notdef put} for
> ...
> dup 32 /suppress put
> ```
> 
> Which seems to suggest that the space characters should be "suppress"ed
> on some actions ...? And indeed, copy/paste from the PDF does give the
> 	Wlodzimierz
> without the /.
> 
> So it seems to be standardized somehow, is this documented somewhere?
> 

I doubt that this is "standardized", the font predates unicode.
Someone invented a name at some time and this is used. 

/suppress seems not have a mapping in glyphtounicode.tex and so
copy&paste simply uses its "char number" 32 and so give a space, but
you could add a mapping with e.g. 

\pdfglyphtounicode{suppress}{0020} 

but I don't know a better number beside the space you already get. 
(pdfx uses \pdfglyphtounicode{suppress}{EB61} but that looks wrong,
EB61 is in the private area). 


> Just for completelyness, using T1 fontenc would elevate this problem, as
> using xelatex - but this is not up to what I can decide/do. 

But it would be the right action. If you care about correct
copy&paste and have things like Umlauts or \l in your document, OT1
is the wrong encoding. The only way to more or less repair the
copy&paste (in viewers that support that) would be to surround them
with an actualtext, but that would affect kerning and hyphenation. 

-- 
Ulrike Fischer 
http://www.troubleshooting-tex.de/



More information about the texhax mailing list.