LaTeX and \l and generated pdf and suppressing chars
Ulrike Fischer
news3 at nililand.de
Thu Jun 20 10:14:02 CEST 2024
Am Thu, 20 Jun 2024 10:37:45 +0900 schrieb Norbert Preining:
> when I do a simple latex file, no preamble, and just simple text:
> W\l{}odzimierz
> the resulting PDF is ... strange.
Well OT1 encoding is ... strange.
> It produces the following entry
> in the PDF:
> (W\040)278(lo)-28(dzimierz)
> and text extraction using pdfminer gives me:
> W(cid:32)lodzimierz
> mark the (cid:32) which comes from the space character \040 which
> in OT1 encoding contains the small /
>
> Looking into the included encoding in the PDF file I see:
> ```
> /Encoding 256 array
> 0 1 255 {1 index exch /.notdef put} for
> ...
> dup 32 /suppress put
> ```
>
> Which seems to suggest that the space characters should be "suppress"ed
> on some actions ...? And indeed, copy/paste from the PDF does give the
> Wlodzimierz
> without the /.
>
> So it seems to be standardized somehow, is this documented somewhere?
>
I doubt that this is "standardized", the font predates unicode.
Someone invented a name at some time and this is used.
/suppress seems not have a mapping in glyphtounicode.tex and so
copy&paste simply uses its "char number" 32 and so give a space, but
you could add a mapping with e.g.
\pdfglyphtounicode{suppress}{0020}
but I don't know a better number beside the space you already get.
(pdfx uses \pdfglyphtounicode{suppress}{EB61} but that looks wrong,
EB61 is in the private area).
> Just for completelyness, using T1 fontenc would elevate this problem, as
> using xelatex - but this is not up to what I can decide/do.
But it would be the right action. If you care about correct
copy&paste and have things like Umlauts or \l in your document, OT1
is the wrong encoding. The only way to more or less repair the
copy&paste (in viewers that support that) would be to surround them
with an actualtext, but that would affect kerning and hyphenation.
--
Ulrike Fischer
http://www.troubleshooting-tex.de/
More information about the texhax
mailing list.