search accents in pdf generated by TeX

Ulrike Fischer news3 at
Fri Jan 28 09:11:17 CET 2022

Am Thu, 27 Jan 2022 21:55:32 -0800 schrieb William F Hammond via

> First, I don't know what the statement "accented letters are
> not recognized by the pdf" means.  If we're talking about
> typesetting with pdftex, then I think that the PDF output is
> UTF-8 encoded. 


> If one runs the program "pdftotext", which
> is part of an Ubuntu package called poppler-utils on my
> Ubuntu platform, the output text is UTF-8 encoded.  I think
> that text TeX's algorithmic accents are implemented using
> Unicode combining characters. 

No, not with pdftex. If you compile


ä ö ü é è

and then copy and paste you will get

    ¨a ¨o ¨u ´e `e

that is 
    U+00A8a U+00A8o U+00A8u U+00B4e U+0060e

(U+00A8 is for example diaresis).

So no combining accents involved. And pdf viewer typically can't
search for this accented chars and you can't copy this in other

if you add \usepackage[T1]{fontenc} and so use a font which has the
needed glyphs then you get the correct unicode code points and a
searchable pdf

     ä ö ü é è

Ulrike Fischer

More information about the texhax mailing list.