[texhax] extracting math from pdf file
Benjamin Sambale
bsambale at gmx.de
Mon Dec 6 08:47:48 CET 2010
Am 06.12.2010 00:56, schrieb Heiko Oberdiek:
> On Sun, Dec 05, 2010 at 06:38:45PM -0400, Jim Diamond wrote
>> On Sun, Dec 5, 2010 at 22:46 (+0100), Benjamin Sambale wrote:
>>> \documentclass{minimal}
>>> \begin{document}
>>> $\ne$
>>> \end{document}
>>> I compiled this code using pdflatex (TeX Live 2010). If I try to copy
>>> the \ne-symbol in the corresponding pdf-file with the mouse cursor, I
>>> get an equality-sgin (=) instead. I only tried this with evince as
>>> pdf viewer, but I suspect that the behavior is similar for other
>>> viewers. I also tried to use something like
>>> \pdfglyphtounicode{notequal}{...}
>>> without success. I'm very grateful for any ideas.
>> A quick peek in plain.tex shows that, at least there, \ne is an
>> over-struck combination of two characters:
>>
>> \def\neq{\not=} \let\ne=\neq
>>
>> If LaTeX does the same thing, then there is no single "not equal" glyph.
> It depends on the used fonts and packages.
>
> If the font does not contain U+2260 (notequals), then
> at least the ActualText feature of the PDF format could be
> used (see PDF spec.):
>
> \documentclass{minimal}
> \pagestyle{empty}
> \usepackage{accsupp}
> \CheckCommand*{\ne}{\not=}
> \renewcommand*{\ne}{%
> \BeginAccSupp{method=hex,unicode,ActualText=2260}%
> \not=%
> \EndAccSupp{}%
> }
> \begin{document}
> $\ne$
> \end{document}
>
> Yours sincerely
> Heiko Oberdiek
> _______________________________________________
> TeX FAQ: http://www.tex.ac.uk/faq
> Mailing list archives: http://tug.org/pipermail/texhax/
> More links: http://tug.org/begin.html
>
> Automated subscription management: http://tug.org/mailman/listinfo/texhax
> Human mailing list managers: postmaster at tug.org
Thanks to all who replied. To answer Philip Taylor's question: I do not
have a reasonable application for this copy procedure. I discovered
these things while converting my PhD thesis to the PDF/A format in order
to satisfy the library specifications. I found out that the commands
\pdfglyphtounicode{multicloseright}{22CA}
\pdfgentounicode=1
allow me to copy $\rtimes$ (from amssymb) to the corresponding unicode
character. So, I wondered if this is also possible with $\ne$.
Heiko Oberdiek's approach works perfectly. I also want to point out that
I do not actually need this for my thesis, since the PDF/A-1b format
hopefully suffices (instead of the more restricted PDF/A-1a format)
Thank you again,
Benjamin
More information about the texhax
mailing list