[Tugindia] counting words in a TEX document

Kapil Hari Paranjape kapil at imsc.res.in
Sun Jan 25 11:09:03 CET 2009


On Sun, 25 Jan 2009, Asha G wrote:
> When I did pdftotext and then did wc on the text I get the following
> message.
> wc: /home/proj/08/cesasha/Nature/JaneliaProposal.txt:56: Invalid or
> incomplete m ultibyte or wide character
>    60  2511 15147 /home/proj/08/cesasha/Nature/JaneliaProposal.txt
> can you please explain what I am doing wrong?

This not really about TeX but since your original question was ...

The default output encoding used by "pdftotext" is "latin1". It is
probably better to use "pdftotext -enc uft8".

As someone on the list said: If you have the TeX input file then it
is probably better to run "untex" on that file rather than convert
the PDF output of pdftex to text.



More information about the tugindia mailing list