[Tugindia] counting words in a TEX document
Kapil Hari Paranjape
kapil at imsc.res.in
Sun Jan 25 11:09:03 CET 2009
On Sun, 25 Jan 2009, Asha G wrote:
> When I did pdftotext and then did wc on the text I get the following
> wc: /home/proj/08/cesasha/Nature/JaneliaProposal.txt:56: Invalid or
> incomplete m ultibyte or wide character
> 60 2511 15147 /home/proj/08/cesasha/Nature/JaneliaProposal.txt
> can you please explain what I am doing wrong?
This not really about TeX but since your original question was ...
The default output encoding used by "pdftotext" is "latin1". It is
probably better to use "pdftotext -enc uft8".
As someone on the list said: If you have the TeX input file then it
is probably better to run "untex" on that file rather than convert
the PDF output of pdftex to text.
More information about the tugindia