[pdftex] Tweaking pdf outputs to produce one box per word.

Ross Moore ross.moore at mq.edu.au
Sat Jul 3 02:57:33 CEST 2010


Hello CFP,

On 02/07/2010, at 4:48 PM, CFP wrote:

> Hello everyone!
> (I think this is where I should be asking, but I’m not totally sure… Forgive
> me if I'm mistaken :))
> 
> I’m trying to tweak the output of the pdflatex command to make it produce
> one box per word. Consider this example:
>    \documentclass{minimal}
> 
>    \begin{document}
>        This is an example sentence.
>    \end{document}
> 
> When opened in a PDF editor after compilation, this sample will appear as
> one text box containing the sentence "This is an example sentence.". This is
> fine for most full-featured pdf readers. Yet on my sony e-reader, selection
> of words is based on boxes ; therefore my pdf reader will select the full
> sentence, hence failing to find a definition for the word I clicked.

Not sure what your definition of "boxes" is here.

What I think is happening is that, normally, there are no space 
characters in the output that pdfLaTeX generates. Thus your e-reader
sees just a single word consisting of all the characters on a single
line of the PDF --- unless there are punctuation symbols, which may
then be treated as word-boundaries.

To test this theory, please try the attached PDF which *does* 
have space characters to define word-boundaries. Indeed it has
full PDF tagging of structure, including the mathematics.
It would be very interesting to see what your e-reader does
with it.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sphere_volume-2-readaloud.pdf
Type: application/pdf
Size: 64502 bytes
Desc: not available
URL: <http://tug.org/pipermail/pdftex/attachments/20100702/30db0400/attachment-0001.pdf>
-------------- next part --------------



> 
> I noticed that pdflatex stops at punctuation marks. How can I proceed to
> make it create one box per word? In the output, I would then have one box
> for "This", one for "is", one for "an", and so on.

Try my attached PDF and tell me how well it works for you.

> 
> Thanks a lot!
> CFP.


Cheers,

	Ross

------------------------------------------------------------------------
Ross Moore                                       ross.moore at mq.edu.au 
Mathematics Department                           office: E7A-419      
Macquarie University                             tel: +61 (0)2 9850 8955
Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
------------------------------------------------------------------------





More information about the pdftex mailing list