[pdftex] missing space chars at font switches

Ross Moore ross.moore at mq.edu.au
Sun Jan 5 00:25:07 CET 2025


Hi Ulrike, Karl, Thanh and others.

Happy New Year.


Hmm. I missed seeing this message months back.

I agree with Ulrike, (fake) space characters are now missing at line-breaks within a paragraph.
This affects text-extraction, in particular for deriving to HTML, as consecutive words get concatenated.
It’s an effect which is `liveable-with’ but virtually impossible to detect and fix in any automatic way.

Previously — going back roughly 12+ years, when \pdffakespace was 1st introduced — spaces
*were* included at line-breaks. (Even after hyphenations, but these are detectable automatically.)
So at some point (last 1-2 years?) the algorithm must have changed,
or some parameter has been given a different value.

Can we please revisit this.

All the best.

  Ross


On 24 Jul 2024, at 7:23 pm, Ulrike Fischer <news3 at nililand.de> wrote:

If one changes the font there are no space chars at the border. E.g.
with

\pdfcompresslevel0
\pdfobjcompresslevel0
\font\test=cmss10
\pdfinterwordspaceon

text text {\test cmss cmss} text text

\bye

there is a space char between "text text" and "cmss cmss":

[(text)]TJ/F51 9.9626 Tf( )Tj/F1 9.9626 Tf 20.756 0 Td [(text)]
[(cmss)]TJ/F51 9.9626 Tf( )Tj/F20 9.9626 Tf 23.302 0 Td [(cmss)]

But nothing between "text cmss" and "cmss text"

[(text)]TJ/F20 9.9626 Tf 20.755 0 Td [(cmss)]
[(cmss)]TJ/F1 9.9626 Tf 23.301 0 Td [(text)]

One can insert the missing chars manually with \pdffakespace but
perhaps an automatic solution is possible?

--
Ulrike Fischer
http://www.troubleshooting-tex.de/<http://www.troubleshooting-tex.de>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/pdftex/attachments/20250104/b34e80b1/attachment.htm>


More information about the pdftex mailing list.