[pdftex] missing space chars at font switches
Ross Moore
ross.moore at mq.edu.au
Sun Jan 5 00:25:07 CET 2025
Hi Ulrike, Karl, Thanh and others.
Happy New Year.
Hmm. I missed seeing this message months back.
I agree with Ulrike, (fake) space characters are now missing at line-breaks within a paragraph.
This affects text-extraction, in particular for deriving to HTML, as consecutive words get concatenated.
It’s an effect which is `liveable-with’ but virtually impossible to detect and fix in any automatic way.
Previously — going back roughly 12+ years, when \pdffakespace was 1st introduced — spaces
*were* included at line-breaks. (Even after hyphenations, but these are detectable automatically.)
So at some point (last 1-2 years?) the algorithm must have changed,
or some parameter has been given a different value.
Can we please revisit this.
All the best.
Ross
On 24 Jul 2024, at 7:23 pm, Ulrike Fischer <news3 at nililand.de> wrote:
If one changes the font there are no space chars at the border. E.g.
with
\pdfcompresslevel0
\pdfobjcompresslevel0
\font\test=cmss10
\pdfinterwordspaceon
text text {\test cmss cmss} text text
\bye
there is a space char between "text text" and "cmss cmss":
[(text)]TJ/F51 9.9626 Tf( )Tj/F1 9.9626 Tf 20.756 0 Td [(text)]
[(cmss)]TJ/F51 9.9626 Tf( )Tj/F20 9.9626 Tf 23.302 0 Td [(cmss)]
But nothing between "text cmss" and "cmss text"
[(text)]TJ/F20 9.9626 Tf 20.755 0 Td [(cmss)]
[(cmss)]TJ/F1 9.9626 Tf 23.301 0 Td [(text)]
One can insert the missing chars manually with \pdffakespace but
perhaps an automatic solution is possible?
--
Ulrike Fischer
http://www.troubleshooting-tex.de/<http://www.troubleshooting-tex.de>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/pdftex/attachments/20250104/b34e80b1/attachment.htm>
More information about the pdftex
mailing list.