[pdftex] Latex source and pdf for ligature issue.

Ross Moore ross.moore at mq.edu.au
Tue Sep 11 00:28:45 CEST 2018


Hi Leif, Boris, and others

On 11 Sep 2018, at 1:40 am, Leif Andersen <leif at leifandersen.net<mailto:leif at leifandersen.net>> wrote:

As requested in github issue:
https://github.com/borisveytsman/acmart/issues/309#issuecomment-419690461<https://protect-au.mimecast.com/s/xk1fCJyBZ6t8rwNJfzd6g8?domain=github.com>

Here is an example of a pdf where `first` get's read as `rst`.

Also note that I'm using the latest version of ACMAART on the ACM's
webpage: https://www.acm.org/publications/proceedings-template<https://protect-au.mimecast.com/s/SBQXCK1DOrCqEWVohAjLfB?domain=acm.org>

~Leif Andersen
<test.pdf><test.tex>



Here is your original post suggesting a problem:

If you have the word first in a document, a screen reader only sees rst. You also see this if you try to copy/paste the word first from a pdf to a text file.

This seems to be unique to acmart, as all of the other document classes I've tried seem to properly have the text first.

I tried your example PDF, with the attachments as included above, without any change or recompilation.
Copy/paste of the complete text gave ‘first’ as expected,
since the /ToUnicode resource correctly maps the fi ligature to the pair of letters `f i’ .
So there is no error in that regard.

I did the Copy/Paste using 3 different PDF viewers on a Mac. :  Adobe Acrobat Pro, Apple's Preview and TeXShop’s Preview.
(The latter 2 should give the same results, as they did.)

So I have to ask you what software you were using for the Copy/Paste ? On what platform?

As for screen readers…

… there is no uniformity in how they get the stream to be read aloud.
You can try using the  accsupp  package to set /ActualText  and/or  /Alt  for the ligature, or the whole word.
(Whole word is probably better, as this would have less effect on hyphenation.)
But even then, don’t expect a uniform result.
In my experience, you need a fully tagged PDF; that is, tagged for both structure and content, to have
much effect on what a screen-reader sees. Even then, it is different with different software.


For me, Adobe’s  Read Out Loud  did a fine job, apart from the small print in the footnotes.
Presumably the spacing is too narrow to be treated as a word-space, so most of it is spelt out.
(Hence the need for  \pdfinterwordspaceon !)

Apple’s VoiceOver was OK, but splits the word into  'firs t', saying "firs tee"
So I cannot reproduce the problem you described.
I know of now way to affect what VoiceOver actually reads, in such a case of a single syllable word,
despite all the options that its Utility provides (e.g. read numbers as words or digits — NB. Adam).


Interestingly there is a small problem in the PDF, with regard to fonts.
There are 2 different subsets sharing the same name:   RBUZNK+LinLibertineT

[cid:d1ba047b-9609-4d4d-9ccf-d8b3038de33b at ausprd01.prod.outlook.com]

Acrobat Pro’s Preflight lists this as an error — though not a critical one.
It doesn’t affect the visual layout, nor should it affect any text extraction, as the subsets are given
different PostScript names within the page content stream.
Nevertheless the subsets should be given different  6-letter subset prefixes, so this is technically
an error by  pdfTeX – which is why this message is being copied to  pdftex at tug.org<mailto:pdftex at tug.org> .

So Lief, I must ask also what version of TeXLive, or pdfTeX, are you using?
Including the  .log  file would have provided this information.


Next, I renamed and compiled with TeXshop, using:

This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017) (preloaded format=pdflatex)

Again the same subsetting prefix was shared by two font dictionaries.
I’ve not updated to TeXLive 2018, so don’t know if this is fixed there already.


Hope this helps.

   Ross


Dr Ross Moore

Mathematics Dept | 12 Wally’s Walk, 734
Macquarie University, NSW 2109, Australia

T: +61 2 9850 8955  |  F: +61 2 9850 8114<tel:%2B61%202%209850%209695>
M:+61 407 288 255<tel:%2B61%20409%20125%20670>  |  E: ross.moore at mq.edu.au<mailto:rick.minter at mq.edu.au>

http://www.maths.mq.edu.au<http://mq.edu.au/>


<http://mq.edu.au/>

[cid:df863a91-183c-406a-ae2a-9c5f3d845f28 at ausprd01.prod.outlook.com]

<http://mq.edu.au/>




CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.<http://mq.edu.au/>

This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University.<http://mq.edu.au/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/pdftex/attachments/20180910/46a63ee9/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2018-09-11 at 7.45.11 am.png
Type: image/png
Size: 341309 bytes
Desc: Screen Shot 2018-09-11 at 7.45.11 am.png
URL: <https://tug.org/pipermail/pdftex/attachments/20180910/46a63ee9/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4605 bytes
Desc: image001.png
URL: <https://tug.org/pipermail/pdftex/attachments/20180910/46a63ee9/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.tex
Type: application/octet-stream
Size: 94 bytes
Desc: test.tex
URL: <https://tug.org/pipermail/pdftex/attachments/20180910/46a63ee9/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.pdf
Type: application/pdf
Size: 213973 bytes
Desc: test.pdf
URL: <https://tug.org/pipermail/pdftex/attachments/20180910/46a63ee9/attachment-0001.pdf>


More information about the pdftex mailing list