Hi Leif, Boris, and others

On 11 Sep 2018, at 1:40 am, Leif Andersen wrote:

As requested in github issue:

Here is an example of a pdf where `first` get's read as `rst`.

Also note that I'm using the latest version of ACMAART on the ACM's
webpage: https://www.acm.org/publications/proceedings-template<https://protect-au.mimecast.com/s/SBQXCK1DOrCqEWVohAjLfB?domain=acm.org>

~Leif Andersen

Here is your original post suggesting a problem:

If you have the word first in a document, a screen reader only sees rst. You also see this if you try to copy/paste the word first from a pdf to a text file.

This seems to be unique to acmart, as all of the other document classes I've tried seem to properly have the text first.

I tried your example PDF, with the attachments as included above, without any change or recompilation.
Copy/paste of the complete text gave ‘first’ as expected,
since the /ToUnicode resource correctly maps the fi ligature to the pair of letters `f i’ .
So there is no error in that regard.

I did the Copy/Paste using 3 different PDF viewers on a Mac. :  Adobe Acrobat Pro, Apple's Preview and TeXShop’s Preview.
(The latter 2 should give the same results, as they did.)

So I have to ask you what software you were using for the Copy/Paste ? On what platform?

As for screen readers…

… there is no uniformity in how they get the stream to be read aloud.
You can try using the  accsupp  package to set /ActualText  and/or  /Alt  for the ligature, or the whole word.
(Whole word is probably better, as this would have less effect on hyphenation.)
But even then, don’t expect a uniform result.
In my experience, you need a fully tagged PDF; that is, tagged for both structure and content, to have
much effect on what a screen-reader sees. Even then, it is different with different software.

For me, Adobe’s  Read Out Loud  did a fine job, apart from the small print in the footnotes.
Presumably the spacing is too narrow to be treated as a word-space, so most of it is spelt out.
(Hence the need for  \pdfinterwordspaceon !)

Apple’s VoiceOver was OK, but splits the word into  'firs t', saying "firs tee"
So I cannot reproduce the problem you described.
I know of now way to affect what VoiceOver actually reads, in such a case of a single syllable word,
despite all the options that its Utility provides (e.g. read numbers as words or digits — NB. Adam).

Interestingly there is a small problem in the PDF, with regard to fonts.
There are 2 different subsets sharing the same name:   RBUZNK+LinLibertineT

[cid:d1ba047b-9609-4d4d-9ccf-d8b3038de33b at ausprd01.prod.outlook.com]

Acrobat Pro’s Preflight lists this as an error — though not a critical one.
It doesn’t affect the visual layout, nor should it affect any text extraction, as the subsets are given
different PostScript names within the page content stream.
Nevertheless the subsets should be given different  6-letter subset prefixes, so this is technically
an error by  pdfTeX – which is why this message is being copied to  pdftex at tug.org<mailto:pdftex at tug.org> .

So Lief, I must ask also what version of TeXLive, or pdfTeX, are you using?
Including the  .log  file would have provided this information.

Next, I renamed and compiled with TeXshop, using:

This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017) (preloaded format=pdflatex)

Again the same subsetting prefix was shared by two font dictionaries.
I’ve not updated to TeXLive 2018, so don’t know if this is fixed there already.

Hope this helps.


