[pdftex] Latex source and pdf for ligature issue.

Leif Andersen leif at leifandersen.net
Wed Sep 12 20:51:28 CEST 2018


Very weird. I have this problem with OS X's preview, as well as Skim.
(Which I believe uses Preview's rendering engine.)

When I open the PDF in firefox I am able to copy/paste just fine, as well
as the screen reader working.

I should mention that I am using OS X 10.11.

I think other latex styles get away with this by having only the visible
text have the ligature, with copy/paste has the ascii `fi`, rather than the
unicode `fi` character.


~Leif Andersen

On Mon, Sep 10, 2018 at 6:28 PM, Ross Moore <ross.moore at mq.edu.au> wrote:

> Hi Leif, Boris, and others
>
> On 11 Sep 2018, at 1:40 am, Leif Andersen <leif at leifandersen.net> wrote:
>
> As requested in github issue:
> https://github.com/borisveytsman/acmart/issues/309#issuecomment-419690461
> <https://protect-au.mimecast.com/s/xk1fCJyBZ6t8rwNJfzd6g8?domain=github.com>
>
> Here is an example of a pdf where `first` get's read as `rst`.
>
> Also note that I'm using the latest version of ACMAART on the ACM's
> webpage: https://www.acm.org/publications/proceedings-template
> <https://protect-au.mimecast.com/s/SBQXCK1DOrCqEWVohAjLfB?domain=acm.org>
>
> ~Leif Andersen
> <test.pdf><test.tex>
>
>
>
>
> Here is your original post suggesting a problem:
>
> If you have the word first in a document, a screen reader only sees rst.
> You also see this if you try to copy/paste the word first from a pdf to a
> text file.
>
> This seems to be unique to acmart, as all of the other document classes
> I've tried seem to properly have the text first.
>
>
> I tried your example PDF, with the attachments as included above, without
> any change or recompilation.
> Copy/paste of the complete text gave ‘first’ as expected,
> since the /ToUnicode resource correctly maps the fi ligature to the pair
> of letters `f i’ .
> So there is no error in that regard.
>
> I did the Copy/Paste using 3 different PDF viewers on a Mac. :  Adobe
> Acrobat Pro, Apple's Preview and TeXShop’s Preview.
> (The latter 2 should give the same results, as they did.)
>
> So I have to ask you what software you were using for the Copy/Paste ? On
> what platform?
>
> As for screen readers…
>
> … there is no uniformity in how they get the stream to be read aloud.
> You can try using the  accsupp  package to set /ActualText  and/or  /Alt
>  for the ligature, or the whole word.
> (Whole word is probably better, as this would have less effect on
> hyphenation.)
> But even then, don’t expect a uniform result.
> In my experience, you need a fully tagged PDF; that is, tagged for both
> structure and content, to have
> much effect on what a screen-reader sees. Even then, it is different with
> different software.
>
>
> For me, Adobe’s  Read Out Loud  did a fine job, apart from the small print
> in the footnotes.
> Presumably the spacing is too narrow to be treated as a word-space, so
> most of it is spelt out.
> (Hence the need for  \pdfinterwordspaceon !)
>
> Apple’s VoiceOver was OK, but splits the word into  'firs t', saying "firs
> tee"
> So I cannot reproduce the problem you described.
> I know of now way to affect what VoiceOver actually reads, in such a case
> of a single syllable word,
> despite all the options that its Utility provides (e.g. read numbers as
> words or digits — NB. Adam).
>
>
> Interestingly there is a small problem in the PDF, with regard to fonts.
> There are 2 different subsets sharing the same name:   RBUZNK+LinLibertineT
>
>
> Acrobat Pro’s Preflight lists this as an error — though not a critical
> one.
> It doesn’t affect the visual layout, nor should it affect any text
> extraction, as the subsets are given
> different PostScript names within the page content stream.
> Nevertheless the subsets should be given different  6-letter subset
> prefixes, so this is technically
> an error by  pdfTeX – which is why this message is being copied to
> pdftex at tug.org .
>
> So Lief, I must ask also what version of TeXLive, or pdfTeX, are you using?
> Including the  .log  file would have provided this information.
>
>
> Next, I renamed and compiled with TeXshop, using:
>
> This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017) (preloaded
> format=pdflatex)
>
> Again the same subsetting prefix was shared by two font dictionaries.
> I’ve not updated to TeXLive 2018, so don’t know if this is fixed there
> already.
>
>
> Hope this helps.
>
>    Ross
>
>
> * Dr Ross Moore*
>
> *Mathematics Dept **|* 12 Wally’s Walk, 734
> Macquarie University, NSW 2109, Australia
>
> *T:* +61 2 9850 *8955  |  F:* +61 2 9850 8114 <%2B61%202%209850%209695>
> *M:*+61 407 288 255 <%2B61%20409%20125%20670>*  |  *E:
> ross.moore at mq.edu.au <rick.minter at mq.edu.au>
>
> http://www.maths.mq.edu.au <http://mq.edu.au/>
>
>
> <http://mq.edu.au/>
>
> <http://mq.edu.au/>
>
>
>
>
> CRICOS Provider Number 00002J. Think before you print.
> Please consider the environment before printing this email.
> <http://mq.edu.au/>
>
> This message is intended for the addressee named and may
> contain confidential information. If you are not the intended
> recipient, please delete it and notify the sender. Views expressed
> in this message are those of the individual sender, and are not
> necessarily the views of Macquarie University. <http://mq.edu.au/>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/pdftex/attachments/20180912/c7a92f9a/attachment-0001.html>


More information about the pdftex mailing list