[pdftex] Latex source and pdf for ligature issue.

Leif Andersen leif at leifandersen.net
Wed Sep 12 21:04:51 CEST 2018


Okay, one more thing. It certainly is an oddity in the pdf file
specifically. When you look at the generated post script, you'll see
something like:


so there clearly is an `rst` in there specifically. So I bet these newer
pdf readers are doing something clever to try and reconstruct the `fi` at
the start of the document.



~Leif Andersen

On Wed, Sep 12, 2018 at 2:58 PM, Leif Andersen <leif at leifandersen.net>
wrote:

> Okay, I just had a colleague try the document on macOS 10.13, and it looks
> like it works fine there. So there is certainly some interaction with
> preview (on 10.11) and acmart. What it is though I'm not yet sure.
>
>
> ~Leif Andersen
>
> On Wed, Sep 12, 2018 at 2:51 PM, Leif Andersen <leif at leifandersen.net>
> wrote:
>
>> Very weird. I have this problem with OS X's preview, as well as Skim.
>> (Which I believe uses Preview's rendering engine.)
>>
>> When I open the PDF in firefox I am able to copy/paste just fine, as well
>> as the screen reader working.
>>
>> I should mention that I am using OS X 10.11.
>>
>> I think other latex styles get away with this by having only the visible
>> text have the ligature, with copy/paste has the ascii `fi`, rather than the
>> unicode `fi` character.
>>
>>
>> ~Leif Andersen
>>
>> On Mon, Sep 10, 2018 at 6:28 PM, Ross Moore <ross.moore at mq.edu.au> wrote:
>>
>>> Hi Leif, Boris, and others
>>>
>>> On 11 Sep 2018, at 1:40 am, Leif Andersen <leif at leifandersen.net> wrote:
>>>
>>> As requested in github issue:
>>> https://github.com/borisveytsman/acmart/issues/309#issuecomm
>>> ent-419690461
>>> <https://protect-au.mimecast.com/s/xk1fCJyBZ6t8rwNJfzd6g8?domain=github.com>
>>>
>>> Here is an example of a pdf where `first` get's read as `rst`.
>>>
>>> Also note that I'm using the latest version of ACMAART on the ACM's
>>> webpage: https://www.acm.org/publications/proceedings-template
>>> <https://protect-au.mimecast.com/s/SBQXCK1DOrCqEWVohAjLfB?domain=acm.org>
>>>
>>> ~Leif Andersen
>>> <test.pdf><test.tex>
>>>
>>>
>>>
>>>
>>> Here is your original post suggesting a problem:
>>>
>>> If you have the word first in a document, a screen reader only sees rst.
>>> You also see this if you try to copy/paste the word first from a pdf to
>>> a text file.
>>>
>>> This seems to be unique to acmart, as all of the other document classes
>>> I've tried seem to properly have the text first.
>>>
>>>
>>> I tried your example PDF, with the attachments as included above,
>>> without any change or recompilation.
>>> Copy/paste of the complete text gave ‘first’ as expected,
>>> since the /ToUnicode resource correctly maps the fi ligature to the pair
>>> of letters `f i’ .
>>> So there is no error in that regard.
>>>
>>> I did the Copy/Paste using 3 different PDF viewers on a Mac. :  Adobe
>>> Acrobat Pro, Apple's Preview and TeXShop’s Preview.
>>> (The latter 2 should give the same results, as they did.)
>>>
>>> So I have to ask you what software you were using for the Copy/Paste ?
>>> On what platform?
>>>
>>> As for screen readers…
>>>
>>> … there is no uniformity in how they get the stream to be read aloud.
>>> You can try using the  accsupp  package to set /ActualText  and/or  /Alt
>>>  for the ligature, or the whole word.
>>> (Whole word is probably better, as this would have less effect on
>>> hyphenation.)
>>> But even then, don’t expect a uniform result.
>>> In my experience, you need a fully tagged PDF; that is, tagged for both
>>> structure and content, to have
>>> much effect on what a screen-reader sees. Even then, it is different
>>> with different software.
>>>
>>>
>>> For me, Adobe’s  Read Out Loud  did a fine job, apart from the small
>>> print in the footnotes.
>>> Presumably the spacing is too narrow to be treated as a word-space, so
>>> most of it is spelt out.
>>> (Hence the need for  \pdfinterwordspaceon !)
>>>
>>> Apple’s VoiceOver was OK, but splits the word into  'firs t', saying
>>> "firs tee"
>>> So I cannot reproduce the problem you described.
>>> I know of now way to affect what VoiceOver actually reads, in such a
>>> case of a single syllable word,
>>> despite all the options that its Utility provides (e.g. read numbers as
>>> words or digits — NB. Adam).
>>>
>>>
>>> Interestingly there is a small problem in the PDF, with regard to fonts.
>>> There are 2 different subsets sharing the same name:
>>> RBUZNK+LinLibertineT
>>>
>>>
>>> Acrobat Pro’s Preflight lists this as an error — though not a critical
>>> one.
>>> It doesn’t affect the visual layout, nor should it affect any text
>>> extraction, as the subsets are given
>>> different PostScript names within the page content stream.
>>> Nevertheless the subsets should be given different  6-letter subset
>>> prefixes, so this is technically
>>> an error by  pdfTeX – which is why this message is being copied to
>>> pdftex at tug.org .
>>>
>>> So Lief, I must ask also what version of TeXLive, or pdfTeX, are you
>>> using?
>>> Including the  .log  file would have provided this information.
>>>
>>>
>>> Next, I renamed and compiled with TeXshop, using:
>>>
>>> This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017)
>>> (preloaded format=pdflatex)
>>>
>>> Again the same subsetting prefix was shared by two font dictionaries.
>>> I’ve not updated to TeXLive 2018, so don’t know if this is fixed there
>>> already.
>>>
>>>
>>> Hope this helps.
>>>
>>>    Ross
>>>
>>>
>>> * Dr Ross Moore*
>>>
>>> *Mathematics Dept **|* 12 Wally’s Walk, 734
>>> Macquarie University, NSW 2109, Australia
>>>
>>> *T:* +61 2 9850 *8955  |  F:* +61 2 9850 8114 <%2B61%202%209850%209695>
>>> *M:*+61 407 288 255 <%2B61%20409%20125%20670>*  |  *E:
>>> ross.moore at mq.edu.au <rick.minter at mq.edu.au>
>>>
>>> http://www.maths.mq.edu.au <http://mq.edu.au/>
>>>
>>>
>>> <http://mq.edu.au/>
>>>
>>> <http://mq.edu.au/>
>>>
>>>
>>>
>>>
>>> CRICOS Provider Number 00002J. Think before you print.
>>> Please consider the environment before printing this email.
>>> <http://mq.edu.au/>
>>>
>>> This message is intended for the addressee named and may
>>> contain confidential information. If you are not the intended
>>> recipient, please delete it and notify the sender. Views expressed
>>> in this message are those of the individual sender, and are not
>>> necessarily the views of Macquarie University. <http://mq.edu.au/>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/pdftex/attachments/20180912/1b678f13/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 5530 bytes
Desc: not available
URL: <https://tug.org/pipermail/pdftex/attachments/20180912/1b678f13/attachment-0001.png>


More information about the pdftex mailing list