[XeTeX] potential new feature: \XeTeXgenerateactualtext
ShreeDevi Kumar
shreeshrii at gmail.com
Thu Feb 25 06:11:27 CET 2016
Jonathan,
This is a really useful feature and I look forward to using it once it is
released in TLY2016.
Since how well the search and copy paste features work could also be font
dependent, I would like to test some more PDFs in unicode devanagari
created by this new feature using other fonts. I usually use Siddhanta and
Sanskrit2003 font.
I would appreciate if you or other members who have this feature installed
can provide a few more sample PDFs in devanagari for testing.
Thanks!
- sent from my phone. excuse the brevity.
On 24-Feb-2016 3:37 pm, "Jonathan Kew" <jfkthame at gmail.com> wrote:
> On 24/2/16 09:22, ShreeDevi Kumar wrote:
>
>> Testing dev-actualtext.pdf sent by JK
>>
>> * Adobe Acrobat Reader XI on Windows 10
>> o Does not highlight text fully
>> o SEARCH finds words and word parts correctly but usually
>> highlights only beginning of the word containing the letter
>> o COPY paste to NOTEPAD++, OPENOFFICE WRITER works correctly,
>> o Save as TXT file does not work correctly - only saves ... in it,
>> not the actual unicode text which can be copied
>>
>
> So it looks like Acrobat makes use of the ActualText for Search and Copy,
> but sadly its "Save as Text" doesn't support Unicode.
>
> I'm pleasantly surprised to see the Gmail previewer also handles it.
>
> The others (Foxit, Edge) sound like they're just working from the glyph
> stream, which is basically doomed to failure.
>
> For a further data point, I tried Evince (Document Viewer) on Ubuntu
> 15.10, and found that Copy and Search work well; it looks like it is using
> the ActualText correctly. This is thanks to the poppler library, I believe.
> The (poppler-based) "pdftotext" tool was also able to extract the Unicode
> text correctly from the PDF, although "pdftohtml" didn't do so well.
>
> One issue with Evince is that drag-selecting text to highlight it (as for
> Copy/Paste) looks bad: the highlighting completely obscures the selected
> text, although it will end up being copied correctly. Interestingly, its
> highlighting of search results doesn't suffer from this problem, and it
> even makes a fair attempt (not completely accurate) at highlighting
> specific letters within a word, not just entire words.
>
> JK
>
>
> * Foxit Reader 7.3 on Windows 10
>> o Highlights text fully,
>> o smallest highlight unit is word,
>> o COPY paste to notepad++ as well as SEARCH does NOT work
>> correctly as Unicode text is not fully correct.
>>
>> ूय
>>
>> िनकोड क्या ह ? ै
>>
>> o
>> Save as TXT file does not work correctly - saves the unicode
>> text with same problems as in copy and paste
>>
>> *
>> Microsoft Edge Viewer on Windows 10
>> o
>>
>> Highlights text fully,
>> o COPY paste to notepad++ as well as SEARCH does NOT work
>> correctly as Unicode text is not fully correct.
>>
>> य ूिनकोड क्या है?
>>
>> *
>>
>> Previewing from within gmail in Chrome on Windows 10 -
>> o Highlights text fully,
>> o smallest highlight unit is word,
>> o COPY paste to NOTEPAD++, OPENOFFICE WRITER works correctly,
>> o (highlights only first letter of first word in
>> paragraph यू rather than full word यूनिकोड)
>> o there is NO SEARCH feature
>> o there is no save as TXT file feature
>> * Same as above while Previewing from within gmail in Internet
>> Explorer on Windows 10
>>
>>
>> ShreeDevi
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Tue, Feb 23, 2016 at 11:30 PM, Jonathan Kew <jfkthame at gmail.com
>> <mailto:jfkthame at gmail.com>> wrote:
>>
>> On 23/2/16 17:39, Philip Taylor wrote:
>>
>> Using Akira-san's "actest.pdf" as sample, Adobe Acrobat Pro 7.1
>> allows
>> me to select only half of the text whereas Adobe Reader DC
>> allows me to
>> select it all; neither allows me to select individual kanji.
>>
>>
>> Ah, right... as there are no spaces between the kanji, they'll end
>> up in the same text object. That's a shortcoming of how the current
>> implementation works, for scripts that don't use inter-word spaces.
>>
>> In either case, copy&paste actually gives you the whole text, even
>> though AAPro only highlights half of it, I guess?
>>
>> JK
>>
>>
>>
>>
>> --------------------------------------------------
>> Subscriptions, Archive, and List information, etc.:
>> http://tug.org/mailman/listinfo/xetex
>>
>>
>>
>>
>>
>>
>> --------------------------------------------------
>> Subscriptions, Archive, and List information, etc.:
>> http://tug.org/mailman/listinfo/xetex
>>
>>
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
> http://tug.org/mailman/listinfo/xetex
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/xetex/attachments/20160225/8fe23212/attachment-0001.html>
More information about the XeTeX
mailing list