[tex-live] enquiry regarding Ukrainian "i"

Zdenek Wagner zdenek.wagner at gmail.com
Tue Dec 3 11:11:22 CET 2013


2013/12/3 Semen Trygubenko / Семен Тригубенко <semen at trygub.com>:
>> >> I'm having the following problem with inputenc package:
>> >> Ukrainian letter "i" is substituted with Latin "i" during compilation.
>> >> Letters look identical so it is not a problem for displaying text, but
>> >> it makes it hard to search inside the generated pdf using conventional
>> >> means.
>> >
>> > the inputenc mechanism provides glyphs to represent the incoming
>> > character stream.  given that the t2a fonts don't have an "i" in the
>> > Cyrillic half of their glyph table, they use the "i" in the Latin
>> > half of the table.
>
> Ah … Thank you very much for this insight!
>
> At this point in my life I'd prefer it to bomb, rather than produce an munged output
> that looks OK but is not searchable or copiable … :-)
>
> So, presumably, once the substitution has occurred (Ukrainian "і" and Latin "i" with the
> t2a's glyph for i) there's not much we can do? Is there space in Cyrillic half of the glyph table
> to include a clone of t2a's glyph for i so that Ukrainian "і" and Latin "i"
> could be distinguished further down the line?
>
> As far as I can see, this is the only problem with Latex handling
> Ukrainian documents, otherwise it is very usable (in conjunction with cmap
> for search and copy) …
>
>
>> > (my guess is that the cmap package doesn't help here, but i can't
>> > convince myself...)
>> >
>> The cmap package could help, it depends whether it contains toUnicode
>> map for the T2A encoding.
>
> I had no success with cmap. However, I didn't look for long as I thought the problem occurs earlier on
> in the pipeline somewhere, hence in the example file I didn't include any of my cmap
> sorties. If there was indeed a way to get cmap to work somehow, would it be able
> to handle multilingual Latex source, e.g. English and Ukrainian?
>
The source of the problem seems to be in
texmf-dist/tex/latex/cmap/t2a.cmap, I do not see the mapping for "i"
in it. Maybe it should be reported to the maintainer of the cmap
package. The cmap package adds the maps per encoding, so if you type
English with T1 or OT1 encoded font and Ukrainian with T2A encoded
font, it should work.
>
>> > if i am right, you need to go with xetex or luatex, and a font that
>> > covers Cyrillic as unicode to get the "correct" output.
>
> I'm trying to get myself up to speed with xetex or luatex.
> Right now it seems that the Latex source I've got is not immediately
> compilable by either, so I think some reading up is in order.
> I will let you know the results when I get somewhere, but, meanwhile, if
> someone could think of a cheap way to get Latex to work I would be very much obliged.
>
Strange, unless I use anything engine specific, my LaTeX files work
with all of them. Of course, in XeLaTeX you do not use inputenc (the
input must be in Unicode) and you do not use fontenc. Fonts are
selected by the fontspec package and polyglossia should be prefered to
babel. The change is a matter of a few lines in the preamble.
>
> --
> Семен Тригубенко http://trygub.com



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



More information about the tex-live mailing list