Semen Trygubenko / Семен Тригубенко <semen at trygub.com> wrote:

> > >> I'm having the following problem with inputenc package:
> > >> Ukrainian letter "i" is substituted with Latin "i" during compilation.
> > >> Letters look identical so it is not a problem for displaying text, but
> > >> it makes it hard to search inside the generated pdf using conventional
> > >> means.
> > >
> > > the inputenc mechanism provides glyphs to represent the incoming
> > > character stream.  given that the t2a fonts don't have an "i" in the
> > > Cyrillic half of their glyph table, they use the "i" in the Latin
> > > half of the table.
> Ah … Thank you very much for this insight!
> At this point in my life I'd prefer it to bomb, rather than produce an munged output
> that looks OK but is not searchable or copiable … :-)

maybe so, but when the t2* encodings were designed, there was no chance
of making the t1 encodings work, in that sense.

you need to switch to utf-8 _and_ to an utf-8 engine, to proceed.

> So, presumably, once the substitution has occurred (Ukrainian "і" and Latin "i" with the
> t2a's glyph for i) there's not much we can do? Is there space in Cyrillic half of the glyph table
> to include a clone of t2a's glyph for i so that Ukrainian "і" and Latin "i"
> could be distinguished further down the line?


> As far as I can see, this is the only problem with Latex handling
> Ukrainian documents, otherwise it is very usable (in conjunction with cmap
> for search and copy) …

that's good to know, i suppose.

> > > (my guess is that the cmap package doesn't help here, but i can't
> > > convince myself...)
> >
> > The cmap package could help, it depends whether it contains toUnicode
> > map for the T2A encoding.

such a map does exist, but as mentioned above, it doesn't help, since
the t2a encoding doesn't support the ukranian "i".

> I had no success with cmap. However, I didn't look for long as I
> thought the problem occurs earlier on in the pipeline somewhere, hence
> in the example file I didn't include any of my cmap sorties. If there
> was indeed a way to get cmap to work somehow, would it be able to
> handle multilingual Latex source, e.g. English and Ukrainian?
but since the information is already lost when you launch yourself on
t2a encoding, _nothing_ in tex 3 is going to be able to help.

> > > if i am right, you need to go with xetex or luatex, and a font that
> > > covers Cyrillic as unicode to get the "correct" output.
> I'm trying to get myself up to speed with xetex or luatex.
> Right now it seems that the Latex source I've got is not immediately
> compilable by either, so I think some reading up is in order.
> I will let you know the results when I get somewhere, but, meanwhile, if
> someone could think of a cheap way to get Latex to work I would be very much obliged.

i assume you're actually using xelatex/lualatex.  if not, try them (but
i assume you worked that out already).

utf-8 input, appropriate opentype or truetype font (with fontsel
package) and no inputenc or fontenc, and no cmap stuff.  i would expect
it to be easier than typing in ukranian (since you're no longer swimming
against the 8-bit tide).

