[tex-live] A fix for (pdf)LaTeX support for Romanian

Vasile Gaburici vgaburici at gmail.com
Wed Aug 20 14:08:08 CEST 2008

0) No single LaTeX (8-bit) encoding has all the characters required
for Romanian. T1 doesn't have S and T with comma below, but with
cedilla), while QX doesn't have A breve. LY1 doesn't have any of
these. Using only visual composites (overlapping boxes) is
unacceptable in PDF documents because searches don't work properly.

1) Latin 10 (latin10.def) disambiguated the comma-below and cedillas
by providing new LICRs for comma-below glyphs using \textcommabelow as
diacritic command. More importantly, these LICRs are also used by
utf8x for the U+0218 -- U+021B range. Both latin10 and utf8x input
encodings are thus correctly configured for Romanian. But these new
LICRs aren't used in the QX font encoding, the only one that provides
the comma-below glyphs. The following lines need to be added to

\DeclareTextComposite{\textcommabelow}{QX}{S}{147} % /Scommaaccent
\DeclareTextComposite{\textcommabelow}{QX}{T}{149} % /Tcommaaccent
\DeclareTextComposite{\textcommabelow}{QX}{s}{179} % /scommaaccent
\DeclareTextComposite{\textcommabelow}{QX}{t}{181} % /tcommaaccent

The comments to right of each line show the corresponding PS name in
qx.enc. All 4 are already properly implemented by Latin Modern and the
TeX Gyre fonts.

After the above additions one can type (using utf8x since latin10 is
obsolete as input method):
\Huge Romanian diacritics: {\fontencoding{QX}\selectfont
ȘȚ}{\fontencoding{T1}\selectfont ĂÎÂ!}

All diacritics, except for T with comma below, which is affected by
yet another bug (see next point), are now searchable in the output of
pdflatex. So please add those 4 lines to qxenc.def. Note that
qxenc.def already has some mappings from the cedilla variants, i.e. \c
[STst] to those slots. I suggest you delete them. If you're worried
about backwards compatibility, you can leave those alone, even though
they are typographically and Unicode-wise incorrect; obviously no
Turks used the QX encoding, or you'd have heard about it ;)

2) When one searches the PDF for [Tt] with comma below, these glyphs
cannot be found as U+021[AB], but can be found as U+016[23]. There's a
general snafu with how LaTeX's utf8x input mode and the pdf output
drivers (dvipdfmx) and engines (pdftex) interact. This isn't
restricted just to Romanian, so I'm sending a separate email, with
more addresses in the CC field for that matter. Stay tuned...

3) Having to change font encodings every few characters would drive
anyone crazy, so the sane thing to do is to select only the commands
from T1 and QX encodings that provide proper composites. I had to dig
through ltoutenc's documentation to figure it out. In the end, the
preamble commands are pretty simple:

% get a composite A breve from T1
\DeclareTextCommandDefault{\u}[1]{\fontencoding{T1}\selectfont\u #1}
% get composite comma-below S and T from QX
% get Euro from QX
% Cedillas are not used in Romanian, except perhaps for Turkish names
% get the composite S and T cedilla from T1 just in case
\DeclareTextCommandDefault{\c}[1]{\fontencoding{T1}\selectfont\c #1}

Using these, the default encoding can be either T1 or QX. The gotcha
for which I needed to read ltoutenc's documentation is that simply
using \DeclareTextAccentDefault{\u}{T1} does not result in proper
using composites being used. The reason is that only the accent is
rendered in T1 encoding, the letter is still from QX if that's the
default encoding used. I've read in the Companion that this is the
intended behavior, even though it doesn't seem the most useful to me
since it will never use any composites that way.

This stuff is too simple to be worth a CTAN package, but I'll drop the
info on a web page somewhere...


More information about the tex-live mailing list