[tex-hyphen] Newest GitHub additions into CTAN?

Stojan Trajanovski stojan.trajanovski at gmail.com
Thu Dec 31 00:45:33 CET 2020


Thanks Mojca and Arthur,

The community of Macedonian TeX/LaTeX users is generally growing during the
years and some of those people occasionally tex some things in Macedonian.

Here are some examples I am aware of:

1) A friend of mine recently re-texed a university Physics textbook in
Macedonian and he mentioned it made it working with some hassle for the
hyphenation.
2) This year, the leader of the macedonian team for an international
mathematics competition typed the student exam paper for the first time in
tex: https://artofproblemsolving.com/contests/cmc (In the past, many of
them either opted for MS Word or used Serbian/Bulgarian babel support.)
I cc-ed the former for a reference.

Their default choice is (still) an 8-bit engine. I think hyphenation for
8-bit will definitely be useful in the proposed form until they start
using unicode-aware engines (LuaTeX, XeTeX etc.).

Regards,
Stojan

On Wed, 30 Dec 2020 at 22:52, Mojca Miklavec <mojca.miklavec.lists at gmail.com>
wrote:

> On Wed, 30 Dec 2020 at 22:48, Arthur Reutenauer wrote:
> > It does sound like T2A is the best choice for Macedonian,
>
> While I agree that it might be the least problematic 8-bit encoding, I
> don't agree that it's the best choice in 2021.
>
> By supporting T2A you are actively educating the users to stick with
> T2A for NEW documents without them noticing tons of issues:
> - accents on those two letters will likely be misplaced (and as a
> consequence more ugly)
> - you don't get any kerning around those characters
> - words containing any of those two letters won't be hyphenated at all
> ... along with an extremely limited choice of available fonts, extreme
> difficulties to mix the document with other languages, with all other
> limitations of the 8-bit engines, ...
>
> I'm saying this because I've seen tons of hardcover books on
> bookshelves in my native language that you can easily recognise as
> being typeset using the wrong (OT1) encoding, with the caron on ccaron
> "heavily" misplaced, esp. when bold is used. (I was among those who
> kept using the wrong encoding for many years as well, and after I've
> learned about it, I probably haven't seen anyone using the correct
> encoding in the sources either.)
>
> The biggest problem is when things work just enough to make an
> impression of being ok, and then users don't feel the need to go one
> extra mile and learn how to make them perfect, even though they would
> prefer the second option if they were ever aware about it. Are users
> actually requesting 8-bit support, or is the addition some kind of a
> personal wish to satisfy everyone?
>
> That said, if there is real urgency to support 8-bit encodings ...
> we'll do it, of course.
>
> > Mojca will try to make an upload to CTAN by tomorrow
> > (Thursday) evening, otherwise we’ll work on it some time next week.
>
> This was the initial plan, but I can no longer promise to stick with
> it after noticing that what's in the repository at the moment is
> actually wrong.
>
> The initial patterns from Vasil that we found online were claiming to
> be using the T2A encoding, but the actual encoding was different.
> Arthur "reversed-engineered" the contents (with the help of the
> comments) by creating his own custom mapping that he provisionally
> called "Macedonian" and using it to convert the original into the
> final UTF-8 file. We were assuming that the patterns were actually
> used in the original form, but unless there was a custom font
> available somewhere, they were likely not working correctly with T2A.
> I looked at those original patterns again and it seems that the
> original patterns were using the cp-1251 encoding, so that they looked
> OK inside a text editor on Windows. Or at least those 7 letters that I
> checked seem to match cp-1251 exactly.
>
> If we really do need to support 8-bit encodings, I would keep the
> original UTF-8 patterns intact and just remove the incompatible
> patterns from the 8-bit version. But that 8-bit version needs to be
> done from scratch (using the existing scripts). The file currently in
> the master branch is apparently using cp-1251 and would not work as
> desired.
>
> Mojca
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/tex-hyphen/attachments/20201230/af1aad1d/attachment.html>


More information about the tex-hyphen mailing list.