[tex-hyphen] Serbian (Serbo-Croation) hyphenation patterns.

Mojca Miklavec mojca.miklavec.lists at gmail.com
Mon Jun 16 09:11:26 CEST 2008


On Sun, Jun 15, 2008 at 10:09 PM, Dejan Muhamedagic wrote:
>
>> OK.
>> But we're really asking about the licence permission at the first
>> place.
>
> I agree to change the license to the LPPL.

Huraaay! Thanks a lot!

>> Any other modifications to the text can be added at any time.
>> Also, please take a short glimpse at the cyrillic patterns, just to
>> check if the conversion was OK. I hope it was, but it's always good to
>> have a second opinion.
>
> Just looked at my sources from which it is easy to produce the
> patterns for both latinic and cyrillic alphabets since it
> contains single characters for all sounds (i.e. there are no
> digraphs). Hence, it is much easier for me to just produce the
> new patterns from sources rather than wade through those patterns.

One question: are these sources a secret? Maybe you can make some
stricter licence for those, but it would be nice to store the sources
somewhere. Unless you really want to hide them from possible
commercial users (that's the case with Slovenian list of words).

Also, if anyone wants to improve your work later ...

>> > Therefore, I think that the prefix
>> > of the patterns file should remain "sh". Last time I looked,
>> > there has been proliferation of new names such as srhyph or
>> > hrhyph (though I believe that hrhyph was a different pattern set
>> > altogether).
>>
>> Yes, hrhyph are different patterns, and srhyph are cyrillic ones
>> (loaded by default together with latin labels!!!).
>>
>> But we're modifying + renaming all of them now, so shhyph would not be
>> used at all, so it makes no sense to try to modify & include it now.
>> Let's focus on new patterns instead.
>>
>> See http://www.tug.org/svn/texhyphen/trunk/tex/patterns/utf8/ for the
>
> That link doesn't work.

Sorry, Karl has asked me to prepare TDS-compliant structure. At the moment:

http://www.tug.org/svn/texhyphen/trunk/hyph-utf8/tex/generic/hyph-utf8/patterns/

>> new patterns. I did not commit yours yet since I'm waiting for your
>> approval to modify the licence, but the patterns would have been named
>> hyph-sr-latn.tex.
>
> I still disagree with the name. As I said, the patterns are not
> specific to Serbian. Nor to Croatian or Bosnian or whatever new
> language appears tomorrow.

So - there has been no Montenegrian langue registered officially yet?
:) :) :) :)

>> I think that if Bosnians decide to use them one day,
>> they can still add an entry to language.dat or load the patterns
>> inside Bosnian pattern loader.
>>
>> Unless you're willing to add support for Bosnian to Babel as well.
>> Even though I understand the Serbo-Croatian language(s), I don't hear
>> any difference between them (I do not distinguish them), so I cannot
>> be of any help here.
>>
>> I'm pretty sure that if we call the patterns Serbocroatian now, some
>> people will pop up at some time complaining that the language doesn't
>> exist any more and they will try to convince Karl to rename them. A
>> similar situation with "Norwegian".
>
> It is up to you to decide and I'm not going to try to enforce
> particular names. I'll just upload the new version of the patterns
> to CTAN with the new license along with a cyrillics version.

I have a little request in this particular case. I would prefer it
much more if you could send me the two files, and I will put them into
the svn repository (they need to be UTF-8 encoded, no tex control
sequences, no catcode changes, no grouping, no messages ... only
comments, \patterns{...} and \hyphenation{...}).

The plan is to eventually get rid of the old files in distributions
(maybe not so soon), and to use proper unicode files instead, that can
be understood by all the TeX engines, including the cutting-edge XeTeX
and LuaTeX.

The files from repository from
    http://www.tug.org/svn/texhyphen/trunk/hyph-utf8/tex/generic/hyph-utf8/patterns/
or actually the whole
    http://www.tug.org/svn/texhyphen/trunk/hyph-utf8
will be put on CTAN and used in TeX Live 2008, so it makes much more
sense to keep these new files up to date. If you put a new file to
CTAN, it probably won't be on CTAN, and I would need to convert it
again if you don't provide a proper format for it.

>> A really nice thing to do would be adding support for Cyrilic script
>> for Serbian to babel though.
>
> I'll take a look at it, though I believe that somebody already did
> that before.

That's quite possible, but there are no files on TeX Live, I guess not
even on CTAN.
I've seen some conversations in forums saying something like: "take
this file from here and do this and that ..."
I forgot the exact details, but people were complaining that
distributions do not support cyrillic script.

Thanks a lot,
     Mojca


More information about the tex-hyphen mailing list