[tex-hyphen] Serbian (Serbo-Croation) hyphenation patterns.

Dejan Muhamedagic dejan at hello-penguin.com
Mon Jun 16 13:14:17 CEST 2008


On Mon, Jun 16, 2008 at 09:11:26AM +0200, Mojca Miklavec wrote:
> On Sun, Jun 15, 2008 at 10:09 PM, Dejan Muhamedagic wrote:
> >
> >> OK.
> >> But we're really asking about the licence permission at the first
> >> place.
> >
> > I agree to change the license to the LPPL.
> 
> Huraaay! Thanks a lot!
> 
> >> Any other modifications to the text can be added at any time.
> >> Also, please take a short glimpse at the cyrillic patterns, just to
> >> check if the conversion was OK. I hope it was, but it's always good to
> >> have a second opinion.
> >
> > Just looked at my sources from which it is easy to produce the
> > patterns for both latinic and cyrillic alphabets since it
> > contains single characters for all sounds (i.e. there are no
> > digraphs). Hence, it is much easier for me to just produce the
> > new patterns from sources rather than wade through those patterns.
> 
> One question: are these sources a secret? Maybe you can make some
> stricter licence for those, but it would be nice to store the sources
> somewhere. Unless you really want to hide them from possible
> commercial users (that's the case with Slovenian list of words).

No, they're not a secret though they have never been published.
It's a bunch of preprocessor (cpp) and awk and some really strange
looking files. And they are almost 20 years old. So, I guess that
I'd have to clean them up a bit for publication.

> Also, if anyone wants to improve your work later ...

Definitely.

> >> > Therefore, I think that the prefix
> >> > of the patterns file should remain "sh". Last time I looked,
> >> > there has been proliferation of new names such as srhyph or
> >> > hrhyph (though I believe that hrhyph was a different pattern set
> >> > altogether).
> >>
> >> Yes, hrhyph are different patterns, and srhyph are cyrillic ones
> >> (loaded by default together with latin labels!!!).
> >>
> >> But we're modifying + renaming all of them now, so shhyph would not be
> >> used at all, so it makes no sense to try to modify & include it now.
> >> Let's focus on new patterns instead.
> >>
> >> See http://www.tug.org/svn/texhyphen/trunk/tex/patterns/utf8/ for the
> >
> > That link doesn't work.
> 
> Sorry, Karl has asked me to prepare TDS-compliant structure. At the moment:
> 
> http://www.tug.org/svn/texhyphen/trunk/hyph-utf8/tex/generic/hyph-utf8/patterns/
> 
> >> new patterns. I did not commit yours yet since I'm waiting for your
> >> approval to modify the licence, but the patterns would have been named
> >> hyph-sr-latn.tex.
> >
> > I still disagree with the name. As I said, the patterns are not
> > specific to Serbian. Nor to Croatian or Bosnian or whatever new
> > language appears tomorrow.
> 
> So - there has been no Montenegrian langue registered officially yet?
> :) :) :) :)

I really don't know, but if not yet it's probably in the works :)

> >> I think that if Bosnians decide to use them one day,
> >> they can still add an entry to language.dat or load the patterns
> >> inside Bosnian pattern loader.
> >>
> >> Unless you're willing to add support for Bosnian to Babel as well.
> >> Even though I understand the Serbo-Croatian language(s), I don't hear
> >> any difference between them (I do not distinguish them), so I cannot
> >> be of any help here.
> >>
> >> I'm pretty sure that if we call the patterns Serbocroatian now, some
> >> people will pop up at some time complaining that the language doesn't
> >> exist any more and they will try to convince Karl to rename them. A
> >> similar situation with "Norwegian".
> >
> > It is up to you to decide and I'm not going to try to enforce
> > particular names. I'll just upload the new version of the patterns
> > to CTAN with the new license along with a cyrillics version.
> 
> I have a little request in this particular case. I would prefer it
> much more if you could send me the two files, and I will put them into
> the svn repository (they need to be UTF-8 encoded, no tex control
> sequences, no catcode changes, no grouping, no messages ... only
> comments, \patterns{...} and \hyphenation{...}).

OK. There used to be a checksum field. Is that still needed?

> The plan is to eventually get rid of the old files in distributions
> (maybe not so soon), and to use proper unicode files instead, that can
> be understood by all the TeX engines, including the cutting-edge XeTeX
> and LuaTeX.
> 
> The files from repository from
>     http://www.tug.org/svn/texhyphen/trunk/hyph-utf8/tex/generic/hyph-utf8/patterns/
> or actually the whole
>     http://www.tug.org/svn/texhyphen/trunk/hyph-utf8
> will be put on CTAN and used in TeX Live 2008, so it makes much more
> sense to keep these new files up to date. If you put a new file to
> CTAN, it probably won't be on CTAN, and I would need to convert it
> again if you don't provide a proper format for it.
> 
> >> A really nice thing to do would be adding support for Cyrilic script
> >> for Serbian to babel though.
> >
> > I'll take a look at it, though I believe that somebody already did
> > that before.
> 
> That's quite possible, but there are no files on TeX Live, I guess not
> even on CTAN.
> I've seen some conversations in forums saying something like: "take
> this file from here and do this and that ..."
> I forgot the exact details, but people were complaining that
> distributions do not support cyrillic script.

Right. I'll see what I can do. I suppose that it shouldn't be a
biggie.

> Thanks a lot,
>      Mojca

Cheers,

Dejan


More information about the tex-hyphen mailing list