<div>It seems that I may have reinvented the wheel (and created an inferior model.)<br></div><div><br></div><div>For a pdf explanation of Lao syllabification check this link</div><div><br></div><div><a href="http://www.tcllab.org/events/uploads/valaxay-lao.pdf">http://www.tcllab.org/events/uploads/valaxay-lao.pdf</a></div>
<div><br></div><div>Thank you,</div><div><br></div><div>Brian Wilson</div><div><br></div><div class="gmail_quote">On Wed, Apr 28, 2010 at 5:00 PM, <span dir="ltr"><<a href="mailto:tex-hyphen-request@tug.org">tex-hyphen-request@tug.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Send tex-hyphen mailing list submissions to<br>
<a href="mailto:tex-hyphen@tug.org">tex-hyphen@tug.org</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="http://tug.org/mailman/listinfo/tex-hyphen" target="_blank">http://tug.org/mailman/listinfo/tex-hyphen</a><br>
or, via email, send a message with subject or body 'help' to<br>
<a href="mailto:tex-hyphen-request@tug.org">tex-hyphen-request@tug.org</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:tex-hyphen-owner@tug.org">tex-hyphen-owner@tug.org</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of tex-hyphen digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
1. Re: tex patterns as lua files (Mojca Miklavec)<br>
2. Re: tex patterns as lua files (Karl Berry)<br>
3. Lao Word wrap (Brian Wilson)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Tue, 27 Apr 2010 15:04:21 +0200<br>
From: Mojca Miklavec <<a href="mailto:mojca.miklavec.lists@gmail.com">mojca.miklavec.lists@gmail.com</a>><br>
To: "About TeX hyphenation patterns." <<a href="mailto:tex-hyphen@tug.org">tex-hyphen@tug.org</a>><br>
Subject: Re: [tex-hyphen] tex patterns as lua files<br>
Message-ID:<br>
<<a href="mailto:o2k6faad9f01004270604vf6c93ff7na8cbdb8ac6df808f@mail.gmail.com">o2k6faad9f01004270604vf6c93ff7na8cbdb8ac6df808f@mail.gmail.com</a>><br>
Content-Type: text/plain; charset=UTF-8<br>
<br>
On Tue, Apr 27, 2010 at 13:06, Manuel P?gouri?-Gonnard wrote:<br>
> Le 27/04/2010 12:26, Mojca Miklavec a ?crit :<br>
><br>
>> What I would really like to know before doing the change is:<br>
>><br>
>> 1.) Which patterns should be default for any other program<br>
>> (javascript, perl etc.) outside of TeX?<br>
><br>
> I guess it's usenglishmax. The Knuthian version matters mainly in the<br>
> nearly-frozen part of the TeX world.<br>
<br>
OK. If others agree ...<br>
<br>
>> 2.) Do you need/want (two questions) Knuth's hyphen.tex patterns in<br>
>> "plain" format as well?<br>
>><br>
> One could always special-case english since we're going to do it at some<br>
> points anyway, but it would be a bit more easy for us if everything is<br>
> uniform.<br>
><br>
> While we're at it, there's also a few other hyphenation files that are not<br>
> in the normal form hypf-XX.tex + loadhyph-XX.tex + all the nice txt files<br>
> you kindly prepared for us. Some are from hyphen-base, namely dumyhyph.tex<br>
> and zerohyph.tex. Again, we can special-case them in our code, our you can<br>
> provide .txt version of them (with an entry in languages.lua.dat) we would<br>
> make a bit more work for you but would result in a cleaner code for loading.<br>
><br>
> It's mainly up to you to evaluate if you think those files belong to<br>
> texhyphen or not. I don't mind doing the little additional Lua & TeX coding<br>
> to treat them specially if needed. (Actually, I already know how I would do<br>
> it for hyphen.cfg, and I didn't look too closely at etex.src yet but I know<br>
> it's possible too.)<br>
<br>
As far as dummy and zero are concerned, what do you think about the<br>
idea of creating a separate folder with appropriate txt files for<br>
those two languages? LuaTeX won't care about location and others that<br>
might be willing to use the repository won't have to create special<br>
cases for dummy/zero files in that folder.<br>
<br>
Of course the entry for those two can be added to language.dat.lua.<br>
<br>
As far as<br>
<br>
> (There are also other files that end up being mentioned in TL's full<br>
> language.dat but ae coming from other sources. We (meaning ?lie and I) need<br>
> to do something about that, but I propose postponing the discussion about<br>
> them, since we're already dicsussing a lot of things at the same time).<br>
<br>
- If you mean arabic and others, it's no problem to add an entry to<br>
that lua file.<br>
- If you mean ibycus, you probably don't want to support it in LuaTeX<br>
- If you mean the Germans with their timestamped patterns, we may<br>
postpone the discussion; in LuaTeX you would probably want to go for a<br>
completly different route than the current approach anyway.<br>
- There are also Javier's ideas about different subsets of patterns in<br>
LuaTeX that we might want to consider.<br>
- And there are some languages that have zillions of versions of<br>
patterns (like Russians etc.).<br>
<br>
Anything else?<br>
<br>
>>> or it can also be done on LaTeX's side, I can<br>
>>> modify the table accordingly. What would be the best?<br>
>><br>
>> I'll respond once I know the answer to the two questions above. The<br>
>> table will be modified in either case and will include USenglish<br>
>> synonym. The question is only whether we should duplicate hyphen.tex<br>
>> in our repository and if yes, which patterns should take precedence<br>
>> (of having no -x-something extension). The lua table will be modified<br>
>> accordingly from languages.rb database.<br>
>><br>
> IMO, for the rest of the world, usenglishmax is the canonical version for US<br>
> english. I guess you want to reflect that in the code/filename by making it<br>
> en-us, and Knuth patterns en-US-x-knuth-original.<br>
><br>
> What is sure is, the logical name "english" *must* be the knuthian patterns<br>
> (= hyphen.tex = en-US-x-knuth-original), usenglish, USenglish and american<br>
> have to be synonyms of this one, and the logical name "usenglishmax" needs<br>
> to be ushyphmax.tex (ak en-US in the new codes if you follow my suggestion).<br>
><br>
> With current language.dat.lua, "english" points to en-US which is formerly<br>
> ushyphmax, which means not Knuthian patterns, and that needs to be changed,<br>
> regardless of what you decide for the rest.<br>
<br>
I fully agree with that. All I wanted to know was how to change that.<br>
<br>
Mojca<br>
<br>
<br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Tue, 27 Apr 2010 22:38:41 GMT<br>
From: <a href="mailto:karl@freefriends.org">karl@freefriends.org</a> (Karl Berry)<br>
To: <a href="mailto:tex-hyphen@tug.org">tex-hyphen@tug.org</a><br>
Subject: Re: [tex-hyphen] tex patterns as lua files<br>
Message-ID: <<a href="mailto:201004272238.o3RMcfOk027053@f7.net">201004272238.o3RMcfOk027053@f7.net</a>><br>
<br>
patterns/txt (or data or plaintext or raw or ...)<br>
<br>
txt seems like a nice choice here.<br>
<br>
> 1.) Which patterns should be default for any other program<br>
> (javascript, perl etc.) outside of TeX?<br>
I guess it's usenglishmax.<br>
<br>
I don't disagree exactly, but what "other programs" are we talking<br>
about? Or are you talking about use of our patterns in completely<br>
different programs (e.g., FOP)?<br>
<br>
The question is only whether we should duplicate hyphen.tex<br>
<br>
Whether you duplicate hyphen.tex in your repository is a matter for your<br>
convenience. In TeX Live, I think hyphen.tex should remain as part of<br>
hyphen-base. So if you include it, we'll just remove it when importing<br>
into TL (which is no problem to do).<br>
<br>
So again something for Karl: what's the best place for the following file?<br>
<a href="http://tug.org/svn/texhyphen/branches/luatex/TL/texmf/tex/generic/config/language.dat.lua" target="_blank">http://tug.org/svn/texhyphen/branches/luatex/TL/texmf/tex/generic/config/language.dat.lua</a><br>
<br>
Since the filenames are unique (....lua) it doesn't seem to matter much.<br>
tex/generic/hyph-utf8/luatex/* maybe? Manuel?<br>
<br>
<br>
------------------------------<br>
<br>
Message: 3<br>
Date: Wed, 28 Apr 2010 16:29:54 +0700<br>
From: Brian Wilson <<a href="mailto:bountonw@gmail.com">bountonw@gmail.com</a>><br>
To: <a href="mailto:tex-hyphen@tug.org">tex-hyphen@tug.org</a>, <a href="mailto:xetex@tug.org">xetex@tug.org</a><br>
Subject: [tex-hyphen] Lao Word wrap<br>
Message-ID:<br>
<<a href="mailto:x2vee9be39b1004280229q9b9cf4a3s2150e686988c6192@mail.gmail.com">x2vee9be39b1004280229q9b9cf4a3s2150e686988c6192@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="iso-8859-1"<br>
<br>
Attached is a humble attempt at Lao syllabication rules in the hopes for Lao<br>
integration with TeX.<br>
<br>
I am sending this to the tex-hyphen list, and CCing the xetex list as a<br>
lengthy discussion regarding this subject occurred there during the last<br>
couple of weeks.<br>
<br>
I will be happy to work with the group in tweaking this and running tests.<br>
<br>
Thank you,<br>
<br>
--<br>
Brian Wilson, Director<br>
Asia-Pacific International University Translation Center<br>
_____________<br>
<br>
I have a new blog!! <a href="http://tc4asia.org/wpblog" target="_blank">http://tc4asia.org/wpblog</a><br>
<br>
"He hath shewed thee, O man, what is good; and what doth the LORD require of<br>
thee , but to do justly, and to love mercy, and to walk humbly with thy<br>
God." Micah 6:8<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://tug.org/pipermail/tex-hyphen/attachments/20100428/3b37e1cd/attachment-0001.html" target="_blank">http://tug.org/pipermail/tex-hyphen/attachments/20100428/3b37e1cd/attachment-0001.html</a>><br>
-------------- next part --------------<br>
The following is a brief sketch of the syllabification rules in Lao. My apologies for not using standard conventions. Feel free to edit.<br>
<br>
On the most basic level of word-wrapping, syllables should never be split.<br>
<br>
Lao syllables consist of<br>
1. Beginning Consonant (bC) [required]<br>
2. Secondary Beginning Consonant (sbC) [for consonant clusters]<br>
3. Vowel (V) [required]<br>
4. Tone Mark (T) [The order of 3 and 4 can be reversed]<br>
5. Final Consonant (fC)<br>
6. Extra Final Consonant (efC)<br>
7. galan (g)<br>
<br>
##########<br>
##########<br>
Consonants and consonant clusters that can begin a syllable.<br>
1. ? 0E81<br>
(1) ?? 0E81 + 0EA3 [uncommon]<br>
(2) ?? 0E81 + 0EA5 [uncommon]<br>
(3) ?? 0E81 + 0EA7<br>
(4) ?? 0E81 + 0EBC [uncommon]<br>
<br>
2. ? 0E82<br>
(1) ?? 0E82 + 0EA3 [uncommon]<br>
(2) ?? 0E82 + 0EA5 [uncommon]<br>
(3) ?? 0E82 + 0EA7<br>
(4) ?? 0E82 + 0EBC [uncommon]<br>
<br>
3. ? 0E84<br>
(1) ?? 0E84 + 0EA3 [uncommon]<br>
(2) ?? 0E84 + 0EA5 [uncommon]<br>
(3) ?? 0E84 + 0EA7<br>
(4) ?? 0E84 + 0EBC [uncommon]<br>
<br>
4. ? 0E87<br>
<br>
5. ? 0E88<br>
<br>
6. ? 0E89<br>
<br>
7. ? 0E80<br>
<br>
8. ? 0E94<br>
(1) ?? 0E94 + 0EA3 [uncommon]<br>
<br>
9. ? 0E95<br>
(1) ?? ?0E95 + 0EA3 [uncommon]<br>
<br>
10. ? 0E96<br>
<br>
11. ? 0E97<br>
<br>
12. ? 0E99<br>
<br>
13. ? 0E9A<br>
(1) ?? ?0E9A + 0EA3 [uncommon]<br>
(2) ?? 0E9A + 0EA5 [uncommon]<br>
(3) ?? 0E9A + 0EBC [uncommon]<br>
<br>
14. ? 0E9B<br>
(1) ?? ?0E9B + 0EA3 [uncommon]<br>
(2) ?? 0E9B + 0EA5 [uncommon]<br>
(3) ?? 0E9B + 0EBC [uncommon]<br>
<br>
15. ? 0E9C<br>
<br>
16. ? 0E9D<br>
(1) ?? 0E9D + 0EA3<br>
(2) ?? 0E9D + 0EBC<br>
<br>
17. ? 0E9E<br>
<br>
18. ? 0E9F<br>
<br>
19. ? 0EA1<br>
<br>
20. ? 0EA2<br>
<br>
21. ? 0EA3<br>
<br>
22. ? 0EA5<br>
<br>
23. ? 0EA7<br>
<br>
24. ? 0EAA<br>
(1) ?? 0E81 + 0EA3 [uncommon]<br>
(2) ?? 0E81 + 0EA5 [uncommon]<br>
(3) ?? 0E81 + 0EA7<br>
(4) ?? 0E81 + 0EBC [uncommon]<br>
<br>
25. ? 0EAB<br>
(1) ?? 0EAB + 0E87<br>
(2) ?? 0EAB + 0E99 [This is uncommon as it has its own character, see below]<br>
(3) ?? 0EAB + 0E8D<br>
(4) ?? 0EAB + 0EA1 [This is uncommon as it has its own character, see below]<br>
(5) ?? 0EAB + 0EA3 [uncommon]<br>
(6) ?? 0EAB + 0EA5<br>
(7) ?? 0EAB + 0EA7<br>
(8) ?? 0EAB + 0EBC<br>
<br>
26. ? 0EAD<br>
<br>
27. ? 0EAE [my mac is rendering this the same as 0EA3, shame on it]<br>
<br>
28. ? 0EDC<br>
<br>
29.? 0EDD<br>
<br>
############<br>
############<br>
Consonants that commonly end a syllable<br>
1. ? 0E81<br>
2. ? 0E87<br>
3. ? 0E8D [This is a /y/ and acts as a semivowel in certain constructions that will be explained later]<br>
4. ? 0E94<br>
5. ? 0EA1<br>
6. ? 0E99<br>
7. ? 0E9A<br>
8. ? 0EA7 [This is a /w/ and acts as a semivowel in certain constructions that will be explained later]<br>
<br>
############<br>
############<br>
Consonants that could conceivably end a syllable in rare occasions when transcribing certain foreign words.<br>
<br>
1. ? 0E82<br>
2. ? 0E84<br>
3. ? 0E88<br>
4. ? 0E89<br>
5. ? 0E94<br>
6. ? 0E95<br>
7. ? 0E96<br>
8. ? 0E97<br>
9. ? 0E9B<br>
10. ? 0E9C<br>
11. ? 0E9D<br>
12. ? 0E9E<br>
13. ? 0E9F<br>
14. ? 0EA1<br>
15. ? 0EA3<br>
16. ? 0EA5<br>
17. ? 0EAA<br>
<br>
############<br>
############<br>
Consonants that can never end a syllable [unless followed immediately by the silencer 0ECC]<br>
1. ? 0EAB<br>
2. ? 0EA2<br>
3. ? 0EAD<br>
4. ? 0EAE<br>
5. ? 0EBC<br>
6. ? 0EDC<br>
7. ? 0EDD<br>
############<br>
############<br>
Extra final consonant<br>
In order to type foreign words, Lao adds 0ECC to extra final consonants.<br>
Every consonant but<br>
1. ? 0EBC<br>
2. ? 0EDC<br>
3. ? 0EDD]<br>
are theoretically possible with some more common than others.<br>
<br>
############<br>
############<br>
Vowels that are written before the beginning consonant [syllable breaks ALWAYS occur before these characters and NEVER occur after these characters]<br>
1. ? 0EC0<br>
2. ? 0EC1<br>
3. ? 0EC2<br>
4. ? 0EC3<br>
5. ? 0EC4<br>
<br>
############<br>
############<br>
Vowels that are written after the beginning consonant [syllable breaks NEVER occur before these characters. Some vowels in this section and the proceeding section can be stacked. I can specify if necessary.]<br>
1. ? 0EB0<br>
2. ? 0EB2<br>
3. ? 0EB3 [can also be written as 0ECD followed by 0EB2]<br>
4. ? 0EB4<br>
5. ? 0EB5<br>
6. ? 0EB6<br>
7. ? 0EB7<br>
8. ? 0EB8<br>
9. ? 0EB9<br>
10. ? 0ECD<br>
<br>
############<br>
############<br>
Vowels that are written between two consonants [syllable breaks NEVER occur before or after these characters]<br>
<br>
1. ? 0EB1 [The following character must be a consonant or 0EBD semi-vowel]<br>
2. ? 0EBB [The following character must be (an optional T marker) 1. consonant or 2. ? 0EB2 vowel when used in the /ow/ diphthong ( <0EC0> <bC> <(sbC)> <0EBB> <(T)> <0EBD>) or 3. ? 0EA7 semi-vowel when used in the /ua/ diphthong (Note that the ? may be followed by ? 0EB0 for the shortened version of this diphthong. <bC> <sbC> <0EBB> <(T)> <0EA7> <(0EB0)>)]<br>
<br>
############<br>
############<br>
Vowels that can't take a final consonants<br>
<br>
1. ? 0EB0 [syllable break ALWAYS occurs after this character]<br>
2. ? 0ECD [syllable break ALWAYS occurs after this character or the optional tone mark immediately following it.]<br>
<br>
<br>
############<br>
############<br>
/ia/ Vowel and in old orthography /y/ which can replace the final ? 0E8D - see above<br>
<br>
1. ? [can NEVER break before. If it is a final /y/, then can break after]<br>
<br>
############<br>
############<br>
Tones. There are four tone marks that can sit on top of the initial consonant or on ? ? ? ? ? 0EB4 - 0EB5 - 0EB6 - 0EB7 - 0ECD (Note that 0EB5 and 0EB7 also part of diphthongs?see below) Breaks can NEVER occur before these.<br>
<br>
1. ? 0EC8<br>
2. ? 0EC9<br>
3. ? 0ECA<br>
4. ? 0ECB<br>
<br>
############<br>
############<br>
The silencer?a mark placed on a consonant rendering it silent. Only used to write foreign words. Usually placed on the last letter of a syllable, although it can occur in the middle of a syllable when placed on a ? 0EA3 or ? 0EA5. A break can NEVER occur before the consonant upon which this character sits as a consonant containing this character (galan) can not begin a syllable.<br>
<br>
1. ? 0ECC<br>
<br>
############<br>
############<br>
The following punctuation marks can never begin a new line. Also not that English and French punctuation symbols and rules apply. ( Lao tends to add a space around punctuation as in French, but not always. ) Quotes can be with " " or << >><br>
1. ? 0EC6<br>
2. 0EAF [Sorry, I can't find this on my unmarked mac keyboard]<br>
<br>
############<br>
############<br>
Vowel Diphthongs. Here is where it gets hairy as three consonant semi-vowels are involved. [See my explanation at the beginning of this document. Parentheses refer to optional characters)]<br>
1. <0EC0> <bC> <(sbC)> <0EB6 or 0EB7> <(T)> <0EAD> <(fC)> [eua vowel. Note that the beginning consonant is in the middle]<br>
<br>
[Well, that wasn't so bad. I think that the other diphthongs are taken care of in previous rules and notes.]<br>
<br>
############<br>
############<br>
Consonants used as vowels between consonants.<br>
<br>
1. ? 0EA7<br>
2. ? 0EAD<br>
<br>
[If ?|? is preceded by a consonant (note optional tone mark) and followed immediately by a consonant that is not followed by a vowel or tone mark then consider C(T)?|?C to be a syllable.]<br>
<br>
############<br>
############<br>
Yeah. The end.<br>
<br>
------------------------------<br>
<br>
_______________________________________________<br>
tex-hyphen mailing list<br>
<a href="mailto:tex-hyphen@tug.org">tex-hyphen@tug.org</a><br>
<a href="http://tug.org/mailman/listinfo/tex-hyphen" target="_blank">http://tug.org/mailman/listinfo/tex-hyphen</a><br>
<br>
<br>
End of tex-hyphen Digest, Vol 21, Issue 10<br>
******************************************<br>
</blockquote></div><br><br clear="all"><br>-- <br>Brian Wilson, Director<br>Asia-Pacific International University Translation Center<br>_____________<br><br>I have a new blog!! <a href="http://tc4asia.org/wpblog">http://tc4asia.org/wpblog</a><br>
<br>"He hath shewed thee, O man, what is good; and what doth the LORD require of thee , but to do justly, and to love mercy, and to walk humbly with thy God." Micah 6:8<br>