[tex-hyphen] Lao Word wrap
Brian Wilson
bountonw at gmail.com
Wed Apr 28 11:29:54 CEST 2010
Attached is a humble attempt at Lao syllabication rules in the hopes for Lao
integration with TeX.
I am sending this to the tex-hyphen list, and CCing the xetex list as a
lengthy discussion regarding this subject occurred there during the last
couple of weeks.
I will be happy to work with the group in tweaking this and running tests.
Thank you,
--
Brian Wilson, Director
Asia-Pacific International University Translation Center
_____________
I have a new blog!! http://tc4asia.org/wpblog
"He hath shewed thee, O man, what is good; and what doth the LORD require of
thee , but to do justly, and to love mercy, and to walk humbly with thy
God." Micah 6:8
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tex-hyphen/attachments/20100428/3b37e1cd/attachment.html>
-------------- next part --------------
The following is a brief sketch of the syllabification rules in Lao. My apologies for not using standard conventions. Feel free to edit.
On the most basic level of word-wrapping, syllables should never be split.
Lao syllables consist of
1. Beginning Consonant (bC) [required]
2. Secondary Beginning Consonant (sbC) [for consonant clusters]
3. Vowel (V) [required]
4. Tone Mark (T) [The order of 3 and 4 can be reversed]
5. Final Consonant (fC)
6. Extra Final Consonant (efC)
7. galan (g)
##########
##########
Consonants and consonant clusters that can begin a syllable.
1. ? 0E81
(1) ?? 0E81 + 0EA3 [uncommon]
(2) ?? 0E81 + 0EA5 [uncommon]
(3) ?? 0E81 + 0EA7
(4) ?? 0E81 + 0EBC [uncommon]
2. ? 0E82
(1) ?? 0E82 + 0EA3 [uncommon]
(2) ?? 0E82 + 0EA5 [uncommon]
(3) ?? 0E82 + 0EA7
(4) ?? 0E82 + 0EBC [uncommon]
3. ? 0E84
(1) ?? 0E84 + 0EA3 [uncommon]
(2) ?? 0E84 + 0EA5 [uncommon]
(3) ?? 0E84 + 0EA7
(4) ?? 0E84 + 0EBC [uncommon]
4. ? 0E87
5. ? 0E88
6. ? 0E89
7. ? 0E80
8. ? 0E94
(1) ?? 0E94 + 0EA3 [uncommon]
9. ? 0E95
(1) ?? ?0E95 + 0EA3 [uncommon]
10. ? 0E96
11. ? 0E97
12. ? 0E99
13. ? 0E9A
(1) ?? ?0E9A + 0EA3 [uncommon]
(2) ?? 0E9A + 0EA5 [uncommon]
(3) ?? 0E9A + 0EBC [uncommon]
14. ? 0E9B
(1) ?? ?0E9B + 0EA3 [uncommon]
(2) ?? 0E9B + 0EA5 [uncommon]
(3) ?? 0E9B + 0EBC [uncommon]
15. ? 0E9C
16. ? 0E9D
(1) ?? 0E9D + 0EA3
(2) ?? 0E9D + 0EBC
17. ? 0E9E
18. ? 0E9F
19. ? 0EA1
20. ? 0EA2
21. ? 0EA3
22. ? 0EA5
23. ? 0EA7
24. ? 0EAA
(1) ?? 0E81 + 0EA3 [uncommon]
(2) ?? 0E81 + 0EA5 [uncommon]
(3) ?? 0E81 + 0EA7
(4) ?? 0E81 + 0EBC [uncommon]
25. ? 0EAB
(1) ?? 0EAB + 0E87
(2) ?? 0EAB + 0E99 [This is uncommon as it has its own character, see below]
(3) ?? 0EAB + 0E8D
(4) ?? 0EAB + 0EA1 [This is uncommon as it has its own character, see below]
(5) ?? 0EAB + 0EA3 [uncommon]
(6) ?? 0EAB + 0EA5
(7) ?? 0EAB + 0EA7
(8) ?? 0EAB + 0EBC
26. ? 0EAD
27. ? 0EAE [my mac is rendering this the same as 0EA3, shame on it]
28. ? 0EDC
29.? 0EDD
############
############
Consonants that commonly end a syllable
1. ? 0E81
2. ? 0E87
3. ? 0E8D [This is a /y/ and acts as a semivowel in certain constructions that will be explained later]
4. ? 0E94
5. ? 0EA1
6. ? 0E99
7. ? 0E9A
8. ? 0EA7 [This is a /w/ and acts as a semivowel in certain constructions that will be explained later]
############
############
Consonants that could conceivably end a syllable in rare occasions when transcribing certain foreign words.
1. ? 0E82
2. ? 0E84
3. ? 0E88
4. ? 0E89
5. ? 0E94
6. ? 0E95
7. ? 0E96
8. ? 0E97
9. ? 0E9B
10. ? 0E9C
11. ? 0E9D
12. ? 0E9E
13. ? 0E9F
14. ? 0EA1
15. ? 0EA3
16. ? 0EA5
17. ? 0EAA
############
############
Consonants that can never end a syllable [unless followed immediately by the silencer 0ECC]
1. ? 0EAB
2. ? 0EA2
3. ? 0EAD
4. ? 0EAE
5. ? 0EBC
6. ? 0EDC
7. ? 0EDD
############
############
Extra final consonant
In order to type foreign words, Lao adds 0ECC to extra final consonants.
Every consonant but
1. ? 0EBC
2. ? 0EDC
3. ? 0EDD]
are theoretically possible with some more common than others.
############
############
Vowels that are written before the beginning consonant [syllable breaks ALWAYS occur before these characters and NEVER occur after these characters]
1. ? 0EC0
2. ? 0EC1
3. ? 0EC2
4. ? 0EC3
5. ? 0EC4
############
############
Vowels that are written after the beginning consonant [syllable breaks NEVER occur before these characters. Some vowels in this section and the proceeding section can be stacked. I can specify if necessary.]
1. ? 0EB0
2. ? 0EB2
3. ? 0EB3 [can also be written as 0ECD followed by 0EB2]
4. ? 0EB4
5. ? 0EB5
6. ? 0EB6
7. ? 0EB7
8. ? 0EB8
9. ? 0EB9
10. ? 0ECD
############
############
Vowels that are written between two consonants [syllable breaks NEVER occur before or after these characters]
1. ? 0EB1 [The following character must be a consonant or 0EBD semi-vowel]
2. ? 0EBB [The following character must be (an optional T marker) 1. consonant or 2. ? 0EB2 vowel when used in the /ow/ diphthong ( <0EC0> <bC> <(sbC)> <0EBB> <(T)> <0EBD>) or 3. ? 0EA7 semi-vowel when used in the /ua/ diphthong (Note that the ? may be followed by ? 0EB0 for the shortened version of this diphthong. <bC> <sbC> <0EBB> <(T)> <0EA7> <(0EB0)>)]
############
############
Vowels that can't take a final consonants
1. ? 0EB0 [syllable break ALWAYS occurs after this character]
2. ? 0ECD [syllable break ALWAYS occurs after this character or the optional tone mark immediately following it.]
############
############
/ia/ Vowel and in old orthography /y/ which can replace the final ? 0E8D - see above
1. ? [can NEVER break before. If it is a final /y/, then can break after]
############
############
Tones. There are four tone marks that can sit on top of the initial consonant or on ? ? ? ? ? 0EB4 - 0EB5 - 0EB6 - 0EB7 - 0ECD (Note that 0EB5 and 0EB7 also part of diphthongs?see below) Breaks can NEVER occur before these.
1. ? 0EC8
2. ? 0EC9
3. ? 0ECA
4. ? 0ECB
############
############
The silencer?a mark placed on a consonant rendering it silent. Only used to write foreign words. Usually placed on the last letter of a syllable, although it can occur in the middle of a syllable when placed on a ? 0EA3 or ? 0EA5. A break can NEVER occur before the consonant upon which this character sits as a consonant containing this character (galan) can not begin a syllable.
1. ? 0ECC
############
############
The following punctuation marks can never begin a new line. Also not that English and French punctuation symbols and rules apply. ( Lao tends to add a space around punctuation as in French, but not always. ) Quotes can be with " " or << >>
1. ? 0EC6
2. 0EAF [Sorry, I can't find this on my unmarked mac keyboard]
############
############
Vowel Diphthongs. Here is where it gets hairy as three consonant semi-vowels are involved. [See my explanation at the beginning of this document. Parentheses refer to optional characters)]
1. <0EC0> <bC> <(sbC)> <0EB6 or 0EB7> <(T)> <0EAD> <(fC)> [eua vowel. Note that the beginning consonant is in the middle]
[Well, that wasn't so bad. I think that the other diphthongs are taken care of in previous rules and notes.]
############
############
Consonants used as vowels between consonants.
1. ? 0EA7
2. ? 0EAD
[If ?|? is preceded by a consonant (note optional tone mark) and followed immediately by a consonant that is not followed by a vowel or tone mark then consider C(T)?|?C to be a syllable.]
############
############
Yeah. The end.
More information about the tex-hyphen
mailing list