[XeTeX] Lao Word wrap

Brian Wilson bountonw at gmail.com
Wed Apr 28 11:29:54 CEST 2010


Attached is a humble attempt at Lao syllabication rules in the hopes for Lao
integration with TeX.

I am sending this to the tex-hyphen list, and CCing the xetex list as a
lengthy discussion regarding this subject occurred there during the last
couple of weeks.

I will be happy to work with the group in tweaking this and running tests.

Thank you,

-- 
Brian Wilson, Director
Asia-Pacific International University Translation Center
_____________

I have a new blog!! http://tc4asia.org/wpblog

"He hath shewed thee, O man, what is good; and what doth the LORD require of
thee , but to do justly, and to love mercy, and to walk humbly with thy
God."  Micah 6:8
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/xetex/attachments/20100428/3b37e1cd/attachment.html>
-------------- next part --------------
The following is a brief sketch of the syllabification rules in Lao. My apologies for not using standard conventions.  Feel free to edit.

On the most basic level of word-wrapping, syllables should never be split.

Lao syllables consist of
	1. Beginning Consonant (bC) [required]
	2. Secondary Beginning Consonant (sbC) [for consonant clusters]
	3. Vowel (V) [required]
	4. Tone Mark (T) [The order of 3 and 4 can be reversed]
	5. Final Consonant (fC) 
	6. Extra Final Consonant (efC)
	7. galan (g)

##########
##########
Consonants and consonant clusters that can begin a syllable. 
	1. ? 0E81
		(1) ?? 0E81 + 0EA3 [uncommon]
		(2) ?? 0E81 + 0EA5 [uncommon]
		(3) ?? 0E81 + 0EA7 
		(4) ?? 0E81 + 0EBC [uncommon]

	2. ? 0E82
			(1) ?? 0E82 + 0EA3 [uncommon]
			(2) ?? 0E82 + 0EA5 [uncommon]
			(3) ?? 0E82 + 0EA7 
			(4) ?? 0E82 + 0EBC [uncommon]
			
	3. ? 0E84
			(1) ?? 0E84 + 0EA3 [uncommon]
			(2) ?? 0E84 + 0EA5 [uncommon]
			(3) ?? 0E84 + 0EA7 
			(4) ?? 0E84 + 0EBC [uncommon]
		
	4. ? 0E87
	
	5. ? 0E88
	
	6. ? 0E89
	
	7. ? 0E80
	
	8. ? 0E94
		(1) ?? 0E94 + 0EA3 [uncommon]
	
	9. ? 0E95
		(1) ?? ?0E95 + 0EA3 [uncommon]
		
	10. ? 0E96
	
	11. ? 0E97
	
	12. ? 0E99

	13. ? 0E9A
			(1) ?? ?0E9A + 0EA3 [uncommon]
			(2) ?? 0E9A + 0EA5  [uncommon]
			(3) ?? 0E9A + 0EBC  [uncommon]
			
	14. ? 0E9B
			(1) ?? ?0E9B + 0EA3 [uncommon]
			(2) ?? 0E9B + 0EA5  [uncommon]
			(3) ?? 0E9B + 0EBC  [uncommon]
			
	15. ? 0E9C
	
	16. ? 0E9D
		(1) ?? 0E9D + 0EA3
		(2) ?? 0E9D + 0EBC
		
	17. ? 0E9E
	
	18. ? 0E9F
	
	19. ? 0EA1
	
	20. ? 0EA2
	
	21. ? 0EA3
	
	22. ? 0EA5
	
	23. ? 0EA7
	
	24. ? 0EAA
		(1) ?? 0E81 + 0EA3 [uncommon]
		(2) ?? 0E81 + 0EA5 [uncommon]
		(3) ?? 0E81 + 0EA7 
		(4) ?? 0E81 + 0EBC [uncommon]
	
	25. ? 0EAB
		(1) ?? 0EAB + 0E87
		(2) ?? 0EAB + 0E99 [This is uncommon as it has its own character, see below]
		(3) ?? 0EAB + 0E8D
		(4) ?? 0EAB + 0EA1 [This is uncommon as it has its own character, see below]
		(5) ?? 0EAB + 0EA3 [uncommon]
		(6) ?? 0EAB + 0EA5
		(7) ?? 0EAB + 0EA7
		(8)  ?? 0EAB + 0EBC
	
	26. ? 0EAD
	
	27. ? 0EAE [my mac is rendering this the same as 0EA3, shame on it]
	
	28. ? 0EDC
	
	29.? 0EDD
	
############
############	
Consonants that commonly end a syllable
	1. ? 0E81
	2. ? 0E87
	3. ? 0E8D [This is a /y/ and acts as a semivowel in certain constructions that will be explained later]
	4. ? 0E94
	5. ? 0EA1
	6. ? 0E99
	7. ? 0E9A
	8. ? 0EA7  [This is a /w/ and acts as a semivowel in certain constructions that will be explained later]
	
############
############	
Consonants that could conceivably end a syllable in rare occasions when transcribing certain foreign words.

	1. ? 0E82
	2. ? 0E84
	3. ? 0E88
	4. ? 0E89
	5. ? 0E94
	6. ? 0E95
	7. ? 0E96
	8. ? 0E97
	9. ? 0E9B
	10. ? 0E9C
	11. ? 0E9D
	12. ? 0E9E
	13. ? 0E9F
	14. ? 0EA1
	15. ? 0EA3
	16. ? 0EA5
	17. ? 0EAA

############
############	
Consonants that can never end a syllable [unless followed immediately by the silencer 0ECC]
	1. ? 0EAB
	2. ? 0EA2
	3. ? 0EAD
	4. ? 0EAE
	5.    ? 0EBC
	6. ? 0EDC
	7. ? 0EDD
############
############	
Extra final consonant
In order to type foreign words, Lao adds  0ECC to extra final consonants. 
Every consonant but
		1.    ? 0EBC
		2. ? 0EDC
		3. ? 0EDD]
are theoretically possible with some more common than others.

############
############	
Vowels that are written before the beginning consonant [syllable breaks ALWAYS occur before these characters and NEVER occur after these characters]
	1. ? 0EC0
	2. ? 0EC1
	3. ? 0EC2
	4. ? 0EC3
	5. ? 0EC4

############
############	
Vowels that are written after the beginning consonant [syllable breaks NEVER occur before these characters. Some vowels in this section and the proceeding section can be stacked. I can specify if necessary.]
	1. ? 0EB0
	2. ? 0EB2
	3. ? 0EB3 [can also be written as 0ECD followed by 0EB2]
	4.   ? 0EB4
	5.   ?  0EB5
	6.   ? 0EB6
	7.   ? 0EB7
	8.   ? 0EB8
	9.   ? 0EB9
	10.   ? 0ECD

############
############	
Vowels that are written between two consonants [syllable breaks NEVER occur before or after these characters]

	1.    ? 0EB1 [The following character must be a consonant or 0EBD semi-vowel]
	2.    ? 0EBB [The following character must be (an optional T marker) 1. consonant or 2. ? 0EB2 vowel when used in the /ow/ diphthong ( <0EC0> <bC> <(sbC)> <0EBB> <(T)> <0EBD>)  or 3.  ? 0EA7 semi-vowel when used in the /ua/ diphthong (Note that the ? may be followed by ? 0EB0 for the shortened version of this diphthong. <bC> <sbC> <0EBB> <(T)> <0EA7> <(0EB0)>)]
	
############
############	
Vowels that can't take a final consonants
	
	1. ? 0EB0 [syllable break ALWAYS occurs after this character]
	2.   ? 0ECD [syllable break ALWAYS occurs after this character or the optional tone mark immediately following it.]
	
	
############
############	
/ia/ Vowel and in old orthography /y/ which can replace the final ? 0E8D - see above

1. ? [can NEVER break before.  If it is a final /y/, then can break after]

############
############	
Tones. There are four tone marks that can sit on top of the initial consonant or on   ?  ?  ?  ?   ?   0EB4 - 0EB5 - 0EB6 - 0EB7 - 0ECD  (Note that 0EB5 and 0EB7 also part of diphthongs?see below) Breaks can NEVER occur before these.

	1.   ? 0EC8
	2.    ?  0EC9
	3.    ?  0ECA
	4.    ?  0ECB
	
############
############	
The silencer?a mark placed on a consonant rendering it silent. Only used to write foreign words. Usually placed on the last letter of a syllable, although it can occur in the middle of a syllable when placed on a ? 0EA3 or ?  0EA5. A break can NEVER occur before the consonant upon which this character sits as a consonant containing this character (galan) can not begin a syllable.

	1.  ?  0ECC
	
############
############	
The following punctuation marks can never begin a new line. Also not that English and French punctuation symbols and rules apply. ( Lao tends to add a space around punctuation as in French, but not always.  )  Quotes can be with " " or << >>
	1.   ? 0EC6
	2.      0EAF [Sorry, I can't find this on my unmarked mac keyboard]

############
############	
Vowel Diphthongs. Here is where it gets hairy as three consonant semi-vowels are involved. [See my explanation at the beginning of this document. Parentheses refer to optional characters)]
	1.  <0EC0>  <bC> <(sbC)>  <0EB6 or 0EB7>  <(T)> <0EAD> <(fC)> [eua vowel. Note that the beginning consonant is in the middle]

[Well, that wasn't so bad. I think that the other diphthongs are taken care of in previous rules and notes.]

############
############	
Consonants used as vowels between consonants.

	1. ? 0EA7
	2. ? 0EAD
	
[If ?|? is preceded by a consonant (note optional tone mark) and followed immediately by a consonant that is not followed by a vowel or tone mark then consider C(T)?|?C to be a syllable.]

############
############	
Yeah. The end.


More information about the XeTeX mailing list