[tex-hyphen] Lao Word Wrapping

Brian Wilson bountonw at gmail.com
Thu Apr 29 06:53:14 CEST 2010


It seems that I may have reinvented the wheel (and created an inferior
model.)

For a pdf explanation of Lao syllabification check this link

http://www.tcllab.org/events/uploads/valaxay-lao.pdf

Thank you,

Brian Wilson

On Wed, Apr 28, 2010 at 5:00 PM, <tex-hyphen-request at tug.org> wrote:

> Send tex-hyphen mailing list submissions to
>        tex-hyphen at tug.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://tug.org/mailman/listinfo/tex-hyphen
> or, via email, send a message with subject or body 'help' to
>        tex-hyphen-request at tug.org
>
> You can reach the person managing the list at
>        tex-hyphen-owner at tug.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of tex-hyphen digest..."
>
>
> Today's Topics:
>
>   1. Re: tex patterns as lua files (Mojca Miklavec)
>   2. Re: tex patterns as lua files (Karl Berry)
>   3. Lao Word wrap (Brian Wilson)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 27 Apr 2010 15:04:21 +0200
> From: Mojca Miklavec <mojca.miklavec.lists at gmail.com>
> To: "About TeX hyphenation patterns." <tex-hyphen at tug.org>
> Subject: Re: [tex-hyphen] tex patterns as lua files
> Message-ID:
>        <o2k6faad9f01004270604vf6c93ff7na8cbdb8ac6df808f at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> On Tue, Apr 27, 2010 at 13:06, Manuel P?gouri?-Gonnard wrote:
> > Le 27/04/2010 12:26, Mojca Miklavec a ?crit :
> >
> >> What I would really like to know before doing the change is:
> >>
> >> 1.) Which patterns should be default for any other program
> >> (javascript, perl etc.) outside of TeX?
> >
> > I guess it's usenglishmax. The Knuthian version matters mainly in the
> > nearly-frozen part of the TeX world.
>
> OK. If others agree ...
>
> >> 2.) Do you need/want (two questions) Knuth's hyphen.tex patterns in
> >> "plain" format as well?
> >>
> > One could always special-case english since we're going to do it at some
> > points anyway, but it would be a bit more easy for us if everything is
> > uniform.
> >
> > While we're at it, there's also a few other hyphenation files that are
> not
> > in the normal form hypf-XX.tex + loadhyph-XX.tex + all the nice txt files
> > you kindly prepared for us. Some are from hyphen-base, namely
> dumyhyph.tex
> > and zerohyph.tex. Again, we can special-case them in our code, our you
> can
> > provide .txt version of them (with an entry in languages.lua.dat) we
> would
> > make a bit more work for you but would result in a cleaner code for
> loading.
> >
> > It's mainly up to you to evaluate if you think those files belong to
> > texhyphen or not. I don't mind doing the little additional Lua & TeX
> coding
> > to treat them specially if needed. (Actually, I already know how I would
> do
> > it for hyphen.cfg, and I didn't look too closely at etex.src yet but I
> know
> > it's possible too.)
>
> As far as dummy and zero are concerned, what do you think about the
> idea of creating a separate folder with appropriate txt files for
> those two languages? LuaTeX won't care about location and others that
> might be willing to use the repository won't have to create special
> cases for dummy/zero files in that folder.
>
> Of course the entry for those two can be added to language.dat.lua.
>
> As far as
>
> > (There are also other files that end up being mentioned in TL's full
> > language.dat but ae coming from other sources. We (meaning ?lie and I)
> need
> > to do something about that, but I propose postponing the discussion about
> > them, since we're already dicsussing a lot of things at the same time).
>
> - If you mean arabic and others, it's no problem to add an entry to
> that lua file.
> - If you mean ibycus, you probably don't want to support it in LuaTeX
> - If you mean the Germans with their timestamped patterns, we may
> postpone the discussion; in LuaTeX you would probably want to go for a
> completly different route than the current approach anyway.
> - There are also Javier's ideas about different subsets of patterns in
> LuaTeX that we might want to consider.
> - And there are some languages that have zillions of versions of
> patterns (like Russians etc.).
>
> Anything else?
>
> >>> or it can also be done on LaTeX's side, I can
> >>> modify the table accordingly. What would be the best?
> >>
> >> I'll respond once I know the answer to the two questions above. The
> >> table will be modified in either case and will include USenglish
> >> synonym. The question is only whether we should duplicate hyphen.tex
> >> in our repository and if yes, which patterns should take precedence
> >> (of having no -x-something extension). The lua table will be modified
> >> accordingly from languages.rb database.
> >>
> > IMO, for the rest of the world, usenglishmax is the canonical version for
> US
> > english. I guess you want to reflect that in the code/filename by making
> it
> > en-us, and Knuth patterns en-US-x-knuth-original.
> >
> > What is sure is, the logical name "english" *must* be the knuthian
> patterns
> > (= hyphen.tex = en-US-x-knuth-original), usenglish, USenglish and
> american
> > have to be synonyms of this one, and the logical name "usenglishmax"
> needs
> > to be ushyphmax.tex (ak en-US in the new codes if you follow my
> suggestion).
> >
> > With current language.dat.lua, "english" points to en-US which is
> formerly
> > ushyphmax, which means not Knuthian patterns, and that needs to be
> changed,
> > regardless of what you decide for the rest.
>
> I fully agree with that. All I wanted to know was how to change that.
>
> Mojca
>
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 27 Apr 2010 22:38:41 GMT
> From: karl at freefriends.org (Karl Berry)
> To: tex-hyphen at tug.org
> Subject: Re: [tex-hyphen] tex patterns as lua files
> Message-ID: <201004272238.o3RMcfOk027053 at f7.net>
>
>    patterns/txt  (or data or plaintext or raw or ...)
>
> txt seems like a nice choice here.
>
>    > 1.) Which patterns should be default for any other program
>    > (javascript, perl etc.) outside of TeX?
>    I guess it's usenglishmax.
>
> I don't disagree exactly, but what "other programs" are we talking
> about?  Or are you talking about use of our patterns in completely
> different programs (e.g., FOP)?
>
>    The question is only whether we should duplicate hyphen.tex
>
> Whether you duplicate hyphen.tex in your repository is a matter for your
> convenience.  In TeX Live, I think hyphen.tex should remain as part of
> hyphen-base.  So if you include it, we'll just remove it when importing
> into TL (which is no problem to do).
>
>    So again something for Karl: what's the best place for the following
> file?
>
> http://tug.org/svn/texhyphen/branches/luatex/TL/texmf/tex/generic/config/language.dat.lua
>
> Since the filenames are unique (....lua) it doesn't seem to matter much.
> tex/generic/hyph-utf8/luatex/* maybe?  Manuel?
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 28 Apr 2010 16:29:54 +0700
> From: Brian Wilson <bountonw at gmail.com>
> To: tex-hyphen at tug.org, xetex at tug.org
> Subject: [tex-hyphen] Lao Word wrap
> Message-ID:
>        <x2vee9be39b1004280229q9b9cf4a3s2150e686988c6192 at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Attached is a humble attempt at Lao syllabication rules in the hopes for
> Lao
> integration with TeX.
>
> I am sending this to the tex-hyphen list, and CCing the xetex list as a
> lengthy discussion regarding this subject occurred there during the last
> couple of weeks.
>
> I will be happy to work with the group in tweaking this and running tests.
>
> Thank you,
>
> --
> Brian Wilson, Director
> Asia-Pacific International University Translation Center
> _____________
>
> I have a new blog!! http://tc4asia.org/wpblog
>
> "He hath shewed thee, O man, what is good; and what doth the LORD require
> of
> thee , but to do justly, and to love mercy, and to walk humbly with thy
> God."  Micah 6:8
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://tug.org/pipermail/tex-hyphen/attachments/20100428/3b37e1cd/attachment-0001.html
> >
> -------------- next part --------------
> The following is a brief sketch of the syllabification rules in Lao. My
> apologies for not using standard conventions.  Feel free to edit.
>
> On the most basic level of word-wrapping, syllables should never be split.
>
> Lao syllables consist of
>        1. Beginning Consonant (bC) [required]
>        2. Secondary Beginning Consonant (sbC) [for consonant clusters]
>        3. Vowel (V) [required]
>        4. Tone Mark (T) [The order of 3 and 4 can be reversed]
>        5. Final Consonant (fC)
>        6. Extra Final Consonant (efC)
>        7. galan (g)
>
> ##########
> ##########
> Consonants and consonant clusters that can begin a syllable.
>        1. ? 0E81
>                (1) ?? 0E81 + 0EA3 [uncommon]
>                (2) ?? 0E81 + 0EA5 [uncommon]
>                (3) ?? 0E81 + 0EA7
>                (4) ?? 0E81 + 0EBC [uncommon]
>
>        2. ? 0E82
>                        (1) ?? 0E82 + 0EA3 [uncommon]
>                        (2) ?? 0E82 + 0EA5 [uncommon]
>                        (3) ?? 0E82 + 0EA7
>                        (4) ?? 0E82 + 0EBC [uncommon]
>
>        3. ? 0E84
>                        (1) ?? 0E84 + 0EA3 [uncommon]
>                        (2) ?? 0E84 + 0EA5 [uncommon]
>                        (3) ?? 0E84 + 0EA7
>                        (4) ?? 0E84 + 0EBC [uncommon]
>
>        4. ? 0E87
>
>        5. ? 0E88
>
>        6. ? 0E89
>
>        7. ? 0E80
>
>        8. ? 0E94
>                (1) ?? 0E94 + 0EA3 [uncommon]
>
>        9. ? 0E95
>                (1) ?? ?0E95 + 0EA3 [uncommon]
>
>        10. ? 0E96
>
>        11. ? 0E97
>
>        12. ? 0E99
>
>        13. ? 0E9A
>                        (1) ?? ?0E9A + 0EA3 [uncommon]
>                        (2) ?? 0E9A + 0EA5  [uncommon]
>                        (3) ?? 0E9A + 0EBC  [uncommon]
>
>        14. ? 0E9B
>                        (1) ?? ?0E9B + 0EA3 [uncommon]
>                        (2) ?? 0E9B + 0EA5  [uncommon]
>                        (3) ?? 0E9B + 0EBC  [uncommon]
>
>        15. ? 0E9C
>
>        16. ? 0E9D
>                (1) ?? 0E9D + 0EA3
>                (2) ?? 0E9D + 0EBC
>
>        17. ? 0E9E
>
>        18. ? 0E9F
>
>        19. ? 0EA1
>
>        20. ? 0EA2
>
>        21. ? 0EA3
>
>        22. ? 0EA5
>
>        23. ? 0EA7
>
>        24. ? 0EAA
>                (1) ?? 0E81 + 0EA3 [uncommon]
>                (2) ?? 0E81 + 0EA5 [uncommon]
>                (3) ?? 0E81 + 0EA7
>                (4) ?? 0E81 + 0EBC [uncommon]
>
>        25. ? 0EAB
>                (1) ?? 0EAB + 0E87
>                (2) ?? 0EAB + 0E99 [This is uncommon as it has its own
> character, see below]
>                (3) ?? 0EAB + 0E8D
>                (4) ?? 0EAB + 0EA1 [This is uncommon as it has its own
> character, see below]
>                (5) ?? 0EAB + 0EA3 [uncommon]
>                (6) ?? 0EAB + 0EA5
>                (7) ?? 0EAB + 0EA7
>                (8)  ?? 0EAB + 0EBC
>
>        26. ? 0EAD
>
>        27. ? 0EAE [my mac is rendering this the same as 0EA3, shame on it]
>
>        28. ? 0EDC
>
>        29.? 0EDD
>
> ############
> ############
> Consonants that commonly end a syllable
>        1. ? 0E81
>        2. ? 0E87
>        3. ? 0E8D [This is a /y/ and acts as a semivowel in certain
> constructions that will be explained later]
>        4. ? 0E94
>        5. ? 0EA1
>        6. ? 0E99
>        7. ? 0E9A
>        8. ? 0EA7  [This is a /w/ and acts as a semivowel in certain
> constructions that will be explained later]
>
> ############
> ############
> Consonants that could conceivably end a syllable in rare occasions when
> transcribing certain foreign words.
>
>        1. ? 0E82
>        2. ? 0E84
>        3. ? 0E88
>        4. ? 0E89
>        5. ? 0E94
>        6. ? 0E95
>        7. ? 0E96
>        8. ? 0E97
>        9. ? 0E9B
>        10. ? 0E9C
>        11. ? 0E9D
>        12. ? 0E9E
>        13. ? 0E9F
>        14. ? 0EA1
>        15. ? 0EA3
>        16. ? 0EA5
>        17. ? 0EAA
>
> ############
> ############
> Consonants that can never end a syllable [unless followed immediately by
> the silencer 0ECC]
>        1. ? 0EAB
>        2. ? 0EA2
>        3. ? 0EAD
>        4. ? 0EAE
>        5.    ? 0EBC
>        6. ? 0EDC
>        7. ? 0EDD
> ############
> ############
> Extra final consonant
> In order to type foreign words, Lao adds  0ECC to extra final consonants.
> Every consonant but
>                1.    ? 0EBC
>                2. ? 0EDC
>                3. ? 0EDD]
> are theoretically possible with some more common than others.
>
> ############
> ############
> Vowels that are written before the beginning consonant [syllable breaks
> ALWAYS occur before these characters and NEVER occur after these characters]
>        1. ? 0EC0
>        2. ? 0EC1
>        3. ? 0EC2
>        4. ? 0EC3
>        5. ? 0EC4
>
> ############
> ############
> Vowels that are written after the beginning consonant [syllable breaks
> NEVER occur before these characters. Some vowels in this section and the
> proceeding section can be stacked. I can specify if necessary.]
>        1. ? 0EB0
>        2. ? 0EB2
>        3. ? 0EB3 [can also be written as 0ECD followed by 0EB2]
>        4.   ? 0EB4
>        5.   ?  0EB5
>        6.   ? 0EB6
>        7.   ? 0EB7
>        8.   ? 0EB8
>        9.   ? 0EB9
>        10.   ? 0ECD
>
> ############
> ############
> Vowels that are written between two consonants [syllable breaks NEVER occur
> before or after these characters]
>
>        1.    ? 0EB1 [The following character must be a consonant or 0EBD
> semi-vowel]
>        2.    ? 0EBB [The following character must be (an optional T marker)
> 1. consonant or 2. ? 0EB2 vowel when used in the /ow/ diphthong ( <0EC0>
> <bC> <(sbC)> <0EBB> <(T)> <0EBD>)  or 3.  ? 0EA7 semi-vowel when used in the
> /ua/ diphthong (Note that the ? may be followed by ? 0EB0 for the shortened
> version of this diphthong. <bC> <sbC> <0EBB> <(T)> <0EA7> <(0EB0)>)]
>
> ############
> ############
> Vowels that can't take a final consonants
>
>        1. ? 0EB0 [syllable break ALWAYS occurs after this character]
>        2.   ? 0ECD [syllable break ALWAYS occurs after this character or
> the optional tone mark immediately following it.]
>
>
> ############
> ############
> /ia/ Vowel and in old orthography /y/ which can replace the final ? 0E8D -
> see above
>
> 1. ? [can NEVER break before.  If it is a final /y/, then can break after]
>
> ############
> ############
> Tones. There are four tone marks that can sit on top of the initial
> consonant or on   ?  ?  ?  ?   ?   0EB4 - 0EB5 - 0EB6 - 0EB7 - 0ECD  (Note
> that 0EB5 and 0EB7 also part of diphthongs?see below) Breaks can NEVER occur
> before these.
>
>        1.   ? 0EC8
>        2.    ?  0EC9
>        3.    ?  0ECA
>        4.    ?  0ECB
>
> ############
> ############
> The silencer?a mark placed on a consonant rendering it silent. Only used to
> write foreign words. Usually placed on the last letter of a syllable,
> although it can occur in the middle of a syllable when placed on a ? 0EA3 or
> ?  0EA5. A break can NEVER occur before the consonant upon which this
> character sits as a consonant containing this character (galan) can not
> begin a syllable.
>
>        1.  ?  0ECC
>
> ############
> ############
> The following punctuation marks can never begin a new line. Also not that
> English and French punctuation symbols and rules apply. ( Lao tends to add a
> space around punctuation as in French, but not always.  )  Quotes can be
> with " " or << >>
>        1.   ? 0EC6
>        2.      0EAF [Sorry, I can't find this on my unmarked mac keyboard]
>
> ############
> ############
> Vowel Diphthongs. Here is where it gets hairy as three consonant
> semi-vowels are involved. [See my explanation at the beginning of this
> document. Parentheses refer to optional characters)]
>        1.  <0EC0>  <bC> <(sbC)>  <0EB6 or 0EB7>  <(T)> <0EAD> <(fC)> [eua
> vowel. Note that the beginning consonant is in the middle]
>
> [Well, that wasn't so bad. I think that the other diphthongs are taken care
> of in previous rules and notes.]
>
> ############
> ############
> Consonants used as vowels between consonants.
>
>        1. ? 0EA7
>        2. ? 0EAD
>
> [If ?|? is preceded by a consonant (note optional tone mark) and followed
> immediately by a consonant that is not followed by a vowel or tone mark then
> consider C(T)?|?C to be a syllable.]
>
> ############
> ############
> Yeah. The end.
>
> ------------------------------
>
> _______________________________________________
> tex-hyphen mailing list
> tex-hyphen at tug.org
> http://tug.org/mailman/listinfo/tex-hyphen
>
>
> End of tex-hyphen Digest, Vol 21, Issue 10
> ******************************************
>



-- 
Brian Wilson, Director
Asia-Pacific International University Translation Center
_____________

I have a new blog!! http://tc4asia.org/wpblog

"He hath shewed thee, O man, what is good; and what doth the LORD require of
thee , but to do justly, and to love mercy, and to walk humbly with thy
God."  Micah 6:8
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tex-hyphen/attachments/20100429/fa27a791/attachment-0001.html>


More information about the tex-hyphen mailing list