[tex-hyphen] descriptions of hyphen-language

Mojca Miklavec mojca.miklavec.lists at gmail.com
Mon Jun 20 10:14:31 CEST 2011


Hello,

The attached set of descriptions (apart from Greek and Ancient Greek
which still need some minor modifications) will probably go to TeX
Live into hyphen-language.tlpsrc files. Any comments welcome.

In particular I would like to ask for corrections/suggestions about
the English description (package contains just ushyphmax + british
patterns):

=== english ===
shortdesc English hyphenation patterns.
longdesc Additional hyphenation patterns for American and British
English in ASCII encoding.
longdesc American English patterns (usenglishmax) extend standard
Knuth's patterns,
longdesc trying to properly hyphenate a larger set of words by
sacrificing full backward compatibility.
longdesc British English patterns are based on dictionary provided by
Oxford University Press.


And I have a question as a non-native English speaker: does it sound
better to leave or omit the world "language" as shown in the following
two descriptions? (I wanted to avoid exact duplication of title, that
is is why the wording in longdesc is slightly different.)

=== spanish ===
shortdesc Spanish hyphenation patterns.
longdesc Hyphenation patterns for Spanish language in T1/EC and UTF-8 encoding.

=== swedish ===
shortdesc Swedish hyphenation patterns.
longdesc Hyphenation patterns for Swedish in T1/EC and UTF-8 encoding.


We will upload at least one more version of hyph-utf8 with updated
Hungarian patterns (I just noticed them yesterday) and fixed typos in
documentation. (It can be done tonight, as soon as mistakes in
descriptions are fixed or some improvements are made.)

Mojca
-------------- next part --------------
=== afrikaans ===
shortdesc Afrikaans hyphenation patterns.
longdesc Hyphenation patterns for Afrikaans language in T1/EC and UTF-8 encoding.
longdesc (OpenOffice includes older patterns created by a different author,
longdesc but the patterns packaged with TeX are considered superior in quality.)
longdesc Word list used to generate patterns with opatgen might be released in future.

=== ancientgreek ===
shortdesc Ancient Greek hyphenation patterns.
longdesc Hyphenation patterns for Ancient Greek.
longdesc The pattern file used for 8-bit TeX engines is grahyph5.tex, in Babel's LGR encoding,
longdesc that is not part of hyph-utf8.
longdesc Patterns in UTF-8 use two code positions for each of the vowels with acute accent
longdesc (a.k.a tonos, oxia), e.g., U+03AE, U+1F75 for eta.

=== arabic ===
shortdesc (No) Arabic hyphenation patterns.
longdesc Prevent hyphenation in Arabic.

=== armenian ===
shortdesc Armenian hyphenation patterns.
longdesc Hyphenation patterns for Armenian for Unicode engines.

=== basque ===
shortdesc Basque hyphenation patterns.
longdesc Hyphenation patterns for Basque language in T1/EC and UTF-8 encoding.

=== bulgarian ===
shortdesc Bulgarian hyphenation patterns.
longdesc Hyphenation patterns for Bulgarian language in T2A and UTF-8 encoding.

=== catalan ===
shortdesc Catalan hyphenation patterns.
longdesc Hyphenation patterns for Catalan language in T1/EC and UTF-8 encoding.

=== chinese ===
shortdesc Chinese pinyin hyphenation patterns.
longdesc Hyphenation patterns for transliterated Mandarin Chinese (pinyin) in T1/EC and UTF-8 encoding, unaccented.

=== coptic ===
shortdesc Coptic hyphenation patterns.
longdesc Hyphenation patterns for Coptic language in UTF-8 encoding
longdesc as well as in ASCII-based encoding for 8-bit engines.
longdesc The latter can only be used with special Coptic fonts (like CBcoptic).
longdesc The patterns are considered experimental.

=== croatian ===
shortdesc Croatian hyphenation patterns.
longdesc Hyphenation patterns for Croatian language in T1/EC and UTF-8 encoding.

=== czech ===
shortdesc Czech hyphenation patterns.
longdesc Hyphenation patterns for Czech language in T1/EC and UTF-8 encoding.
longdesc Original patterns 'czhyphen' are still distributed in 'csplain' package
longdesc and loaded with ISO Latin 2 encoding (IL2).

=== danish ===
shortdesc Danish hyphenation patterns.
longdesc Hyphenation patterns for Danish language in T1/EC and UTF-8 encoding.

=== dutch ===
shortdesc Dutch hyphenation patterns.
longdesc Hyphenation patterns for Dutch in T1/EC and UTF-8 encoding.
longdesc \lefthyphenmin and \righthyphenmin must both be > 1.
longdesc These patterns don't handle cases like menuutje > menu-tje, and don't hyphenate words
longdesc that have different hyphenations according to their meaning.

=== english ===
shortdesc English hyphenation patterns.
longdesc Additional hyphenation patterns for American and British English in ASCII encoding.
longdesc American English patterns (usenglishmax) extend standard Knuth's patterns,
longdesc trying to properly hyphenate a larger set of words by sacrificing full backward compatibility.
longdesc British English patterns are based on dictionary provided by Oxford University Press.

=== esperanto ===
shortdesc Esperanto hyphenation patterns.
longdesc Hyphenation patterns for Esperanto language in UTF-8 and ISO Latin 3 (IL3) encoding.
longdesc Note that TeX distributions usually don't ship any suitable fonts in that encoding,
longdesc so unless you create your own font support or want to use MlTeX,
longdesc using native UTF-8 engines is highly recommended.

=== estonian ===
shortdesc Estonian hyphenation patterns.
longdesc Hyphenation patterns for Estonian language in T1/EC and UTF-8 encoding.

=== ethiopic ===
shortdesc Hyphenation patterns for Ethiopic scripts.
longdesc Hyphenation patterns for languages written using the Ethiopic script, in UTF-8.
longdesc They are not supposed to be linguistically relevant in all cases
longdesc and should, for proper typography, be replaced by files tailored
longdesc to individual languages.

=== farsi ===
shortdesc (No) Persian hyphenation patterns.
longdesc Prevent hyphenation in Persian.

=== finnish ===
shortdesc Finnish hyphenation patterns.
longdesc Hyphenation patterns for Finnish in T1/EC and UTF-8.

=== french ===
shortdesc French hyphenation patterns.
longdesc Hyphenation patterns for French in T1/EC and UTF-8 encoding.

=== galician ===
shortdesc Galician hyphenation patterns.
longdesc Hyphenation patterns for Galician in T1/EC and UTF-8 encoding.
longdesc Generated automatically from the mkpattern utility.

=== german ===
shortdesc German hyphenation patterns.
longdesc Hyphenation patterns for German in T1/EC and UTF-8 encoding,
longdesc for traditional and reformed spelling, including Swiss German.
longdesc The package includes the latest patterns from dehyph-exptl
longdesc (known to TeX under names 'german', 'ngerman' and 'swissgerman'),
longdesc however 8-bit engines still load old versions of patterns
longdesc for 'german' and 'ngerman' for backward-compatibility reasons.
longdesc Swiss German patterns are suitable for Standard German (Hochdeutsch),
longdesc not the Alemannic dialects spoken in Switzerland (Schwyzerdütsch).
longdesc There are no patterns for written Schwyzerdütsch.

=== greek ===
shortdesc Monotonic Modern Greek hyphenation patterns.
longdesc Hyphenation patterns for Modern Greek in monotonic spelling.
longdesc The pattern file used for 8-bit TeX engines is grmhyph5.tex, in Babel's LGR encoding,
longdesc that is not part of hyph-utf8.
longdesc Patterns in UTF-8 use two code positions for each of the vowels with acute accent
longdesc (a.k.a tonos, oxia), e.g., U+03AD, U+1F73 for epsilon.

=== greek ===
shortdesc Polytonic Modern Greek hyphenation patterns.
longdesc Hyphenation patterns for Modern Greek in polytonic spelling.
longdesc The pattern file used for 8-bit TeX engines is grphyph5.tex that is not part of hyph-utf8.
longdesc Patterns in UTF-8 use two code positions for each of the vowels with acute accent
longdesc (a.k.a tonos, oxia), e.g., U+03AC, U+1F71 for alpha.

=== hungarian ===
shortdesc Hungarian hyphenation patterns.
longdesc Hyphenation patterns for Hungarian in T1/EC and UTF-8 encoding.
longdesc From https://github.com/nagybence/huhyphn/.

=== icelandic ===
shortdesc Icelandic hyphenation patterns.
longdesc Hyphenation patterns for Icelandic in T1/EC and UTF-8 encoding.

=== indic ===
shortdesc Indic hyphenation patterns.
longdesc Hyphenation patterns for Assamese, Bengali, Gujarati, Hindi,
longdesc Kannada, Malayalam, Marathi, Oriya, Panjabi, Tamil and Telugu
longdesc for Unicode engines.

=== indonesian ===
shortdesc Indonesian hyphenation patterns.
longdesc Hyphenation patterns for Indonesian (Bahasa Indonesia) in ASCII encoding.
longdesc They are probably also usable for Malay (Bahasa Melayu).

=== interlingua ===
shortdesc Interlingua hyphenation patterns.
longdesc Hyphenation patterns for Interlingua in ASCII encoding.

=== irish ===
shortdesc Irish hyphenation patterns.
longdesc Hyphenation patterns for Irish (Gaeilge).
longdesc Visit http://borel.slu.edu/fleiscin/index.html for more information.

=== italian ===
shortdesc Italian hyphenation patterns.
longdesc Hyphenation patterns for Italian in ASCII encoding.
longdesc Implements Recommendation UNI 6461 of the Italian Standards Institution (Ente Nazionale di Unificazione UNI).

=== kurmanji ===
shortdesc Kurmanji hyphenation patterns.
longdesc Hyphenation patterns for Kurmanji (Northern Kurdish) in T1/EC and UTF-8 encoding
longdesc (as spoken in Turkey and by the Kurdish diaspora in Europe).

=== lao ===
shortdesc Lao hyphenation patterns.
longdesc Hyphenation patterns for Lao language for Unicode engines.
longdesc Current version is experimental and gives bad results.
longdesc Please wait for a new version.

=== latin ===
shortdesc Latin hyphenation patterns.
longdesc Hyphenation patterns for Latin language in T1/EC and UTF-8 encoding,
longdesc mainly in modern spelling (u when u is needed and v when v is needed),
longdesc medieval spelling with the ligatures \ae and \oe and the (uncial)
longdesc lowercase 'v' written as a 'u' is also supported. Apparently
longdesc there is no conflict between the patterns of modern Latin and
longdesc those of medieval Latin.

=== latvian ===
shortdesc Latvian hyphenation patterns.
longdesc Hyphenation patterns for Latvian in L7X and UTF-8 encoding.

=== lithuanian ===
shortdesc Lithuanian hyphenation patterns.
longdesc Hyphenation patterns for Lithuanian in L7X and UTF-8 encoding.
longdesc Designed for \lefthyphenmin and \righthyphenmin set to 2.

=== mongolian ===
shortdesc Mongolian hyphenation patterns in Cyrillic script.
longdesc Hyphenation patterns for Mongolian language in T2A, LMC and UTF-8 encoding.
longdesc LMC encoding is used in MonTeX. The package includes two sets of patterns
longdesc that will hopefully be merged in future.

=== norwegian ===
shortdesc Norwegian Bokmal and Nynorsk hyphenation patterns.
longdesc Hyphenation patterns for Norwegian Bokmal and Nynorsk in T1/EC and UTF-8 encoding.

=== polish ===
shortdesc Polish hyphenation patterns.
longdesc Hyphenation patterns for Polish in QX and UTF-8 encoding.
longdesc These patterns are used by the standard formats as well as MeX and LaMeX.

=== portuguese ===
shortdesc Portuguese hyphenation patterns.
longdesc Hyphenation patterns for Portuguese in T1/EC and UTF-8 encoding.

=== romanian ===
shortdesc Romanian hyphenation patterns.
longdesc Hyphenation patterns for Romanian in T1/EC and UTF-8 encoding.
longdesc The UTF-8 patterns use U+0219 and U+021B for the characters s with comma accent
longdesc and t with comma accent respectively, but we may consider using U+015F and U+0163
longdesc as well in the future.

=== russian ===
shortdesc Russian hyphenation patterns.
longdesc Hyphenation patterns for Russian in T2A and UTF-8 encoding.
longdesc For 8-bit TeX engines, the ruhyphen package provides a number of different pattern sets,
longdesc as well as different (8-bit) encodings, that can be chosen at format-generation time.
longdesc The UTF-8 version only provides the default pattern set.  A mechanism similar
longdesc to the one used for 8-bit patterns may be implemented in the future.

=== sanskrit ===
shortdesc Sanskrit hyphenation patterns.
longdesc Hyphenation patterns for Sanskrit and Prakrit in transliteration,
longdesc and in Devanagari, Bengali, Kannada, Malayalam and Telugu scripts
longdesc for Unicode engines.

=== serbian ===
shortdesc Serbian hyphenation patterns.
longdesc Hyphenation patterns for Serbian language in T1/EC, T2A and UTF-8 encoding.
longdesc For 8-bit TeX the patterns are available separately as 'serbian' in
longdesc T1/EC encoding for Latin script and 'serbianc' in T2A encoding for
longdesc Cyrillic script. Unicode engines should only use 'serbian'
longdesc which has patterns in both scripts combined.

=== slovak ===
shortdesc Slovak hyphenation patterns.
longdesc Hyphenation patterns for Slovak language in T1/EC and UTF-8 encoding.
longdesc Original patterns 'skhyphen' are still distributed in 'csplain' package
longdesc and loaded with ISO Latin 2 encoding (IL2).

=== slovenian ===
shortdesc Slovenian hyphenation patterns.
longdesc Hyphenation patterns for Slovenian language in T1/EC and UTF-8 encoding.

=== spanish ===
shortdesc Spanish hyphenation patterns.
longdesc Hyphenation patterns for Spanish language in T1/EC and UTF-8 encoding.

=== swedish ===
shortdesc Swedish hyphenation patterns.
longdesc Hyphenation patterns for Swedish in T1/EC and UTF-8 encoding.

=== turkish ===
shortdesc Turkish hyphenation patterns.
longdesc Hyphenation patterns for Turkish in T1/EC and UTF-8 encoding.
longdesc Auto-generated from a script included in the distribution.
longdesc The patterns for Turkish were first produced for the Ottoman Texts Project in 1987
longdesc and were suitable for both Modern Turkish and Ottoman Turkish in Latin script,
longdesc however the required character set didn't fit into EC encoding,
longdesc so support for Ottoman Turkish had to be dropped for compatibility with 8-bit engines.

=== turkmen ===
shortdesc Turkmen hyphenation patterns.
longdesc Hyphenation patterns for Turkmen language in T1/EC and UTF-8 encoding.

=== ukrainian ===
shortdesc Ukrainian hyphenation patterns.
longdesc Hyphenation patterns for Ukrainian in T2A and UTF-8 encoding.
longdesc For 8-bit TeX engines, the ukrhyph package provides a number of different pattern sets,
longdesc as well as different (8-bit) encodings, that can be chosen at format-generation time.
longdesc The UTF-8 version only provides the default pattern set.  A mechanism similar
longdesc to the one used for 8-bit patterns may be implemented in the future.

=== uppersorbian ===
shortdesc Upper Sorbian hyphenation patterns.
longdesc Hyphenation patterns for Upper Sorbian in T1/EC and UTF-8 encoding.

=== welsh ===
shortdesc Welsh hyphenation patterns.
longdesc Hyphenation patterns for Welsh language in T1/EC and UTF-8 encoding.



More information about the tex-hyphen mailing list