Introduction
This is the central place for hyphenation patterns in TEX. They are all bundled in a single package called hyph-utf8.
For pattern authors
If you are a pattern author and wish to update your patterns, please contact the hyph-utf8 package maintainers through the tex-hyphen mailing list.
Documentation
Algorithm
Papers
- Documentation (needs improvement)
- Documentation for Lua(La)TEX part of package
- TUG 2008 paper
Slides
- The TEX hyphenation applied to HTML (Mathias Nater, BachoTEX 2010)
Related Packages
- Babel (pdf; 1659 kb) – for pdfTEX and other 8-bit TEX engines
- Polyglossia (pdf; 169 kb) – for XETEX
Links
Collaboration
- Mozilla
- FOP XML Hyphenation Patterns (Simon Pepping)
- TEX-Hyphen-Pattern (Perl implementation on CPAN) (Roland van Ipenburg)
- Hyphenator.js (Client-side implementation of hyphenation in HTML documents) (Mathias Nater)
OpenOffice.org
- Test TEX/OpenOffice hyphenation algorithm online (based on hunspell)
- Using TEX hyphenation patterns in OpenOffice.org (explains how to properly convert TEX patterns into OpenOffice-friendly form)
- Hunspell (library)
- Open Office language extensions
- text-hyphen (rubyforge)
- TEX Hyphenator in Java
- Indic languages:
- An article about soft hyphen
Other external links
Languages
The package contains patterns for the following languages:
(if patterns for any other language exist and are missing below please let us know)
| name, synonyms | code (link to file) |
(left,right)- hyphenmin |
8-bit encoding |
|
|---|---|---|---|---|
| Afrikaans | afrikaans | af | (1,2) | EC |
| Ancientgreek | ancientgreek | grc | (1,1) | |
| ibycus | grc-x-ibycus | (2,2) | ||
| Arabic | arabic | ar | (,) | |
| Armenian | armenian | hy | (1,2) | |
| Assamese | assamese | as | (1,1) | |
| Basque | basque | eu | (2,2) | EC |
| Bengali | bengali | bn | (1,1) | |
| Bulgarian | bulgarian | bg | (2,2) | T2A |
| Catalan | catalan | ca | (2,2) | EC |
| Chinese | pinyin | zh-latn-pinyin | (1,1) | EC |
| Coptic | coptic | cop | (1,1) | |
| Croatian | croatian | hr | (2,2) | EC |
| Czech | czech | cs | (2,3) | EC |
| Danish | danish | da | (2,2) | EC |
| Dutch | dutch | nl | (2,2) | EC |
| English | english, usenglish, USenglish, american | (default) | (2,3) | ASCII |
| ukenglish, british, UKenglish | en-gb | (2,3) | ASCII | |
| usenglishmax | en-us | (2,3) | ASCII | |
| Esperanto | esperanto | eo | (2,2) | IL3 |
| Estonian | estonian | et | (2,3) | EC |
| Ethiopic | ethiopic, amharic, geez | mul-ethi | (1,1) | |
| Farsi | farsi, persian | fa | (,) | |
| Finnish | finnish | fi | (2,2) | EC |
| French | french, patois, francais | fr | (2,3) | EC |
| Friulan | friulan | fur | (2,2) | EC |
| Galician | galician | gl | (2,2) | EC |
| German | german | de-1901 | (2,2) | EC |
| ngerman | de-1996 | (2,2) | EC | |
| swissgerman | de-ch-1901 | (2,2) | EC | |
| Greek | monogreek | el-monoton | (1,1) | |
| greek, polygreek | el-polyton | (1,1) | ||
| Gujarati | gujarati | gu | (1,1) | |
| Hindi | hindi | hi | (1,1) | |
| Hungarian | hungarian | hu | (2,2) | EC |
| Icelandic | icelandic | is | (2,2) | EC |
| Indonesian | indonesian | id | (2,2) | ASCII |
| Interlingua | interlingua | ia | (2,2) | ASCII |
| Irish | irish | ga | (2,3) | EC |
| Italian | italian | it | (2,2) | ASCII |
| Kannada | kannada | kn | (1,1) | |
| Kurmanji | kurmanji | kmr | (2,2) | EC |
| Latin | latin | la | (2,2) | EC |
| Latvian | latvian | lv | (2,2) | L7X |
| Lithuanian | lithuanian | lt | (2,2) | L7X |
| Malayalam | malayalam | ml | (1,1) | |
| Marathi | marathi | mr | (1,1) | |
| Mongolian | mongolian | mn-cyrl | (2,2) | T2A |
| mongolianlmc | mn-cyrl-x-lmc | (2,2) | LMC | |
| Norwegian | bokmal, norwegian, norsk | nb | (2,2) | EC |
| nynorsk | nn | (2,2) | EC | |
| Oriya | oriya | or | (1,1) | |
| Panjabi | panjabi | pa | (1,1) | |
| Polish | polish | pl | (2,2) | QX |
| Portuguese | portuguese, portuges | pt | (2,3) | EC |
| Romanian | romanian | ro | (2,2) | EC |
| Romansh | romansh | rm | (2,2) | ASCII |
| Russian | russian | ru | (2,2) | T2A |
| Sanskrit | sanskrit | sa | (1,3) | |
| Serbian | serbian | sr-latn | (2,2) | EC |
| Serbianc | serbianc | sh-cyrl | (2,2) | T2A |
| Slovak | slovak | sk | (2,3) | EC |
| Slovenian | slovenian, slovene | sl | (2,2) | EC |
| Spanish | spanish, espanol | es | (2,2) | EC |
| Swedish | swedish | sv | (2,2) | EC |
| Tamil | tamil | ta | (1,1) | |
| Telugu | telugu | te | (1,1) | |
| Turkish | turkish | tr | (2,2) | EC |
| Turkmen | turkmen | tk | (2,2) | EC |
| Ukrainian | ukrainian | uk | (2,2) | T2A |
| Uppersorbian | uppersorbian | hsb | (2,2) | EC |
| Welsh | welsh | cy | (2,3) | EC |
