[latex3-commits] [git/LaTeX3-latex3-babel] docs: Extensive revision of non-standard hyphenation. (7536199)

Javier email at dante.de
Fri Apr 7 19:28:54 CEST 2023


Repository : https://github.com/latex3/babel
On branch  : docs
Link       : https://github.com/latex3/babel/commit/7536199aa35e29e021644abdd604c283299b7310

>---------------------------------------------------------------

commit 7536199aa35e29e021644abdd604c283299b7310
Author: Javier <email at localhost>
Date:   Fri Apr 7 19:28:54 2023 +0200

    Extensive revision of non-standard hyphenation.


>---------------------------------------------------------------

7536199aa35e29e021644abdd604c283299b7310
 docs/guides/locale-norwegian.md                    |  11 +-
 .../guides/non-standard-hyphenation-with-luatex.md | 167 +++++++++++----------
 2 files changed, 97 insertions(+), 81 deletions(-)

diff --git a/docs/guides/locale-norwegian.md b/docs/guides/locale-norwegian.md
index 83816b2..ad96fd0 100644
--- a/docs/guides/locale-norwegian.md
+++ b/docs/guides/locale-norwegian.md
@@ -41,9 +41,11 @@ added to the exception list with:
 ```
 Here, the group `{ll-}{l}{ll}` is equivalent to a `\discretionary`.
 Remember you must add every word (like, say, ‘volleyballandslaget’).
+These discretionaries can be weighted, too, as the `luatex` manual
+explains (with `\exceptionpenalty`).
 
-Alternatively, you may define a transform, which is more flexible, but
-less efficient. For example:
+Alternatively, with `babel` you may define a transform, which is more
+flexible, but less efficient. For example:
 ```tex
 \babelposthyphenation{norsk}{ vol|ley|ba()l()lands|la }{
   { no = l, pre = ll- }
@@ -53,9 +55,8 @@ This rule matches the sequence of characters and hyphenation points in
 the second argument, which explains why the ending ‘g’ has not been
 included —without it, ‘volleyballandslaget’ is also matched.
 
-Furthermore, with transforms hyphenation points can by weighted with
-different penalties, too, as explained in [Non-standard hyphenation
-with
+Transforms hyphenation points can by weighted with arbitrary penalties,
+as explained in [Non-standard hyphenation with
 luatex](https://latex3.github.io/babel/guides/non-standard-hyphenation-with-luatex.html).
 
 Feel free to contribute a list a words requiring this feature.
diff --git a/docs/guides/non-standard-hyphenation-with-luatex.md b/docs/guides/non-standard-hyphenation-with-luatex.md
index d68bee7..2e6ce9f 100644
--- a/docs/guides/non-standard-hyphenation-with-luatex.md
+++ b/docs/guides/non-standard-hyphenation-with-luatex.md
@@ -11,7 +11,12 @@ including changes in letters and weighted hypenation points.
 frequent cases, too. Please, refer to its manual for further
 information.)
 
-Here is a simple example of a declaration, which tell LaTeX to change
+The basic syntax is explained in the `babel` manual. This article
+complements it with an explanation of the second and third arguments of
+`\babelposthyphenation`, which also apply to `\babelprehyphenation`
+with the differences explained in the manual.
+
+Here is a simple example of a declaration, which tells LaTeX to change
 the group ‘ck’ to ‘kk’ with an optional hyphenation point inside this
 group (it’s not meant as a full o realistic rule for German, but just a
 starting point).
@@ -23,7 +28,8 @@ starting point).
 ```
 It consists of:
 * the language the transformation is applied to (here `german`);
-* a pattern with the string to be handled (here `ck`);
+* a pattern with the string to be handled (here `ck`), which is based
+  on lua patterns (please refer to the lua site linked below);
 * a replacement with a list containing exactly the same number of
   elements as the pattern (except if there are inserted elements, as
   explained below).
@@ -34,16 +40,13 @@ first item in the list, the second letter with the second item and so
 on. (This is not strictly true, because the replace list is filled with
 nil's if shorter.)
 
-## Rules
-
-‘Regular’ hyphenation points, as inserted automatically by the hyphenation
-algorithm, are entered in the pattern as vertical bars (`|`). Explicit
-hyphens are entered as `=`. Spaces are allowed for clarity, and they
-are discarded.
+## Replacement list
 
-The items in the replacement list are of four kinds:
+The items in the replacement list are of kinds:
 
-1. An empty group `{}` leaves the corresponding item **untouched**.
+1. An empty group `{}` leaves the corresponding item **untouched**. For
+example, in the rule above the ‘k’ in the pattern (the second element) just provide the
+context, because in the replacement list the second item is `{}`.
 2. A list like `{ no = c, pre = k-, post = }` replaces the letter by
 the corresponding **discretionary**. Only one of the keys is necessary,
 and the rest defaults to empty. By default the penalty is
@@ -51,42 +54,78 @@ and the rest defaults to empty. By default the penalty is
 different value can be set with the key `penalty`. A further field is
 `data` - automatic hyphens contain no information about the font and
 the like, and with this key you can set which element in the list (as
-captured) they will the taken from.
+captured) they will the taken from. In the rule above the ‘c’ is
+replaced by a discretionary, but no `data` is required because the
+item to be replaced is a character, which already contains the required
+data. (Remember discretionaries are not allowed in
+`\babelprehyphenation`.)
 3. The key `string` replaces the character with the string. If empty,
-the char node is removed; to insert chars, just use a multi-character
-string. The nodes created are literal copies of the original, but with
-the new characters.
-4. With `remove` the node is, well, removed (ie, it's like and empty
-`string=`).
+the item (in TeX jargon, the node) is removed; to insert chars, just
+use a multi-character string. The nodes created are literal copies of
+the original (the same font, language, and so on), but with the new
+characters.
+4. With `remove` the node is, well, removed. A synonymous is `string=`.
 5. **Spaces** are declared with something like `space =.2
 .1 0`. The values are in em units, and they are the natural width, the
 `plus`, and the `minus`. Here, you may need `data`, too. With
 `spacefactor` the unit is the font size of the current font (if the
 node is a glyph; you may need a `data=` pointing to a specific glyph).
+6. **Penalties** are declared with `penalty`.
 
-A few keys can be used in conjunction with `insert`, which must be the
-very first one in the replacement. 
+A further key is `kashida`, for Arabic justification. See [What's new in
+babel 3.59](whats-new-in-babel-3.59.md). 
 
-The pattern is matched with lua empty captures, which are automatically
-added before and after the string. You may set different empty captures,
-to reduce the number of items in the replacement list:
+Some keys can be used in conjunction with `insert`, which must be the
+very first one in the replacement. With it the item is not replaced,
+but inserted. The following rule inserts a penalty in the middle of the
+group ‘ff’:
 ```tex
-\babelposthyphenation{ngerman}{very()long()pattern}{
-  string = L,
-  string = OOO,
-  string = N,
-  string = G
+\babelposthyphenation{nil}{ ff }
+  { {},
+    {insert, penalty = 10},
+    {}
+  }
+```
+
+`babel` traverses the strings to be processed with the help of a
+pointer. Another key available in the replacements is `step = <num>`,
+which moves this pointer forward (if positive) or backwards (if
+negative). By default it's, of course, `0`, which leaves the pointer
+just after the last replacement. It can be set in any non-empty
+replacement (eg, `{ string = a, step = -1 }`).
+
+In the replacement list, there is an extended syntax which allows to
+**map the captured characters**. For example, `{2|ΐΰῒῢ|ίύὶὺ}` means: if
+the second captured char is ΐ replace it with ί, ύ with ύ, and so on.
+This feature is particularly useful when a letter changes if there is a
+hyphen, and also when transliterating. Here is a partial example of the
+latter (the full example is [here](https://latex3.github.io/babel/news/whats-new-in-babel-3.44.html),
+with digraphs and trigraphs):
+```tex
+\babelprehyphenation{transrussian}
+  {([ABVGDEËZIJKLMNOPRSTUFHÈY"abvgdeëzijklmnoprstufhèy'])}{
+  string = {1|ABVGDEËZIJKLMNOPRSTUFHÈY"abvgdeëzijklmnoprstufhèy'%
+             |АБВГДЕЁЗИЙКЛМНОПРСТУФХЭЫЬабвгдеёзийклмнопрстуфхэыь}
 }
 ```
 
-Dots, characters classes (with %) and char-sets (with `[]`, including
+## Patterns
+
+‘Regular’ hyphenation points, as inserted automatically by the
+hyphenation algorithm, are entered in the pattern as vertical bars
+(`|`), as the short examples below show. Explicit hyphens are entered
+as `=`. Spaces are allowed for clarity, and they are discarded. If you
+are not sure where the hyphenation points fall, use '\showhyphens`.
+(Also, remember `|` in `\babelprehyphenation` is a space.)
+
+Lua patterns with dots, characters classes (with `%`, but see below for
+an alternative TeX-friendly syntax) and char-sets (with `[]`, including
 complementing and ranges) are allowed, too. When using the dot, be
-aware it matches `|` and `=`, too. A matched `|` or `=` can be
-replaced with the hex value (at least 4 digits): `{007C}` and `{003D}`.
-`+`, `-`, `?` and `*` are allowed outside the `()`...`()` block, but
-not inside. So, `{a}|?()Á()` is a letter followed optionally by a
-discretionary, but only Á is actually transformed (in these cases, you
-may wanto to go back with the key `step`).
+aware it matches `|` and `=`, too. `+`, `-`, `?` and `*` are allowed
+outside the `()`...`()` block, but not inside. So, `{a}|?()Á()` is a
+letter followed optionally by a discretionary, but only Á is actually
+transformed (in these cases, you may want to go back with the key
+`step`).
 
 Ordinary captures are allowed _inside_ the empty captures (they must
 resolve to exactly one character). In the pattern, **the syntax `{n}`**
@@ -94,63 +133,39 @@ is a backreference matching the _n_-th capture inside the empty
 captures. This syntax can be used in the replacement strings, with the
 corresponding capture:
 ```tex
-\babelposthyphenation{ngerman}{([fmtrp]) | {1}}{
+\babelposthyphenation{german}{([fmtrp]) | {1}}{
   { no = {1}, pre = {1}{1}- },
   remove,
   {}
 }
-\babelposthyphenation{ngerman}{ ([cC]) ([kK]) }{
+\babelposthyphenation{german}{ ([cC]) ([kK]) }{
   { no = {1}, pre = {2}- },
   {}
 }
 ```
+No attempt has been done in this example to follow the full German
+rules. For a more realistic example, in Norwegian, see [the guide for
+this
+language](https://latex3.github.io/babel/guides/locale-norwegian.html#hyphenation).
 
 Since the percent sign has a quite different meaning in lua and tex, as
 a convenience the {} syntax can be used to enter **character classes**
 in the pattern, too (ie, `{d}` becomes `%d`, but note `{1}` is not
 internally the same as `%1`).
 
-And here is a complete example (again, no attempt is done to follow the
-full rules):
-```tex
-\documentclass{article}
-
-\usepackage[german]{babel}
-
-\babelposthyphenation{ngerman}{([fmtrp]) | {1}}{
-  { no = {1}, pre = {1}{1}- },
-  remove,
-  {}
-}
-
-\begin{document}
-
-\rightskip5cm
-
-Auffrisierende Auffrisierendem Auffrisierenden Auffrisierender
-Auffrisierendes Auffrisierst Auffrisiert Auffrisierte Auffrisiertem
-Auffrisierten Auffrisierter Auffrisiertes Auffrisiertest Auffrisiertet
-Auffrisst Auffuhr Aufführbar Aufführbare Aufführbarem Aufführbaren
-Aufführbarer Aufführbares Aufführe Auffuhren Aufführen Aufführend
-Aufführende Aufführendem Aufführenden Aufführender Aufführendes
-
-\end{document}
-```
-
-In the replacement list, there is an extended syntax which allows to
-**map the captured characters**. For example, `{2|ΐΰῒῢ|ίύὶὺ}` means: if
-the second captured char is ΐ replace it with ί, ύ with ύ, and so on.
-This feature is particularly useful when a letter changes if there is a
-hyphen, and also when transliterating. Here is a partial example of the
-latter (the full example is [here](../news/whats-new-in-babel-3.44.md),
-with digraphs and trigraphs):
+The pattern is matched with lua empty captures, which are automatically
+added before and after the string. You may set different empty captures,
+to reduce the number of items in the replacement list:
 ```tex
-\babelprehyphenation{transrussian}
-  {([ABVGDEËZIJKLMNOPRSTUFHÈY"abvgdeëzijklmnoprstufhèy'])}{
-  string = {1|ABVGDEËZIJKLMNOPRSTUFHÈY"abvgdeëzijklmnoprstufhèy'%
-             |АБВГДЕЁЗИЙКЛМНОПРСТУФХЭЫЬабвгдеёзийклмнопрстуфхэыь}
+\babelprehyphenation{english}{very()long()pattern}{
+  string = L,
+  string = OOO,
+  string = N,
+  string = G
 }
 ```
+With this rule, the string ‘verylongpattern’ is replaced with
+‘veryLOOONGpattern’.
 
 ## Short examples
 
@@ -202,13 +217,13 @@ In cases like this, you may want to use maps as described above.
 }
 ```
 With `{A}*` we consider the possibility of leading characters like `(`
-or `“`, because `{A}` it's the same as `%A` in lua. This part is placed
+or `“`, because `{A}` is the same as `%A` in lua. This part is placed
 before that to be processed, which is enclosed between `() ()`.
 
 * Here is an example showing how to group two similar rules. The
   pattern means ‘either < or > repeated’. Then, the first replacement
-  selects the character based on the captured one. The result is |<<|
-  and |>>| get replaced by |“| and |”|, respectively:
+  selects the character based on the captured one. The result is `<<`
+  and `>>` get replaced by `“` and `”`, respectively:
 ```tex
 \babelprehyphenation{english}{ ([<>]){1} }{
   string = {1|<>|“”},





More information about the latex3-commits mailing list.