[latex3-commits] [git/LaTeX3-latex3-babel] docs: Extensive revision of non-standard hyphenation. (7536199)
Javier
email at dante.de
Fri Apr 7 19:28:54 CEST 2023
Repository : https://github.com/latex3/babel
On branch : docs
Link : https://github.com/latex3/babel/commit/7536199aa35e29e021644abdd604c283299b7310
>---------------------------------------------------------------
commit 7536199aa35e29e021644abdd604c283299b7310
Author: Javier <email at localhost>
Date: Fri Apr 7 19:28:54 2023 +0200
Extensive revision of non-standard hyphenation.
>---------------------------------------------------------------
7536199aa35e29e021644abdd604c283299b7310
docs/guides/locale-norwegian.md | 11 +-
.../guides/non-standard-hyphenation-with-luatex.md | 167 +++++++++++----------
2 files changed, 97 insertions(+), 81 deletions(-)
diff --git a/docs/guides/locale-norwegian.md b/docs/guides/locale-norwegian.md
index 83816b2..ad96fd0 100644
--- a/docs/guides/locale-norwegian.md
+++ b/docs/guides/locale-norwegian.md
@@ -41,9 +41,11 @@ added to the exception list with:
```
Here, the group `{ll-}{l}{ll}` is equivalent to a `\discretionary`.
Remember you must add every word (like, say, ‘volleyballandslaget’).
+These discretionaries can be weighted, too, as the `luatex` manual
+explains (with `\exceptionpenalty`).
-Alternatively, you may define a transform, which is more flexible, but
-less efficient. For example:
+Alternatively, with `babel` you may define a transform, which is more
+flexible, but less efficient. For example:
```tex
\babelposthyphenation{norsk}{ vol|ley|ba()l()lands|la }{
{ no = l, pre = ll- }
@@ -53,9 +55,8 @@ This rule matches the sequence of characters and hyphenation points in
the second argument, which explains why the ending ‘g’ has not been
included —without it, ‘volleyballandslaget’ is also matched.
-Furthermore, with transforms hyphenation points can by weighted with
-different penalties, too, as explained in [Non-standard hyphenation
-with
+Transforms hyphenation points can by weighted with arbitrary penalties,
+as explained in [Non-standard hyphenation with
luatex](https://latex3.github.io/babel/guides/non-standard-hyphenation-with-luatex.html).
Feel free to contribute a list a words requiring this feature.
diff --git a/docs/guides/non-standard-hyphenation-with-luatex.md b/docs/guides/non-standard-hyphenation-with-luatex.md
index d68bee7..2e6ce9f 100644
--- a/docs/guides/non-standard-hyphenation-with-luatex.md
+++ b/docs/guides/non-standard-hyphenation-with-luatex.md
@@ -11,7 +11,12 @@ including changes in letters and weighted hypenation points.
frequent cases, too. Please, refer to its manual for further
information.)
-Here is a simple example of a declaration, which tell LaTeX to change
+The basic syntax is explained in the `babel` manual. This article
+complements it with an explanation of the second and third arguments of
+`\babelposthyphenation`, which also apply to `\babelprehyphenation`
+with the differences explained in the manual.
+
+Here is a simple example of a declaration, which tells LaTeX to change
the group ‘ck’ to ‘kk’ with an optional hyphenation point inside this
group (it’s not meant as a full o realistic rule for German, but just a
starting point).
@@ -23,7 +28,8 @@ starting point).
```
It consists of:
* the language the transformation is applied to (here `german`);
-* a pattern with the string to be handled (here `ck`);
+* a pattern with the string to be handled (here `ck`), which is based
+ on lua patterns (please refer to the lua site linked below);
* a replacement with a list containing exactly the same number of
elements as the pattern (except if there are inserted elements, as
explained below).
@@ -34,16 +40,13 @@ first item in the list, the second letter with the second item and so
on. (This is not strictly true, because the replace list is filled with
nil's if shorter.)
-## Rules
-
-‘Regular’ hyphenation points, as inserted automatically by the hyphenation
-algorithm, are entered in the pattern as vertical bars (`|`). Explicit
-hyphens are entered as `=`. Spaces are allowed for clarity, and they
-are discarded.
+## Replacement list
-The items in the replacement list are of four kinds:
+The items in the replacement list are of kinds:
-1. An empty group `{}` leaves the corresponding item **untouched**.
+1. An empty group `{}` leaves the corresponding item **untouched**. For
+example, in the rule above the ‘k’ in the pattern (the second element) just provide the
+context, because in the replacement list the second item is `{}`.
2. A list like `{ no = c, pre = k-, post = }` replaces the letter by
the corresponding **discretionary**. Only one of the keys is necessary,
and the rest defaults to empty. By default the penalty is
@@ -51,42 +54,78 @@ and the rest defaults to empty. By default the penalty is
different value can be set with the key `penalty`. A further field is
`data` - automatic hyphens contain no information about the font and
the like, and with this key you can set which element in the list (as
-captured) they will the taken from.
+captured) they will the taken from. In the rule above the ‘c’ is
+replaced by a discretionary, but no `data` is required because the
+item to be replaced is a character, which already contains the required
+data. (Remember discretionaries are not allowed in
+`\babelprehyphenation`.)
3. The key `string` replaces the character with the string. If empty,
-the char node is removed; to insert chars, just use a multi-character
-string. The nodes created are literal copies of the original, but with
-the new characters.
-4. With `remove` the node is, well, removed (ie, it's like and empty
-`string=`).
+the item (in TeX jargon, the node) is removed; to insert chars, just
+use a multi-character string. The nodes created are literal copies of
+the original (the same font, language, and so on), but with the new
+characters.
+4. With `remove` the node is, well, removed. A synonymous is `string=`.
5. **Spaces** are declared with something like `space =.2
.1 0`. The values are in em units, and they are the natural width, the
`plus`, and the `minus`. Here, you may need `data`, too. With
`spacefactor` the unit is the font size of the current font (if the
node is a glyph; you may need a `data=` pointing to a specific glyph).
+6. **Penalties** are declared with `penalty`.
-A few keys can be used in conjunction with `insert`, which must be the
-very first one in the replacement.
+A further key is `kashida`, for Arabic justification. See [What's new in
+babel 3.59](whats-new-in-babel-3.59.md).
-The pattern is matched with lua empty captures, which are automatically
-added before and after the string. You may set different empty captures,
-to reduce the number of items in the replacement list:
+Some keys can be used in conjunction with `insert`, which must be the
+very first one in the replacement. With it the item is not replaced,
+but inserted. The following rule inserts a penalty in the middle of the
+group ‘ff’:
```tex
-\babelposthyphenation{ngerman}{very()long()pattern}{
- string = L,
- string = OOO,
- string = N,
- string = G
+\babelposthyphenation{nil}{ ff }
+ { {},
+ {insert, penalty = 10},
+ {}
+ }
+```
+
+`babel` traverses the strings to be processed with the help of a
+pointer. Another key available in the replacements is `step = <num>`,
+which moves this pointer forward (if positive) or backwards (if
+negative). By default it's, of course, `0`, which leaves the pointer
+just after the last replacement. It can be set in any non-empty
+replacement (eg, `{ string = a, step = -1 }`).
+
+In the replacement list, there is an extended syntax which allows to
+**map the captured characters**. For example, `{2|ΐΰῒῢ|ίύὶὺ}` means: if
+the second captured char is ΐ replace it with ί, ύ with ύ, and so on.
+This feature is particularly useful when a letter changes if there is a
+hyphen, and also when transliterating. Here is a partial example of the
+latter (the full example is [here](https://latex3.github.io/babel/news/whats-new-in-babel-3.44.html),
+with digraphs and trigraphs):
+```tex
+\babelprehyphenation{transrussian}
+ {([ABVGDEËZIJKLMNOPRSTUFHÈY"abvgdeëzijklmnoprstufhèy'])}{
+ string = {1|ABVGDEËZIJKLMNOPRSTUFHÈY"abvgdeëzijklmnoprstufhèy'%
+ |АБВГДЕЁЗИЙКЛМНОПРСТУФХЭЫЬабвгдеёзийклмнопрстуфхэыь}
}
```
-Dots, characters classes (with %) and char-sets (with `[]`, including
+## Patterns
+
+‘Regular’ hyphenation points, as inserted automatically by the
+hyphenation algorithm, are entered in the pattern as vertical bars
+(`|`), as the short examples below show. Explicit hyphens are entered
+as `=`. Spaces are allowed for clarity, and they are discarded. If you
+are not sure where the hyphenation points fall, use '\showhyphens`.
+(Also, remember `|` in `\babelprehyphenation` is a space.)
+
+Lua patterns with dots, characters classes (with `%`, but see below for
+an alternative TeX-friendly syntax) and char-sets (with `[]`, including
complementing and ranges) are allowed, too. When using the dot, be
-aware it matches `|` and `=`, too. A matched `|` or `=` can be
-replaced with the hex value (at least 4 digits): `{007C}` and `{003D}`.
-`+`, `-`, `?` and `*` are allowed outside the `()`...`()` block, but
-not inside. So, `{a}|?()Á()` is a letter followed optionally by a
-discretionary, but only Á is actually transformed (in these cases, you
-may wanto to go back with the key `step`).
+aware it matches `|` and `=`, too. `+`, `-`, `?` and `*` are allowed
+outside the `()`...`()` block, but not inside. So, `{a}|?()Á()` is a
+letter followed optionally by a discretionary, but only Á is actually
+transformed (in these cases, you may want to go back with the key
+`step`).
Ordinary captures are allowed _inside_ the empty captures (they must
resolve to exactly one character). In the pattern, **the syntax `{n}`**
@@ -94,63 +133,39 @@ is a backreference matching the _n_-th capture inside the empty
captures. This syntax can be used in the replacement strings, with the
corresponding capture:
```tex
-\babelposthyphenation{ngerman}{([fmtrp]) | {1}}{
+\babelposthyphenation{german}{([fmtrp]) | {1}}{
{ no = {1}, pre = {1}{1}- },
remove,
{}
}
-\babelposthyphenation{ngerman}{ ([cC]) ([kK]) }{
+\babelposthyphenation{german}{ ([cC]) ([kK]) }{
{ no = {1}, pre = {2}- },
{}
}
```
+No attempt has been done in this example to follow the full German
+rules. For a more realistic example, in Norwegian, see [the guide for
+this
+language](https://latex3.github.io/babel/guides/locale-norwegian.html#hyphenation).
Since the percent sign has a quite different meaning in lua and tex, as
a convenience the {} syntax can be used to enter **character classes**
in the pattern, too (ie, `{d}` becomes `%d`, but note `{1}` is not
internally the same as `%1`).
-And here is a complete example (again, no attempt is done to follow the
-full rules):
-```tex
-\documentclass{article}
-
-\usepackage[german]{babel}
-
-\babelposthyphenation{ngerman}{([fmtrp]) | {1}}{
- { no = {1}, pre = {1}{1}- },
- remove,
- {}
-}
-
-\begin{document}
-
-\rightskip5cm
-
-Auffrisierende Auffrisierendem Auffrisierenden Auffrisierender
-Auffrisierendes Auffrisierst Auffrisiert Auffrisierte Auffrisiertem
-Auffrisierten Auffrisierter Auffrisiertes Auffrisiertest Auffrisiertet
-Auffrisst Auffuhr Aufführbar Aufführbare Aufführbarem Aufführbaren
-Aufführbarer Aufführbares Aufführe Auffuhren Aufführen Aufführend
-Aufführende Aufführendem Aufführenden Aufführender Aufführendes
-
-\end{document}
-```
-
-In the replacement list, there is an extended syntax which allows to
-**map the captured characters**. For example, `{2|ΐΰῒῢ|ίύὶὺ}` means: if
-the second captured char is ΐ replace it with ί, ύ with ύ, and so on.
-This feature is particularly useful when a letter changes if there is a
-hyphen, and also when transliterating. Here is a partial example of the
-latter (the full example is [here](../news/whats-new-in-babel-3.44.md),
-with digraphs and trigraphs):
+The pattern is matched with lua empty captures, which are automatically
+added before and after the string. You may set different empty captures,
+to reduce the number of items in the replacement list:
```tex
-\babelprehyphenation{transrussian}
- {([ABVGDEËZIJKLMNOPRSTUFHÈY"abvgdeëzijklmnoprstufhèy'])}{
- string = {1|ABVGDEËZIJKLMNOPRSTUFHÈY"abvgdeëzijklmnoprstufhèy'%
- |АБВГДЕЁЗИЙКЛМНОПРСТУФХЭЫЬабвгдеёзийклмнопрстуфхэыь}
+\babelprehyphenation{english}{very()long()pattern}{
+ string = L,
+ string = OOO,
+ string = N,
+ string = G
}
```
+With this rule, the string ‘verylongpattern’ is replaced with
+‘veryLOOONGpattern’.
## Short examples
@@ -202,13 +217,13 @@ In cases like this, you may want to use maps as described above.
}
```
With `{A}*` we consider the possibility of leading characters like `(`
-or `“`, because `{A}` it's the same as `%A` in lua. This part is placed
+or `“`, because `{A}` is the same as `%A` in lua. This part is placed
before that to be processed, which is enclosed between `() ()`.
* Here is an example showing how to group two similar rules. The
pattern means ‘either < or > repeated’. Then, the first replacement
- selects the character based on the captured one. The result is |<<|
- and |>>| get replaced by |“| and |”|, respectively:
+ selects the character based on the captured one. The result is `<<`
+ and `>>` get replaced by `“` and `”`, respectively:
```tex
\babelprehyphenation{english}{ ([<>]){1} }{
string = {1|<>|“”},
More information about the latex3-commits
mailing list.