texlive[53072] Master/texmf-dist: babel (9dec19)
commits+karl at tug.org
commits+karl at tug.org
Mon Dec 9 22:54:07 CET 2019
Revision: 53072
http://tug.org/svn/texlive?view=revision&revision=53072
Author: karl
Date: 2019-12-09 22:54:06 +0100 (Mon, 09 Dec 2019)
Log Message:
-----------
babel (9dec19)
Modified Paths:
--------------
trunk/Master/texmf-dist/doc/latex/babel/README.md
trunk/Master/texmf-dist/doc/latex/babel/babel.pdf
trunk/Master/texmf-dist/source/latex/babel/babel.dtx
trunk/Master/texmf-dist/source/latex/babel/babel.ins
trunk/Master/texmf-dist/source/latex/babel/bbcompat.dtx
trunk/Master/texmf-dist/source/latex/babel/locale.zip
trunk/Master/texmf-dist/tex/generic/babel/babel.def
trunk/Master/texmf-dist/tex/generic/babel/babel.sty
trunk/Master/texmf-dist/tex/generic/babel/hyphen.cfg
trunk/Master/texmf-dist/tex/generic/babel/luababel.def
trunk/Master/texmf-dist/tex/generic/babel/nil.ldf
trunk/Master/texmf-dist/tex/generic/babel/switch.def
trunk/Master/texmf-dist/tex/generic/babel/txtbabel.def
trunk/Master/texmf-dist/tex/generic/babel/xebabel.def
Added Paths:
-----------
trunk/Master/texmf-dist/tex/generic/babel/test-hyphen-post-wiki.tex
Modified: trunk/Master/texmf-dist/doc/latex/babel/README.md
===================================================================
--- trunk/Master/texmf-dist/doc/latex/babel/README.md 2019-12-09 21:53:18 UTC (rev 53071)
+++ trunk/Master/texmf-dist/doc/latex/babel/README.md 2019-12-09 21:54:06 UTC (rev 53072)
@@ -1,4 +1,4 @@
-## Babel 3.36
+## Babel 3.37
This package manages culturally-determined typographical (and other)
rules, and hyphenation patterns for a wide range of languages. Many
@@ -51,14 +51,22 @@
### Latest changes
```
+3.37 2019-12-08
+ - Preliminary code for non-standard hyphenation, like ff ->
+ ff-f (lua).
+ - \babelprovide now can be used to add or modify values for the
+ keys in ini files.
+ - Line breaking in South East Asian and CKJ are assimilated to
+ hyphenation, and it is activated even without 'import' (lua).
+
3.36 2019-11-14
- New - \babeladjust, with options: bidi.text, bidi.mirroring,
bidi.mapdigits, layout.tabular, layout.lists, linebreak.sea,
- linebreak.cjk. There are still some limitations.
+ linebreak.cjk. There are still some limitations (lua).
- New - ini for Polytonic Greek, thanks to Claudio Beccari.
- Fix - Language and script set for Chinese Tradicional and
Chinese Simplified.
-
+
3.35 2019-10-15
- \markboth and \markright made robust with a recent LaTeX.
- Shorthands work in bibs and refs even with safe=none.
@@ -114,31 +122,6 @@
necessary.
- Minor improvements in babel-vi.ini.
-3.26 2018-10-16
- - Fix for 3.25 - \babelprovide raised an error with xetex.
-
-3.25 2018-10-03
- - Fixes for 3.23 - mapfont=direction could raise an error.
- - Language and Script were not always defined correctly.
- - Improved tentative support for Thai, Lao and Khmer in both
- luatex and xetex.
-
-3.24 2018-09-26
- - Prelimimary support for Thai interword spacing with luatex.
-
-3.23 2018-09-02
- - After extensive tests and fixing some issues, bidi=basic is
- not experimental any longer.
- - import in \babelprovide does not require a language code if
- the language name is a recognized one.
- - New macro: \ifbabelshorthand.
- - TS1, T3 and TS3 have been added to the non-ascii list, to
- avoid problems in case no ASCII-savvy encoding is requested.
- - Define Language and Script if fontspec does not known them (eg,
- the Japanese script).
- - Set the \thepage bidi bahavior in foots/heads.
- - Fix - Undefined \bbl at stripslash in Plain.
-
```
Javier Bezos
Modified: trunk/Master/texmf-dist/doc/latex/babel/babel.pdf
===================================================================
(Binary files differ)
Modified: trunk/Master/texmf-dist/source/latex/babel/babel.dtx
===================================================================
--- trunk/Master/texmf-dist/source/latex/babel/babel.dtx 2019-12-09 21:53:18 UTC (rev 53071)
+++ trunk/Master/texmf-dist/source/latex/babel/babel.dtx 2019-12-09 21:54:06 UTC (rev 53072)
@@ -31,7 +31,7 @@
%
% \iffalse
%<*filedriver>
-\ProvidesFile{babel.dtx}[2019/11/14 v3.36 The Babel package]
+\ProvidesFile{babel.dtx}[2019/12/08 v3.37 The Babel package]
\documentclass{ltxdoc}
\GetFileInfo{babel.dtx}
\usepackage{fontspec}
@@ -202,29 +202,14 @@
\vspace{2cm}
\leftskip5mm
\begin{minipage}{10cm}
-\large\setlength\parskip{3mm}\raggedright
- The standard distribution of \LaTeX\ contains a number of document
- classes that are meant to be used, but also serve as examples for
- other users to create their own document classes. These document
- classes have become very popular among \LaTeX\ users. But it should
- be kept in mind that they were designed for American tastes and
- typography. At one time they even contained a number of hard-wired
- texts.
-
- This manual describes \babel{}, a package that makes use of the
- capabilities of \TeX, \xetex{} and \luatex{} to provide an
- environment in which documents can be typeset in a language other
- than US English, or in more than one language or script.
-
- Current development is focused on Unicode engines (Xe\TeX{} and
- Lua\TeX) and the so-called \textit{complex scripts}. New features
- related to font selection, bidi writing, line breaking and so on are
- being added incrementally.
-
- \Babel{} provides support (total or partial) for about 200 languages,
- either as a “classical” package option or as an |ini| file.
- Furthermore, new languages can be created from scratch easily.
-
+\fontsize{35}{45}\selectfont
+\setlength\parskip{3mm}\raggedright
+Localization and internationalization\\[1cm]
+\TeX\\
+pdf\TeX\\
+Lua\TeX\\
+LuaHB\TeX\\
+Xe\TeX
\vspace{20cm}
\end{minipage}
\end{tabular}
@@ -258,7 +243,7 @@
\item The first sections describe the traditional way of loading a
language (with |ldf| files). The alternative way based on |ini|
- files, which complements the previous one (it will \textit{not}
+ files, which complements the previous one (it does \textit{not}
replace it), is described below.
\end{itemize}
@@ -271,6 +256,14 @@
purpose, namely, passing that language as an optional argument. In
addition, you may want to set the font and input encodings.
+Many languages are compatible with \textsf{xetex} and \textsf{luatex}.
+With them you can use \babel{} to localize the documents. When these
+engines are used, the Latin script is covered by default in current
+\LaTeX{} (provided the document encoding is UTF-8), because the font
+loader is preloaded and the font is switched to |lmroman|. Other
+scripts require loading \textsf{fontspec}. You may want to set the font
+attributes with \textsf{fontspec}, too.
+
\begin{example}
Here is a simple full example for “traditional” \TeX{} engines
(see below for \xetex{} and \luatex{}). The packages |fontenc| and
@@ -293,14 +286,45 @@
\end{document}
\end{verbatim}
\end{example}
+
+\begin{example}
+And now a simple monolingual document in Russian (text from the
+Wikipedia) with \xetex{} or \luatex{}. Note neither \textsf{fontenc}
+nor \textsf{inputenc} are necessary, but the document should be encoded
+in UTF-8 and a so-called Unicode font must be loaded (in this example
+|\babelfont| is used, described below).
+
+\begin{verbatim}
+\documentclass{article}
+
+_\usepackage[russian]{babel}_
+
+\babelfont{rm}{DejaVu Serif}
+
+\begin{document}
+
+Россия, находящаяся на пересечении множества культур, а также
+с учётом многонационального характера её населения, — отличается
+высокой степенью этнокультурного многообразия и способностью к
+межкультурному диалогу.
+
+\end{document}
+\end{verbatim}
+\end{example}
+
\begin{troubleshooting}
\trouble{Paragraph ended before \textbackslash UTFviii at three@octets
was complete}
A common source of trouble is a wrong setting of the input encoding.
-Very often you will get the following somewhat cryptic error:
+Depending on the \LaTeX{} version you could get the following somewhat
+cryptic error:
\begin{verbatim}
! Paragraph ended before \UTFviii at three@octets was complete.
\end{verbatim}
+Or the more explanatory:
+\begin{verbatim}
+! Package inputenc Error: Invalid UTF-8 byte ...
+\end{verbatim}
Make sure you set the encoding actually used by your editor.
\end{troubleshooting}
@@ -430,39 +454,11 @@
\end{verbatim}
\end{example}
-\subsection{Modifiers}
-
-\New{3.9c} The basic behavior of some languages can be modified when
-loading \babel{} by means of \textit{modifiers}. They are set after
-the language name, and are prefixed with a dot (only when the language
-is set as package option -- neither global options nor the |main| key
-accepts them). An example is (spaces are not significant and they can
-be added or removed):\footnote{No predefined ``axis'' for modifiers
-are provided because languages and their scripts have quite different
-needs.}
-\begin{verbatim}
-\usepackage[latin_.medieval_, spanish_.notilde.lcroman_, danish]{babel}
-\end{verbatim}
-
-Attributes (described below) are considered modifiers, ie, you can
-set an attribute by including it in the list of modifiers. However,
-modifiers are a more general mechanism.
-
-\subsection{\textsf{xelatex} and \textsf{lualatex}}
-
-Many languages are compatible with \textsf{xetex} and \textsf{luatex}.
-With them you can use \babel{} to localize the documents.
-
-The Latin script is covered by default in current \LaTeX{} (provided
-the document encoding is UTF-8), because the font loader is preloaded
-and the font is switched to |lmroman|. Other scripts require loading
-\textsf{fontspec}. You may want to set the font attributes with
-\textsf{fontspec}, too.
-
\begin{example}
- The following bilingual, single script document in UTF-8 encoding
- just prints a couple of ‘captions’ and |\today| in Danish and
- Vietnamese. No additional packages are required.
+ With \xetex{} and \luatex, the following bilingual, single script
+ document in UTF-8 encoding just prints a couple of ‘captions’ and
+ |\today| in Danish and Vietnamese. No additional packages are
+ required.
\begin{verbatim}
\documentclass{article}
@@ -480,31 +476,24 @@
\end{verbatim}
\end{example}
-\begin{example}
-Here is a simple monolingual document in Russian (text from the
-Wikipedia). Note neither \textsf{fontenc} nor \textsf{inputenc} are
-necessary, but the document should be encoded in UTF-8 and a
-so-called Unicode font must be loaded (in this example |\babelfont| is
-used, described below).
+\subsection{Modifiers}
+\New{3.9c} The basic behavior of some languages can be modified when
+loading \babel{} by means of \textit{modifiers}. They are set after
+the language name, and are prefixed with a dot (only when the language
+is set as package option -- neither global options nor the |main| key
+accepts them). An example is (spaces are not significant and they can
+be added or removed):\footnote{No predefined ``axis'' for modifiers
+are provided because languages and their scripts have quite different
+needs.}
\begin{verbatim}
-\documentclass{article}
+\usepackage[latin_.medieval_, spanish_.notilde.lcroman_, danish]{babel}
+\end{verbatim}
-_\usepackage[russian]{babel}_
+Attributes (described below) are considered modifiers, ie, you can
+set an attribute by including it in the list of modifiers. However,
+modifiers are a more general mechanism.
-\babelfont{rm}{DejaVu Serif}
-
-\begin{document}
-
-Россия, находящаяся на пересечении множества культур, а также
-с учётом многонационального характера её населения, — отличается
-высокой степенью этнокультурного многообразия и способностью к
-межкультурному диалогу.
-
-\end{document}
-\end{verbatim}
-\end{example}
-
\subsection{Troubleshooting}
\begin{itemize}
@@ -725,7 +714,7 @@
can add further macros with the key |include| in the optional argument
(without commas). Macros not to be modified are listed in
|exclude|. You can also enforce a font encoding with
-|fontenc|.\footnote{With it encoded string may not work as expected.}
+|fontenc|.\footnote{With it, encoded strings may not work as expected.}
A couple of examples:
\begin{verbatim}
\babelensure[include=\Today]{spanish}
@@ -775,18 +764,20 @@
char and on the same level are ignored.
\item Since they are active, a shorthand cannot contain the same
character in its definition (except if it is deactivated with, eg,
- |string|).
+ |\string|).
\end{enumerate}
\end{note}
-A typical error when using shorthands is the following:
+\begin{troubleshooting}
\trouble{Argument of \textbackslash language at active@arg"
has an extra \textbraceright}
+A typical error when using shorthands is the following:
\begin{verbatim}
! Argument of \language at active@arg" has an extra }.
\end{verbatim}
It means there is a closing brace just after a shorthand, which is not
allowed (eg,~|"}|). Just add |{}| after (eg,~|"{}}|).
+\end{troubleshooting}
\Describe{\shorthandon}{\marg{shorthands-list}}
\DescribeOther{\shorthandoff}{%
@@ -819,6 +810,10 @@
those when the shorthands are defined, usually when language files are
loaded.
+If you do not need shorthands, or prefer an alternative approach of
+your own, you may want to switch them off with the package option
+|shorthands=off|, as described below.
+
\Describe{\useshorthands}{%
\colorbox{thegrey}{\ttfamily\hskip-.2em*\hskip-.2em}%
\marg{char}}
@@ -881,36 +876,6 @@
is that expected in each context.
\end{example}
-\Describe{\aliasshorthand}{\marg{original}\marg{alias}}
-
-The command |\aliasshorthand| can be used to let another
-character perform the same functions as the default shorthand
-character. If one prefers for example to use the character |/|
-over |"| in typing Polish texts, this can be achieved by entering
-|\aliasshorthand{"}{/}|.
-
-\begin{note}
- The substitute character must \textit{not} have been declared before
- as shorthand (in such a case, |\aliashorthands| is ignored).
-\end{note}
-
-\begin{example}
- The following example shows how to replace a shorthand by another
-\begin{verbatim}
-\aliasshorthand{~}{^}
-\AtBeginDocument{\shorthandoff*{~}}
-\end{verbatim}
-\end{example}
-
-\begin{warning}
- Shorthands remember somehow the original character, and the fallback
- value is that of the latter. So, in this example, if no shorthand if
- found, |^| expands to a non-breaking space, because this is the
- value of |~| (internally, |^| still calls |\active at char~| or
- |\normal at char~|). Furthermore, if you change the |system| value of
- |^| with |\defineshorthand| nothing happens.
-\end{warning}
-
\Describe{\languageshorthands}{\marg{language}} The command
|\languageshorthands| can be used to switch the shorthands on the
language level. It takes one argument, the name of a language or
@@ -943,6 +908,8 @@
\verb|\babelshorthand{:}|. (You can conveniently define your own
macros, or even your own user shorthands provided they do not overlap.)
+\bigskip
+
For your records, here is a list of shorthands, but you must double
check them, as they may change:\footnote{Thanks to Enrico Gregorio}
@@ -969,7 +936,7 @@
\item[Kurmanji] |^|
\item[Latin] |" ^ =|
\item[Slovak] |" ^ ' -|
-\item[Spanish] |" . < > '|
+\item[Spanish] |" . < > ' ~|
\item[Turkish] |: ! =|
\end{description}
In addition, the \babel{} core declares |~| as a one-char shorthand
@@ -981,6 +948,37 @@
\New{3.23} Tests if a character has been made a shorthand.
+\Describe{\aliasshorthand}{\marg{original}\marg{alias}}
+
+The command |\aliasshorthand| can be used to let another character
+perform the same functions as the default shorthand character. If one
+prefers for example to use the character |/| over |"| in typing Polish
+texts, this can be achieved by entering |\aliasshorthand{"}{/}|. For
+the reasons in the warning below, usage of this macro is not
+recommended.
+
+\begin{note}
+ The substitute character must \textit{not} have been declared before
+ as shorthand (in such a case, |\aliashorthands| is ignored).
+\end{note}
+
+\begin{example}
+ The following example shows how to replace a shorthand by another
+\begin{verbatim}
+\aliasshorthand{~}{^}
+\AtBeginDocument{\shorthandoff*{~}}
+\end{verbatim}
+\end{example}
+
+\begin{warning}
+ Shorthands remember somehow the original character, and the fallback
+ value is that of the latter. So, in this example, if no shorthand if
+ found, |^| expands to a non-breaking space, because this is the
+ value of |~| (internally, |^| still calls |\active at char~| or
+ |\normal at char~|). Furthermore, if you change the |system| value of
+ |^| with |\defineshorthand| nothing happens.
+\end{warning}
+
\subsection{Package options}
\New{3.9a}
@@ -1172,8 +1170,8 @@
(something to keep in mind if backward compatibility is important). The
following section shows how to make use of them currently (by means of
|\babelprovide|), but a higher interface, based on package options, in
-under development (in other words, |\babelprovide| is mainly intended
-for auxiliary tasks).
+under study. In other words, |\babelprovide| is mainly meant
+for auxiliary tasks.
\begin{example}
Although Georgian has its own \texttt{ldf} file, here is how to
@@ -1221,28 +1219,32 @@
\newfontscript{Devanagari}{deva}
\end{verbatim}
Other Indic scripts are still under development in \luatex{}. On the
- other hand, \xetex{} is better.
+ other hand, \xetex{} is better. The upcoming \textsf{lualatex} will
+ be based on \textsf{luahbtex}, so Indic scripts will be rendered
+ correctly with the option |Renderer=Harfbuzz| in \textsc{fontspec}.
\item[Southeast scripts] Thai works in both \luatex{} and \xetex{}, but
line breaking differs (rules can be modified in \luatex; they are
- hardcoded in \xetex). Lao seems to work, too, but there are no
- patterns for the latter in \luatex{}. Some quick patterns could help,
- with something similar to:
+ hard-coded in \xetex). Lao seems to work, too, but there are no
+ patterns for the latter in \luatex{}. Khemer clusters are rendered
+ wrongly. The comment about Indic scripts and \textsf{lualatex} also
+ applies here. Some quick patterns could help, with something similar
+ to:
\begingroup
\setmonofont[Script=Lao,Scale=MatchLowercase]{DejaVu Sans Mono}
\begin{verbatim}
\babelprovide[import,hyphenrules=+]{lao}
-\babelpatterns[lao]{1ດ 1ມ 1ແ 1ອ 1ງ 1ກ 1າ} % Random
+\babelpatterns[lao]{1ດ 1ມ 1ອ 1ງ 1ກ 1າ} % Random
\end{verbatim}
\endgroup
- Khemer clusters are rendered wrongly.
-\item[East Asia scripts] Settings for either Simplified of Tradicional
+\item[East Asia scripts] Settings for either Simplified of Traditional
should work out of the box. \luatex{} does basic line breaking, but
currently \xetex{} does not (you may load \textsf{zhspacing}). Although
for a few words and shorts texts the |ini| files should be fine, CJK
texts are best set with a dedicated framework (\textsf{CJK},
-\textsf{luatexja}, \textsf{kotex}, \textsf{CTeX}, etc.). Actually, this
-is what the |ldf| does in |japanese| with \luatex, because the
-following piece of code loads \textsf{luatexja}:
+\textsf{luatexja}, \textsf{kotex}, \textsf{CTeX}, etc.). This is what
+the class |ltjbook| does with \luatex, which can be used in conjunction
+with the |ldf| for |japanese|, because the following piece of code
+loads \textsf{luatexja}:
\begin{verbatim}
\documentclass{ltjbook}
\usepackage[japanese]{babel}
@@ -1871,6 +1873,13 @@
\Describe\babelfont{\oarg{language-list}\marg{font-family}%
\oarg{font-options}\marg{font-name}}
+The main purpose of |\babelfont| is to define at once in a multilingual
+document the fonts required by the different languages, with their
+corresponding language systems (script and language). So, if you load,
+say, 4 languages, |\babelfont{rm}{FreeSerif}| defines 4 fonts (with their
+variants, of course), which are switched with the language by \babel.
+It is a tool to make things easier and transparent to the user.
+
Here \textit{font-family} is |rm|, |sf| or |tt| (or newly defined
ones, as explained below), and \textit{font-name} is the same as in
\textsf{fontspec} and the like.
@@ -2007,6 +2016,7 @@
font 'FONT' with script 'SCRIPT' 'Default' language used instead'}
\textit{Package fontspec Warning: 'Language 'LANG' not available for
font 'FONT' with script 'SCRIPT' 'Default' language used instead'}.
+\textbf{This is \textit{not} and error.}
This warning is shown by \textsf{fontspec}, not by \babel. It could be
irrelevant for English, but not for many other languages, including
Urdu and Turkish. This is a useful and harmless warning, and if
@@ -2014,6 +2024,27 @@
to ignore it altogether.
\end{troubleshooting}
+\begin{troubleshooting}
+\trouble{Package babel Info: The following fonts are not babel standard families}
+\textit{Package babel Info: The following fonts are not babel
+standard families}.
+
+\textbf{This is \textit{not} and error.} \babel{} assumes that if you
+are using |\babelfont| for a family, very likely you want to define the
+rest of them. If you don't, you can find some inconsistencies between
+families. This checking is done at the beginning of the document, at a
+point where we cannot know which families will be used.
+
+Actually, there is no real need to use |\babelfont| in a monolingual
+document, if you set the language system in |\setmainfont| (or not,
+depending on what you want).
+
+As the message explains, \textit{there is nothing intrinsically wrong}
+with not defining all the families. In fact, there is nothing
+intrinsically wrong with not using |\babelfont| at all. But you must be
+aware that this may lead to some problems.
+\end{troubleshooting}
+
\subsection{Modifying a language}
Modifying the behavior of a language (say, the chapter “caption”), is
@@ -2085,10 +2116,11 @@
set to 2 and 3. In either case, caption, date and language system are
not defined.
-If no |ini| file is imported with |import|, \m{language-name} is
-relevant because in such a case the hyphenation rules are based on it
-as provided in the |ini| file corresponding to that name; the same
-applies to OpenType language and script.
+If no |ini| file is imported with |import|, \m{language-name} is still
+relevant because in such a case the hyphenation and like breaking rules
+(including those for South East Asian and CJK) are based on it as
+provided in the |ini| file corresponding to that name; the same applies
+to OpenType language and script.
Conveniently, some options allow to fill the language, and \babel{}
warns you about what to do if there is a missing string. Very likely
@@ -2124,7 +2156,7 @@
\New{3.13} Imports data from an |ini| file, including captions, date,
and hyphenmins. For example:
\begin{verbatim}
-\babelprovide[import=hu]{hungarian}
+\babelprovide[_import=hu_]{hungarian}
\end{verbatim}
Unicode engines load the UTF-8 variants, while 8-bit engines load the
LICR (ie, with macros like |\'| or |\ss|) ones.
@@ -2135,7 +2167,7 @@
the list of recognized languages above. So, the previous example could
be written:
\begin{verbatim}
-\babelprovide[import]{hungarian}
+\babelprovide[_import_]{hungarian}
\end{verbatim}
There are about 200 |ini| files, with data taken from the |ldf| files
@@ -2153,7 +2185,7 @@
\Describe{captions=}{\meta{language-tag}}
Loads only the strings. For example:
\begin{verbatim}
-\babelprovide[captions=hu]{hungarian}
+\babelprovide[_captions=hu_]{hungarian}
\end{verbatim}
\Describe{hyphenrules=}{\meta{language-list}} With this option, with a
@@ -2160,7 +2192,7 @@
space-separated list of hyphenation rules, \babel{} assigns to the
language the first valid hyphenation rules in the list. For example:
\begin{verbatim}
-\babelprovide[hyphenrules=chavacano spanish italian]{chavacano}
+\babelprovide[_hyphenrules=chavacano spanish italian_]{chavacano}
\end{verbatim}
If none of the listed hyphenrules exist, the default behavior
applies. Note in this example we set |chavacano| as first option --
@@ -2205,14 +2237,13 @@
\Describe{mapfont=}{\texttt{direction}}
Assigns the font for the writing direction of this language (only with
-|bidi=basic|).\footnote{There will be another value, \texttt{language},
-not yet implemented.} More precisely, what |mapfont=direction| means
-is, ‘when a character has the same direction as the script for the
-“provided” language, then change its font to that set for this
-language’. There are 3 directions, following the bidi Unicode
-algorithm, namely, Arabic-like, Hebrew-like and left to
-right.\footnote{In future releases a new value (\texttt{script}) will
-be added.} So, there should be at most 3 directives of this kind.
+|bidi=basic|). More precisely, what |mapfont=direction| means is, ‘when
+a character has the same direction as the script for the “provided”
+language, then change its font to that set for this language’. There
+are 3 directions, following the bidi Unicode algorithm, namely,
+Arabic-like, Hebrew-like and left to right.\footnote{In future releases
+a couple of values (\texttt{language} and \texttt{script}) will be
+added.} So, there should be at most 3 directives of this kind.
\Describe{intraspace=}{\meta{base} \meta{shrink} \meta{stretch}}
Sets the interword space for the writing system of the language, in em
@@ -2219,12 +2250,12 @@
units (so, |0 .1 0| is |0em plus .1em|). Like |\spaceskip|, the em unit
applied is that of the current text (more precisely, the previous
glyph). Currently used only in Southeast Asian scrips, like Thai, and
-CJK. Requires |import|.
+CJK.
\Describe{intrapenalty=}{\meta{penalty}}
Sets the interword penalty for the writing system of this language.
Currently used only in Southeast Asian scrips, like Thai. Ignored if 0
-(which is the default value). Requires |import|.
+(which is the default value).
\begin{note}
(1) If you need shorthands, you can define them with |\useshorthands|
@@ -2252,8 +2283,8 @@
% \babelprovide[import, maparabic]{telugu}
\babelfont{rm}{Gautami}
\begin{document}
-\telugudigits{1234}
-\telugucounter{section}
+_\telugudigits{1234}_
+_\telugucounter{section}_
\end{document}
\end{verbatim}
@@ -2519,11 +2550,12 @@
tentative, but it mostly works. For RL documents use the former, and
for LR ones use the latter.
-\New{3.32} There is some experimental support for \textsf{harftex}.
-Since it is based on \luatex, the option |basic| mostly works. You may
-need to deactivate the |rtlm| or the |rtla| font features (besides
-loading \textsf{harfload} before \babel and activating |mode=harf|;
-there is a sample in the GitHub repository).
+\New{3.37} There is some experimental support for \textsf{luahbtex}
+(with |lualatex-dev|) and the latest releases of \textsf{luaotfload}
+(3.11), with |Renderer = Harfbuzz| in \textsf{fontspec}. Since it is
+based on \luatex, the option |basic| mostly works (You may need
+deactivate the |rtlm| or the |rtla| font features, or alternatively
+deactive mirroring in \babel{} with |\babeladjust|.)
There are samples on GitHub, under \texttt{/required/babel/samples}.
See particularly |lua-bidibasic.tex| and |lua-secenum.tex|.
@@ -3149,8 +3181,43 @@
version of |\foreignlanguage|).
\medskip
-\textbf{Old stuff}
+\textbf{Modifying, and adding, values of |ini| files}
+\New{3.37} There is a way to modify the values of |ini| files when they
+get loaded with |\babelprovide|. To set, say, |digits.native| in the
+|numbers| section, use something like
+|numbers..digits.native=abcdefghij| (note the double dot between the
+section and the key name). New keys may be added, too.
+
+\medskip
+\textbf{Non-standard hyphenation}
+
+\New{3.37} With \luatex{} it is now possible to define non-standard
+hyohenation rules, like |f-f| $\to$ |ff-f|. No rules are currently
+provided by default, but they can be defined as shown in the following
+example:
+\begin{verbatim}
+\babelposthyphenation{ngerman}{([fmtrp]) | {1}}
+{
+ { no = {1}, pre = {1}{1}-}, % Replace first char with disc
+ remove, % Remove automatic disc
+ {} % Keep last char, untouched
+}
+\end{verbatim}
+
+This feature must be explicitly activated with:
+\begin{verbatim}
+\babeladjust{ hyphenation.extra = on }
+\end{verbatim}
+
+See the \babel{} wiki for a more detailed description and some examples:
+\begin{verbatim}
+https://github.com/latex3/babel/wiki
+\end{verbatim}
+
+\medskip
+\textbf{Old and deprecated stuff}
+
A couple of tentative macros were provided by \babel{} ($\ge$3.9g) with
a partial solution for ``Unicode'' fonts. These macros are now
deprecated --- use |\babelfont|. A short description follows, for
@@ -3171,16 +3238,6 @@
\babelFSfeatures{turkish}{Language=Turkish}
\end{verbatim}
-\medskip
-\textbf{Modifying values of |ini| files}
-
-\New{3.36} There is a way to modify the values of |ini| files when they
-get loaded with |\babelprovide|. To set, say, |digits.native| in the
-|numbers| section, use something like
-|numbers..digits.native=abcdefghij| (note the double dot between the
-section and the key name). The syntax may change, and currently it only
-redefines existing keys.
-
\section{Loading languages with \file{language.dat}}
\TeX{} and most engines based on it (pdf\TeX, \xetex, $\epsilon$-\TeX,
@@ -3968,29 +4025,25 @@
\begin{thebibliography}{9}
\bibitem{AT} Huda Smitshuijzen Abifares, \textit{Arabic Typography},
Saqi, 2001.
- \bibitem{DEK} Donald E. Knuth,
- \emph{The \TeX book}, Addison-Wesley, 1986.
- \bibitem{LLbook} Leslie Lamport,
- \emph{\LaTeX, A document preparation System}, Addison-Wesley,
- 1986.
- \bibitem{treebus} K.F. Treebus.
- \emph{Tekstwijzer, een gids voor het grafisch verwerken van
- tekst},
- SDU Uitgeverij ('s-Gravenhage, 1988).
- \bibitem{HP} Hubert Partl,
- \emph{German \TeX},
- \emph{TUGboat} 9 (1988) \#1, p.~70--72.
- \bibitem{LLth} Leslie Lamport,
- in: \TeX hax Digest, Volume 89, \#13, 17 February 1989.
\bibitem{BEP} Johannes Braams, Victor Eijkhout and Nico Poppelier,
\emph{The development of national \LaTeX\ styles},
\emph{TUGboat} 10 (1989) \#3, p.~401--406.
\bibitem{FE} Yannis Haralambous,
\emph{Fonts \& Encodings}, O'Reilly, 2007.
+ \bibitem{DEK} Donald E. Knuth,
+ \emph{The \TeX book}, Addison-Wesley, 1986.
\bibitem{UE} Jukka K. Korpela,
\textit{Unicode Explained}, O'Reilly, 2006.
+ \bibitem{LLbook} Leslie Lamport,
+ \emph{\LaTeX, A document preparation System}, Addison-Wesley,
+ 1986.
+ \bibitem{LLth} Leslie Lamport,
+ in: \TeX hax Digest, Volume 89, \#13, 17 February 1989.
\bibitem{CJKV} Ken Lunde,
\textit{CJKV Information Processing}, O'Reilly, 2nd ed., 2009.
+ \bibitem{HP} Hubert Partl,
+ \emph{German \TeX},
+ \emph{TUGboat} 9 (1988) \#1, p.~70--72.
\bibitem{ilatex} Joachim Schrod,
\emph{International \LaTeX\ is ready to use},
\emph{TUGboat} 11 (1990) \#1, p.~87--90.
@@ -3998,6 +4051,10 @@
Sofroniu,
\emph{Digital typography using \LaTeX},
Springer, 2002, p.~301--373.
+ \bibitem{treebus} K.F. Treebus.
+ \emph{Tekstwijzer, een gids voor het grafisch verwerken van
+ tekst},
+ SDU Uitgeverij ('s-Gravenhage, 1988).
\end{thebibliography}
\end{document}
%</filedriver>
@@ -4103,8 +4160,8 @@
% \section{Tools}
%
% \begin{macrocode}
-%<<version=3.36>>
-%<<date=2019/11/14>>
+%<<version=3.37>>
+%<<date=2019/12/08>>
% \end{macrocode}
%
% \textbf{Do not use the following macros in \texttt{ldf} files. They
@@ -4639,9 +4696,6 @@
if Babel.numbers and Babel.digits_mapped then
head = Babel.numbers(head)
end
- if Babel.fixboxdirs then % Temporary!
- head = Babel.fixboxdirs(head)
- end
if Babel.bidi_enabled then
head = Babel.bidi(head, false, dir)
end
@@ -5680,7 +5734,6 @@
\def\bbl at main@language{#1}%
\let\languagename\bbl at main@language
\bbl at id@assign
- \chardef\localeid\@nameuse{bbl at id@@\languagename}%
\bbl at patterns{\languagename}}
% \end{macrocode}
%
@@ -8088,6 +8141,8 @@
% intraspace.}
% \changes{babel~3.34}{2019/09/20}{Fix - with main the language must not
% be restored.}
+% \changes{babel~3.37}{2019/12/07}{SEA and CJK linebreaking activated
+% by default.}
%
% \begin{macrocode}
\bbl at trace{Creating languages and reading ini files}
@@ -8097,7 +8152,6 @@
% Set name and locale id
\def\languagename{#2}%
\bbl at id@assign
- \chardef\localeid\@nameuse{bbl at id@@\languagename}%
\let\bbl at KVP@captions\@nil
\let\bbl at KVP@import\@nil
\let\bbl at KVP@main\@nil
@@ -8159,8 +8213,8 @@
\bbl at read@ini{##1}{basic data}%
\bbl at exportkey{chrng}{characters.ranges}{}%
\bbl at exportkey{dgnat}{numbers.digits.native}{}%
- % \bbl at exportkey{hyphr}{typography.hyphenrules}{}%
- % \bbl at exportkey{intsp}{typography.intraspace}{}%
+ \bbl at exportkey{hyphr}{typography.hyphenrules}{}%
+ \bbl at exportkey{intsp}{typography.intraspace}{}%
\endgroup}% boxed, to avoid extra spaces:
{\setbox\z@\hbox{\InputIfFileExists{babel-#2.tex}{}{}}}}%
{}%
@@ -8204,65 +8258,7 @@
\ifx\bbl at KVP@intraspace\@nil\else % We can override the ini or set
\bbl at csarg\edef{intsp@#2}{\bbl at KVP@intraspace}%
\fi
- \ifcase\bbl at engine\or % lua
- \bbl at ifunset{bbl at intsp@\languagename}{}%
- {\expandafter\ifx\csname bbl at intsp@\languagename\endcsname\@empty\else
- \bbl at xin@{\bbl at cs{sbcp@\languagename}}{Hant,Hans,Jpan,Kore,Kana}%
- \ifin@ % cjk
- \bbl at cjkintraspace
- \directlua{
- Babel = Babel or {}
- Babel.locale_props = Babel.locale_props or {}
- Babel.locale_props[\the\localeid].linebreak = 'c'
- }%
- \bbl at exp{\\\bbl at intraspace\bbl at cs{intsp@\languagename}\\\@@}%
- \ifx\bbl at KVP@intrapenalty\@nil
- \bbl at intrapenalty0\@@
- \fi
- \else % sea
- \bbl at seaintraspace
- \bbl at exp{\\\bbl at intraspace\bbl at cs{intsp@\languagename}\\\@@}%
- \directlua{
- Babel = Babel or {}
- Babel.sea_ranges = Babel.sea_ranges or {}
- Babel.set_chranges('\bbl at cs{sbcp@\languagename}',
- '\bbl at cs{chrng@\languagename}')
- }%
- \ifx\bbl at KVP@intrapenalty\@nil
- \bbl at intrapenalty0\@@
- \fi
- \fi
- \fi
- \ifx\bbl at KVP@intrapenalty\@nil\else
- \expandafter\bbl at intrapenalty\bbl at KVP@intrapenalty\@@
- \fi}%
- \or % xe
- \bbl at xin@{\bbl at cs{sbcp@\languagename}}{Thai,Laoo,Khmr}%
- \ifin@ % sea (currently ckj not handled)
- \bbl at ifunset{bbl at intsp@\languagename}{}%
- {\expandafter\ifx\csname bbl at intsp@\languagename\endcsname\@empty\else
- \ifx\bbl at KVP@intraspace\@nil
- \bbl at exp{%
- \\\bbl at intraspace\bbl at cs{intsp@\languagename}\\\@@}%
- \fi
- \ifx\bbl at KVP@intrapenalty\@nil
- \bbl at intrapenalty0\@@
- \fi
- \fi
- \ifx\bbl at KVP@intraspace\@nil\else % We may override the ini
- \expandafter\bbl at intraspace\bbl at KVP@intraspace\@@
- \fi
- \ifx\bbl at KVP@intrapenalty\@nil\else
- \expandafter\bbl at intrapenalty\bbl at KVP@intrapenalty\@@
- \fi
- \ifx\bbl at ispacesize\@undefined
- \AtBeginDocument{%
- \expandafter\bbl at add
- \csname selectfont \endcsname{\bbl at ispacesize}}%
- \def\bbl at ispacesize{\bbl at cs{xeisp@\bbl at cs{sbcp@\languagename}}}%
- \fi}%
- \fi
- \fi
+ \bbl at provide@intraspace
% == maparabic ==
% Native digits, if provided in ini (TeX level, xe and lua)
\ifcase\bbl at engine\else
@@ -8477,8 +8473,11 @@
%
% The reader of |ini| files. There are 3 possible cases: a section name
% (in the form |[...]|), a comment (starting with |;|) and a
-% key/value pair. \textit{TODO - Work in progress.}
+% key/value pair.
%
+% \changes{babel~3.37}{2019/12/07}{Allow to define key/values
+% (added \cs{bbl at renewlist}).}
+%
% \begin{macrocode}
\def\bbl at read@ini#1#2{%
\openin1=babel-#1.ini % FIXME - number must not be hardcoded
@@ -8511,6 +8510,14 @@
\expandafter\bbl at iniline\bbl at line\bbl at iniline
\fi
\repeat
+ \bbl at foreach\bbl at renewlist{%
+ \bbl at ifunset{bbl at renew@##1}{}{\bbl at inisec[##1]\@@}}%
+ \global\let\bbl at renewlist\@empty
+ % Ends last section. See \bbl at inisec
+ \def\bbl at elt##1##2{\bbl at inireader##1=##2\@@}%
+ \@nameuse{bbl at renew@\bbl at section}%
+ \global\bbl at csarg\let{renew@\bbl at section}\relax
+ \@nameuse{bbl at secpost@\bbl at section}%
\fi}
\def\bbl at iniline#1\bbl at iniline{%
\@ifnextchar[\bbl at inisec{\@ifnextchar;\bbl at iniskip\bbl at inipreread}#1\@@}% ]
@@ -8519,7 +8526,10 @@
% The special cases for comment lines and sections are handled by the
% two following commands. In sections, we provide the posibility to
% take extra actions at the end or at the start (TODO - but note the last
-% section is not ended). By default, key=val pairs are ignored.
+% section is not ended). By default, key=val pairs are ignored. The
+% |secpost| ``hook'' is used only by `identification', while |secpre|
+% only by |date.gregorian.licr|.
+%
%
% \begin{macrocode}
\def\bbl at iniskip#1\@@{}% if starts with ;
@@ -8528,15 +8538,19 @@
\@nameuse{bbl at renew@\bbl at section}%
\global\bbl at csarg\let{renew@\bbl at section}\relax
\@nameuse{bbl at secpost@\bbl at section}% ends previous section
- \def\bbl at section{#1}%
+ \def\bbl at section{#1}% starts current section
\def\bbl at elt##1##2{%
\@namedef{bbl at KVP@#1..##1}{}}%
\@nameuse{bbl at renew@#1}%
- \@nameuse{bbl at secpre@#1}% starts current section
+ \@nameuse{bbl at secpre@#1}% pre-section `hook'
\bbl at ifunset{bbl at inikv@#1}%
{\let\bbl at inireader\bbl at iniskip}%
{\bbl at exp{\let\\\bbl at inireader\<bbl at inikv@#1>}}}
+\let\bbl at renewlist\@empty
\def\bbl at renewinikey#1..#2\@@#3{%
+ \bbl at ifunset{bbl at renew@#1}%
+ {\bbl at add@list\bbl at renewlist{#1}}%
+ {}%
\bbl at csarg\bbl at add{renew@#1}{\bbl at elt{#2}{#3}}}
% \end{macrocode}
%
@@ -8565,8 +8579,10 @@
% \end{macrocode}
%
% Key-value pairs are treated differently depending on the section in
-% the |ini| file. The following macros are the readers for
-% |identification| and |typography|.
+% the |ini| file. The following macros are the readers for
+% |identification| and |typography|. Note |\bbl at secpost@identification|
+% is called always (via |\bbl at inisec|), while |\bbl at after@ini| must be
+% called explicitly after |\bbl at read@ini| if necessary.
%
% \changes{babel~3.36}{2019/10/30}{New fields for CJK, because OpenType
% and the CLDR follow different models.}
@@ -8841,6 +8857,10 @@
\bbl at adjust@layout{\let\list\bbl at NL@list}}
\@namedef{bbl at ADJ@layout.lists at on}{%
\bbl at adjust@layout{\let\list\bbl at OL@list}}
+\@namedef{bbl at ADJ@hyphenation.extra at on}{%
+ \directlua{
+ Babel.linebreaking.add_after(Babel.post_hyphenate_replace)
+ }}
% \end{macrocode}
%
% \section{The kernel of Babel (\texttt{babel.def} for \LaTeX only)}
@@ -10177,7 +10197,8 @@
Babel.locale_props[\bbl at id@last] = {}
}%
\fi}%
- {}}
+ {}%
+ \chardef\localeid\@nameuse{bbl at id@@\languagename}}
% \end{macrocode}
%
% The unprotected part of |\selectlanguage|.
@@ -10342,7 +10363,6 @@
\languageshorthands{none}%
% set the locale id
\bbl at id@assign
- \chardef\localeid\@nameuse{bbl at id@@\languagename}%
% switch captions, date
\ifcase\bbl at select@type
\ifhmode
@@ -10786,10 +10806,16 @@
% so we can safely use its error handling interface. Otherwise
% we'll have to `keep it simple'.
%
+% Infos are not written to the console, but on the other hand many
+% people think warnings are errors, so a further message type is
+% defined: an important info which is sent to the console.
+%
% \changes{babel~3.9a}{2012/07/30}{\cs{newcommand}s replaced by
% \cs{def}'s, so that the file can be loaded twice}
% \changes{babel~3.9a}{2013/01/26}{Define generic variants instead of
% duplicating each predefined message}
+% \changes{babel~3.37}{2019/12/07}{New message type: an into written to
+% the console.}
%
% \begin{macrocode}
\edef\bbl at nulllanguage{\string\language=0}
@@ -10806,6 +10832,7 @@
\def\\{^^J(babel) }%
\message{\\#1}%
\endgroup}
+ \let\bbl at infowarn\bbl at warning
\def\bbl at info#1{%
\begingroup
\newlinechar=`\^^J
@@ -10823,6 +10850,13 @@
\def\\{\MessageBreak}%
\PackageWarning{babel}{#1}%
\endgroup}
+ \def\bbl at infowarn#1{%
+ \begingroup
+ \def\\{\MessageBreak}%
+ \GenericWarning
+ {(babel) \@spaces\@spaces\@spaces}%
+ {Package babel Info: #1}%
+ \endgroup}
\def\bbl at info#1{%
\begingroup
\def\\{\MessageBreak}%
@@ -10831,6 +10865,7 @@
\fi
\@ifpackagewith{babel}{silent}
{\let\bbl at info\@gobble
+ \let\bbl at infowarn\@gobble
\let\bbl at warning\@gobble}
{}
\def\bbl at nocaption{\protect\bbl at nocaption@i}
@@ -11419,7 +11454,7 @@
\def\bbl at nostdfont#1{%
\bbl at ifunset{bbl at WFF@\f at family}%
{\bbl at csarg\gdef{WFF@\f at family}{}% Flag, to avoid dupl warns
- \bbl at warning{The current font is not a babel standard family:\\%
+ \bbl at infowarn{The current font is not a babel standard family:\\%
#1%
\fontname\font\\%
There is nothing intrinsically wrong with this warning, and\\%
@@ -11482,7 +11517,7 @@
\expandafter\xdef\csname ##1default\endcsname{\f at family}}%
{}}%
\ifx\bbl at tempa\@empty\else
- \bbl at warning{The following fonts are not babel standard families:\\%
+ \bbl at infowarn{The following fonts are not babel standard families:\\%
\bbl at tempa
There is nothing intrinsically wrong with it, but\\%
'babel' will no set Script and Language. Consider\\%
@@ -11736,6 +11771,32 @@
\def\bbl at intrapenalty#1\@@{%
\bbl at csarg\gdef{xeipn@\bbl at cs{sbcp@\languagename}}%
{\XeTeXlinebreakpenalty #1\relax}}
+\def\bbl at provide@intraspace{%
+ \bbl at xin@{\bbl at cs{sbcp@\languagename}}{Thai,Laoo,Khmr}%
+ \ifin@ % sea (currently ckj not handled)
+ \bbl at ifunset{bbl at intsp@\languagename}{}%
+ {\expandafter\ifx\csname bbl at intsp@\languagename\endcsname\@empty\else
+ \ifx\bbl at KVP@intraspace\@nil
+ \bbl at exp{%
+ \\\bbl at intraspace\bbl at cs{intsp@\languagename}\\\@@}%
+ \fi
+ \ifx\bbl at KVP@intrapenalty\@nil
+ \bbl at intrapenalty0\@@
+ \fi
+ \fi
+ \ifx\bbl at KVP@intraspace\@nil\else % We may override the ini
+ \expandafter\bbl at intraspace\bbl at KVP@intraspace\@@
+ \fi
+ \ifx\bbl at KVP@intrapenalty\@nil\else
+ \expandafter\bbl at intrapenalty\bbl at KVP@intrapenalty\@@
+ \fi
+ \ifx\bbl at ispacesize\@undefined
+ \AtBeginDocument{%
+ \expandafter\bbl at add
+ \csname selectfont \endcsname{\bbl at ispacesize}}%
+ \def\bbl at ispacesize{\bbl at cs{xeisp@\bbl at cs{sbcp@\languagename}}}%
+ \fi}%
+ \fi}
\AddBabelHook{xetex}{loadkernel}{%
<@Restore Unicode catcodes before loading patterns@>}
\ifx\DisableBabelHook\@undefined\endinput\fi
@@ -11771,6 +11832,7 @@
%
% \begin{macrocode}
%<*texxet>
+\providecommand\bbl at provide@intraspace{}
\bbl at trace{Redefinitions for bidi layout}
\def\bbl at sspre@caption{%
\bbl at exp{\everyhbox{\\\bbl at textdir\bbl at cs{wdir@\bbl at main@language}}}}
@@ -12226,6 +12288,8 @@
% \changes{babel~3.24}{2018/09/24}{Lua code for interword spacing
% in Southeast Asian scripts.}
% \changes{babel~3.32}{2019/05/25}{Don't break with CJK if nohyphenation.}
+% \changes{babel~3.37}{2019/12/07}{Added code for non-standard
+% hyphenation.}
%
% \textit{In progress.} Replace regular (ie, implicit) discretionaries
% by spaceskips, based on the previous glyph (which I think makes
@@ -12236,6 +12300,21 @@
% Unicode UAX 14).
%
% \begin{macrocode}
+\directlua{
+ Babel = Babel or {}
+ Babel.linebreaking = Babel.linebreaking or {}
+ Babel.linebreaking.before = {}
+ Babel.linebreaking.after = {}
+ Babel.locale = {} % Free to use, indexed with \localeid
+ function Babel.linebreaking.add_before(func)
+ tex.print([[\noexpand\csname bbl at luahyphenate\endcsname]])
+ table.insert(Babel.linebreaking.before , func)
+ end
+ function Babel.linebreaking.add_after(func)
+ tex.print([[\noexpand\csname bbl at luahyphenate\endcsname]])
+ table.insert(Babel.linebreaking.after, func)
+ end
+}
\def\bbl at intraspace#1 #2 #3\@@{%
\directlua{
Babel = Babel or {}
@@ -12375,7 +12454,17 @@
if Babel.cjk_enabled then
Babel.cjk_linebreak(head)
end
+ if Babel.linebreaking.before then
+ for k, func in ipairs(Babel.linebreaking.before) do
+ func(head)
+ end
+ end
lang.hyphenate(head)
+ if Babel.linebreaking.after then
+ for k, func in ipairs(Babel.linebreaking.after) do
+ func(head)
+ end
+ end
if Babel.sea_enabled then
Babel.sea_disc_to_space(head)
end
@@ -12384,6 +12473,38 @@
}
}
\endgroup
+\def\bbl at provide@intraspace{%
+ \bbl at ifunset{bbl at intsp@\languagename}{}%
+ {\expandafter\ifx\csname bbl at intsp@\languagename\endcsname\@empty\else
+ \bbl at xin@{\bbl at cs{sbcp@\languagename}}{Hant,Hans,Jpan,Kore,Kana}%
+ \ifin@ % cjk
+ \bbl at cjkintraspace
+ \directlua{
+ Babel = Babel or {}
+ Babel.locale_props = Babel.locale_props or {}
+ Babel.locale_props[\the\localeid].linebreak = 'c'
+ }%
+ \bbl at exp{\\\bbl at intraspace\bbl at cs{intsp@\languagename}\\\@@}%
+ \ifx\bbl at KVP@intrapenalty\@nil
+ \bbl at intrapenalty0\@@
+ \fi
+ \else % sea
+ \bbl at seaintraspace
+ \bbl at exp{\\\bbl at intraspace\bbl at cs{intsp@\languagename}\\\@@}%
+ \directlua{
+ Babel = Babel or {}
+ Babel.sea_ranges = Babel.sea_ranges or {}
+ Babel.set_chranges('\bbl at cs{sbcp@\languagename}',
+ '\bbl at cs{chrng@\languagename}')
+ }%
+ \ifx\bbl at KVP@intrapenalty\@nil
+ \bbl at intrapenalty0\@@
+ \fi
+ \fi
+ \fi
+ \ifx\bbl at KVP@intrapenalty\@nil\else
+ \expandafter\bbl at intrapenalty\bbl at KVP@intrapenalty\@@
+ \fi}}
% \end{macrocode}
%
% \subsection{CJK line breaking}
@@ -12415,33 +12536,6 @@
<@Font selection@>
% \end{macrocode}
%
-% \textbf{Temporary} fix for luatex $<$1.10, which sometimes inserted a
-% spurious closing dir node with a |\textdir| within |\hbox|es. This
-% will be eventually removed.
-%
-% \begin{macrocode}
-\def\bbl at luafixboxdir{%
- \setbox\z@\hbox{\textdir TLT}%
- \directlua{
- function Babel.first_dir(head)
- for item in node.traverse_id(node.id'dir', head) do
- return item
- end
- return nil
- end
- if Babel.first_dir(tex.box[0].head) then
- function Babel.fixboxdirs(head)
- local fd = Babel.first_dir(head)
- if fd and fd.dir:sub(1,1) == '-' then
- head = node.remove(head, fd)
- end
- return head
- end
- end
- }}
-\AtBeginDocument{\bbl at luafixboxdir}
-% \end{macrocode}
-%
% \changes{babel~3.32}{2019/05/23}{New - \cs{babelcharproperty}.}
%
% The code for |\babelcharproperty| is straightforward. Just note the
@@ -12489,6 +12583,255 @@
\let\bbl at chprop@lb\bbl at chprop@linebreak
% \end{macrocode}
%
+% Post-handling hyphenation patterns for non-standard rules, like |ff|
+% to |ff-f|. There are still some issues with speed (not very slow, but
+% still slow).
+%
+% After declaring the table containing the patterns with their
+% replacements, we define some auxiliary functions: |str_to_nodes|
+% converts the string returned by a function to a node list, taking the
+% node at |base| as a model (font, language, etc.); |fetch_word|
+% fetches a series of glyphs and discretionaries, which |pattern| is
+% matched against (if there is a match, it is called again before
+% trying other patterns, and this is very likely the main bottleneck).
+%
+% |post_hyphenate_replace| is the callback applied after
+% |tex.hyphenate|. This means the automatic hyphenation points are
+% known. As empty captures return a byte position (as explained in the
+% \luatex{} manual), we must convert it to a utf8 position. With
+% |first|, the last byte can be the leading byte in a utf8 sequence,
+% so we just remove it and add 1 to the resulting length. With |last|
+% we must take into account the capture position points to the next
+% character. Here |word_head| points to the starting node of the text to
+% be matched.
+%
+% \begin{macrocode}
+\begingroup
+\catcode`\#=12
+\catcode`\%=12
+\catcode`\&=14
+\directlua{
+ Babel.linebreaking.replacements = {}
+
+ function Babel.str_to_nodes(fn, matches, base)
+ local n, head, last
+ if fn == nil then return nil end
+ for s in string.utfvalues(fn(matches)) do
+ if base.id == 7 then
+ base = base.replace
+ end
+ n = node.copy(base)
+ n.char = s
+ if not head then
+ head = n
+ else
+ last.next = n
+ end
+ last = n
+ end
+ return head
+ end
+
+ function Babel.fetch_word(head, funct)
+ local word_string = ''
+ local word_nodes = {}
+ local lang
+ local item = head
+
+ while item do
+
+ if item.id == 29
+ and not(item.char == 124) &% ie, not |
+ and not(item.char == 61) &% ie, not =
+ and (item.lang == lang or lang == nil) then
+ lang = lang or item.lang
+ word_string = word_string .. unicode.utf8.char(item.char)
+ word_nodes[#word_nodes+1] = item
+
+ elseif item.id == 7 and item.subtype == 2 then
+ word_string = word_string .. '='
+ word_nodes[#word_nodes+1] = item
+
+ elseif item.id == 7 and item.subtype == 3 then
+ word_string = word_string .. '|'
+ word_nodes[#word_nodes+1] = item
+
+ elseif word_string == '' then
+ &% pass
+
+ else
+ return word_string, word_nodes, item, lang
+ end
+
+ item = item.next
+ end
+ end
+
+ function Babel.post_hyphenate_replace(head)
+ local u = unicode.utf8
+ local lbkr = Babel.linebreaking.replacements
+ local word_head = head
+
+ while true do
+ local w, wn, nw, lang = Babel.fetch_word(word_head)
+ if not lang then return head end
+
+ if not lbkr[lang] then
+ break
+ end
+
+ for k=1, #lbkr[lang] do
+ local p = lbkr[lang][k].pattern
+ local r = lbkr[lang][k].replace
+
+ while true do
+ local matches = { u.match(w, p) }
+ if #matches < 2 then break end
+
+ local first = table.remove(matches, 1)
+ local last = table.remove(matches, #matches)
+
+ &% Fix offsets, from bytes to unicode.
+ first = u.len(w:sub(1, first-1)) + 1
+ last = u.len(w:sub(1, last-1))
+
+ local new &% used when inserting and removing nodes
+ local changed = 0
+
+ &% This loop traverses the replace list and takes the
+ &% corresponding actions
+ for q = first, last do
+ local crep = r[q-first+1]
+ local char_node = wn[q]
+ local char_base = char_node
+
+ if crep and crep.data then
+ char_base = wn[crep.data+first-1]
+ end
+
+ if crep == {} then
+ break
+ elseif crep == nil then
+ changed = changed + 1
+ node.remove(head, char_node)
+ elseif crep and (crep.pre or crep.no or crep.post) then
+ changed = changed + 1
+ d = node.new(7, 0) &% (disc, discretionary)
+ d.pre = Babel.str_to_nodes(crep.pre, matches, char_base)
+ d.post = Babel.str_to_nodes(crep.post, matches, char_base)
+ d.replace = Babel.str_to_nodes(crep.no, matches, char_base)
+ d.attr = char_base.attr
+ if crep.pre == nil then &% TeXbook p96
+ d.penalty = crep.penalty or tex.hyphenpenalty
+ else
+ d.penalty = crep.penalty or tex.exhyphenpenalty
+ end
+ head, new = node.insert_before(head, char_node, d)
+ node.remove(head, char_node)
+ if q == 1 then
+ word_head = new
+ end
+ elseif crep and crep.string then
+ changed = changed + 1
+ local str = crep.string(matches)
+ if str == '' then
+ if q == 1 then
+ word_head = char_node.next
+ end
+ head, new = node.remove(head, char_node)
+ elseif char_node.id == 29 and u.len(str) == 1 then
+ char_node.char = string.utfvalue(str)
+ else
+ local n
+ for s in string.utfvalues(str) do
+ if char_node.id == 7 then
+ log('Automatic hyphens cannot be replaced, just removed.')
+ else
+ n = node.copy(char_base)
+ end
+ n.char = s
+ if q == 1 then
+ head, new = node.insert_before(head, char_node, n)
+ word_head = new
+ else
+ node.insert_before(head, char_node, n)
+ end
+ end
+
+ node.remove(head, char_node)
+ end &% string length
+ end &% if char and char.string
+ end &% for char in match
+ if changed > 20 then
+ texio.write('Too many changes. Ignoring the rest.')
+ elseif changed > 0 then
+ w, wn, nw = Babel.fetch_word(word_head)
+ end
+
+ end &% for match
+ end &% for patterns
+ word_head = nw
+ end &% for words
+ return head
+ end
+
+ function Babel.capture_func(key, cap)
+ local ret = "[[" .. cap:gsub('{([0-9])}', "]]..m[%1]..[[") .. "]]"
+ ret = ret:gsub("%[%[%]%]%.%.", '')
+ ret = ret:gsub("%.%.%[%[%]%]", '')
+ return key .. [[=function(m) return ]] .. ret .. [[ end]]
+ end
+}
+% \end{macrocode}
+%
+% Now the \TeX{} high level interface, which requires the function
+% defined above for converting strings to functions returning a string.
+% These functions handle the |{|\textit{n}|}| syntax. For example,
+% |pre={1}{1}-| becomes |function(m) return m[1]..m[1]..'-' end|, where
+% |m| are the matches returned after applying the pattern. The way it
+% is done is somewhat tricky, but the effect in not dissimilar to lua
+% |load| – save the code as string in a TeX macro, and expand this
+% macro at the appropriate place. As |\directlua| does not take into
+% account the current catcode of |@|, we just avoid this character in
+% macro names (which explains the internal group, too).
+%
+% \begin{macrocode}
+\catcode`\#=6
+\gdef\babelposthyphenation#1#2#3{&%
+ \begingroup
+ \def\babeltempa{\bbl at add@list\babeltempb}&%
+ \let\babeltempb\@empty
+ \bbl at foreach{#3}{&%
+ \bbl at ifsamestring{##1}{remove}&%
+ {\bbl at add@list\babeltempb{nil}}&%
+ {\directlua{
+ local rep = [[##1]]
+ rep = rep:gsub( '(no)%s*=%s*([^%s,]*)', Babel.capture_func)
+ rep = rep:gsub( '(pre)%s*=%s*([^%s,]*)', Babel.capture_func)
+ rep = rep:gsub( '(post)%s*=%s*([^%s,]*)', Babel.capture_func)
+ rep = rep:gsub('(string)%s*=%s*([^%s,]*)', Babel.capture_func)
+ tex.print([[\string\babeltempa{{]] .. rep .. [[}}]])
+ }}}&%
+ \directlua{
+ local lbkr = Babel.linebreaking.replacements
+ local u = unicode.utf8
+ &% Convert pattern:
+ local patt = string.gsub([[#2]], '%s', '')
+ if not u.find(patt, '()', nil, true) then
+ patt = '()' .. patt .. '()'
+ end
+ patt = u.gsub(patt, '{(.)}',
+ function (n)
+ return '%' .. (tonumber(n) and (tonumber(n)+1) or n)
+ end)
+ lbkr[\the\csname l@#1\endcsname] = lbkr[\the\csname l@#1\endcsname] or {}
+ table.insert(lbkr[\the\csname l@#1\endcsname],
+ { pattern = patt, replace = { \babeltempb } })
+ }&%
+ \endgroup}
+\endgroup
+% \end{macrocode}
+%
% \subsection{Layout}
%
% \textbf{Work in progress}.
Modified: trunk/Master/texmf-dist/source/latex/babel/babel.ins
===================================================================
--- trunk/Master/texmf-dist/source/latex/babel/babel.ins 2019-12-09 21:53:18 UTC (rev 53071)
+++ trunk/Master/texmf-dist/source/latex/babel/babel.ins 2019-12-09 21:54:06 UTC (rev 53072)
@@ -26,7 +26,7 @@
%% and covered by LPPL is defined by the unpacking scripts (with
%% extension .ins) which are part of the distribution.
%%
-\def\filedate{2019/11/14}
+\def\filedate{2019/12/08}
\def\batchfile{babel.ins}
\input docstrip.tex
Modified: trunk/Master/texmf-dist/source/latex/babel/bbcompat.dtx
===================================================================
--- trunk/Master/texmf-dist/source/latex/babel/bbcompat.dtx 2019-12-09 21:53:18 UTC (rev 53071)
+++ trunk/Master/texmf-dist/source/latex/babel/bbcompat.dtx 2019-12-09 21:54:06 UTC (rev 53072)
@@ -30,7 +30,7 @@
%
% \iffalse
%<*dtx>
-\ProvidesFile{bbcompat.dtx}[2019/11/14 v3.36]
+\ProvidesFile{bbcompat.dtx}[2019/12/08 v3.37]
%</dtx>
%
%% File 'bbcompat.dtx'
Modified: trunk/Master/texmf-dist/source/latex/babel/locale.zip
===================================================================
(Binary files differ)
Modified: trunk/Master/texmf-dist/tex/generic/babel/babel.def
===================================================================
--- trunk/Master/texmf-dist/tex/generic/babel/babel.def 2019-12-09 21:53:18 UTC (rev 53071)
+++ trunk/Master/texmf-dist/tex/generic/babel/babel.def 2019-12-09 21:54:06 UTC (rev 53072)
@@ -41,7 +41,7 @@
\wlog{File: #1 #4 #3 <#2>}%
\let\ProvidesFile\@undefined}
\fi
-\ProvidesFile{babel.def}[2019/11/14 3.36 Babel common definitions]
+\ProvidesFile{babel.def}[2019/12/08 3.37 Babel common definitions]
\ifx\AtBeginDocument\@undefined
\input plain.def\relax
\fi
@@ -416,7 +416,6 @@
\def\bbl at main@language{#1}%
\let\languagename\bbl at main@language
\bbl at id@assign
- \chardef\localeid\@nameuse{bbl at id@@\languagename}%
\bbl at patterns{\languagename}}
\def\bbl at beforestart{%
\bbl at usehooks{beforestart}{}%
@@ -1376,7 +1375,6 @@
% Set name and locale id
\def\languagename{#2}%
\bbl at id@assign
- \chardef\localeid\@nameuse{bbl at id@@\languagename}%
\let\bbl at KVP@captions\@nil
\let\bbl at KVP@import\@nil
\let\bbl at KVP@main\@nil
@@ -1438,8 +1436,8 @@
\bbl at read@ini{##1}{basic data}%
\bbl at exportkey{chrng}{characters.ranges}{}%
\bbl at exportkey{dgnat}{numbers.digits.native}{}%
- % \bbl at exportkey{hyphr}{typography.hyphenrules}{}%
- % \bbl at exportkey{intsp}{typography.intraspace}{}%
+ \bbl at exportkey{hyphr}{typography.hyphenrules}{}%
+ \bbl at exportkey{intsp}{typography.intraspace}{}%
\endgroup}% boxed, to avoid extra spaces:
{\setbox\z@\hbox{\InputIfFileExists{babel-#2.tex}{}{}}}}%
{}%
@@ -1483,65 +1481,7 @@
\ifx\bbl at KVP@intraspace\@nil\else % We can override the ini or set
\bbl at csarg\edef{intsp@#2}{\bbl at KVP@intraspace}%
\fi
- \ifcase\bbl at engine\or % lua
- \bbl at ifunset{bbl at intsp@\languagename}{}%
- {\expandafter\ifx\csname bbl at intsp@\languagename\endcsname\@empty\else
- \bbl at xin@{\bbl at cs{sbcp@\languagename}}{Hant,Hans,Jpan,Kore,Kana}%
- \ifin@ % cjk
- \bbl at cjkintraspace
- \directlua{
- Babel = Babel or {}
- Babel.locale_props = Babel.locale_props or {}
- Babel.locale_props[\the\localeid].linebreak = 'c'
- }%
- \bbl at exp{\\\bbl at intraspace\bbl at cs{intsp@\languagename}\\\@@}%
- \ifx\bbl at KVP@intrapenalty\@nil
- \bbl at intrapenalty0\@@
- \fi
- \else % sea
- \bbl at seaintraspace
- \bbl at exp{\\\bbl at intraspace\bbl at cs{intsp@\languagename}\\\@@}%
- \directlua{
- Babel = Babel or {}
- Babel.sea_ranges = Babel.sea_ranges or {}
- Babel.set_chranges('\bbl at cs{sbcp@\languagename}',
- '\bbl at cs{chrng@\languagename}')
- }%
- \ifx\bbl at KVP@intrapenalty\@nil
- \bbl at intrapenalty0\@@
- \fi
- \fi
- \fi
- \ifx\bbl at KVP@intrapenalty\@nil\else
- \expandafter\bbl at intrapenalty\bbl at KVP@intrapenalty\@@
- \fi}%
- \or % xe
- \bbl at xin@{\bbl at cs{sbcp@\languagename}}{Thai,Laoo,Khmr}%
- \ifin@ % sea (currently ckj not handled)
- \bbl at ifunset{bbl at intsp@\languagename}{}%
- {\expandafter\ifx\csname bbl at intsp@\languagename\endcsname\@empty\else
- \ifx\bbl at KVP@intraspace\@nil
- \bbl at exp{%
- \\\bbl at intraspace\bbl at cs{intsp@\languagename}\\\@@}%
- \fi
- \ifx\bbl at KVP@intrapenalty\@nil
- \bbl at intrapenalty0\@@
- \fi
- \fi
- \ifx\bbl at KVP@intraspace\@nil\else % We may override the ini
- \expandafter\bbl at intraspace\bbl at KVP@intraspace\@@
- \fi
- \ifx\bbl at KVP@intrapenalty\@nil\else
- \expandafter\bbl at intrapenalty\bbl at KVP@intrapenalty\@@
- \fi
- \ifx\bbl at ispacesize\@undefined
- \AtBeginDocument{%
- \expandafter\bbl at add
- \csname selectfont \endcsname{\bbl at ispacesize}}%
- \def\bbl at ispacesize{\bbl at cs{xeisp@\bbl at cs{sbcp@\languagename}}}%
- \fi}%
- \fi
- \fi
+ \bbl at provide@intraspace
% == maparabic ==
% Native digits, if provided in ini (TeX level, xe and lua)
\ifcase\bbl at engine\else
@@ -1761,6 +1701,14 @@
\expandafter\bbl at iniline\bbl at line\bbl at iniline
\fi
\repeat
+ \bbl at foreach\bbl at renewlist{%
+ \bbl at ifunset{bbl at renew@##1}{}{\bbl at inisec[##1]\@@}}%
+ \global\let\bbl at renewlist\@empty
+ % Ends last section. See \bbl at inisec
+ \def\bbl at elt##1##2{\bbl at inireader##1=##2\@@}%
+ \@nameuse{bbl at renew@\bbl at section}%
+ \global\bbl at csarg\let{renew@\bbl at section}\relax
+ \@nameuse{bbl at secpost@\bbl at section}%
\fi}
\def\bbl at iniline#1\bbl at iniline{%
\@ifnextchar[\bbl at inisec{\@ifnextchar;\bbl at iniskip\bbl at inipreread}#1\@@}% ]
@@ -1770,15 +1718,19 @@
\@nameuse{bbl at renew@\bbl at section}%
\global\bbl at csarg\let{renew@\bbl at section}\relax
\@nameuse{bbl at secpost@\bbl at section}% ends previous section
- \def\bbl at section{#1}%
+ \def\bbl at section{#1}% starts current section
\def\bbl at elt##1##2{%
\@namedef{bbl at KVP@#1..##1}{}}%
\@nameuse{bbl at renew@#1}%
- \@nameuse{bbl at secpre@#1}% starts current section
+ \@nameuse{bbl at secpre@#1}% pre-section `hook'
\bbl at ifunset{bbl at inikv@#1}%
{\let\bbl at inireader\bbl at iniskip}%
{\bbl at exp{\let\\\bbl at inireader\<bbl at inikv@#1>}}}
+\let\bbl at renewlist\@empty
\def\bbl at renewinikey#1..#2\@@#3{%
+ \bbl at ifunset{bbl at renew@#1}%
+ {\bbl at add@list\bbl at renewlist{#1}}%
+ {}%
\bbl at csarg\bbl at add{renew@#1}{\bbl at elt{#2}{#3}}}
\def\bbl at inikv#1=#2\@@{% key=value
\bbl at trim@def\bbl at tempa{#1}%
@@ -1993,6 +1945,10 @@
\bbl at adjust@layout{\let\list\bbl at NL@list}}
\@namedef{bbl at ADJ@layout.lists at on}{%
\bbl at adjust@layout{\let\list\bbl at OL@list}}
+\@namedef{bbl at ADJ@hyphenation.extra at on}{%
+ \directlua{
+ Babel.linebreaking.add_after(Babel.post_hyphenate_replace)
+ }}
{\def\format{lplain}
\ifx\fmtname\format
\else
Modified: trunk/Master/texmf-dist/tex/generic/babel/babel.sty
===================================================================
--- trunk/Master/texmf-dist/tex/generic/babel/babel.sty 2019-12-09 21:53:18 UTC (rev 53071)
+++ trunk/Master/texmf-dist/tex/generic/babel/babel.sty 2019-12-09 21:54:06 UTC (rev 53072)
@@ -33,7 +33,7 @@
%%
\NeedsTeXFormat{LaTeX2e}[2005/12/01]
-\ProvidesPackage{babel}[2019/11/14 3.36 The Babel package]
+\ProvidesPackage{babel}[2019/12/08 3.37 The Babel package]
\@ifpackagewith{babel}{debug}
{\providecommand\bbl at trace[1]{\message{^^J[ #1 ]}}%
\let\bbl at debug\@firstofone}
@@ -235,9 +235,6 @@
if Babel.numbers and Babel.digits_mapped then
head = Babel.numbers(head)
end
- if Babel.fixboxdirs then % Temporary!
- head = Babel.fixboxdirs(head)
- end
if Babel.bidi_enabled then
head = Babel.bidi(head, false, dir)
end
Modified: trunk/Master/texmf-dist/tex/generic/babel/hyphen.cfg
===================================================================
--- trunk/Master/texmf-dist/tex/generic/babel/hyphen.cfg 2019-12-09 21:53:18 UTC (rev 53071)
+++ trunk/Master/texmf-dist/tex/generic/babel/hyphen.cfg 2019-12-09 21:54:06 UTC (rev 53072)
@@ -37,7 +37,7 @@
\wlog{File: #1 #4 #3 <#2>}%
\let\ProvidesFile\@undefined}
\fi
-\ProvidesFile{hyphen.cfg}[2019/11/14 3.36 Babel hyphens]
+\ProvidesFile{hyphen.cfg}[2019/12/08 3.37 Babel hyphens]
\xdef\bbl at format{\jobname}
\ifx\AtBeginDocument\@undefined
\def\@empty{}
Modified: trunk/Master/texmf-dist/tex/generic/babel/luababel.def
===================================================================
--- trunk/Master/texmf-dist/tex/generic/babel/luababel.def 2019-12-09 21:53:18 UTC (rev 53071)
+++ trunk/Master/texmf-dist/tex/generic/babel/luababel.def 2019-12-09 21:54:06 UTC (rev 53072)
@@ -311,6 +311,21 @@
{\csname bbl at patterns@\bbl at tempa\endcsname\space}%
#2}}}%
\fi}}
+\directlua{
+ Babel = Babel or {}
+ Babel.linebreaking = Babel.linebreaking or {}
+ Babel.linebreaking.before = {}
+ Babel.linebreaking.after = {}
+ Babel.locale = {} % Free to use, indexed with \localeid
+ function Babel.linebreaking.add_before(func)
+ tex.print([[\noexpand\csname bbl at luahyphenate\endcsname]])
+ table.insert(Babel.linebreaking.before , func)
+ end
+ function Babel.linebreaking.add_after(func)
+ tex.print([[\noexpand\csname bbl at luahyphenate\endcsname]])
+ table.insert(Babel.linebreaking.after, func)
+ end
+}
\def\bbl at intraspace#1 #2 #3\@@{%
\directlua{
Babel = Babel or {}
@@ -450,7 +465,17 @@
if Babel.cjk_enabled then
Babel.cjk_linebreak(head)
end
+ if Babel.linebreaking.before then
+ for k, func in ipairs(Babel.linebreaking.before) do
+ func(head)
+ end
+ end
lang.hyphenate(head)
+ if Babel.linebreaking.after then
+ for k, func in ipairs(Babel.linebreaking.after) do
+ func(head)
+ end
+ end
if Babel.sea_enabled then
Babel.sea_disc_to_space(head)
end
@@ -459,6 +484,38 @@
}
}
\endgroup
+\def\bbl at provide@intraspace{%
+ \bbl at ifunset{bbl at intsp@\languagename}{}%
+ {\expandafter\ifx\csname bbl at intsp@\languagename\endcsname\@empty\else
+ \bbl at xin@{\bbl at cs{sbcp@\languagename}}{Hant,Hans,Jpan,Kore,Kana}%
+ \ifin@ % cjk
+ \bbl at cjkintraspace
+ \directlua{
+ Babel = Babel or {}
+ Babel.locale_props = Babel.locale_props or {}
+ Babel.locale_props[\the\localeid].linebreak = 'c'
+ }%
+ \bbl at exp{\\\bbl at intraspace\bbl at cs{intsp@\languagename}\\\@@}%
+ \ifx\bbl at KVP@intrapenalty\@nil
+ \bbl at intrapenalty0\@@
+ \fi
+ \else % sea
+ \bbl at seaintraspace
+ \bbl at exp{\\\bbl at intraspace\bbl at cs{intsp@\languagename}\\\@@}%
+ \directlua{
+ Babel = Babel or {}
+ Babel.sea_ranges = Babel.sea_ranges or {}
+ Babel.set_chranges('\bbl at cs{sbcp@\languagename}',
+ '\bbl at cs{chrng@\languagename}')
+ }%
+ \ifx\bbl at KVP@intrapenalty\@nil
+ \bbl at intrapenalty0\@@
+ \fi
+ \fi
+ \fi
+ \ifx\bbl at KVP@intrapenalty\@nil\else
+ \expandafter\bbl at intrapenalty\bbl at KVP@intrapenalty\@@
+ \fi}}
\AddBabelHook{luatex}{loadkernel}{%
\begingroup
% Reset chars "80-"C0 to category "other", no case mapping:
@@ -529,7 +586,7 @@
\def\bbl at nostdfont#1{%
\bbl at ifunset{bbl at WFF@\f at family}%
{\bbl at csarg\gdef{WFF@\f at family}{}% Flag, to avoid dupl warns
- \bbl at warning{The current font is not a babel standard family:\\%
+ \bbl at infowarn{The current font is not a babel standard family:\\%
#1%
\fontname\font\\%
There is nothing intrinsically wrong with this warning, and\\%
@@ -586,7 +643,7 @@
\expandafter\xdef\csname ##1default\endcsname{\f at family}}%
{}}%
\ifx\bbl at tempa\@empty\else
- \bbl at warning{The following fonts are not babel standard families:\\%
+ \bbl at infowarn{The following fonts are not babel standard families:\\%
\bbl at tempa
There is nothing intrinsically wrong with it, but\\%
'babel' will no set Script and Language. Consider\\%
@@ -670,26 +727,6 @@
\expandafter\addto\csname extras#1\endcsname{%
\babel at save\bbl at langfeatures
\edef\bbl at langfeatures{#2,}}}
-\def\bbl at luafixboxdir{%
- \setbox\z@\hbox{\textdir TLT}%
- \directlua{
- function Babel.first_dir(head)
- for item in node.traverse_id(node.id'dir', head) do
- return item
- end
- return nil
- end
- if Babel.first_dir(tex.box[0].head) then
- function Babel.fixboxdirs(head)
- local fd = Babel.first_dir(head)
- if fd and fd.dir:sub(1,1) == '-' then
- head = node.remove(head, fd)
- end
- return head
- end
- end
- }}
-\AtBeginDocument{\bbl at luafixboxdir}
\newcommand\babelcharproperty[1]{%
\count@=#1\relax
\ifvmode
@@ -729,6 +766,216 @@
Babel.Babel.cjk_characters[\the\count@]['c'] = '#1'
}}
\let\bbl at chprop@lb\bbl at chprop@linebreak
+\begingroup
+\catcode`\#=12
+\catcode`\%=12
+\catcode`\&=14
+\directlua{
+ Babel.linebreaking.replacements = {}
+
+ function Babel.str_to_nodes(fn, matches, base)
+ local n, head, last
+ if fn == nil then return nil end
+ for s in string.utfvalues(fn(matches)) do
+ if base.id == 7 then
+ base = base.replace
+ end
+ n = node.copy(base)
+ n.char = s
+ if not head then
+ head = n
+ else
+ last.next = n
+ end
+ last = n
+ end
+ return head
+ end
+
+ function Babel.fetch_word(head, funct)
+ local word_string = ''
+ local word_nodes = {}
+ local lang
+ local item = head
+
+ while item do
+
+ if item.id == 29
+ and not(item.char == 124) &% ie, not |
+ and not(item.char == 61) &% ie, not =
+ and (item.lang == lang or lang == nil) then
+ lang = lang or item.lang
+ word_string = word_string .. unicode.utf8.char(item.char)
+ word_nodes[#word_nodes+1] = item
+
+ elseif item.id == 7 and item.subtype == 2 then
+ word_string = word_string .. '='
+ word_nodes[#word_nodes+1] = item
+
+ elseif item.id == 7 and item.subtype == 3 then
+ word_string = word_string .. '|'
+ word_nodes[#word_nodes+1] = item
+
+ elseif word_string == '' then
+ &% pass
+
+ else
+ return word_string, word_nodes, item, lang
+ end
+
+ item = item.next
+ end
+ end
+
+ function Babel.post_hyphenate_replace(head)
+ local u = unicode.utf8
+ local lbkr = Babel.linebreaking.replacements
+ local word_head = head
+
+ while true do
+ local w, wn, nw, lang = Babel.fetch_word(word_head)
+ if not lang then return head end
+
+ if not lbkr[lang] then
+ break
+ end
+
+ for k=1, #lbkr[lang] do
+ local p = lbkr[lang][k].pattern
+ local r = lbkr[lang][k].replace
+
+ while true do
+ local matches = { u.match(w, p) }
+ if #matches < 2 then break end
+
+ local first = table.remove(matches, 1)
+ local last = table.remove(matches, #matches)
+
+ &% Fix offsets, from bytes to unicode.
+ first = u.len(w:sub(1, first-1)) + 1
+ last = u.len(w:sub(1, last-1))
+
+ local new &% used when inserting and removing nodes
+ local changed = 0
+
+ &% This loop traverses the replace list and takes the
+ &% corresponding actions
+ for q = first, last do
+ local crep = r[q-first+1]
+ local char_node = wn[q]
+ local char_base = char_node
+
+ if crep and crep.data then
+ char_base = wn[crep.data+first-1]
+ end
+
+ if crep == {} then
+ break
+ elseif crep == nil then
+ changed = changed + 1
+ node.remove(head, char_node)
+ elseif crep and (crep.pre or crep.no or crep.post) then
+ changed = changed + 1
+ d = node.new(7, 0) &% (disc, discretionary)
+ d.pre = Babel.str_to_nodes(crep.pre, matches, char_base)
+ d.post = Babel.str_to_nodes(crep.post, matches, char_base)
+ d.replace = Babel.str_to_nodes(crep.no, matches, char_base)
+ d.attr = char_base.attr
+ if crep.pre == nil then &% TeXbook p96
+ d.penalty = crep.penalty or tex.hyphenpenalty
+ else
+ d.penalty = crep.penalty or tex.exhyphenpenalty
+ end
+ head, new = node.insert_before(head, char_node, d)
+ node.remove(head, char_node)
+ if q == 1 then
+ word_head = new
+ end
+ elseif crep and crep.string then
+ changed = changed + 1
+ local str = crep.string(matches)
+ if str == '' then
+ if q == 1 then
+ word_head = char_node.next
+ end
+ head, new = node.remove(head, char_node)
+ elseif char_node.id == 29 and u.len(str) == 1 then
+ char_node.char = string.utfvalue(str)
+ else
+ local n
+ for s in string.utfvalues(str) do
+ if char_node.id == 7 then
+ log('Automatic hyphens cannot be replaced, just removed.')
+ else
+ n = node.copy(char_base)
+ end
+ n.char = s
+ if q == 1 then
+ head, new = node.insert_before(head, char_node, n)
+ word_head = new
+ else
+ node.insert_before(head, char_node, n)
+ end
+ end
+
+ node.remove(head, char_node)
+ end &% string length
+ end &% if char and char.string
+ end &% for char in match
+ if changed > 20 then
+ texio.write('Too many changes. Ignoring the rest.')
+ elseif changed > 0 then
+ w, wn, nw = Babel.fetch_word(word_head)
+ end
+
+ end &% for match
+ end &% for patterns
+ word_head = nw
+ end &% for words
+ return head
+ end
+
+ function Babel.capture_func(key, cap)
+ local ret = "[[" .. cap:gsub('{([0-9])}', "]]..m[%1]..[[") .. "]]"
+ ret = ret:gsub("%[%[%]%]%.%.", '')
+ ret = ret:gsub("%.%.%[%[%]%]", '')
+ return key .. [[=function(m) return ]] .. ret .. [[ end]]
+ end
+}
+\catcode`\#=6
+\gdef\babelposthyphenation#1#2#3{&%
+ \begingroup
+ \def\babeltempa{\bbl at add@list\babeltempb}&%
+ \let\babeltempb\@empty
+ \bbl at foreach{#3}{&%
+ \bbl at ifsamestring{##1}{remove}&%
+ {\bbl at add@list\babeltempb{nil}}&%
+ {\directlua{
+ local rep = [[##1]]
+ rep = rep:gsub( '(no)%s*=%s*([^%s,]*)', Babel.capture_func)
+ rep = rep:gsub( '(pre)%s*=%s*([^%s,]*)', Babel.capture_func)
+ rep = rep:gsub( '(post)%s*=%s*([^%s,]*)', Babel.capture_func)
+ rep = rep:gsub('(string)%s*=%s*([^%s,]*)', Babel.capture_func)
+ tex.print([[\string\babeltempa{{]] .. rep .. [[}}]])
+ }}}&%
+ \directlua{
+ local lbkr = Babel.linebreaking.replacements
+ local u = unicode.utf8
+ &% Convert pattern:
+ local patt = string.gsub([[#2]], '%s', '')
+ if not u.find(patt, '()', nil, true) then
+ patt = '()' .. patt .. '()'
+ end
+ patt = u.gsub(patt, '{(.)}',
+ function (n)
+ return '%' .. (tonumber(n) and (tonumber(n)+1) or n)
+ end)
+ lbkr[\the\csname l@#1\endcsname] = lbkr[\the\csname l@#1\endcsname] or {}
+ table.insert(lbkr[\the\csname l@#1\endcsname],
+ { pattern = patt, replace = { \babeltempb } })
+ }&%
+ \endgroup}
+\endgroup
\bbl at trace{Redefinitions for bidi layout}
\ifx\@eqnnum\@undefined\else
\ifx\bbl at attr@dir\@undefined\else
Modified: trunk/Master/texmf-dist/tex/generic/babel/nil.ldf
===================================================================
--- trunk/Master/texmf-dist/tex/generic/babel/nil.ldf 2019-12-09 21:53:18 UTC (rev 53071)
+++ trunk/Master/texmf-dist/tex/generic/babel/nil.ldf 2019-12-09 21:54:06 UTC (rev 53072)
@@ -32,7 +32,7 @@
%% extension |.ins|) which are part of the distribution.
%%
-\ProvidesLanguage{nil}[2019/11/14 3.36 Nil language]
+\ProvidesLanguage{nil}[2019/12/08 3.37 Nil language]
\LdfInit{nil}{datenil}
\ifx\l at nil\@undefined
\newlanguage\l at nil
Modified: trunk/Master/texmf-dist/tex/generic/babel/switch.def
===================================================================
--- trunk/Master/texmf-dist/tex/generic/babel/switch.def 2019-12-09 21:53:18 UTC (rev 53071)
+++ trunk/Master/texmf-dist/tex/generic/babel/switch.def 2019-12-09 21:54:06 UTC (rev 53072)
@@ -37,7 +37,7 @@
\wlog{File: #1 #4 #3 <#2>}%
\let\ProvidesFile\@undefined}
\fi
-\ProvidesFile{switch.def}[2019/11/14 3.36 Babel switching mechanism]
+\ProvidesFile{switch.def}[2019/12/08 3.37 Babel switching mechanism]
\ifx\AtBeginDocument\@undefined
\input plain.def\relax
\fi
@@ -58,8 +58,8 @@
\countdef\last at language=19
\def\addlanguage{\alloc at 9\language\chardef\@cclvi}
\fi
-\def\bbl at version{3.36}
-\def\bbl at date{2019/11/14}
+\def\bbl at version{3.37}
+\def\bbl at date{2019/12/08}
\def\adddialect#1#2{%
\global\chardef#1#2\relax
\bbl at usehooks{adddialect}{{#1}{#2}}%
@@ -133,7 +133,8 @@
Babel.locale_props[\bbl at id@last] = {}
}%
\fi}%
- {}}
+ {}%
+ \chardef\localeid\@nameuse{bbl at id@@\languagename}}
\expandafter\def\csname selectlanguage \endcsname#1{%
\ifnum\bbl at hymapsel=\@cclv\let\bbl at hymapsel\tw@\fi
\bbl at push@language
@@ -204,7 +205,6 @@
\languageshorthands{none}%
% set the locale id
\bbl at id@assign
- \chardef\localeid\@nameuse{bbl at id@@\languagename}%
% switch captions, date
\ifcase\bbl at select@type
\ifhmode
@@ -408,6 +408,7 @@
\def\\{^^J(babel) }%
\message{\\#1}%
\endgroup}
+ \let\bbl at infowarn\bbl at warning
\def\bbl at info#1{%
\begingroup
\newlinechar=`\^^J
@@ -425,6 +426,13 @@
\def\\{\MessageBreak}%
\PackageWarning{babel}{#1}%
\endgroup}
+ \def\bbl at infowarn#1{%
+ \begingroup
+ \def\\{\MessageBreak}%
+ \GenericWarning
+ {(babel) \@spaces\@spaces\@spaces}%
+ {Package babel Info: #1}%
+ \endgroup}
\def\bbl at info#1{%
\begingroup
\def\\{\MessageBreak}%
@@ -433,6 +441,7 @@
\fi
\@ifpackagewith{babel}{silent}
{\let\bbl at info\@gobble
+ \let\bbl at infowarn\@gobble
\let\bbl at warning\@gobble}
{}
\def\bbl at nocaption{\protect\bbl at nocaption@i}
Added: trunk/Master/texmf-dist/tex/generic/babel/test-hyphen-post-wiki.tex
===================================================================
--- trunk/Master/texmf-dist/tex/generic/babel/test-hyphen-post-wiki.tex (rev 0)
+++ trunk/Master/texmf-dist/tex/generic/babel/test-hyphen-post-wiki.tex 2019-12-09 21:54:06 UTC (rev 53072)
@@ -0,0 +1,31 @@
+\documentclass{article}
+
+\usepackage[ngerman]{babel}
+
+\directlua{
+
+Babel.linebreaking.add_after(Babel.post_hyphenate_replace)
+
+Babel.linebreaking.add_replacement(
+ 'ngerman',
+ '([fmtrp])|{1}',
+ {
+ nil,
+ { no = '{1}', pre = '{1}{1}-', post = '', penalty = 150, data = 1 },
+ {}
+ })
+
+}
+
+\begin{document}
+
+\rightskip5cm
+
+Auffrisierende Auffrisierendem Auffrisierenden Auffrisierender
+Auffrisierendes Auffrisierst Auffrisiert Auffrisierte Auffrisiertem
+Auffrisierten Auffrisierter Auffrisiertes Auffrisiertest Auffrisiertet
+Auffrisst Auffuhr Aufführbar Aufführbare Aufführbarem Aufführbaren
+Aufführbarer Aufführbares Aufführe Auffuhren Aufführen Aufführend
+Aufführende Aufführendem Aufführenden Aufführender Aufführendes
+
+\end{document}
\ No newline at end of file
Property changes on: trunk/Master/texmf-dist/tex/generic/babel/test-hyphen-post-wiki.tex
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Modified: trunk/Master/texmf-dist/tex/generic/babel/txtbabel.def
===================================================================
--- trunk/Master/texmf-dist/tex/generic/babel/txtbabel.def 2019-12-09 21:53:18 UTC (rev 53071)
+++ trunk/Master/texmf-dist/tex/generic/babel/txtbabel.def 2019-12-09 21:54:06 UTC (rev 53072)
@@ -32,6 +32,7 @@
%% extension |.ins|) which are part of the distribution.
%%
+\providecommand\bbl at provide@intraspace{}
\bbl at trace{Redefinitions for bidi layout}
\def\bbl at sspre@caption{%
\bbl at exp{\everyhbox{\\\bbl at textdir\bbl at cs{wdir@\bbl at main@language}}}}
Modified: trunk/Master/texmf-dist/tex/generic/babel/xebabel.def
===================================================================
--- trunk/Master/texmf-dist/tex/generic/babel/xebabel.def 2019-12-09 21:53:18 UTC (rev 53071)
+++ trunk/Master/texmf-dist/tex/generic/babel/xebabel.def 2019-12-09 21:54:06 UTC (rev 53072)
@@ -51,6 +51,32 @@
\def\bbl at intrapenalty#1\@@{%
\bbl at csarg\gdef{xeipn@\bbl at cs{sbcp@\languagename}}%
{\XeTeXlinebreakpenalty #1\relax}}
+\def\bbl at provide@intraspace{%
+ \bbl at xin@{\bbl at cs{sbcp@\languagename}}{Thai,Laoo,Khmr}%
+ \ifin@ % sea (currently ckj not handled)
+ \bbl at ifunset{bbl at intsp@\languagename}{}%
+ {\expandafter\ifx\csname bbl at intsp@\languagename\endcsname\@empty\else
+ \ifx\bbl at KVP@intraspace\@nil
+ \bbl at exp{%
+ \\\bbl at intraspace\bbl at cs{intsp@\languagename}\\\@@}%
+ \fi
+ \ifx\bbl at KVP@intrapenalty\@nil
+ \bbl at intrapenalty0\@@
+ \fi
+ \fi
+ \ifx\bbl at KVP@intraspace\@nil\else % We may override the ini
+ \expandafter\bbl at intraspace\bbl at KVP@intraspace\@@
+ \fi
+ \ifx\bbl at KVP@intrapenalty\@nil\else
+ \expandafter\bbl at intrapenalty\bbl at KVP@intrapenalty\@@
+ \fi
+ \ifx\bbl at ispacesize\@undefined
+ \AtBeginDocument{%
+ \expandafter\bbl at add
+ \csname selectfont \endcsname{\bbl at ispacesize}}%
+ \def\bbl at ispacesize{\bbl at cs{xeisp@\bbl at cs{sbcp@\languagename}}}%
+ \fi}%
+ \fi}
\AddBabelHook{xetex}{loadkernel}{%
\begingroup
% Reset chars "80-"C0 to category "other", no case mapping:
@@ -121,7 +147,7 @@
\def\bbl at nostdfont#1{%
\bbl at ifunset{bbl at WFF@\f at family}%
{\bbl at csarg\gdef{WFF@\f at family}{}% Flag, to avoid dupl warns
- \bbl at warning{The current font is not a babel standard family:\\%
+ \bbl at infowarn{The current font is not a babel standard family:\\%
#1%
\fontname\font\\%
There is nothing intrinsically wrong with this warning, and\\%
@@ -178,7 +204,7 @@
\expandafter\xdef\csname ##1default\endcsname{\f at family}}%
{}}%
\ifx\bbl at tempa\@empty\else
- \bbl at warning{The following fonts are not babel standard families:\\%
+ \bbl at infowarn{The following fonts are not babel standard families:\\%
\bbl at tempa
There is nothing intrinsically wrong with it, but\\%
'babel' will no set Script and Language. Consider\\%
More information about the tex-live-commits
mailing list