[latex3-commits] [git/LaTeX3-latex3-babel] master: Hyphens: integrated with the patterns mechanism. Use tatweel. (2f8c11f)
Javier
email at dante.de
Sun Apr 4 09:14:36 CEST 2021
Repository : https://github.com/latex3/babel
On branch : master
Link : https://github.com/latex3/babel/commit/2f8c11f7332b784846de1f4443cdeafd82918e1c
>---------------------------------------------------------------
commit 2f8c11f7332b784846de1f4443cdeafd82918e1c
Author: Javier <email at localhost>
Date: Sun Apr 4 09:14:36 2021 +0200
Hyphens: integrated with the patterns mechanism. Use tatweel.
>---------------------------------------------------------------
2f8c11f7332b784846de1f4443cdeafd82918e1c
README.md | 6 +-
babel.dtx | 10 +--
babel.ins | 2 +-
babel.pdf | Bin 825200 -> 825254 bytes
bbcompat.dtx | 2 +-
locale/ug/babel-uyghur.tex | 94 +++++++++++++---------------
news-guides/media/uyghur-hyphenation.png | Bin 0 -> 99219 bytes
news-guides/news/whats-new-in-babel-3.57.md | 57 ++++++++++++++++-
8 files changed, 111 insertions(+), 60 deletions(-)
diff --git a/README.md b/README.md
index ccf4073..94073a8 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-## Babel 3.56.2330
+## Babel 3.56.2332
This package manages culturally-determined typographical (and other)
rules, and hyphenation patterns for a wide range of languages. Many
@@ -46,7 +46,7 @@ respective authors.
### Summary of Latest changes
```
-3.57 2021-04-15??
+3.57 2021-04-08??
* Transforms:
- Arabic: transliteration.dad
- Croatian: digraphs.ligatures
@@ -54,7 +54,7 @@ respective authors.
- Hindi: transliteration.hk
- Hungarian: digraphs.hyphen
* {xxxx} syntax also in string=.
- * Experimental code for Uyghur hyphenation (lua).
+ * Preliminary code for Uyghur hyphenation (lua).
3.56 2021-03-24
* Transforms (\babelprehyphenation, \babelposthyphenation)
diff --git a/babel.dtx b/babel.dtx
index f78477c..94fe9b1 100644
--- a/babel.dtx
+++ b/babel.dtx
@@ -31,7 +31,7 @@
%
% \iffalse
%<*filedriver>
-\ProvidesFile{babel.dtx}[2021/04/02 v3.56.2330 The Babel package]
+\ProvidesFile{babel.dtx}[2021/04/04 v3.56.2332 The Babel package]
\documentclass{ltxdoc}
\GetFileInfo{babel.dtx}
\usepackage{fontspec}
@@ -4897,8 +4897,8 @@ help from Bernd Raichle, for which I am grateful.
% \section{Tools}
%
% \begin{macrocode}
-%<<version=3.56.2330>>
-%<<date=2021/04/02>>
+%<<version=3.56.2332>>
+%<<date=2021/04/04>>
% \end{macrocode}
%
% \textbf{Do not use the following macros in \texttt{ldf} files. They
@@ -10828,7 +10828,7 @@ help from Bernd Raichle, for which I am grateful.
Babel.loc_to_scr[\the\localeid] =
Babel.script_blocks['\bbl at cl{sbcp}']
end}%
- \ifx\bbl at mapselect\@undefined
+ \ifx\bbl at mapselect\@undefined % TODO. almost the same as mapfont
\AtBeginDocument{%
\expandafter\bbl at add\csname selectfont \endcsname{{\bbl at mapselect}}%
{\selectfont}}%
@@ -10856,7 +10856,7 @@ help from Bernd Raichle, for which I am grateful.
{See the manual for details.}}}%
\bbl at ifunset{bbl at lsys@\languagename}{\bbl at provide@lsys{\languagename}}{}%
\bbl at ifunset{bbl at wdir@\languagename}{\bbl at provide@dirs{\languagename}}{}%
- \ifx\bbl at mapselect\@undefined
+ \ifx\bbl at mapselect\@undefined % TODO. See onchar
\AtBeginDocument{%
\expandafter\bbl at add\csname selectfont \endcsname{{\bbl at mapselect}}%
{\selectfont}}%
diff --git a/babel.ins b/babel.ins
index 243b918..94faf75 100644
--- a/babel.ins
+++ b/babel.ins
@@ -26,7 +26,7 @@
%% and covered by LPPL is defined by the unpacking scripts (with
%% extension .ins) which are part of the distribution.
%%
-\def\filedate{2021/04/02}
+\def\filedate{2021/04/04}
\def\batchfile{babel.ins}
\input docstrip.tex
diff --git a/babel.pdf b/babel.pdf
index 47b38f3..f8b6d5c 100644
Binary files a/babel.pdf and b/babel.pdf differ
diff --git a/bbcompat.dtx b/bbcompat.dtx
index 292d05d..2d699ea 100644
--- a/bbcompat.dtx
+++ b/bbcompat.dtx
@@ -30,7 +30,7 @@
%
% \iffalse
%<*dtx>
-\ProvidesFile{bbcompat.dtx}[2021/04/02 v3.56.2330]
+\ProvidesFile{bbcompat.dtx}[2021/04/04 v3.56.2332]
%</dtx>
%
%% File 'bbcompat.dtx'
diff --git a/locale/ug/babel-uyghur.tex b/locale/ug/babel-uyghur.tex
index 188d383..d3df2d2 100644
--- a/locale/ug/babel-uyghur.tex
+++ b/locale/ug/babel-uyghur.tex
@@ -10,66 +10,62 @@
\BabelBeforeIni{ug}{%
}
+\newattribute\bblug at disc
+\bblug at disc=0
+
+\bbl at luahyphenate
+
\directlua{
-Babel.ug_conson = {
-[0x0628] = true, [0x067E] = true, [0x062A] = true, [0x062C] = true,
-[0x0686] = true, [0x062E] = true, [0x062F] = true, [0x0631] = true,
-[0x0632] = true, [0x0698] = true, [0x0633] = true, [0x0634] = true,
-[0x0641] = true, [0x063A] = true, [0x0642] = true, [0x0643] = true,
-[0x06AF] = true, [0x06AD] = true, [0x0644] = true, [0x0645] = true,
-[0x0646] = true, [0x0647] = true, [0x064A] = true, [0x06CB] = true
-}
+Babel.uyghur = Babel.uyghur or {}
+
+function Babel.uyghur.posthyphen(head)
+ local UGDISC = luatexbase.registernumber'bblug at disc'
+ for item in node.traverse(head) do
+ if item.id == 7 and item.subtype == 3 and
+ item.next and item.next.id == 29 and
+ item.next.lang == \the\l at uyghur\space then
+ node.set_attribute(item.next, UGDISC, 1)
+ node.remove(head, item)
+ end
+ end
+end
+
+Babel.uyghur.hyphen_sep = .09 % in em units
+% Note it can be a string, with several characters:
+Babel.uyghur.hyphen = unicode.utf8.char(0x0640)
-function Babel.ug_hyphenate(head)
- if not Babel.ug_toisol then return end
- local d, pre, post
+Babel.linebreaking.add_after(Babel.uyghur.posthyphen)
+
+function Babel.uyghur.hyphenate(head)
+ local d, k
+ local quad = 655360
+ local UGDISC = luatexbase.registernumber'bblug at disc'
for item in node.traverse(head) do
- if item.id == 29 and item.prev and item.prev.id == 29
- and item.next and item.next.id == 29 then
- pre = Babel.ug_toisol[item.char] or item.char
- post = Babel.ug_toisol[item.next.char] or item.next.char
- if Babel.ug_conson[pre] and not Babel.ug_conson[post] then
+ if item.id == 29 and item.lang == \the\l at uyghur\space then
+ local ugdisc = node.get_attribute(item, UGDISC)
+ if ugdisc > 0 then
+ quad = font.getfont(item.font).size or quad
+ k = node.new(13, 1) % (kern, userkern)
+ k.kern = Babel.uyghur.hyphen_sep * quad
d = node.new(7, 3) % (disc, regular)
- d.pre = Babel.str_to_nodes(
- function() return '-' end,
+ d.pre = Babel.str_to_nodes(
+ function() return Babel.uyghur.hyphen end,
nil, item)
- d.penalty = 0 % Must be tex.(ex)hyphenpenalty
- head, new = node.insert_before(head, item, d)
+ d.pre = node.insert_before(d.pre, d.pre, k)
+ d.penalty = 50 % Must be tex.(ex)hyphenpenalty
+ head = node.insert_before(head, item, d)
end
end
end
return head
end
-}
-\gdef\UyghurSetupHyph{%
- \directlua{
- Babel.ug_toisol = {}
- luatexbase.add_to_callback("pre_linebreak_filter",
- Babel.ug_hyphenate, "Babel.ug_hyphenate")
- luatexbase.add_to_callback("hpack_filter",
- Babel.ug_hyphenate, "Babel.ug_hyphenate")
- }%
- % It must be done for each font, and stored separately.
- % Locale must be taken into account too.
- \bbl at foreach{%
- 0628,067E,062A,062C,0686,062E,062F,0631,0632,%
- 0698,0633,0634,0641,063A,0642,0643,06AF,06AD,%
- 0644,0645,0646,0647,064A,06CB}{%
- \setbox\z@\hbox{\char"##1=\char"##1^^^^200d=%
- ^^^^200d\char"##1^^^^200d=^^^^200d\char"##1}%
- \directlua{
- local chars = {}
- for item in node.traverse(tex.box[0].head) do
- if item.id == node.id'glyph' and item.char > 128 and
- not (item.char == 0x200D) then
- table.insert(chars, item.char)
- end
- end
- Babel.ug_toisol[chars[2]] = chars[1]
- Babel.ug_toisol[chars[3]] = chars[1]
- Babel.ug_toisol[chars[4]] = chars[1]
- }}}
+luatexbase.add_to_callback("pre_linebreak_filter",
+ Babel.uyghur.hyphenate, "Babel.uyghur.hyphenate")
+luatexbase.add_to_callback("hpack_filter",
+ Babel.uyghur.hyphenate, "Babel.uyghur.hyphenate")
+
+}
\endinput
\ No newline at end of file
diff --git a/news-guides/media/uyghur-hyphenation.png b/news-guides/media/uyghur-hyphenation.png
new file mode 100644
index 0000000..1d8ddb4
Binary files /dev/null and b/news-guides/media/uyghur-hyphenation.png differ
diff --git a/news-guides/news/whats-new-in-babel-3.57.md b/news-guides/news/whats-new-in-babel-3.57.md
index bc4c59c..f5704ea 100644
--- a/news-guides/news/whats-new-in-babel-3.57.md
+++ b/news-guides/news/whats-new-in-babel-3.57.md
@@ -6,7 +6,6 @@
*Some of them are still experimental or incomplete.*
-
* **Arabic** `transliteration.dad` ▸ Applies the transliteration system
devised by Yannis Haralambous for \textsf{dad}. Not yet complete, but
sufficient for many texts.
@@ -29,4 +28,60 @@ Devanagari.
*ssz*, *tty* and *zzs* as *cs-cs*,
*dz-dz*, etc.
+## Uyghur hyphenation (lua)
+
+Some tentative code has been added to the Uyghur locale for the words
+to be hyphenated correctly, preserving the joining forms. See
+https://www.w3.org/TR/css-text-3/#word-break-shaping . It assumes the
+basic forms (initial, medial, final).
+
+Here is an example (text from copypasted from
+https://github.com/azmat21/Syllabification-for-Uyghur ).
+```
+\documentclass{article}
+
+\usepackage[bidi=basic]{babel}
+
+\usepackage{multicol}
+
+\babelprovide[hyphenrules=+, main, import]{uyghur}
+
+\babelfont{rm}
+ [Renderer=Harfbuzz]
+ % {Amiri}
+ % {Arial}
+ % {Arabic Typesetting}
+ % {Scheherazade}
+ {FreeSerif}
+ % {Calibri}
+
+\begin{document}
+
+% A few basic patterns, with a somewhat crude rule.
+\patterns{
+3^^^^06284 3^^^^062a4 3^^^^062b4 3^^^^062c4 3^^^^062d4 3^^^^062e4
+3^^^^062f4 3^^^^06314 3^^^^06324 3^^^^06333 3^^^^06334 3^^^^06344
+3^^^^06354 3^^^^06364 3^^^^06374 3^^^^06384 3^^^^06394 3^^^^063a4
+3^^^^06414 3^^^^06424 3^^^^06434 3^^^^06444 3^^^^06454 3^^^^06464
+3^^^^06474 3^^^^064a4 3^^^^06864 3^^^^06ad4 3^^^^06af4 3^^^^06cb4
+}
+
+\begin{multicols}{3}
+ % \hsize1pt
+
+ ھەممىمىزگە مەلۇم بولغىنىدەك ئۇيغۇر تىلى يىزىق تۈرۈك يىزىقىنىڭ شەرقى
+ تارماق قىسمىغا تەۋە بولۇپ ، ئىپادىلەش شەكلى جەھەتتىن ئەرەب يىزىقى
+ ئاساسىدىكى ئۇيغۇر يىزىقى ، لاتىن يىزىقى ئاساسىدىكى ئۇيغۇر يىزىقى ۋە
+ سيلىرىك يىزىقى ئاساسىدىكى ئۇيغۇر يىزىقى دەپ ئۈچ تۈرگە بۆلىنىدۇ ،
+ بۇلارنىڭ ھەرىپ شەكىلىرى مەنبە1، 2، 3لەردىن كۆرۇلسە بولىدۇ. تۆۋەندىكى
+ پىروگىراممىسا ئاساسلىق لاتىن يىزىقى ئاساسىدىكى سۆزلەرنى بوغۇمغا ئايىرش
+ سۆزلىنىدۇ، ئەگەر قىزىققۇچىلار بولسا ئەسىلى كود چۈشۈرۇپ باشقا شەكىلدى
+ ئۇيغۇر يىزىقىغا ماشلاشتۇرۇپ ئىشلەتسە بولىدۇ .
+\end{multicols}
+
+\end{document}
+```
+
+![Uyghur](../media/uyghur-hyphenation.png)
+
More information about the latex3-commits
mailing list.