[latex3-commits] [git/LaTeX3-latex3-babel] master: Hyphens: integrated with the patterns mechanism. Use tatweel. (2f8c11f)

Javier email at dante.de
Sun Apr 4 09:14:36 CEST 2021


Repository : https://github.com/latex3/babel
On branch  : master
Link       : https://github.com/latex3/babel/commit/2f8c11f7332b784846de1f4443cdeafd82918e1c

>---------------------------------------------------------------

commit 2f8c11f7332b784846de1f4443cdeafd82918e1c
Author: Javier <email at localhost>
Date:   Sun Apr 4 09:14:36 2021 +0200

    Hyphens: integrated with the patterns mechanism. Use tatweel.


>---------------------------------------------------------------

2f8c11f7332b784846de1f4443cdeafd82918e1c
 README.md                                   |   6 +-
 babel.dtx                                   |  10 +--
 babel.ins                                   |   2 +-
 babel.pdf                                   | Bin 825200 -> 825254 bytes
 bbcompat.dtx                                |   2 +-
 locale/ug/babel-uyghur.tex                  |  94 +++++++++++++---------------
 news-guides/media/uyghur-hyphenation.png    | Bin 0 -> 99219 bytes
 news-guides/news/whats-new-in-babel-3.57.md |  57 ++++++++++++++++-
 8 files changed, 111 insertions(+), 60 deletions(-)

diff --git a/README.md b/README.md
index ccf4073..94073a8 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-## Babel 3.56.2330
+## Babel 3.56.2332
 
 This package manages culturally-determined typographical (and other)
 rules, and hyphenation patterns for a wide range of languages. Many
@@ -46,7 +46,7 @@ respective authors.
 
 ### Summary of Latest changes
 ```
-3.57   2021-04-15??
+3.57   2021-04-08??
        * Transforms:
          - Arabic:     transliteration.dad
          - Croatian:   digraphs.ligatures
@@ -54,7 +54,7 @@ respective authors.
          - Hindi:      transliteration.hk
          - Hungarian:  digraphs.hyphen
        * {xxxx} syntax also in string=.
-       * Experimental code for Uyghur hyphenation (lua).
+       * Preliminary code for Uyghur hyphenation (lua).
          
 3.56   2021-03-24
        * Transforms (\babelprehyphenation, \babelposthyphenation)
diff --git a/babel.dtx b/babel.dtx
index f78477c..94fe9b1 100644
--- a/babel.dtx
+++ b/babel.dtx
@@ -31,7 +31,7 @@
 %
 % \iffalse
 %<*filedriver>
-\ProvidesFile{babel.dtx}[2021/04/02 v3.56.2330 The Babel package]
+\ProvidesFile{babel.dtx}[2021/04/04 v3.56.2332 The Babel package]
 \documentclass{ltxdoc}
 \GetFileInfo{babel.dtx}
 \usepackage{fontspec}
@@ -4897,8 +4897,8 @@ help from Bernd Raichle, for which I am grateful.
 % \section{Tools}
 %
 %    \begin{macrocode}
-%<<version=3.56.2330>>
-%<<date=2021/04/02>>
+%<<version=3.56.2332>>
+%<<date=2021/04/04>>
 %    \end{macrocode}
 %
 % \textbf{Do not use the following macros in \texttt{ldf} files. They
@@ -10828,7 +10828,7 @@ help from Bernd Raichle, for which I am grateful.
           Babel.loc_to_scr[\the\localeid] =
             Babel.script_blocks['\bbl at cl{sbcp}']
         end}%
-      \ifx\bbl at mapselect\@undefined
+      \ifx\bbl at mapselect\@undefined  % TODO. almost the same as mapfont
         \AtBeginDocument{%
           \expandafter\bbl at add\csname selectfont \endcsname{{\bbl at mapselect}}%
           {\selectfont}}%
@@ -10856,7 +10856,7 @@ help from Bernd Raichle, for which I am grateful.
                  {See the manual for details.}}}%
     \bbl at ifunset{bbl at lsys@\languagename}{\bbl at provide@lsys{\languagename}}{}%
     \bbl at ifunset{bbl at wdir@\languagename}{\bbl at provide@dirs{\languagename}}{}%
-    \ifx\bbl at mapselect\@undefined
+    \ifx\bbl at mapselect\@undefined % TODO. See onchar
       \AtBeginDocument{%
         \expandafter\bbl at add\csname selectfont \endcsname{{\bbl at mapselect}}%
         {\selectfont}}%
diff --git a/babel.ins b/babel.ins
index 243b918..94faf75 100644
--- a/babel.ins
+++ b/babel.ins
@@ -26,7 +26,7 @@
 %% and covered by LPPL is defined by the unpacking scripts (with
 %% extension .ins) which are part of the distribution.
 %%
-\def\filedate{2021/04/02}
+\def\filedate{2021/04/04}
 \def\batchfile{babel.ins}
 \input docstrip.tex
 
diff --git a/babel.pdf b/babel.pdf
index 47b38f3..f8b6d5c 100644
Binary files a/babel.pdf and b/babel.pdf differ
diff --git a/bbcompat.dtx b/bbcompat.dtx
index 292d05d..2d699ea 100644
--- a/bbcompat.dtx
+++ b/bbcompat.dtx
@@ -30,7 +30,7 @@
 %
 % \iffalse
 %<*dtx>
-\ProvidesFile{bbcompat.dtx}[2021/04/02 v3.56.2330]
+\ProvidesFile{bbcompat.dtx}[2021/04/04 v3.56.2332]
 %</dtx>
 %
 %% File 'bbcompat.dtx'
diff --git a/locale/ug/babel-uyghur.tex b/locale/ug/babel-uyghur.tex
index 188d383..d3df2d2 100644
--- a/locale/ug/babel-uyghur.tex
+++ b/locale/ug/babel-uyghur.tex
@@ -10,66 +10,62 @@
 \BabelBeforeIni{ug}{%
 }
 
+\newattribute\bblug at disc
+\bblug at disc=0
+
+\bbl at luahyphenate 
+
 \directlua{
 
-Babel.ug_conson = {
-[0x0628] = true, [0x067E] = true, [0x062A] = true, [0x062C] = true,
-[0x0686] = true, [0x062E] = true, [0x062F] = true, [0x0631] = true,
-[0x0632] = true, [0x0698] = true, [0x0633] = true, [0x0634] = true,
-[0x0641] = true, [0x063A] = true, [0x0642] = true, [0x0643] = true,
-[0x06AF] = true, [0x06AD] = true, [0x0644] = true, [0x0645] = true,
-[0x0646] = true, [0x0647] = true, [0x064A] = true, [0x06CB] = true
-}
+Babel.uyghur = Babel.uyghur or {}
+
+function Babel.uyghur.posthyphen(head)
+  local UGDISC = luatexbase.registernumber'bblug at disc'
+  for item in node.traverse(head) do
+    if item.id == 7 and item.subtype == 3 and
+        item.next and item.next.id == 29 and
+        item.next.lang == \the\l at uyghur\space then 
+      node.set_attribute(item.next, UGDISC, 1)
+      node.remove(head, item)
+    end
+  end
+end
+
+Babel.uyghur.hyphen_sep = .09   % in em units
+% Note it can be a string, with several characters:
+Babel.uyghur.hyphen = unicode.utf8.char(0x0640)
 
-function Babel.ug_hyphenate(head) 
-  if not Babel.ug_toisol then return end
-  local d, pre, post
+Babel.linebreaking.add_after(Babel.uyghur.posthyphen)
+
+function Babel.uyghur.hyphenate(head) 
+  local d, k
+  local quad = 655360
+  local UGDISC = luatexbase.registernumber'bblug at disc'
   for item in node.traverse(head) do
-    if item.id == 29 and item.prev and item.prev.id == 29
-       and item.next and item.next.id == 29 then
-      pre =  Babel.ug_toisol[item.char] or item.char
-      post = Babel.ug_toisol[item.next.char] or item.next.char
-      if Babel.ug_conson[pre] and not Babel.ug_conson[post] then
+    if item.id == 29 and item.lang == \the\l at uyghur\space then
+      local ugdisc = node.get_attribute(item, UGDISC)
+      if ugdisc > 0 then    
+        quad = font.getfont(item.font).size or quad
+        k = node.new(13, 1)  % (kern, userkern)
+        k.kern = Babel.uyghur.hyphen_sep * quad
         d = node.new(7, 3)   % (disc, regular)
-        d.pre     = Babel.str_to_nodes(
-                      function() return '-' end, 
+        d.pre = Babel.str_to_nodes(
+                      function() return Babel.uyghur.hyphen end, 
                       nil, item)
-        d.penalty = 0 % Must be tex.(ex)hyphenpenalty
-        head, new = node.insert_before(head, item, d)
+        d.pre = node.insert_before(d.pre, d.pre, k)
+        d.penalty = 50 % Must be tex.(ex)hyphenpenalty
+        head = node.insert_before(head, item, d)
       end
     end
   end
   return head
 end
-}
 
-\gdef\UyghurSetupHyph{%
-  \directlua{
-      Babel.ug_toisol   = {}
-      luatexbase.add_to_callback("pre_linebreak_filter",
-        Babel.ug_hyphenate, "Babel.ug_hyphenate")
-      luatexbase.add_to_callback("hpack_filter",
-        Babel.ug_hyphenate, "Babel.ug_hyphenate")
-  }% 
-  % It must be done for each font, and stored separately.
-  % Locale must be taken into account too.
-  \bbl at foreach{%
-      0628,067E,062A,062C,0686,062E,062F,0631,0632,%
-      0698,0633,0634,0641,063A,0642,0643,06AF,06AD,%
-      0644,0645,0646,0647,064A,06CB}{%
-    \setbox\z@\hbox{\char"##1=\char"##1^^^^200d=%
-      ^^^^200d\char"##1^^^^200d=^^^^200d\char"##1}%
-    \directlua{
-      local chars = {}
-      for item in node.traverse(tex.box[0].head) do
-        if item.id == node.id'glyph' and item.char > 128 and
-             not (item.char == 0x200D) then
-          table.insert(chars, item.char)
-        end
-      end
-      Babel.ug_toisol[chars[2]] = chars[1]
-      Babel.ug_toisol[chars[3]] = chars[1]
-      Babel.ug_toisol[chars[4]] = chars[1]
-    }}}
+luatexbase.add_to_callback("pre_linebreak_filter",
+  Babel.uyghur.hyphenate, "Babel.uyghur.hyphenate")
+luatexbase.add_to_callback("hpack_filter",
+  Babel.uyghur.hyphenate, "Babel.uyghur.hyphenate")
+  
+}
 
 \endinput
\ No newline at end of file
diff --git a/news-guides/media/uyghur-hyphenation.png b/news-guides/media/uyghur-hyphenation.png
new file mode 100644
index 0000000..1d8ddb4
Binary files /dev/null and b/news-guides/media/uyghur-hyphenation.png differ
diff --git a/news-guides/news/whats-new-in-babel-3.57.md b/news-guides/news/whats-new-in-babel-3.57.md
index bc4c59c..f5704ea 100644
--- a/news-guides/news/whats-new-in-babel-3.57.md
+++ b/news-guides/news/whats-new-in-babel-3.57.md
@@ -6,7 +6,6 @@
 
 *Some of them are still experimental or incomplete.*
 
-
 * **Arabic** `transliteration.dad` ▸ Applies the transliteration system
 devised by Yannis Haralambous for \textsf{dad}. Not yet complete, but
 sufficient for many texts.
@@ -29,4 +28,60 @@ Devanagari.
 *ssz*, *tty* and *zzs* as *cs-cs*,
 *dz-dz*, etc.
 
+## Uyghur hyphenation (lua)
+
+Some tentative code has been added to the Uyghur locale for the words
+to be hyphenated correctly, preserving the joining forms. See
+https://www.w3.org/TR/css-text-3/#word-break-shaping . It assumes the
+basic forms (initial, medial, final). 
+
+Here is an example (text from copypasted from
+https://github.com/azmat21/Syllabification-for-Uyghur ).
+```
+\documentclass{article}
+
+\usepackage[bidi=basic]{babel}
+
+\usepackage{multicol}
+
+\babelprovide[hyphenrules=+, main, import]{uyghur}
+
+\babelfont{rm}
+  [Renderer=Harfbuzz]
+   % {Amiri}
+   % {Arial}
+   % {Arabic Typesetting}
+   % {Scheherazade}
+   {FreeSerif}
+   % {Calibri}
+
+\begin{document}
+
+% A few basic patterns, with a somewhat crude rule.
+\patterns{
+3^^^^06284 3^^^^062a4 3^^^^062b4 3^^^^062c4 3^^^^062d4 3^^^^062e4
+3^^^^062f4 3^^^^06314 3^^^^06324 3^^^^06333 3^^^^06334 3^^^^06344
+3^^^^06354 3^^^^06364 3^^^^06374 3^^^^06384 3^^^^06394 3^^^^063a4
+3^^^^06414 3^^^^06424 3^^^^06434 3^^^^06444 3^^^^06454 3^^^^06464
+3^^^^06474 3^^^^064a4 3^^^^06864 3^^^^06ad4 3^^^^06af4 3^^^^06cb4
+}
+
+\begin{multicols}{3}
+  % \hsize1pt
+
+  ھەممىمىزگە مەلۇم بولغىنىدەك ئۇيغۇر تىلى يىزىق تۈرۈك يىزىقىنىڭ شەرقى
+  تارماق قىسمىغا تەۋە بولۇپ ،  ئىپادىلەش شەكلى جەھەتتىن ئەرەب يىزىقى
+  ئاساسىدىكى ئۇيغۇر يىزىقى ،  لاتىن يىزىقى ئاساسىدىكى ئۇيغۇر يىزىقى ۋە
+  سيلىرىك يىزىقى ئاساسىدىكى ئ‍ۇيغۇر يىزىقى دەپ ئۈچ تۈرگە بۆلىنىدۇ ، 
+  بۇلارنىڭ ھەرىپ شەكىلىرى مەنبە1، 2، 3لەردىن كۆرۇلسە بولىدۇ. تۆۋەندىكى
+  پىروگىراممىسا ئاساسلىق لاتىن يىزىقى ئاساسىدىكى سۆزلەرنى بوغۇمغا ئايىرش
+  سۆزلىنىدۇ،  ئەگەر قىزىققۇچىلار بولسا ئەسىلى كود چۈشۈرۇپ باشقا شەكىلدى
+  ئ‍ۇيغۇر يىزىقىغا ماشلاشتۇرۇپ ئىشلەتسە بولىدۇ .
+\end{multicols}
+
+\end{document}
+```
+
+![Uyghur](../media/uyghur-hyphenation.png)
+
 





More information about the latex3-commits mailing list.