texlive[60523] Master/texmf-dist: lua-uca (16sep21)

commits+karl at tug.org commits+karl at tug.org
Thu Sep 16 22:18:13 CEST 2021


Revision: 60523
          http://tug.org/svn/texlive?view=revision&revision=60523
Author:   karl
Date:     2021-09-16 22:18:12 +0200 (Thu, 16 Sep 2021)
Log Message:
-----------
lua-uca (16sep21)

Modified Paths:
--------------
    trunk/Master/texmf-dist/doc/support/lua-uca/CHANGELOG.md
    trunk/Master/texmf-dist/doc/support/lua-uca/README.md
    trunk/Master/texmf-dist/doc/support/lua-uca/lua-uca-doc.pdf
    trunk/Master/texmf-dist/doc/support/lua-uca/lua-uca-doc.tex
    trunk/Master/texmf-dist/scripts/lua-uca/lua-uca-ducet.lua
    trunk/Master/texmf-dist/scripts/lua-uca/lua-uca-languages.lua
    trunk/Master/texmf-dist/scripts/lua-uca/lua-uca-reordering-table.lua

Added Paths:
-----------
    trunk/Master/texmf-dist/doc/support/lua-uca/HACKING.md
    trunk/Master/texmf-dist/source/support/lua-uca/
    trunk/Master/texmf-dist/source/support/lua-uca/Makefile

Modified: trunk/Master/texmf-dist/doc/support/lua-uca/CHANGELOG.md
===================================================================
--- trunk/Master/texmf-dist/doc/support/lua-uca/CHANGELOG.md	2021-09-16 20:17:39 UTC (rev 60522)
+++ trunk/Master/texmf-dist/doc/support/lua-uca/CHANGELOG.md	2021-09-16 20:18:12 UTC (rev 60523)
@@ -1,5 +1,14 @@
 # Changelog
 
+2021-09-16
+
+  - added sorting rules for all languages contained in CLDR collation files.
+
+2020-06-09
+
+  - moved development information that depends on files not distributed on CTAN to `HACKING.md`. 
+  - extended documentation.
+
 2020-03-24
   
   - version `0.1` released

Added: trunk/Master/texmf-dist/doc/support/lua-uca/HACKING.md
===================================================================
--- trunk/Master/texmf-dist/doc/support/lua-uca/HACKING.md	                        (rev 0)
+++ trunk/Master/texmf-dist/doc/support/lua-uca/HACKING.md	2021-09-16 20:18:12 UTC (rev 60523)
@@ -0,0 +1,98 @@
+# Lua-UCA hacking
+
+You need the full installation from
+[Github](https://github.com/michal-h21/lua-uca) in order to do stuff described
+in this section. Package distributed on CTAN doesn't contain all necessary
+files.
+
+## Install 
+
+The package needs to download Unicode collation data and convert it to a Lua
+table. It depends on `wget` and `unzip` utilities. All files can be downloaded
+using Make:
+
+    make
+
+To install the package in the local TEXMF tree, run:
+
+    make install
+
+## New language support
+
+To add a new language, add new function to `src/lua-uca/lua-uca-languages.lua`
+file. The function name should be short language code. Example function for
+the Russian language:
+
+    languages.ru = function(collator_obj)
+      collator_obj:reorder{ "cyrillic" }
+      return collator_obj
+    end
+
+The language function takes the Collator object as a parameter. Methods showed
+in the *Change sorting rules* section can be used with this object.
+
+The `data/common/collation/` directory in the source repository contains files from the `CLDR` project.
+They contain rules for many languages. The files needs to be normalized to the
+[NFC form](https://en.wikipedia.org/wiki/Unicode_equivalence), for example using:
+
+    cat cs.xml | uconv -x any-nfc -o cs.xml
+
+The `uconv` utility is a part of the [ICU Project](http://userguide.icu-project.org/).
+
+Sorting rules for a language are placed in the `<collation>` element. Multiple
+`<collation>` elements may be present in the XML file. It is usually best to chose the one with attribute 
+`type="standard"`.
+
+The following example contains code from `da.xml`:
+
+
+    [caseFirst upper]
+    &D<<đ<<<Đ<<ð<<<Ð
+    &th<<<þ
+    &TH<<<Þ
+    &Y<<ü<<<Ü<<ű<<<Ű
+    &[before 1]ǀ<æ<<<Æ<<ä<<<Ä<ø<<<Ø<<ö<<<Ö<<ő<<<Ő<å<<<Å<<<aa<<<Aa<<<AA
+    &oe<<œ<<<Œ
+
+This is translated to Lua code in `lua-uca-languages.lua` in the following way:
+
+
+    languages.da = function(collator_obj)
+      -- helper function for more readable tailoring definition
+      local tailoring = function(s) collator_obj:tailor_string(s) end
+      collator_obj:uppercase_first()
+      tailoring("&D<<đ<<<Đ<<ð<<<Ð")
+      tailoring("&th<<<þ")
+      tailoring("&TH<<<Þ")
+      tailoring("&Y<<ü<<<Ü<<ű<<<Ű")
+      tailoring("&ǀ<æ<<<Æ<<ä<<<Ä<ø<<<Ø<<ö<<<Ö<<ő<<<Ő<å<<<Å<<<aa<<<Aa<<<AA")
+      tailoring("&oe<<œ<<<Œ")
+      return collator_obj
+    end
+
+
+
+
+Pull requests with new language support are highly appreciated.
+
+## Support files in the source distribution
+
+The `xindex` directory contains some examples for configuration of `Xindex`, Lua based indexing system.
+Run `make xindex` command to compile them.
+
+`Xindex` has built-in support for Lua-UCA since version `0.23`, it can be requested using the `-u` option.
+
+The `tools/indexing-sample.lua` file provides a simple indexing processor, independent of any other tool.  
+
+## Testing
+
+You can run unit tests using the following command:
+
+    make test
+
+Testing requires [Busted](https://olivinelabs.com/busted/) testing framework installed on your system.
+Tests are placed in the `spec` directory and they provide more examples of the package usage.
+
+
+
+


Property changes on: trunk/Master/texmf-dist/doc/support/lua-uca/HACKING.md
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Modified: trunk/Master/texmf-dist/doc/support/lua-uca/README.md
===================================================================
--- trunk/Master/texmf-dist/doc/support/lua-uca/README.md	2021-09-16 20:17:39 UTC (rev 60522)
+++ trunk/Master/texmf-dist/doc/support/lua-uca/README.md	2021-09-16 20:18:12 UTC (rev 60523)
@@ -1,16 +1,10 @@
+\iffalse
 # The `Lua-UCA` package
+\fi
 
 This package adds support for the [Unicode collation algorithm](https://unicode.org/reports/tr10/) for Lua 5.3. 
 
-## Install 
 
-The package needs to download Unicode collation data and convert it to a Lua table. It depends on `wget` and `unzip` utitilities.
-
-To install the package in the local TEXMF tree, run:
-
-    make
-    make install
-
 ## Usage
 
 To sort a table using Czech collation rules:
@@ -43,19 +37,18 @@
 > chochol
 > jasan
 
-More samples of use can be found in the `spec` directory.
-`tools/indexing-sample.lua` is a simple indexing processor. 
+More samples of the library usage can be found in the source repository of this package on [Github](https://github.com/michal-h21/lua-uca).
+% See `HACKING.md` file in the repo for more information.
 
 ## Use with Xindex processor
 
 [Xindex](https://www.ctan.org/pkg/xindex) is flexible index processor written
-in Lua by Herbert Voß. It supports Lua configuration files, which enables use
-of Lua-UCA for sorting of the index entries, as shown in [this
-example](https://tex.stackexchange.com/a/524014/2891) for Norwegian text.
+in Lua by Herbert Voß. It has built-in `Lua-UCA` support starting with version
+`0.23`. The support can be requested using the `-u` option:
 
-The `xindex` directory in the [source repository](https://github.com/michal-h21/lua-uca/tree/master/xindex) contains more advanced version of such configuration
-file together with several examples. Run `make xindex` command to compile them.
+     xindex -u -l no -c norsk filename.idx
 
+
 ## Change sorting rules
 
 The simplest way to change the default sorting order is to use the
@@ -90,7 +83,7 @@
     tailoring "&Ö=Oe"
     tailoring "&ö=oe"
 
-Some languages, like Norwegian sort uppercase letters before lowercase. This
+Some languages, like Norwegian, sort uppercase letters before lowercase. This
 can be enabled using `collator_obj:uppercase_first()` function:
 
     local tailoring = function(s) collator_obj:tailor_string(s) end
@@ -102,15 +95,9 @@
     tailoring("&ǀ<æ<<<Æ<<ä<<<Ä<ø<<<Ø<<ö<<<Ö<<ő<<<Ő<å<<<Å<<<aa<<<Aa<<<AA")
     tailoring("&oe<<œ<<<Œ")
 
-The `data/common/collation/` directory contains files from the `CLDR` project.
-They contain rules for many languages. The files needs to be normalized to the
-[NFC form](https://en.wikipedia.org/wiki/Unicode_equivalence), for example
-using:
+% More information on a new language support is in the `HACKING.md`
+% document in the [`Lua-UCA` Github repo](https://github.com/michal-h21/lua-uca/blob/master/HACKING.md).
 
-    cat cs.xml | uconv -x any-nfc -o cs.xml
-
-The `uconv` utility is a part of the [ICU Project](http://userguide.icu-project.org/).
-
 ### Script reordering
 
 Many languages sort different scripts after the script this language uses. As
@@ -117,7 +104,7 @@
 Latin based scripts are sorted first, it is necessary to reorder scripts in
 such cases.
 
-The `collator_obj:reorder` function takes table with scripts that need to be reorderd. 
+The `collator_obj:reorder` function takes table with scripts that need to be reordered. 
 For example Cyrillic can be sorted before Latin using:
 
     collator_obj:reorder {"cyrillic"}
@@ -132,6 +119,5 @@
 
 # What is missing
 
-- Tailorings for most languages.
 - Algorithm for setting implicit sort weights of characters that are not explicitly listed in DUCET.
 - Special handling of CJK scripts.

Modified: trunk/Master/texmf-dist/doc/support/lua-uca/lua-uca-doc.pdf
===================================================================
(Binary files differ)

Modified: trunk/Master/texmf-dist/doc/support/lua-uca/lua-uca-doc.tex
===================================================================
--- trunk/Master/texmf-dist/doc/support/lua-uca/lua-uca-doc.tex	2021-09-16 20:17:39 UTC (rev 60522)
+++ trunk/Master/texmf-dist/doc/support/lua-uca/lua-uca-doc.tex	2021-09-16 20:18:12 UTC (rev 60523)
@@ -26,10 +26,12 @@
 \begin{document}
 \maketitle
 \tableofcontents
+\section{Introduction}
 \markdownInput{README.md}
 
-\section{Available languages}
+\section{Available Languages}
 
+\begin{raggedright}
 The \texttt{lua-uca-languages} library provides the following langauges:
 \bgroup\ttfamily
 \begin{luacode*}
@@ -44,7 +46,14 @@
 tex.print(table.concat(l, ", "))
 \end{luacode*}
 \egroup
+\end{raggedright}
 
+If you want to requrest  language not listed in this listing, or if you had
+created support code for one, please contact the package author by mail or using
+issue tracker on package's Github profile.
+
+\markdownInput{HACKING.md}
+
 \section{License}
 \markdownInput{LICENSE}
 \markdownInput{CHANGELOG.md}

Modified: trunk/Master/texmf-dist/scripts/lua-uca/lua-uca-ducet.lua
===================================================================
--- trunk/Master/texmf-dist/scripts/lua-uca/lua-uca-ducet.lua	2021-09-16 20:17:39 UTC (rev 60522)
+++ trunk/Master/texmf-dist/scripts/lua-uca/lua-uca-ducet.lua	2021-09-16 20:18:12 UTC (rev 60523)
@@ -1 +1 @@

@@ Diff output truncated at 1234567 characters. @@


More information about the tex-live-commits mailing list.