texlive[60523] Master/texmf-dist: lua-uca (16sep21)
commits+karl at tug.org
commits+karl at tug.org
Thu Sep 16 22:18:13 CEST 2021
Revision: 60523
http://tug.org/svn/texlive?view=revision&revision=60523
Author: karl
Date: 2021-09-16 22:18:12 +0200 (Thu, 16 Sep 2021)
Log Message:
-----------
lua-uca (16sep21)
Modified Paths:
--------------
trunk/Master/texmf-dist/doc/support/lua-uca/CHANGELOG.md
trunk/Master/texmf-dist/doc/support/lua-uca/README.md
trunk/Master/texmf-dist/doc/support/lua-uca/lua-uca-doc.pdf
trunk/Master/texmf-dist/doc/support/lua-uca/lua-uca-doc.tex
trunk/Master/texmf-dist/scripts/lua-uca/lua-uca-ducet.lua
trunk/Master/texmf-dist/scripts/lua-uca/lua-uca-languages.lua
trunk/Master/texmf-dist/scripts/lua-uca/lua-uca-reordering-table.lua
Added Paths:
-----------
trunk/Master/texmf-dist/doc/support/lua-uca/HACKING.md
trunk/Master/texmf-dist/source/support/lua-uca/
trunk/Master/texmf-dist/source/support/lua-uca/Makefile
Modified: trunk/Master/texmf-dist/doc/support/lua-uca/CHANGELOG.md
===================================================================
--- trunk/Master/texmf-dist/doc/support/lua-uca/CHANGELOG.md 2021-09-16 20:17:39 UTC (rev 60522)
+++ trunk/Master/texmf-dist/doc/support/lua-uca/CHANGELOG.md 2021-09-16 20:18:12 UTC (rev 60523)
@@ -1,5 +1,14 @@
# Changelog
+2021-09-16
+
+ - added sorting rules for all languages contained in CLDR collation files.
+
+2020-06-09
+
+ - moved development information that depends on files not distributed on CTAN to `HACKING.md`.
+ - extended documentation.
+
2020-03-24
- version `0.1` released
Added: trunk/Master/texmf-dist/doc/support/lua-uca/HACKING.md
===================================================================
--- trunk/Master/texmf-dist/doc/support/lua-uca/HACKING.md (rev 0)
+++ trunk/Master/texmf-dist/doc/support/lua-uca/HACKING.md 2021-09-16 20:18:12 UTC (rev 60523)
@@ -0,0 +1,98 @@
+# Lua-UCA hacking
+
+You need the full installation from
+[Github](https://github.com/michal-h21/lua-uca) in order to do stuff described
+in this section. Package distributed on CTAN doesn't contain all necessary
+files.
+
+## Install
+
+The package needs to download Unicode collation data and convert it to a Lua
+table. It depends on `wget` and `unzip` utilities. All files can be downloaded
+using Make:
+
+ make
+
+To install the package in the local TEXMF tree, run:
+
+ make install
+
+## New language support
+
+To add a new language, add new function to `src/lua-uca/lua-uca-languages.lua`
+file. The function name should be short language code. Example function for
+the Russian language:
+
+ languages.ru = function(collator_obj)
+ collator_obj:reorder{ "cyrillic" }
+ return collator_obj
+ end
+
+The language function takes the Collator object as a parameter. Methods showed
+in the *Change sorting rules* section can be used with this object.
+
+The `data/common/collation/` directory in the source repository contains files from the `CLDR` project.
+They contain rules for many languages. The files needs to be normalized to the
+[NFC form](https://en.wikipedia.org/wiki/Unicode_equivalence), for example using:
+
+ cat cs.xml | uconv -x any-nfc -o cs.xml
+
+The `uconv` utility is a part of the [ICU Project](http://userguide.icu-project.org/).
+
+Sorting rules for a language are placed in the `<collation>` element. Multiple
+`<collation>` elements may be present in the XML file. It is usually best to chose the one with attribute
+`type="standard"`.
+
+The following example contains code from `da.xml`:
+
+
+ [caseFirst upper]
+ &D<<đ<<<Đ<<ð<<<Ð
+ &th<<<þ
+ &TH<<<Þ
+ &Y<<ü<<<Ü<<ű<<<Ű
+ &[before 1]ǀ<æ<<<Æ<<ä<<<Ä<ø<<<Ø<<ö<<<Ö<<ő<<<Ő<å<<<Å<<<aa<<<Aa<<<AA
+ &oe<<œ<<<Œ
+
+This is translated to Lua code in `lua-uca-languages.lua` in the following way:
+
+
+ languages.da = function(collator_obj)
+ -- helper function for more readable tailoring definition
+ local tailoring = function(s) collator_obj:tailor_string(s) end
+ collator_obj:uppercase_first()
+ tailoring("&D<<đ<<<Đ<<ð<<<Ð")
+ tailoring("&th<<<þ")
+ tailoring("&TH<<<Þ")
+ tailoring("&Y<<ü<<<Ü<<ű<<<Ű")
+ tailoring("&ǀ<æ<<<Æ<<ä<<<Ä<ø<<<Ø<<ö<<<Ö<<ő<<<Ő<å<<<Å<<<aa<<<Aa<<<AA")
+ tailoring("&oe<<œ<<<Œ")
+ return collator_obj
+ end
+
+
+
+
+Pull requests with new language support are highly appreciated.
+
+## Support files in the source distribution
+
+The `xindex` directory contains some examples for configuration of `Xindex`, Lua based indexing system.
+Run `make xindex` command to compile them.
+
+`Xindex` has built-in support for Lua-UCA since version `0.23`, it can be requested using the `-u` option.
+
+The `tools/indexing-sample.lua` file provides a simple indexing processor, independent of any other tool.
+
+## Testing
+
+You can run unit tests using the following command:
+
+ make test
+
+Testing requires [Busted](https://olivinelabs.com/busted/) testing framework installed on your system.
+Tests are placed in the `spec` directory and they provide more examples of the package usage.
+
+
+
+
Property changes on: trunk/Master/texmf-dist/doc/support/lua-uca/HACKING.md
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Modified: trunk/Master/texmf-dist/doc/support/lua-uca/README.md
===================================================================
--- trunk/Master/texmf-dist/doc/support/lua-uca/README.md 2021-09-16 20:17:39 UTC (rev 60522)
+++ trunk/Master/texmf-dist/doc/support/lua-uca/README.md 2021-09-16 20:18:12 UTC (rev 60523)
@@ -1,16 +1,10 @@
+\iffalse
# The `Lua-UCA` package
+\fi
This package adds support for the [Unicode collation algorithm](https://unicode.org/reports/tr10/) for Lua 5.3.
-## Install
-The package needs to download Unicode collation data and convert it to a Lua table. It depends on `wget` and `unzip` utitilities.
-
-To install the package in the local TEXMF tree, run:
-
- make
- make install
-
## Usage
To sort a table using Czech collation rules:
@@ -43,19 +37,18 @@
> chochol
> jasan
-More samples of use can be found in the `spec` directory.
-`tools/indexing-sample.lua` is a simple indexing processor.
+More samples of the library usage can be found in the source repository of this package on [Github](https://github.com/michal-h21/lua-uca).
+% See `HACKING.md` file in the repo for more information.
## Use with Xindex processor
[Xindex](https://www.ctan.org/pkg/xindex) is flexible index processor written
-in Lua by Herbert Voß. It supports Lua configuration files, which enables use
-of Lua-UCA for sorting of the index entries, as shown in [this
-example](https://tex.stackexchange.com/a/524014/2891) for Norwegian text.
+in Lua by Herbert Voß. It has built-in `Lua-UCA` support starting with version
+`0.23`. The support can be requested using the `-u` option:
-The `xindex` directory in the [source repository](https://github.com/michal-h21/lua-uca/tree/master/xindex) contains more advanced version of such configuration
-file together with several examples. Run `make xindex` command to compile them.
+ xindex -u -l no -c norsk filename.idx
+
## Change sorting rules
The simplest way to change the default sorting order is to use the
@@ -90,7 +83,7 @@
tailoring "&Ö=Oe"
tailoring "&ö=oe"
-Some languages, like Norwegian sort uppercase letters before lowercase. This
+Some languages, like Norwegian, sort uppercase letters before lowercase. This
can be enabled using `collator_obj:uppercase_first()` function:
local tailoring = function(s) collator_obj:tailor_string(s) end
@@ -102,15 +95,9 @@
tailoring("&ǀ<æ<<<Æ<<ä<<<Ä<ø<<<Ø<<ö<<<Ö<<ő<<<Ő<å<<<Å<<<aa<<<Aa<<<AA")
tailoring("&oe<<œ<<<Œ")
-The `data/common/collation/` directory contains files from the `CLDR` project.
-They contain rules for many languages. The files needs to be normalized to the
-[NFC form](https://en.wikipedia.org/wiki/Unicode_equivalence), for example
-using:
+% More information on a new language support is in the `HACKING.md`
+% document in the [`Lua-UCA` Github repo](https://github.com/michal-h21/lua-uca/blob/master/HACKING.md).
- cat cs.xml | uconv -x any-nfc -o cs.xml
-
-The `uconv` utility is a part of the [ICU Project](http://userguide.icu-project.org/).
-
### Script reordering
Many languages sort different scripts after the script this language uses. As
@@ -117,7 +104,7 @@
Latin based scripts are sorted first, it is necessary to reorder scripts in
such cases.
-The `collator_obj:reorder` function takes table with scripts that need to be reorderd.
+The `collator_obj:reorder` function takes table with scripts that need to be reordered.
For example Cyrillic can be sorted before Latin using:
collator_obj:reorder {"cyrillic"}
@@ -132,6 +119,5 @@
# What is missing
-- Tailorings for most languages.
- Algorithm for setting implicit sort weights of characters that are not explicitly listed in DUCET.
- Special handling of CJK scripts.
Modified: trunk/Master/texmf-dist/doc/support/lua-uca/lua-uca-doc.pdf
===================================================================
(Binary files differ)
Modified: trunk/Master/texmf-dist/doc/support/lua-uca/lua-uca-doc.tex
===================================================================
--- trunk/Master/texmf-dist/doc/support/lua-uca/lua-uca-doc.tex 2021-09-16 20:17:39 UTC (rev 60522)
+++ trunk/Master/texmf-dist/doc/support/lua-uca/lua-uca-doc.tex 2021-09-16 20:18:12 UTC (rev 60523)
@@ -26,10 +26,12 @@
\begin{document}
\maketitle
\tableofcontents
+\section{Introduction}
\markdownInput{README.md}
-\section{Available languages}
+\section{Available Languages}
+\begin{raggedright}
The \texttt{lua-uca-languages} library provides the following langauges:
\bgroup\ttfamily
\begin{luacode*}
@@ -44,7 +46,14 @@
tex.print(table.concat(l, ", "))
\end{luacode*}
\egroup
+\end{raggedright}
+If you want to requrest language not listed in this listing, or if you had
+created support code for one, please contact the package author by mail or using
+issue tracker on package's Github profile.
+
+\markdownInput{HACKING.md}
+
\section{License}
\markdownInput{LICENSE}
\markdownInput{CHANGELOG.md}
Modified: trunk/Master/texmf-dist/scripts/lua-uca/lua-uca-ducet.lua
===================================================================
--- trunk/Master/texmf-dist/scripts/lua-uca/lua-uca-ducet.lua 2021-09-16 20:17:39 UTC (rev 60522)
+++ trunk/Master/texmf-dist/scripts/lua-uca/lua-uca-ducet.lua 2021-09-16 20:18:12 UTC (rev 60523)
@@ -1 +1 @@
@@ Diff output truncated at 1234567 characters. @@
More information about the tex-live-commits
mailing list.