[tex4ht] macro containing a Unicode character

Michal Hoftich michal.h21 at gmail.com
Sat Jul 23 23:56:07 CEST 2016


Dear Alex,

> 
> I would like to produce an ODT document from my XeLaTeX document (using MacTeX
> 2016).
> 
> The necessary code to include Unicode characters (including in Greek and Arabic
> script) was kindly provided by CV Radhakrishnan and Michal Hoftich back in
> February 2013. But I am running into a new difficulty: converting a document
> that defines LaTeX macros that have Unicode characters in them. (The reason I
> want this is to enable me to use macros within a Right-to-Left script, Arabic.
> Mixing up RTL and LTR scripts in a text editor, especially when punctuation --
> or braces {} -- is involved, tends to make the source file unreadable.)
> 
> I am attaching a MWE in two files:
> 
> 1. `main.tex`: standalone file that includes macro definition
> 2. `utf2ent.pl`: the Perl script devised by CVR to keep Unicode in the new
> document
> 
> The script I run to compile this is:
> 
>      # CVR's script to preserve Unicode characters
>      perl utf2ent.pl main.tex > main-ent.tex
>      
>      # tex4ht
>      mk4ht oolatex main-ent "xhtml, charset=utf-8"  -utf8
> 

There are two problems: 

1. Macros with Unicode names are supported only by Unicode engines, ie.
XeTeX and LuaTeX. mk4ht oolatex is 8-bit pdflatex, so it can't really
support it.

2. utf2ent converts all Unicode characters to entities, including your
command, so you end with something like '\\entity{1589}' in your code.

3. $\langle$ and $\rangle$ produces wrong mathml code, see 

https://puszcza.gnu.org.ua/bugs/?278

ODT format uses mathml, so it may produce invalid file.

Now what can be done:

You need to use Unicode engine. That means LuaTeX at the moment, as
XeTeX support is broken in tex4ht at the moment. Fortunately, you can
use XeTeX to produce the PDF and only modify some macros for tex4ht.

With LuaTeX, it is possible to keep Unicode characters without need to
call external scripts to convert them to Unicode entities. See

http://michal-h21.github.io/samples/helpers4ht/fontspec.html

for more details. I've modified your file to use alternative4ht and to
fix the problem with angles. Two new macros are introduced:     extlangle
and     extrangle, which are redefined in the config file to use XML
entities directly, instead of math mode. 

I've also found a problem that the angles are wrongly swapped in the ODT
and HTML, probably it is because they use the BIDI algorithm, so they
don't expect that they are swapped by the user already (you use
\rangle#1\langle). I've redefined the commands for angles in the config
file to use the opposite side than should be used according to the name,
so they are rendered correctly.

The last problem is that mk4ht doesn't support LuaTeX, so you need to
use different way to compile the document. You can use:

make4ht -ulm draft -c hello.cfg main.tex "xhtml,ooffice" "ooffice/!
-cmozhtf -utf8" " -cooxtpipes -coo"

(it might be best to save it as a script, as it is not really human
friendly command call :) 

Modified main.tex and hello.cfg are attached. main.tex can be compiled
with xelatex to PDF, all needed changes for tex4ht are in the hello.cfg
file.

Best regards,
Michal
-------------- next part --------------
A non-text attachment was scrubbed...
Name: main.tex
Type: application/postscript
Size: 1403 bytes
Desc: not available
URL: <http://tug.org/pipermail/tex4ht/attachments/20160723/e7b20a50/attachment.ai>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hello.cfg
Type: text/x-tex
Size: 376 bytes
Desc: not available
URL: <http://tug.org/pipermail/tex4ht/attachments/20160723/e7b20a50/attachment.bin>


More information about the tex4ht mailing list