[tex4ht] htxelatex support for unicode and multiple scripts

Alexandre Roberts alexandre.roberts at gmail.com
Sat Feb 16 23:55:40 CET 2013


Dear tex4ht list members,

I am about to begin drafting the first chapter of my dissertation in
Byzantine and Middle Eastern history. This is the moment when I will commit
to the format I will use for writing my entire dissertation. I want it to
be XeLaTeX/BibLaTeX, but unless I can come up with a simple workflow for
converting the content of my documents to Word format -- the only format
that publishers in my field accept -- I will have to give this up and turn
to Word/Endnote or Mellel/Bookends for the next three years.

I am using MacTeX 2012 on Mac OS X 10.7.5.


*Goal*

*What I need:*
1. Footnotes and support for BibTeX. This is why I chose tex4ht; the
ability to format tex footnotes as word footnotes is key.
2. Full support for Unicode. This includes French, Italian and German
accents as well as diacritics in the Latin script which I use to represent
Arabic (e.g. *wa-laʿanahu wa-laʿana madhhabahu*) and Syriac (e.g. *ṭubhaw
l-gabhrā dhabh-ʾurḥā dh-ʿawāle lā hallekh*) in the body and footnotes of my
text as well as in BibTeX entries. This also includes full support for
Greek Unicode (e.g. Ἰγνάτιός τε ὁ ἐν τῇ περιοικίδι Μελιτηνῆς καὶ Ζαχάκιος ὁ
Ἄρκης καὶ ὁ ἀπὸ Μεσοποταμίας Μωϋσῆς).

*What I would love to have:*
3. Support for Arabic and Syriac scripts (arabxetex, xesyriac).

Without #3, I think I could still commit to LaTeX and leave out the
right-to-left scripts in publications if I must. But without ##1-2, I would
be a fool to take the plunge: I recently had to publish a paper which I
wrote, idealistically, in LaTeX, and the conversion process was messy,
error-prone, and far too time consuming to repeat with a longer work.


*How far I have gotten so far*

But I still have hope. I have prepared a barebones sample of the kind of
document I would like to convert:

%!TEX TS-program = xelatex
> %!TEX encoding = UTF-8 Unicode
> \documentclass[12pt]{memoir}
> \usepackage{fontspec}
> \usepackage{xunicode}
> % Choose roman font (choosing the mapping so that ``--$>$``, '--$>$' etc.).
> \setromanfont[Mapping=tex-text]{Palatino}
> % Greek (normally, use first two lines; to make simple file for export to
> Word, use 3rd line only)
> %\newfontfamily{\gr}{New Athena Unicode}
> %\newcommand{\greek}[1]{{\gr #1}}
> \newcommand{\greek}[1]{#1}
> %Arabic
> %\usepackage[novoc,fdf2alif]{arabxetex}
> %\newfontfamily\arabicfont[Script=Arabic,Scale=1.2,WordSpace=2]{USAMA
> NASKH}
> %\usepackage{bidi}
> \newcommand{\textarab}[1][1]{[[INSERT ARABIC QUOTE HERE]]}
> % Bibliography etc
> \usepackage[american]{babel}
> \usepackage{csquotes}
> \usepackage[style=historian, babel=hyphen, mincrossrefs = 1,
> usetranslator=true, printnoterefs=false, backend=biber]{biblatex}
> \bibliography{/Users/alexandre/Dropbox/bib-dbs/alexhistory.bib}
>
> \begin{document}
> …One recension reached to the beginning of al-Qāhir's caliphate
> (320--2/932--4), the very year when he ``was made patriarch of Alexandria
> (\emph{ṣuyyira… baṭriyarkan ʿalā l-Iskandarīya})." Others contain
> ``additions (\emph{ziyādāt})" not in the original, which Yaḥyā knows
> because ``I saw the copy of the original itself, as well as other,
> different copies of the book, and the end of its contents is up to the
> caliphate of al-Rāḍī (322--9/934--40)."\footnote{(citation)
> \textarab[utf]{ورايت نسخة الاصل نفسها ونسخ اخر للكتاب غيرها ونهاية ما فيها
> الى خلافة الراضي}.}
> … But don't tell me…\footnote{\greek{ἀλλὰ μὴ εἶπέ μοι...}}
> \printbibliography
> \end{document}


As you can see, at the moment I am not even trying to keep the Arabic
script. (If you have any ideas about how I might do that, I'd love to hear
them too!) When I execute the command

htxelatex word.converter.barebones.tex "xhtml,charset=utf-8" -utf8


I get errors of the form "! LaTeX Error: Command `\acute' already defined
in `'." And more importantly, the outputted HTML file is essentially blank:

<?xml version="1.0" encoding="utf-8" ?>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
>   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
> <!--http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd-->
> <html xmlns="http://www.w3.org/1999/xhtml"
> >
> <head><title></title>
> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
> <meta name="generator" content="TeX4ht (
> http://www.cse.ohio-state.edu/~gurari/TeX4ht/)" />
> <meta name="originator" content="TeX4ht (
> http://www.cse.ohio-state.edu/~gurari/TeX4ht/)" />
> <!-- xhtml,charset=utf-8,html -->
> <meta name="src" content="word.converter.barebones.tex" />
> <meta name="date" content="2013-02-16 14:38:00" />
> <link rel="stylesheet" type="text/css" href="word.converter.barebones.css"
> />
> </head><body
> >
> <!--l. 31--><p class="noindent" >
>
> <span class="footnote-mark"><a
> href="word.converter.barebones2.html#fn1x0"><sup class="textsuperscript">
> </sup></a></span><a
>  id="x1-2f1"></a>
> </p><!--l. 33--><p class="indent" >   <span class="footnote-mark"><a
> href="word.converter.barebones3.html#fn2x0"><sup class="textsuperscript">
> </sup></a></span><a
>  id="x1-3f2"></a>
> </p>
>
> </body></html>


(The separate HTML files representing footnotes are likewise blank.)

I can get this whole process to work and output HTML or ODT, as long as I
don't insist on using fontspec, which seems to be the key to being able to
include diacritical marks, accents, and Greek. But of course, that's
precisely what I need!


*Appeal to the tex4ht list*

I feel like I'm close, but since I am only really an amateur and don't have
much of a sense of how the underbelly of TeX, tex4ht etc. work, I hope that
my appeal will reach the ears and screens of those who do!

I would be most grateful for any help you may be able to offer!

Alex

--
Alexandre M. Roberts
Department of History
UC Berkeley
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tex4ht/attachments/20130216/59cfb746/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: word.converter.barebones.log
Type: application/octet-stream
Size: 64193 bytes
Desc: not available
URL: <http://tug.org/pipermail/tex4ht/attachments/20130216/59cfb746/attachment-0001.obj>


More information about the tex4ht mailing list