[tex4ht] htxelatex support for unicode and multiple scripts

Alexandre Roberts alexandre.roberts at gmail.com
Mon Feb 18 18:34:32 CET 2013

Hello Michal,

My hearty thanks to you too! I didn't see your message before I sent my
reply. It seems like Radhakrishnan's perl script is still ideal for my
purposes since it seems (as far as I understand) to be comprehensive in
converting Unicode, but you have anticipated my "minor issue of
convenience" #1. I can now modify my workflow to:

1. paste all tex into word.converter.tex
2. Run:

> perl utf2ent.pl word.converter.tex > word.converter-ent.tex      [CVR’s
> perl script]
> mk4ht oolatex word.converter-ent "xhtml, charset=utf-8"  -utf8
> biber word.converter-ent
> mk4ht oolatex word.converter-ent "xhtml, charset=utf-8"  -utf8

3. open word.converter-ent.odt in OpenOffice
4. Save As *word.converter-ent.doc*, which is the final result.

The BibTeX issue remains; I hope there is a simple fix!

All best,

On Mon, Feb 18, 2013 at 9:17 AM, Michal Hoftich <michal.h21 at gmail.com>wrote:

> Hello,
> you can also use macro \DeclareUnicodeCharacter from the `inputenc`
> package. It has two parameters, first is hex value of utf8 character,
> second is macro to be used, in this case \entity{decimal value of utf8
> character}.
> I created simple package, `greek-arabic-4ht.sty`, which covers full
> unicode range for greek, arabic and extended latin A, but not  all
> characters used in your document are covered!! You can simply add them to
> the package as you find any such undeclared character.
> You can edit your preamble to include packages needed when tex4ht is
> running like this:
> --------------------
> \documentclass[12pt]{memoir}
> \makeatletter
> \@ifpackageloaded{tex4ht}{%
> \newcommand{\greek}[1]{#1}
> \usepackage{greek-arabic-4ht}
> }{%
> \usepackage{fontspec}
> \usepackage{xunicode}
> % Choose roman font (choosing the mapping so that ``--$>$``, '--$>$' etc.).
> \setromanfont[Mapping=tex-text]{Palatino}
> % Greek (normally, use first two lines; to make simple file for export to
> Word, use 3rd line only)
> \newfontfamily{\gr}{New Athena Unicode}
> \newcommand{\greek}[1]{{\gr #1}}
> \newfontfamily\arabicfont[Script=Arabic,Scale=1.2,WordSpace=2]{USAMA NASKH}
> \usepackage{bidi}
> }
> \makeatother
> ------------------
> If your target format is word, you can translate your document with
> command:
> mk4ht oolatex alex "xhtml, charset=utf-8"  -utf8
> this will make file in openoffice format, which can be easily translated
> to word. Sample is also included in the attachment.
> Regards,
> Michal
> 2013/2/18 Radhakrishnan CV <cvr at river-valley.org>
>> On Sun, Feb 17, 2013 at 4:25 AM, Alexandre Roberts <
>> alexandre.roberts at gmail.com> wrote:
>>> Dear tex4ht list members,
>>> I am about to begin drafting the first chapter of my dissertation in
>>> Byzantine and Middle Eastern history. This is the moment when I will commit
>>> to the format I will use for writing my entire dissertation. I want it to
>>> be XeLaTeX/BibLaTeX, but unless I can come up with a simple workflow for
>>> converting the content of my documents to Word format -- the only format
>>> that publishers in my field accept -- I will have to give this up and turn
>>> to Word/Endnote or Mellel/Bookends for the next three years.
>> As far as I understand, TeX4ht won't support fontspec or XeLaTeX
>> technologies of using system fonts that do not have *.tfm's. In effect, by
>> adopting TeX4ht, one is likely to loose the features brought in by XeTeX.
>> However, here is another approach.
>>    1. We translate all the Unicode character representations in the
>>    document to Unicode code points in 7bit ascii which is very much palatable
>>    to TeX4ht. A simple perl script, etf2ent.pl in the attached archive
>>    does the job.
>>    2. We run TeX4ht on the output of step 1.
>>    3. Open the *html in a browser, I believe, we get what you wanted.
>>    See the attached screen shot as it appeared in Firefox in my Linux box.
>> Here is what I did with your specimen document.
>>    1. commented out lines that related to fontspec package from your
>>    sources named as alex.tex.
>>    2. added four lines of macro code to digest the converted TeX sources
>>    3. ran the command: perl utf2ent.pl alex.tex > alex-ent.tex
>>    4. ran the command: htlatex alex-ent "xhtml,charset=utf-8,fn-in"
>>    -utf8  (fn-in option is to keep the footnotes in the same document). I have
>>    used a local bib file, mn.bib as I didn't have your bib database. biber was
>>    also run in the meantime to process the bibliography database.
>>    5. open the output, alex-ent.html in a browser. I got it as you see
>>    in the attached alex.png.
>> Hope this might help you.
>> Best regards
>> --
>> Radhakrishnan
>> River Valley<https://maps.google.com/maps?q=River%20Valley,%20Thiruvananthapuram%20Neyyardam%20Road,%20Kerala,%20India&vector=1>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tex4ht/attachments/20130218/10f71385/attachment.html>

More information about the tex4ht mailing list