[tex4ht] htxelatex support for unicode and multiple scripts

Alexandre Roberts alexandre.roberts at gmail.com
Mon Feb 18 18:21:53 CET 2013

Thank you! From my perspective, this is nothing short of miraculous, and I
wish I had consulted you back when I converted my article to .doc format!
It would have saved me much work and countless errors.

Based on what you have sent me, I have been able to convert longer
documents. This convinces me that I will be able to compose in the TeX
world and still speak to the Word world. I only have a few small questions
as to how I can develop a workflow which smooths out this communication
even further.

*Current workflow*
1. paste all tex into word.converter.tex
2. run CVR’s perl script: perl utf2ent.pl word.converter.tex >
3. run SimpleTeX4ht (ODT option) on word.converter-ent.tex
4. biber word.converter-ent
5. run SimpleTeX4ht (ODT option) on word.converter-ent.tex (again)
6. open word.converter-ent.odt in OpenOffice
7. Save As *word.converter-ent.doc*, which is the final result

*Remaining issue*

Unicode in my BibTeX entries: while Unicode in my main text and footnotes
now shows up perfectly, the Unicode characters in my BibTeX entries still
don't make it through. (step 4) Is there a way to get this Unicode to make
it as well without modifying my .bib file?

*Two additional minor issues of convenience*

1. Since my goal is to produce Word documents from my tex source, I would
like tex4ht to produce ODT documents rather than HTML. Right now I have
managed to do this with SimpleTeX4ht (steps 3 and 5); is there a simpler
way to do this from the command-line using htlatex?

2. Is there a way to automate step 1? I am writing various sections of my
chapter in different tex files and then use \input{} in the tex file which
I compile. I imagine it must be straightforward to have the perl script
begin by replacing each "\input{file}" with the contents of "file.tex" in
the output, but I don't know how to do that.

I would again be most grateful for your help on this matter as well. I have
attached a sample of what my ordinary xelatex output looks like
(sample-xelatex.pdf), as well as the same code converted to Word format
using my revision of your workflow (word.converter-ent.x). I also included
a bib file with the entries I cite in my sample (mn.bib).

Best wishes,

On Sun, Feb 17, 2013 at 9:41 PM, Radhakrishnan CV <cvr at river-valley.org>wrote:

> On Sun, Feb 17, 2013 at 4:25 AM, Alexandre Roberts <
> alexandre.roberts at gmail.com> wrote:
>> Dear tex4ht list members,
>> I am about to begin drafting the first chapter of my dissertation in
>> Byzantine and Middle Eastern history. This is the moment when I will commit
>> to the format I will use for writing my entire dissertation. I want it to
>> be XeLaTeX/BibLaTeX, but unless I can come up with a simple workflow for
>> converting the content of my documents to Word format -- the only format
>> that publishers in my field accept -- I will have to give this up and turn
>> to Word/Endnote or Mellel/Bookends for the next three years.
> As far as I understand, TeX4ht won't support fontspec or XeLaTeX
> technologies of using system fonts that do not have *.tfm's. In effect, by
> adopting TeX4ht, one is likely to loose the features brought in by XeTeX.
> However, here is another approach.
>    1. We translate all the Unicode character representations in the
>    document to Unicode code points in 7bit ascii which is very much palatable
>    to TeX4ht. A simple perl script, etf2ent.pl in the attached archive
>    does the job.
>    2. We run TeX4ht on the output of step 1.
>    3. Open the *html in a browser, I believe, we get what you wanted. See
>    the attached screen shot as it appeared in Firefox in my Linux box.
> Here is what I did with your specimen document.
>    1. commented out lines that related to fontspec package from your
>    sources named as alex.tex.
>    2. added four lines of macro code to digest the converted TeX sources
>    3. ran the command: perl utf2ent.pl alex.tex > alex-ent.tex
>    4. ran the command: htlatex alex-ent "xhtml,charset=utf-8,fn-in"
>    -utf8  (fn-in option is to keep the footnotes in the same document). I have
>    used a local bib file, mn.bib as I didn't have your bib database. biber was
>    also run in the meantime to process the bibliography database.
>    5. open the output, alex-ent.html in a browser. I got it as you see in
>    the attached alex.png.
> Hope this might help you.
> Best regards
> --
> Radhakrishnan
> River Valley<https://maps.google.com/maps?q=River%20Valley,%20Thiruvananthapuram%20Neyyardam%20Road,%20Kerala,%20India&vector=1>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tex4ht/attachments/20130218/03399f20/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mn.bib
Type: application/octet-stream
Size: 1403 bytes
Desc: not available
URL: <http://tug.org/pipermail/tex4ht/attachments/20130218/03399f20/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sample-xelatex.pdf
Type: application/pdf
Size: 30268 bytes
Desc: not available
URL: <http://tug.org/pipermail/tex4ht/attachments/20130218/03399f20/attachment-0002.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: word.converter.tex
Type: application/x-tex
Size: 1043 bytes
Desc: not available
URL: <http://tug.org/pipermail/tex4ht/attachments/20130218/03399f20/attachment-0001.tex>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: word.converter-ent.pdf
Type: application/pdf
Size: 85784 bytes
Desc: not available
URL: <http://tug.org/pipermail/tex4ht/attachments/20130218/03399f20/attachment-0003.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: word.converter-ent.doc
Type: application/msword
Size: 21504 bytes
Desc: not available
URL: <http://tug.org/pipermail/tex4ht/attachments/20130218/03399f20/attachment-0001.doc>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: word.converter-ent.odt
Type: application/vnd.oasis.opendocument.text
Size: 8012 bytes
Desc: not available
URL: <http://tug.org/pipermail/tex4ht/attachments/20130218/03399f20/attachment-0001.odt>

More information about the tex4ht mailing list