[pdftex] writing XML from tex source
thierry.bouche at ujf-grenoble.fr
Thu Mar 30 13:29:00 CEST 2006
I know I'm slightly off topic, but I know that most knowledgeable
people are around here, so... I take the risk!
I have developped a simple system to write XML metadata for a journal
article in an auxiliary file from its TeX source.
The rationale is extremely simple as I am no guru (I overload latex
commands in such a way that their argument is put into a token, then I
\write literally this token into an XML tag).
It is enough to have something working, just by using a postprocessor
that picks up the fields where some teX code might remain, and produce
a valid XML file. But I think it should be feasible to produce a valid
XML file in the first place in most cases, using techniques such as
demonstrated by hyperref's PDF bookmarks mechanism.
My problem is that I'm not enough of a wizard to extract the hyperref
code and have some standalone macro that would convert a given tex
string stored in a token into a purely text string, dropping any
meaningless tex code such as \\, \relax, etc., and ideally converting
all text characters to UTF-8, and dealing with XML reserved chars such
Has anybody any hint about that?
More information about the pdftex