[pdftex] writing XML from tex source

Thierry Bouche thierry.bouche at ujf-grenoble.fr
Thu Mar 30 13:29:00 CEST 2006

Hi pdftex,

 I know I'm slightly off topic, but I know that most knowledgeable
 people are around here, so... I take the risk!

 I have developped a simple system to write XML metadata for a journal
 article in an auxiliary file from its TeX source.

 The rationale is extremely simple as I am no guru (I overload latex
 commands in such a way that their argument is put into a token, then I
 \write literally this token into an XML tag).

 It is enough to have something working, just by using a postprocessor
 that picks up the fields where some teX code might remain, and produce
 a valid XML file. But I think it should be feasible to produce a valid
 XML file in the first place in most cases, using techniques such as
 demonstrated by hyperref's PDF bookmarks mechanism.

 My problem is that I'm not enough of a wizard to extract the hyperref
 code and have some standalone macro that would convert a given tex
 string stored in a token into a purely text string, dropping any
 meaningless tex code such as \\, \relax, etc., and ideally converting
 all text characters to UTF-8, and dealing with XML reserved chars such
 as <.

 Has anybody any hint about that?


More information about the pdftex mailing list