[l2h] Generating XHTML

Fri Nov 5 00:17:54 CET 2004

Hi Fred,

On 05/11/2004, at 8:24 AM, Fred L. Drake, Jr. wrote:

> I know this has come up before, but I haven't seen it discussed 
> recetly, so
> perhaps the answers have changed.  ;-)
>
> Is there any way to get LaTeX2HTML to generate XHTML instead of classic
> SGML-based HTML?  I'd really like to move the Python documentation 
> into the
> "new world" as much as possible.

No, I've not done any work on this yet.

But  HTMLtidy  can be used for such a conversion, run as a
post-processor after the LaTeX2HTML job.

The intro page at:     http://www.w3.org/People/Raggett/tidy/
describes the boolean option:

output-xhtml: bool
If set to yes, Tidy will generate the pretty printed output  writing it 
as extensible HTML. The default is no. This option  causes Tidy to set 
the doctype and default namespace as appropriate to  XHTML. If a 
doctype or namespace is given they will checked for  consistency with 
the content of the document. In the case of an  inconsistency, the 
corrected values will appear in the output. For  XHTML, entities can be 
written as named or numeric entities according  to the value of the 
"numeric-entities" property. The tags and  attributes will be output in 
the case used in the input document,  regardless of other options.

>
> If there's not a way to do this with LaTeX2HTML, pointers to some other
> LaTeX-to-XML tool would be appreciated.  (Especially if it doesn't 
> involve
> TeXML!)

You can configure LaTeX2HTML to run this automatically on every page,
after all other processing has been completed.

There are 3 places in lateX2HTML where you could install such an extra
post-processing step, by defining your own Perl subroutine:

   &post_post_process

   &document_post_post_process

These two are Perl subroutines that will be called (if defined), to act 
on the
contents of the  $_ container, before being written to the  .html  
files.
( &document_post_post_process   acts a little later than  
&post_post_process
*after* the <ADDRESS> tags have been added.)

   &html_validate

This subroutine is called subject to the values of certain variables.
Currently it is defined to act on the completed HTML pages; viz.

sub html_validate {
     my($extn) = $EXTN;
     if (!($EXTN =~ /^\.html?$/i)) {
         $extn =~ s/^[^\.]*(\.html?)$/$1/;
     }
     print "\n *** Validating ***\n";
     system("$HTML_VALIDATOR *$extn");
}

Indeed this makes a system call to a program that acts on all files
having the right extension that happen to live in the $DESTDIR
directory where the web-pages are being built.

You can easily write an alternative subroutine to act instead,
placing it in a  .latex2html-init  file.

>
> Thanks!
>

Hope this helps,

	Ross

>
>   -Fred
>
> -- 
> Fred L. Drake, Jr.  <fdrake at acm.org>
>
> _______________________________________________
> latex2html mailing list
> latex2html at tug.org
> http://tug.org/mailman/listinfo/latex2html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 3208 bytes
Desc: not available
Url : http://tug.org/pipermail/latex2html/attachments/20041105/ae50b984/attachment.bin