[tex4ht] validity of tex4ht HTML-code output

Thu Jul 28 10:24:04 CEST 2011

On Thu, Jul 28, 2011 at 12:56 AM, Ulrike Fischer <news3 at nililand.de> wrote:

> Am Thu, 28 Jul 2011 00:19:57 -0700 schrieb Johannes Wilm:
>
>
> > What I wonder though is what the state of the HTML that is being output
> > really is. It seems to me specifically that:
> >
> > a. almost none of the <p>-tags are closed
>
> Use the xhtml-Option I mentioned earlier.
>

Ah, yes that's probably what I should done to start out with. Now when I
switch html for xhtml, it somehow breaks my SVG-fixing script. I didn't know
that the html-option would produce partially invalid HTML, as it seems.

>
> > b. an element that is used a lot are "tspans" which the W3C validation
> > claims to not have heard about.
>
> Make small, complete example that shows this how you got this
> element.
>

I used the options:

*\usepackage[html,fn-in,png,charset=utf-8]{tex4ht}*

But given that changing to xhtml things start breaking again, I think I will
leave all that alone.

What worked for me is another quick-fix, including the following two lines
in my script:

*rpl tspan span *html*
*tidy -utf8 -m -c --drop-proprietary-attributes true *.htm*l

Unless I have overseen something, it all seems to work out now. And the
epub-files that Calibre cna produce of it seem to work as well.

I guess everything is currently changing, like

bibtex -> biblatex
pdftex -> luatex
8 bit -> utf-8
PDF -> EPUB
PNG -> SVG
Kile -> LyX

Maybe that's why everything looks like it is in a semi-broken state right
now. I haven't been around long enough to know whether things always are
like this. I hope that over time I will be able to remove more and more
components of my monster-script, as the native applications manage to get
rid of their internal issues. I'll be happy to mail the script to anyone,
although it's really just a large and confusing piece of spaghetti-code to
anyone other than me.

-- 
Johannes Wilm
http://www.johanneswilm.org
tel: +1 (520) 399 8880
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tex4ht/attachments/20110728/adf3f2df/attachment.html>