[l2h] Converting emdashs and endashs?

Ross Moore ross at ics.mq.edu.au
Tue Aug 12 09:08:57 CEST 2003


Hello James,

On Mon, 11 Aug 2003, James Howison wrote:

> Now I have curly quotes happening (yay!) I am wondering about the other
> special characters.  I realize that this will break back-wards
> compatibility but that is not an issue for my needs.
>
> I would like "---" to be converted to "—" as defined in the
> unicode.pl file at 799 - but this doesn't seem to happen - instead it
> is converted to "--".  This is also what happens if I change --- to
> {---}.

That is definitely a lot harder; particularly since -- and --- are
rarely used correctly in LaTeX manuscripts.
So general rules may easily result in something that the author
never intended.

> I'm not sure why some of the conversions in the unicode.pl file happen,
> while others do not.  I can't find an equivalent of the
> $USE_CURLY_QUOTES in the source code that seems relevant to mdash ...
>
> Any ideas on how to get a maximal set of the conversions in unicode.pl
> actually happening?  I notice that there is no do_cmd_textemdash in
> unicode.pl - is that why?
>
> Also I see from the source that converting single quotes is
> tough---perhaps I'm naive but it would seem to me that this sequence
> would work...
>
> s/``/“/og
> s/`/‘/og     # once the `` is gone then the ` is only used for
> open single quote right?

Not at all.  \`  is used as an accent, and in some language variants,
the ` is made active to remove the need to use the \ .
With this active character, overloading can occur for generating
other special characters or ligatures.


> s/''/”/og
> s/'/’/og     # Will also replace apostrophes with close curly
> single - not a bad thing.

Sorry; I cannot agree.
Every Latin-based charset encoding has an apostrophe character.
A curly-quote is most definitely *not* logically an apostrophe, even
though it may look like one.

Try cut/paste from a web-page into LaTeX source.
Simply finding curly quotes to replace with apostrophes is *very* tedious
indeed. At least when quotes occur in pairs then you expect to have
to do something with the environment delimiters --- it should *not*
be necessary to have to search/replace apostrophes.

<<
> i.e. ensure that one does the singles after the doubles ...
>
> But there is probably a better algorithm in the source code for 'quoter'
>
> http://www.dwheeler.com/quoter/

The aim of an HTML translation should not be appearance.
It should be ensuring that meaning is preserved, and that no symbol
is rendered with the 'missing character' glyph.


Hope this helps,

	Ross Moore

>
> Thanks,
> James
>
>
> On Saturday, August 9, 2003, at 02:53  am, Ross Moore wrote:
>
> > On Sat, 9 Aug 2003, James Howison wrote:
> >
> >> Hi all,
> >>
> >> I'd really like to convert the latex quotation marks, `` and '' to the
> >> recommended HTML curly quotes, &#8220 instead of `` and &#8221 instead
> >> of '' - standard codes that render the curly quotes beautifully.
> >
> > set
> >  $USE_CURLY_QUOTES =1;
> > in an initialisation file.
> >
> > This is not the default, because not all browsers actually render
> > these characters. (At least, that was the situation 3-4 years ago when
> > the LaTex2HTML coding was written.)
> >
> >
> > Hope this helps,
> >
> > 	Ross Moore
> >
> >
> >>
> >> I'm sure that this is possible through latex2html - the codes are
> >> listed around unicode.pl:722 - but either I can't find the magic
> >> incantation to have latex2html do the conversion or there is a bug
> >> preventing this from working in my version (1.70) or set-up.
> >>
> >> I've tried:
> >>
> >> latex2html -html_version 4.0,unicode test.tex
> >>
> >> What is strange is that this does work for, say \v{Z} which converts
> >> to
> >> the code &#381; (and that is definitely happening through unicode.pl
> >> (I
> >> changed the translation and it worked fine).
> >>
> >> So why doesn't the translation for `` (which is correctly listed in
> >> the
> >> unicode.pl as \`\`) and '' which is correctly listed as \'\' work?
> >>
> >> I've had a good hunt around for this - but I can't see why the other
> >> codes are converted but not the quotes.
> >>
> >> Cheers,
> >> James
> >>
> >> ps.  minimal test.tex follows
> >>
> >> ----------
> >>
> >> \documentclass[11pt]{article}
> >> \begin{document}
> >> ``Why are these quotes not converted to unicode''  (they are in the
> >> unicode.pl file)
> >> While this symbol (also in the unicode.pl file) is? - \v{Z}
> >> \end{document}
> >>
> >> _______________________________________________
> >> latex2html mailing list
> >> latex2html at tug.org
> >> http://tug.org/mailman/listinfo/latex2html
> >>
> >
>
> _______________________________________________
> latex2html mailing list
> latex2html at tug.org
> http://tug.org/mailman/listinfo/latex2html
>


More information about the latex2html mailing list