[Fwd: Re: [l2h] Converting emdashs and endashs?]
Daniel Taupin
taupind at wanadoo.fr
Tue Aug 12 20:35:45 CEST 2003
-------- Original Message --------
Subject: Re: [l2h] Converting emdashs and endashs?
Date: Tue, 12 Aug 2003 19:35:11 +0200
From: Daniel Taupin <taupind at wanadoo.fr>
Reply-To: taupind at wanadoo.fr
To: James Howison <jhowison at syr.edu>
References: <E5F1FFBB-CCE1-11D7-BFAA-00306579408C at syr.edu>
Please, do not confuse shapes of quotes (single, double) which are a character
problem, with the handling of -- and ---. The last things are standard ligatures
with TeX fonts, while the first ones are a question of typing taste.
Therefore, since it is a TeX/LaTeX standard, I ask for a standard conversion
(unless in math mode) from -- to "endash" and a FURTHER conversion of "endash"
followed by a - to "emdash".
On the other hand, I would disagree with a change in the behaviour of double
quotes, mainly because iot would be tricky for people performing copy/paste from
latex2html generated screens.
James Howison wrote:
> On Monday, August 11, 2003, at 11:08 pm, Ross Moore wrote:
>
>> Hello James,
>>
>> On Mon, 11 Aug 2003, James Howison wrote:
>>
>>> Now I have curly quotes happening (yay!) I am wondering about the other
>>> special characters. I realize that this will break back-wards
>>> compatibility but that is not an issue for my needs.
>>>
>>> I would like "---" to be converted to "—" as defined in the
>>> unicode.pl file at 799 - but this doesn't seem to happen - instead it
>>> is converted to "--". This is also what happens if I change --- to
>>> {---}.
>>
>>
>> That is definitely a lot harder; particularly since -- and --- are
>> rarely used correctly in LaTeX manuscripts.
>> So general rules may easily result in something that the author
>> never intended.
>
>
> I use -- and --- often.
>
> I'm still wondering, though, how to tell which conversions specified in
> the unicode.pl file actually happen and which do not---and how those
> are controlled ... I guess I'll spend some more time with the source ;)
>
>>> Also I see from the source that converting single quotes is
>>> tough---perhaps I'm naive but it would seem to me that this sequence
>>> would work...
>>>
>>> s/``/“/og
>>> s/`/‘/og # once the `` is gone then the ` is only used for
>>> open single quote right?
>>
>>
>> Not at all. \` is used as an accent, and in some language variants,
>> the ` is made active to remove the need to use the \ .
>> With this active character, overloading can occur for generating
>> other special characters or ligatures.
>
>
> Right - well I see the difficulty now. Quite an important distinction
> - language compatibility being very important. The use of ` rather
> than \ is not something that I'm familiar with - out of interest why is
> this done - is it because the \ character is not easily accessible on
> the keyboard?
>
> Perhaps if these conversions are done _after_ the conversions from
> latex->unicode then perhaps this would work (i.e. the international
> characters would already be converted to their unicode expressions ...).
>
>>> s/''/”/og
>>> s/'/’/og # Will also replace apostrophes with close curly
>>> single - not a bad thing.
>>
>>
>> Sorry; I cannot agree.
>> Every Latin-based charset encoding has an apostrophe character.
>> A curly-quote is most definitely *not* logically an apostrophe, even
>> though it may look like one.
>
>
> I acknowledge that this is a matter of style---but the unicode standard
> discusses this and generally prefers the use of the curly single
> (ߣ) to the straight mark ()
>
> http://www.unicode.org/unicode/reports/tr8/
> #Apostrophe%20Semantics%20Errata
>
> <snip>
>
>> The aim of an HTML translation should not be appearance.
>> It should be ensuring that meaning is preserved, and that no symbol
>> is rendered with the 'missing character' glyph.
>
>
> I think one might reasonably disagree that appearance is not
> important---HTML is, intentions notwithstanding, a format used for
> presentation. Your point and care is about the 'missing character'
> glyph is well taken, the warnings are very useful for this.
>
> The 'div' request for CSS in Hakan's email also reflects the use of
> HTML as an appearance format.
>
> Thanks,
> James
>
>> Hope this helps,
>>
>> Ross Moore
>>
>>>
>>> Thanks,
>>> James
>>>
>>>
>>> On Saturday, August 9, 2003, at 02:53 am, Ross Moore wrote:
>>>
>>>> On Sat, 9 Aug 2003, James Howison wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'd really like to convert the latex quotation marks, `` and '' to
>>>>> the
>>>>> recommended HTML curly quotes, “ instead of `` and ”
>>>>> instead
>>>>> of '' - standard codes that render the curly quotes beautifully.
>>>>
>>>>
>>>> set
>>>> $USE_CURLY_QUOTES =1;
>>>> in an initialisation file.
>>>>
>>>> This is not the default, because not all browsers actually render
>>>> these characters. (At least, that was the situation 3-4 years ago when
>>>> the LaTex2HTML coding was written.)
>>>>
>>>>
>>>> Hope this helps,
>>>>
>>>> Ross Moore
>>>>
>>>>
>>>>>
>>>>> I'm sure that this is possible through latex2html - the codes are
>>>>> listed around unicode.pl:722 - but either I can't find the magic
>>>>> incantation to have latex2html do the conversion or there is a bug
>>>>> preventing this from working in my version (1.70) or set-up.
>>>>>
>>>>> I've tried:
>>>>>
>>>>> latex2html -html_version 4.0,unicode test.tex
>>>>>
>>>>> What is strange is that this does work for, say \v{Z} which converts
>>>>> to
>>>>> the code Ž (and that is definitely happening through unicode.pl
>>>>> (I
>>>>> changed the translation and it worked fine).
>>>>>
>>>>> So why doesn't the translation for `` (which is correctly listed in
>>>>> the
>>>>> unicode.pl as \`\`) and '' which is correctly listed as \'\' work?
>>>>>
>>>>> I've had a good hunt around for this - but I can't see why the other
>>>>> codes are converted but not the quotes.
>>>>>
>>>>> Cheers,
>>>>> James
>>>>>
>>>>> ps. minimal test.tex follows
>>>>>
>>>>> ----------
>>>>>
>>>>> \documentclass[11pt]{article}
>>>>> \begin{document}
>>>>> ``Why are these quotes not converted to unicode'' (they are in the
>>>>> unicode.pl file)
>>>>> While this symbol (also in the unicode.pl file) is? - \v{Z}
>>>>> \end{document}
>>>>>
>>>>> _______________________________________________
>>>>> latex2html mailing list
>>>>> latex2html at tug.org
>>>>> http://tug.org/mailman/listinfo/latex2html
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> latex2html mailing list
>>> latex2html at tug.org
>>> http://tug.org/mailman/listinfo/latex2html
>>>
>>
>
> _______________________________________________
> latex2html mailing list
> latex2html at tug.org
> http://tug.org/mailman/listinfo/latex2html
>
--
------------------------------------------------------------------------
Daniel Taupin, 91400 ORSAY - France
E-mail= mailto:taupind at wanadoo.fr
Home/fax: (33)1.60.10.26.44. Rep.: (33)1.60.10.04.13, fax (work)
(33)1.69.15.60.86
--
------------------------------------------------------------------------
Daniel Taupin, 91400 ORSAY - France
E-mail= mailto:taupind at wanadoo.fr
Home/fax: (33)1.60.10.26.44. Rep.: (33)1.60.10.04.13, fax (work)
(33)1.69.15.60.86
More information about the latex2html
mailing list