[Fwd: Re: [l2h] Converting emdashs and endashs?]

Daniel Taupin taupind at wanadoo.fr
Tue Aug 12 20:35:45 CEST 2003



-------- Original Message --------
Subject: Re: [l2h] Converting emdashs and endashs?
Date: Tue, 12 Aug 2003 19:35:11 +0200
From: Daniel Taupin <taupind at wanadoo.fr>
Reply-To: taupind at wanadoo.fr
To: James Howison <jhowison at syr.edu>
References: <E5F1FFBB-CCE1-11D7-BFAA-00306579408C at syr.edu>

Please, do not confuse shapes of quotes (single, double) which are a character
problem, with the handling of -- and ---. The last things are standard ligatures
with TeX fonts, while the first ones are a question of typing taste.

Therefore, since it is a TeX/LaTeX standard, I ask for a standard conversion
(unless in math mode) from -- to "endash" and a FURTHER conversion of  "endash"
followed by a - to "emdash".

On the other hand, I would disagree with a change in the behaviour of double
quotes, mainly because iot would be tricky for people performing copy/paste from
latex2html generated screens.

James Howison wrote:

> On Monday, August 11, 2003, at 11:08  pm, Ross Moore wrote:
> 
>> Hello James,
>>
>> On Mon, 11 Aug 2003, James Howison wrote:
>>
>>> Now I have curly quotes happening (yay!) I am wondering about the  other
>>> special characters.  I realize that this will break back-wards
>>> compatibility but that is not an issue for my needs.
>>>
>>> I would like "---" to be converted to "&#8212;" as defined in the
>>> unicode.pl file at 799 - but this doesn't seem to happen - instead it
>>> is converted to "--".  This is also what happens if I change --- to
>>> {---}.
>>
>>
>> That is definitely a lot harder; particularly since -- and --- are
>> rarely used correctly in LaTeX manuscripts.
>> So general rules may easily result in something that the author
>> never intended.
> 
> 
> I use -- and --- often.
> 
> I'm still wondering, though, how to tell which conversions specified in  
> the unicode.pl file actually happen and which do not---and how those  
> are controlled ... I guess I'll spend some more time with the source ;)
> 
>>> Also I see from the source that converting single quotes is
>>> tough---perhaps I'm naive but it would seem to me that this sequence
>>> would work...
>>>
>>> s/``/&#8220;/og
>>> s/`/&#8216;/og     # once the `` is gone then the ` is only used for
>>> open single quote right?
>>
>>
>> Not at all.  \`  is used as an accent, and in some language variants,
>> the ` is made active to remove the need to use the \ .
>> With this active character, overloading can occur for generating
>> other special characters or ligatures.
> 
> 
> Right - well I see the difficulty now.  Quite an important distinction  
> - language compatibility being very important.  The use of ` rather  
> than \ is not something that I'm familiar with - out of interest why is  
> this done - is it because the \ character is not easily accessible on  
> the keyboard?
> 
> Perhaps if these conversions are done _after_ the conversions from  
> latex->unicode then perhaps this would work (i.e. the international  
> characters would already be converted to their unicode expressions ...).
> 
>>> s/''/&#8221;/og
>>> s/'/&#8217;/og     # Will also replace apostrophes with close curly
>>> single - not a bad thing.
>>
>>
>> Sorry; I cannot agree.
>> Every Latin-based charset encoding has an apostrophe character.
>> A curly-quote is most definitely *not* logically an apostrophe, even
>> though it may look like one.
> 
> 
> I acknowledge that this is a matter of style---but the unicode standard  
> discusses this and generally prefers the use of the curly single  
> (&#2019) to the straight mark (&#0027)
> 
> http://www.unicode.org/unicode/reports/tr8/ 
> #Apostrophe%20Semantics%20Errata
> 
> <snip>
> 
>> The aim of an HTML translation should not be appearance.
>> It should be ensuring that meaning is preserved, and that no symbol
>> is rendered with the 'missing character' glyph.
> 
> 
> I think one might reasonably disagree that appearance is not  
> important---HTML is, intentions notwithstanding, a format used for  
> presentation.  Your point and care is about the 'missing character'  
> glyph is well taken, the warnings are very useful for this.
> 
> The 'div' request for CSS in Hakan's email also reflects the use of  
> HTML as an appearance format.
> 
> Thanks,
> James
> 
>> Hope this helps,
>>
>>     Ross Moore
>>
>>>
>>> Thanks,
>>> James
>>>
>>>
>>> On Saturday, August 9, 2003, at 02:53  am, Ross Moore wrote:
>>>
>>>> On Sat, 9 Aug 2003, James Howison wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'd really like to convert the latex quotation marks, `` and '' to  
>>>>> the
>>>>> recommended HTML curly quotes, &#8220 instead of `` and &#8221  
>>>>> instead
>>>>> of '' - standard codes that render the curly quotes beautifully.
>>>>
>>>>
>>>> set
>>>>  $USE_CURLY_QUOTES =1;
>>>> in an initialisation file.
>>>>
>>>> This is not the default, because not all browsers actually render
>>>> these characters. (At least, that was the situation 3-4 years ago  when
>>>> the LaTex2HTML coding was written.)
>>>>
>>>>
>>>> Hope this helps,
>>>>
>>>>     Ross Moore
>>>>
>>>>
>>>>>
>>>>> I'm sure that this is possible through latex2html - the codes are
>>>>> listed around unicode.pl:722 - but either I can't find the magic
>>>>> incantation to have latex2html do the conversion or there is a bug
>>>>> preventing this from working in my version (1.70) or set-up.
>>>>>
>>>>> I've tried:
>>>>>
>>>>> latex2html -html_version 4.0,unicode test.tex
>>>>>
>>>>> What is strange is that this does work for, say \v{Z} which converts
>>>>> to
>>>>> the code &#381; (and that is definitely happening through unicode.pl
>>>>> (I
>>>>> changed the translation and it worked fine).
>>>>>
>>>>> So why doesn't the translation for `` (which is correctly listed in
>>>>> the
>>>>> unicode.pl as \`\`) and '' which is correctly listed as \'\' work?
>>>>>
>>>>> I've had a good hunt around for this - but I can't see why the other
>>>>> codes are converted but not the quotes.
>>>>>
>>>>> Cheers,
>>>>> James
>>>>>
>>>>> ps.  minimal test.tex follows
>>>>>
>>>>> ----------
>>>>>
>>>>> \documentclass[11pt]{article}
>>>>> \begin{document}
>>>>> ``Why are these quotes not converted to unicode''  (they are in the
>>>>> unicode.pl file)
>>>>> While this symbol (also in the unicode.pl file) is? - \v{Z}
>>>>> \end{document}
>>>>>
>>>>> _______________________________________________
>>>>> latex2html mailing list
>>>>> latex2html at tug.org
>>>>> http://tug.org/mailman/listinfo/latex2html
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> latex2html mailing list
>>> latex2html at tug.org
>>> http://tug.org/mailman/listinfo/latex2html
>>>
>>
> 
> _______________________________________________
> latex2html mailing list
> latex2html at tug.org
> http://tug.org/mailman/listinfo/latex2html
> 

-- 
  ------------------------------------------------------------------------
   Daniel Taupin, 91400 ORSAY - France
   E-mail= mailto:taupind at wanadoo.fr
   Home/fax: (33)1.60.10.26.44. Rep.: (33)1.60.10.04.13, fax (work)
(33)1.69.15.60.86













-- 
  ------------------------------------------------------------------------
   Daniel Taupin, 91400 ORSAY - France
   E-mail= mailto:taupind at wanadoo.fr
   Home/fax: (33)1.60.10.26.44. Rep.: (33)1.60.10.04.13, fax (work) 
(33)1.69.15.60.86













More information about the latex2html mailing list