[l2h] Converting emdashs and endashs?

James Howison jhowison at syr.edu
Tue Aug 12 18:27:42 CEST 2003


On Monday, August 11, 2003, at 11:08  pm, Ross Moore wrote:

> Hello James,
>
> On Mon, 11 Aug 2003, James Howison wrote:
>
>> Now I have curly quotes happening (yay!) I am wondering about the  
>> other
>> special characters.  I realize that this will break back-wards
>> compatibility but that is not an issue for my needs.
>>
>> I would like "---" to be converted to "—" as defined in the
>> unicode.pl file at 799 - but this doesn't seem to happen - instead it
>> is converted to "--".  This is also what happens if I change --- to
>> {---}.
>
> That is definitely a lot harder; particularly since -- and --- are
> rarely used correctly in LaTeX manuscripts.
> So general rules may easily result in something that the author
> never intended.

I use -- and --- often.

I'm still wondering, though, how to tell which conversions specified in  
the unicode.pl file actually happen and which do not---and how those  
are controlled ... I guess I'll spend some more time with the source ;)

>> Also I see from the source that converting single quotes is
>> tough---perhaps I'm naive but it would seem to me that this sequence
>> would work...
>>
>> s/``/“/og
>> s/`/‘/og     # once the `` is gone then the ` is only used for
>> open single quote right?
>
> Not at all.  \`  is used as an accent, and in some language variants,
> the ` is made active to remove the need to use the \ .
> With this active character, overloading can occur for generating
> other special characters or ligatures.

Right - well I see the difficulty now.  Quite an important distinction  
- language compatibility being very important.  The use of ` rather  
than \ is not something that I'm familiar with - out of interest why is  
this done - is it because the \ character is not easily accessible on  
the keyboard?

Perhaps if these conversions are done _after_ the conversions from  
latex->unicode then perhaps this would work (i.e. the international  
characters would already be converted to their unicode expressions ...).

>> s/''/”/og
>> s/'/’/og     # Will also replace apostrophes with close curly
>> single - not a bad thing.
>
> Sorry; I cannot agree.
> Every Latin-based charset encoding has an apostrophe character.
> A curly-quote is most definitely *not* logically an apostrophe, even
> though it may look like one.

I acknowledge that this is a matter of style---but the unicode standard  
discusses this and generally prefers the use of the curly single  
(&#2019) to the straight mark (&#0027)

http://www.unicode.org/unicode/reports/tr8/ 
#Apostrophe%20Semantics%20Errata

<snip>

> The aim of an HTML translation should not be appearance.
> It should be ensuring that meaning is preserved, and that no symbol
> is rendered with the 'missing character' glyph.

I think one might reasonably disagree that appearance is not  
important---HTML is, intentions notwithstanding, a format used for  
presentation.  Your point and care is about the 'missing character'  
glyph is well taken, the warnings are very useful for this.

The 'div' request for CSS in Hakan's email also reflects the use of  
HTML as an appearance format.

Thanks,
James

> Hope this helps,
>
> 	Ross Moore
>
>>
>> Thanks,
>> James
>>
>>
>> On Saturday, August 9, 2003, at 02:53  am, Ross Moore wrote:
>>
>>> On Sat, 9 Aug 2003, James Howison wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'd really like to convert the latex quotation marks, `` and '' to  
>>>> the
>>>> recommended HTML curly quotes, &#8220 instead of `` and &#8221  
>>>> instead
>>>> of '' - standard codes that render the curly quotes beautifully.
>>>
>>> set
>>>  $USE_CURLY_QUOTES =1;
>>> in an initialisation file.
>>>
>>> This is not the default, because not all browsers actually render
>>> these characters. (At least, that was the situation 3-4 years ago  
>>> when
>>> the LaTex2HTML coding was written.)
>>>
>>>
>>> Hope this helps,
>>>
>>> 	Ross Moore
>>>
>>>
>>>>
>>>> I'm sure that this is possible through latex2html - the codes are
>>>> listed around unicode.pl:722 - but either I can't find the magic
>>>> incantation to have latex2html do the conversion or there is a bug
>>>> preventing this from working in my version (1.70) or set-up.
>>>>
>>>> I've tried:
>>>>
>>>> latex2html -html_version 4.0,unicode test.tex
>>>>
>>>> What is strange is that this does work for, say \v{Z} which converts
>>>> to
>>>> the code &#381; (and that is definitely happening through unicode.pl
>>>> (I
>>>> changed the translation and it worked fine).
>>>>
>>>> So why doesn't the translation for `` (which is correctly listed in
>>>> the
>>>> unicode.pl as \`\`) and '' which is correctly listed as \'\' work?
>>>>
>>>> I've had a good hunt around for this - but I can't see why the other
>>>> codes are converted but not the quotes.
>>>>
>>>> Cheers,
>>>> James
>>>>
>>>> ps.  minimal test.tex follows
>>>>
>>>> ----------
>>>>
>>>> \documentclass[11pt]{article}
>>>> \begin{document}
>>>> ``Why are these quotes not converted to unicode''  (they are in the
>>>> unicode.pl file)
>>>> While this symbol (also in the unicode.pl file) is? - \v{Z}
>>>> \end{document}
>>>>
>>>> _______________________________________________
>>>> latex2html mailing list
>>>> latex2html at tug.org
>>>> http://tug.org/mailman/listinfo/latex2html
>>>>
>>>
>>
>> _______________________________________________
>> latex2html mailing list
>> latex2html at tug.org
>> http://tug.org/mailman/listinfo/latex2html
>>
>



More information about the latex2html mailing list