[XeTeX] Polytonic greek and XeLaTeX
Ross Moore
ross at ics.mq.edu.au
Thu Jan 3 02:59:52 CET 2008
Hi Mojca,
On 03/01/2008, at 11:11 AM, Mojca Miklavec wrote:
> On Jan 3, 2008 12:35 AM, Peter Dyballa wrote:
>>
>> Am 02.01.2008 um 22:57 schrieb Mojca Miklavec:
>>
>>> If I write Ux63 (c) + Ux030C (caron): do I get the ccaron (in a
>>> badly
>>> designed font with ccaron included, but with no smart features) or a
>>> caron placed over c?
>>
>> I think what you get is from an expanded version of the TeX
>> command \v
>> {}. With expanded I mean that it is contained in xunicode.sty. And
>> the product might look bad and it probably won't be found when you
>> search for haček or Dvořák or Čapek ... Bože!
With xunicode all of the following should give the same result:
ä \"{a} \char"00E4
similarly with
ă \u{a} \char"0103
and
ǎ \v{a} \char"01CE .
Just what that result is may well depend upon the font being used.
>
> Hmmm ... ConTeXt actually replaces \v c by č automatically if that's
> the case (but I cannot believe that it really works that way, using
> \v).
xunicode.sty does the same; more precisely, if the encoding declares
that an accented character exists separately, as it does in the
default Unicode encoding for \v c (as č ), then that is what is used.
But if you ask for \v d which doesn't have a Unicode code-point,
then you should get Ux64 (d) + Ux030C (caron) , with a result
that depends upon how the particular font handles this combination.
> If I run "pdftotext -enc UTF-8" I also get a proper č or š, but ǎ
> remains a+combining caron, and most other characters retain a
> combining accent behind them as well (in contrast to pdfTeX where one
> can get just anything out of the (non-)combining accents).
I tried this with the triples of characters listed above, and
got 3 copies each of ä and ă and ǎ without using
any combining characters.
The font I used in the PDF was Lucida Grande .
>
> (Preview.app doesn't handle accents by copy-pasting anyway, but
> searching works OK, even searching for ǎ works.)
Not sure what you mean by this.
Copying from a TeXshop window view of the PDF (so essentially Preview)
and Pasting into TextEdit, then saving as text-only, did the following:
ä was converted to a + Ux0308
ă was converted to a + Ux0306
ǎ was converted to a + Ux030C
It is not at all clear to me at which point the conversion was made.
In copying from Preview ?
In pasting into or saving from TextEdit ?
Does pdftotext make any normalising assumptions?
Surely the result of copying is determined by the strings
contained in the /ToUnicode resource of the font.
In fact, I tried this twice, and on the first occasion ä
stayed as a single character in 5 of 6 instances, with the
6th being converted. On the 2nd try all 6 instances were
converted to use the combining character.
>
> Mojca
Hope this helps,
Ross
------------------------------------------------------------------------
Ross Moore ross at maths.mq.edu.au
Mathematics Department office: E7A-419
Macquarie University tel: +61 +2 9850 8955
Sydney, Australia 2109 fax: +61 +2 9850 8114
------------------------------------------------------------------------
More information about the XeTeX
mailing list