[XeTeX] Polytonic greek and XeLaTeX

Thu Jan 3 02:59:52 CET 2008

Hi Mojca,

On 03/01/2008, at 11:11 AM, Mojca Miklavec wrote:

> On Jan 3, 2008 12:35 AM, Peter Dyballa wrote:
>>
>> Am 02.01.2008 um 22:57 schrieb Mojca Miklavec:
>>
>>> If I write Ux63 (c) + Ux030C (caron): do I get the ccaron (in a  
>>> badly
>>> designed font with ccaron included, but with no smart features) or a
>>> caron placed over c?
>>
>> I think what you get is from an expanded version of the TeX  
>> command \v
>> {}. With expanded I mean that it is contained in xunicode.sty. And
>> the product might look bad and it probably won't be found when you
>> search for haček or Dvořák or Čapek ... Bože!

With xunicode all of the following should give the same result:
      ä \"{a} \char"00E4
similarly with
      ă \u{a} \char"0103
and
      ǎ \v{a} \char"01CE .

Just what that result is may well depend upon the font being used.

>
> Hmmm ... ConTeXt actually replaces \v c by č automatically if that's
> the case (but I cannot believe that it really works that way, using
> \v).

xunicode.sty  does the same; more precisely, if the encoding declares
that an accented character exists separately, as it does in the
default Unicode encoding for \v c (as č ), then that is what is used.

But if you ask for  \v d  which doesn't have a Unicode code-point,
then you should get  Ux64 (d) + Ux030C (caron) , with a result
that depends upon how the particular font handles this combination.

> If I run "pdftotext -enc UTF-8" I also get a proper č or š, but ǎ
> remains a+combining caron, and most other characters retain a
> combining accent behind them as well (in contrast to pdfTeX where one
> can get just anything out of the (non-)combining accents).

I tried this with the triples of characters listed above, and
got 3 copies each of  ä  and  ă  and  ǎ  without using
any combining characters.
The font I used in the PDF was  Lucida Grande .

>
> (Preview.app doesn't handle accents by copy-pasting anyway, but
> searching works OK, even searching for ǎ works.)

Not sure what you mean by this.
Copying from a TeXshop window view of the PDF (so essentially Preview)
and Pasting into TextEdit, then saving as text-only, did the following:
    ä  was converted to  a + Ux0308
    ă  was converted to  a + Ux0306
    ǎ  was converted to  a + Ux030C

It is not at all clear to me at which point the conversion was made.
In copying from Preview ?
In pasting into or saving from TextEdit ?
Does  pdftotext  make any normalising assumptions?

Surely the result of copying is determined by the strings
contained in the  /ToUnicode  resource of the font.

In fact, I tried this twice, and on the first occasion  ä
stayed as a single character in 5 of 6 instances, with the
6th being converted. On the 2nd try all 6 instances were
converted to use the combining character.

>
> Mojca

Hope this helps,

	Ross

------------------------------------------------------------------------
Ross Moore                                         ross at maths.mq.edu.au
Mathematics Department                             office: E7A-419
Macquarie University                               tel: +61 +2 9850 8955
Sydney, Australia  2109                            fax: +61 +2 9850 8114
------------------------------------------------------------------------