[XeTeX] Bug fixes and new features related to Unicode character codes, surrogates, etc

Ross Moore ross.moore at mq.edu.au
Thu May 7 03:07:07 CEST 2015


Hi David,

On 07/05/2015, at 9:26 AM, David Carlisle wrote:

>> The character itself, as bytes that is, is not wrong and users should be able to create these.
>> But preferably through macros that ensure that they come correctly paired.
> 
> placing two character tokens representing a surrogate pair should not
> though magically turn itself
> into a single character.

Agreed.
You don't know whether you want a single character until 
you know what kind of output is being generated.
That need not be known on input.

> The UTF-8 or ^^^^ encoding should refer to
> the unicode code point not
> to the UTF-16 encoding,

No disagreement to this.

> 
> In the current versions ^^^^d835^^^^dc00 is two characters in luatex
> and one character in xetex
> as the implementation detail that xetex's underlying storage is mostly
> UTF-16 is exposed.

This seems to be premature of XeTeX then.
It seems to be making an assumption on how those bytes 
will ultimately be used.

> If it is
> not possible to prevent ^^^ or utf8 encoded surrogate pairs combining
> then it is better to
> prevent them being formed.

Hmm. 
What if you have an entirely different purpose in mind for those bytes?
You still need to be able to create them and do further processing with them.

Maybe there should be a primitive that sets a flag controlling what
happens to surrogates' bytes on input?
It may well be that XeTeX's current behaviour is best for putting
content into PDF pages; but not best in other situations. So a macro
programmer should have a means to change this, when needed.

> 
> this is no different to XML where & #xd835;& #xdc00; always refers to
> two (invalid) characters not
> to & #x1d400;

Seems fine to me.
If application software wants/needs to combine them, it can do so.

> 
> David


Cheers,

	Ross


Ross Moore

Senior Lecturer
Mathematics Department  |   Level 2, E7A 
Macquarie University, NSW 2109, Australia
T: +61 2 9850 8955   |  F: +61 2 9850 8114
M: +61 407 288 255  |  http://www.maths.mq.edu.au/

CRICOS Provider Number 00002J. Think before you print. Please consider the environment before printing this email.

This message is intended for the addressee named and may contain confidential information. If you are not the intended recipient, please delete it and notify the sender. Views expressed in this message are those of the individual sender, and are not necessarily the views of Macquarie University.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/xetex/attachments/20150507/ee564547/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: macquarieUni_sm.png
Type: image/png
Size: 4605 bytes
Desc: not available
URL: <http://tug.org/pipermail/xetex/attachments/20150507/ee564547/attachment-0001.png>


More information about the XeTeX mailing list