[omega] Question about the paper published in EuroTeX 2005

Wed Mar 30 15:08:47 CEST 2005

Yannis
> And there are still many intermediate levels between "mandatory" and 
> "aesthetic".

Yes, indeed.

> We all agree that "fi" is an aesthetic one (although there is a Unicode 
> character for it...)
> and that "æ" is a mandatory one (it is even considered as a "letter", 
> and of course
> also as a character). But what about Arabic lam-alif? It is mandatory 
> in every Arabic
> writing style, and it is the only ligature introduced in all Arabic 
> grammars. But still it is not
> a "character".
>

Yes, I was not trying to claim that the analysis is simple.  When you
get down to these type of considerations you start to ask `what is a
character?'  (Or even, if you are me or Joachim: what is text?)

For the present I think it is good enough to say that a character is
almost (but with some exceptions in both directions) a `populated
Unicode slot' (so that text is a sequence of such `characters'
together, importantly, with a property (which I call, for want of a
better word,  its `language') that tells us how to give a meaning to
the sequence (`meaning', in this message, is undefined). 

> I disagree with ligatures resulting only from the glyph-choice 
> procedure. They may be
> aesthetic but there are still rules about avoiding them: in Turkish you 
> *must* avoid
> "fi" ligature, in German you *must* avoid it between word components. 

I agree that another word is needed for this non-free choice, but
these can be viewed (for computing purposes) as rules that restrict
the glyph choice when typesetting a particular language.  Benjamin B
will recognise this as part of a more general (and very long) LaTeX3
workshop where we tried to separate (not very successfully) `language
properties' from other `cultural, typographic and aesthetic'
conventions in typesetting. 

> Unicode
> gives a solution to this: the ZWNJ character, but IMHO this is the 
> wrong way to deal
> with this: it is not logical to introduce an extra character for 
> avoiding a phenomenon
> on the glyph level.

That bit of Unicode is a collection of such ad hoc kludges!

Recent versions of Unicode also support the concept of a`language
label' for text (not sure if that is the word they used: it is what
Frank and I called it when we explained its importance to them).

> You are returning to an idea we where discussing (w/ John) when you were
> in Brest: to act on words and to use a server of word visualisations. 
> This is not
> incompatible with textemes: the latter will describe the structure of 
> the former,
> and we can even imagine a link to the "word visualisation" as texteme 
> property.
> 

Agreed, but if the software model has this idea of `word' that may
affect how much and what information you put into each texteme.

> We had a similar idea about the Quran: a server providing authentified 
> segments
> of Quranic text, every word of which would be a link to the server. 
> This can be
> brought down to the texteme level and every Quranic texteme in a 
> document
> can be considered as the instance of the "abstract class" being on the 
> server. It is
> even compatible with religion: every printed Quran is an instance of 
> the abstract ONE,
> and this goes down to every single letter.
> 

`In the beginning was The Word ...' (that is a kind of anti-pun:-).

> > 3a. When using a font resource whose rendering engine must be accessed
> > via `sequences of Unicode slot numbers' there will be an extra step
> > needed in order to deliver the correct sequence to the font resource
> > (and to ensure that the right settings (for use of ligatures etc etc)
> > are used when the font resource interprets that sequence).  This seems
> > to me to be a peculiarity of this week's technology, so I am not sure
> > that this step should be part of a good general model of
> > characters/glyphs/@@@emes.
> 
> you refer to re-ordering as done by Uniscribe/ATSUI/Pango. This should 
> be
> feasible through an OTP, or maybe at a more global level so that we can 
> apply bidi
> across buffer boundaries. These are problems we are searching upon.
> 

More to the general idea that, as I understand it, one is not meant to
access glyphs in an OpenType-style of font resource 

> > 5.  Some of this may be relevant to what the `objects' used by the
> > model (and implementation) are called.  Another thing that may be
> > relevant to choosing a name is that similar objects (ie with extensible
> > property lists) and structures (eg `network graphs') will be needed at
> > all levels: characters/glyphs, words, lines, paragraphs (in their many
> > forms), columns, table entries, tables, pages, spreads, ...
> 
> ... and the Universe can be seen in Judith Foster's eye (as in the 
> Contacts movie) :-)
> 
> Joke apart, I agree of course. But concerning omega, let us not be 
> tempted (as others
> in previous years have be tempted to rebuild the World). Our current 
> experimentations
> are as simple as possible, and they will become more complex whenever 
> there is
> a (concrete) need for it. Let us first solve the issues of 
> micro-typography; when this
> is done, then only we (or others) will examine higher-level structures.
> 

Again, that is fine but therefore do not spend too much time on
finding a word for a particular class when it may eventually turn out to be
simply a specialisation of some much bigger and more useful class
and, in that wider context, its `true name' may become obvious.

> 
> I noticed you call our *** concept @@@emes, don't you like the word 
> "textemes"?
> 

No, I think @s are friendly characters...so I did not use ***emes:-)

I was merely trying to avoid that discussion, although the ending
`eme' as in `meme' is one I am quite fond of.

But if you want my analysis: the problem with `texteme' is that the `eme'
ending does not convey the atomicity that I think is an important
property of these objects; to me it sounds more like a bit of text
that has some wholeness (like a word or a phrase).

But here I am contradicting my injunction to not spend time on
getting the `true name' since that will become apparent when we really
understand the idea.

So I shall stop.

chris