[XeTeX] handling malformed UTF-8 input

Ross Moore ross at ics.mq.edu.au
Sat Feb 23 00:13:36 CET 2008


Hi Bruno,

On 23/02/2008, at 3:35 AM, Bruno Voisin wrote:

> Le 22 févr. 08 à 13:25, Andreas Matthias a écrit :
>
>> I just had the impression that your posting before last was
>> focusing on the question of error/warning. IMHO an error would
>> be better but a warning is not harmful either.
>
> Just to add a "usability" side to the issue: from experience when
> people in my department ask for help debugging (La)TeX documents which
> don't typeset, I can tell that most people don't care about warnings,
> don't read them, in a GUI hide or close the console without reading.
> They just don't care: provided a documents typesets to the end and
> returns output (I mean a PDF file), they absolutely don't care at all
> about overfull or underfull boxes, equations too big to fit on a line,
> unwanted font switches, etc. I don't approve this attitude, but in my
> (mechanics) department, for people writing scientific texts, proposals
> and the like, that's what I've seen with most (La)TeX users.

This is in absolute agreement with what I experience.
Trying to work efficiently within such an environment
is precisely the problem that I'm trying to solve.

>
> I can see Ross'es (is that how you write this in English? -- I'm not

    Ross'  is the best way.   :-)
>
> sure) point regarding automated treatment. But if you want to direct
> the user's attention to a problem, I think a warning is doomed to fail
> in most cases, and an error is required: it's only when the typeset
> fails and is interrupted that the user will care.

In my situation the user is not in a position to do
anything about the typesetting.
All they are doing is providing data for collection.
Yes, ultimately this data will need to be typeset, as part
of a Program or Abstract Book or List of Participants, etc.


The problem is to verify the correctness of entered data,
and to identify whether there are any problems.
This is best done at the time of data-entry.

Submission of the data need not guarantee that it has been
accepted. The submitter should "Confirm" that it is OK,
after having been presented with an onscreen rendering.
(e.g. a .png image of a small PDF created by typesetting
the name/address/title/abstract etc.)

It may even be that normalising transformations are
performed on some parts of the input data;
e.g.,  user inputs  'Caltech'  which the system replaces
with  "California Institute of Technology".


Confirmation should not be just a Yes/No choice.
Options can be such as:
   *  yes, it looks perfect;
   *  there are funny box-like characters instead of accented letters;
   *  everything is garbled after some <specified> point;
   *  sorry, I input everything in capitals;
   *  the institution showing is not what I submitted;
   *  there are poor line-breaks in the displayed abstract;
   *  I'll go back and re-enter some data fields;
   ...
An extra text field would be provided so that the submitter
can give extra comments about what seems to be wrong,
giving the experts a chance of determining whether the problem
is caused by the system or by bad data, and allowing
for a 2nd representation of the data that was problematic.

In all cases an email (or other warning mechanism) will
be sent to the experts, who can then decide what further
action may need to be taken.


In short, the submitter sees what he/she enters, and receives
a nicely-presented visual representation of that data.
They must confirm that the latter properly represents
the former; if not, then try to fix obvious things (such
as spelling errors, inappropriate capitalisations, etc.)
or provide relevant feedback about anything else that
looks wrong. Failure to confirm would also be handled
in an appropriate way.


It's pretty clear that with such a system, the (Xe)TeX
processing must be completely hidden from the submitter.
It must produce a reliable result on all the valid data,
and must give a clear visual indication of any data
which is problematic.

Stopping for errors is just not on.

Using \nonstopmode is clear. But even this produces lots
of useless stuff from error messages not handled
interactively, whereas warnings can be much more useful.

>
> Bruno Voisin


Cheers,

	Ross

------------------------------------------------------------------------
Ross Moore                                         ross at maths.mq.edu.au
Mathematics Department                             office: E7A-419
Macquarie University                               tel: +61 +2 9850 8955
Sydney, Australia  2109                            fax: +61 +2 9850 8114
------------------------------------------------------------------------




More information about the XeTeX mailing list