[XeTeX] handling malformed UTF-8 input

Mike Maxwell maxwell at umiacs.umd.edu
Sat Feb 23 03:25:35 CET 2008


Bruno Voisin wrote:
> Just to add a "usability" side to the issue: from experience when  
> people in my department ask for help debugging (La)TeX documents which  
> don't typeset, I can tell that most people don't care about warnings,  
> don't read them, in a GUI hide or close the console without reading.  
> They just don't care: provided a documents typesets to the end and  
> returns output (I mean a PDF file), they absolutely don't care at all  
> about overfull or underfull boxes, equations too big to fit on a line...

Perhaps they're expecting more from LaTeX than it can provide at the 
present?  That is, when these kinds of problems arise, are they maybe 
expecting LaTeX to do something smart--like wrap the equation at a 
"reasonable" place, or automatically either float a table to the next 
page or break the table and continue it on the next page (like I was 
hoping for in an earlier thread:-)), etc.?

I can imagine that a decade or so ago, these kinds of smart formatting 
options were not possible due to the constraints of computer memory and 
processing power.  But I think they could be provided now.  In fact, one 
could imagine the program stopping and asking what to do, or maybe 
better doing something smart (perhaps taking into account some 
user-defined or style sheet-defined preferences, like 'floating a table 
smaller than X% of the page size is preferable to breaking it over a 
page boundary').  I think this is what Microsoft Word seems to do with 
tables that don't fit in their current position, and it seems to do it 
rather well.

Maybe it's time to "wrap" LaTeX inside some other program that would 
provide smart error (warning) handling.  It can still emit warnings 
(e.g. "The table on pg 16 didn't fit on pg 15, so I decided to float it; 
let me know if that's not the right thing to do.").  Sort of like dwim 
programming was supposed to work, except in this case I think it's 
easier, because the semantics of what the user is trying to do (output 
stuff on a page of certain dimensions) is clearer than the case of a 
general purpose programming language.

(And I do realize that this is the XeTeX mailing list, not a general 
text formatting list, but the topic came up here, so...)
-- 
    Mike Maxwell
    What good is a universe without somebody around to look at it?
    --Robert Dicke, Princeton physicist


More information about the XeTeX mailing list