[XeTeX] Line-breaking algorithms in XeTeX

Pander pander at users.sourceforge.net
Mon Apr 27 15:08:07 CEST 2009


Here is the script, example output is below where F=font, r=right
margin, l=left mragin, b=bottom margin, t=top margin, over=overfull,
under=underfull etc. the error is calculated like:
  math.sqrt((over * over) + (under * under) + (hyphen_percent *
hyphen_percent) + (pages * pages))

I think in the next version I will omit the pages in the error calculation

Please test the script and let me know what to improve.

F=FreeSerif R=0.700 L=0.467 B=0.700 T=0.467   Over:3 Under:31 Pages:98
HyphenExcept:8.229 (674/8191)   Error:103.159
F=Gentium Basic R=0.700 L=0.467 B=0.700 T=0.467   Over:18 Under:33
Pages:100 HyphenExcept:8.229 (674/8191)   Error:107.148
F=Gentium R=0.700 L=0.467 B=0.700 T=0.467   Over:18 Under:33 Pages:100
HyphenExcept:8.229 (674/8191)   Error:107.148
F=Gentium Book Basic R=0.700 L=0.467 B=0.700 T=0.467   Over:27 Under:31
Pages:100 HyphenExcept:8.229 (674/8191)   Error:108.433
F=FreeSerif R=0.700 L=0.467 B=1.000 T=0.667   Over:3 Under:41 Pages:102
HyphenExcept:8.229 (674/8191)   Error:110.280
F=Gentium Basic R=0.700 L=0.467 B=1.000 T=0.667   Over:18 Under:42
Pages:102 HyphenExcept:8.229 (674/8191)   Error:112.070
F=Gentium R=0.700 L=0.467 B=1.000 T=0.667   Over:18 Under:42 Pages:102
HyphenExcept:8.229 (674/8191)   Error:112.070
F=FreeSerif R=1.000 L=0.667 B=0.700 T=0.467   Over:17 Under:40 Pages:104
HyphenExcept:8.229 (674/8191)   Error:113.016
F=Gentium R=1.000 L=0.667 B=0.700 T=0.467   Over:31 Under:39 Pages:104
HyphenExcept:8.229 (674/8191)   Error:115.610
F=Gentium Basic R=1.000 L=0.667 B=0.700 T=0.467   Over:36 Under:38
Pages:104 HyphenExcept:8.229 (674/8191)   Error:116.721
F=Gentium Book Basic R=0.700 L=0.467 B=1.000 T=0.667   Over:27 Under:47
Pages:110 HyphenExcept:8.229 (674/8191)   Error:122.905
F=Gentium Book Basic R=1.000 L=0.667 B=0.700 T=0.467   Over:46 Under:39
Pages:108 HyphenExcept:8.229 (674/8191)   Error:123.971
F=FreeSerif R=1.000 L=0.667 B=1.000 T=0.667   Over:17 Under:46 Pages:114
HyphenExcept:8.229 (674/8191)   Error:124.373
F=FreeSerif R=0.700 L=0.467 B=1.300 T=0.867   Over:3 Under:46 Pages:116
HyphenExcept:8.229 (674/8191)   Error:125.095
F=FreeSerif R=1.300 L=0.867 B=0.700 T=0.467   Over:29 Under:44 Pages:114
HyphenExcept:8.229 (674/8191)   Error:125.860
F=Gentium R=1.000 L=0.667 B=1.000 T=0.667   Over:31 Under:45 Pages:114
HyphenExcept:8.229 (674/8191)   Error:126.687
F=Gentium R=0.700 L=0.467 B=1.300 T=0.867   Over:18 Under:47 Pages:118
HyphenExcept:8.229 (674/8191)   Error:128.548
F=Gentium Basic R=0.700 L=0.467 B=1.300 T=0.867   Over:18 Under:48
Pages:118 HyphenExcept:8.229 (674/8191)   Error:128.917
F=Gentium Basic R=1.000 L=0.667 B=1.000 T=0.667   Over:36 Under:46
Pages:116 HyphenExcept:8.229 (674/8191)   Error:130.137
F=FreeSerif R=1.000 L=0.667 B=1.300 T=0.867   Over:17 Under:47 Pages:122
HyphenExcept:8.229 (674/8191)   Error:132.097
F=Gentium Book Basic R=0.700 L=0.467 B=1.300 T=0.867   Over:27 Under:48
Pages:120 HyphenExcept:8.229 (674/8191)   Error:132.290
F=FreeSerif R=1.300 L=0.867 B=1.000 T=0.667   Over:29 Under:47 Pages:122
HyphenExcept:8.229 (674/8191)   Error:134.170
F=Gentium R=1.000 L=0.667 B=1.300 T=0.867   Over:31 Under:48 Pages:122
HyphenExcept:8.229 (674/8191)   Error:134.969
F=Gentium Book Basic R=1.000 L=0.667 B=1.000 T=0.667   Over:46 Under:46
Pages:120 HyphenExcept:8.229 (674/8191)   Error:136.747
F=Gentium Basic R=1.000 L=0.667 B=1.300 T=0.867   Over:36 Under:46
Pages:124 HyphenExcept:8.229 (674/8191)   Error:137.316
F=Gentium Book Basic R=1.000 L=0.667 B=1.300 T=0.867   Over:46 Under:51
Pages:124 HyphenExcept:8.229 (674/8191)   Error:141.988
F=FreeSerif R=1.300 L=0.867 B=1.300 T=0.867   Over:29 Under:56 Pages:130
HyphenExcept:8.229 (674/8191)   Error:144.723
F=Gentium R=1.300 L=0.867 B=0.700 T=0.467   Over:104 Under:47 Pages:116
HyphenExcept:8.229 (674/8191)   Error:162.938
F=Gentium Basic R=1.300 L=0.867 B=0.700 T=0.467   Over:107 Under:48
Pages:116 HyphenExcept:8.229 (674/8191)   Error:165.157
F=Gentium R=1.300 L=0.867 B=1.000 T=0.667   Over:104 Under:48 Pages:124
HyphenExcept:8.229 (674/8191)   Error:169.008
F=Gentium Basic R=1.300 L=0.867 B=1.000 T=0.667   Over:107 Under:50
Pages:124 HyphenExcept:8.229 (674/8191)   Error:171.443
F=Gentium R=1.300 L=0.867 B=1.300 T=0.867   Over:104 Under:54 Pages:130
HyphenExcept:8.229 (674/8191)   Error:175.213
F=Gentium Basic R=1.300 L=0.867 B=1.300 T=0.867   Over:107 Under:55
Pages:130 HyphenExcept:8.229 (674/8191)   Error:177.318
F=Gentium Book Basic R=1.300 L=0.867 B=0.700 T=0.467   Over:136 Under:48
Pages:118 HyphenExcept:8.229 (674/8191)   Error:186.525
F=Gentium Book Basic R=1.300 L=0.867 B=1.000 T=0.667   Over:136 Under:51
Pages:124 HyphenExcept:8.229 (674/8191)   Error:191.156
F=Gentium Book Basic R=1.300 L=0.867 B=1.300 T=0.867   Over:136 Under:60
Pages:134 HyphenExcept:8.229 (674/8191)   Error:200.299



Pander wrote:
> John Was wrote:
>> Dear All
>>  
>> Since starting to use (plain) XeTeX I've noticed something strange with
>> the paragraphing/line-breaking mechanism which has never happened during
>> the ten years or so during which I have used traditional TeX.  It is
>> cropping up in the fourth issue of a periodical that I have set with
>> XeTeX, so I'm pretty sure that it's not a random fluke.
>>  
>> (1) I sometimes get an overfull rule (i.e. rectangular box) at the
>> right-hand side which will disappear when I either (a) attach the word
>> causing the problem to the next word with ~, forcing it over (I
>> sometimes have to put the word in an \hbox{} as well); or (b) when I
>> increase the line-count by giving \looseness1 for the paragraph.  In the
>> past, plain TeX would always make such decisions for itself and never
>> generate an overfull rule when it could find a way to justify the
>> paragraph without doing so.  This happens most frequently in the reviews
>> section of the periodical, where  \looseness is set to -1 by default to
>> save as much space as possible:  but until I started to use XeTeX, it
>> was always the case that if the paragraph could not lose a line, then
>> the negative looseness was ignored and the paragraph was set
>> successfully with normal looseness  (i.e. \looseness = 0).  It was never
>> (I think) the case that a tight looseness which generated an overfull
>> box would get through and need manual intervention from me.  So has
>> something altered in the way XeTeX is handling the line-breaks, giving
>> priority to the looseness command even at the expense of generating an
>> overfull rule, and even when zero looseness would cause that error to
>> disappear?
>>  
>> (2) This is even more puzzling (and more of an nuisance).  For the
>> purpose of sending contributors proofs of their reviews I start each
>> review on a new page so that they don't also receive the tops and tails
>> of adjacent reviews, but while initially typesetting I have the reviews
>> running on consecutively, as they will do in the final published
>> version.  There is a switch at the end of each review which generates a
>> \vfill \eject when \ifseparatereviews is true, otherwise it just
>> produces a \vskip: there is no other difference.  Yet I sometimes get
>> overfull rules showing up (at random points) when the reviews are
>> separated out, even though the same paragraph typeset without error
>> while the reviews were set to run on continuously.  The problem almost
>> (but not entirely) disappears if I double the \hfuzz when the
>> \ifseparatereviews switch is true, but that is no more than a quick fix
>> to prevent authors receiving proofs with worrying blobs at the
>> right-hand side.  This seems incomprehensible, but as it has happened
>> with four out of four periodical issues I can't be imagining it - and
>> the commands are precisely the same as the ones I used when the
>> periodical was typeset using traditional plain TeX, with no new
>> parameters such as alteration to \spaceskip or anything else that might
>> cause this to happen.
>>  
>> (1) and (2) seem likely to be part of the same problem (though not
>> necessarily so).  Any ideas, or at least insight into what XeTeX is
>> doing that old plain TeX didn't?
>>  
>> Thanks
>>  
>>  
>> John
> 
> Hi all,
> 
> Slightly related is something I have made. Sometimes you have some
> freedom of choice in font and in the dimensions of the margins of the
> work you are about to make. Each selection will have a different amount of:
> - Overfull
> - Underfull
> - hyphenation exceptions
> 
> I have made a python script that, via exhaustive enumeration, will find
> the optimum settings for a minimum amount of occurrences of the list
> above. Using those optimal settings could be a smarter starting point
> for fixing widows, orphans and hyphenation exceptions.
> 
> If someone is interested in this script. please contact me.
> 
> Regards,
> 
> Pander
> 
>>  
>>  
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> XeTeX mailing list
>> postmaster at tug.org
>> http://tug.org/mailman/listinfo/xetex
> 
> _______________________________________________
> XeTeX mailing list
> postmaster at tug.org
> http://tug.org/mailman/listinfo/xetex

-------------- next part --------------
A non-text attachment was scrubbed...
Name: find-optimum.py
Type: text/x-python
Size: 5191 bytes
Desc: not available
Url : http://tug.org/pipermail/xetex/attachments/20090427/a71ed58b/attachment.py 


More information about the XeTeX mailing list