[texhax] best way to revise a large existing text

W.J. Metzger wes at hef.ru.nl
Fri Oct 30 13:16:08 CET 2009


Frederik,   Thanks for looking at this.  So, it appears that if I would
upgrade to the newest perl, I would get a warning message instead of a
segmentation fault.  Unfortunately, that is tricky, since I work on several
computers for which I am not the administrator, and he prefers to wait for
an rpm from Scientific Linux.  Anyway, with the addition of the comment I
avoid the problem for now.  Please inform me when you have a new version of
latexdiff.  And thanks for your interest.

Regards, Wes


On Thu, 29 Oct 2009, Frederik Tilmann wrote:

> Wes
>
> apologies for not having read your previous email carefully enough. My perl
> still does not crash but I get a warning message that gives a clue why you
> get a segfault:
> "Complex regular subexpression recursion limit (32766) exceeded at
> /home/tilmann/bin/latexdiff line 1159."
> and as you say this occurs during the initial parsing (see below).  Even with
> that error I still get a reasonable result out (same as before) but of course
> this could be different once differences are introduces. This bug is not very
> straightforward to fix but I have been thinking for some time about parsing
> the comments in a completely different way which would also solve this
> problem, and your problem is an impetus to do this with high priority.  Even
> so unfortunately it will be some time before I get round to this, and in the
> meantime I can only recommend upgrading your perl version which at least
> allows approximate parsing of the troublesome documents.
>
> Regards
> Frederik
>
>
>
>
>>  ~/tmp 85> latexdiff -V pap7.tex pap7.tex > pap7-diff.tex
>>  This is LATEXDIFF 0.5  (Algorithm::Diff 1.15 so)
>>    (c) 2004-2007 F J Tilmann
>>  Preamble Internal Type UNDERLINE
>>  Preamble Internal Type SAFE
>>  Preamble Internal Type FLOATSAFE
>>  Differencing preamble.
>>  amsmath package detected.
>>  Preprocessing body.  (0.11 s)
>>  Splitting into latex tokens
>>  Parsing pap7.tex
>>  Complex regular subexpression recursion limit (32766) exceeded at
>>  /home/tilmann/bin/latexdiff line 1159.
>>
>>  WARNING: Inconsistency in length of input string and parsed string:
>>       This often indicates faulty or non-standard latex code.
>>       In many cases you can ignore this and the following warning messages.
>>  Note that character numbers in the following are counted beginning after
>>  \begin{document} and are only approximate.DEBUG Original length 109458
>>  Parsed length 109456
>>  Complex regular subexpression recursion limit (32766) exceeded at
>>  /home/tilmann/bin/latexdiff line 1191.
>>
>>   in terms of $Q$ the \taumodel\ provides a good description, much bette
>>                                ^^^^^^^^^^^
>>  Missing characters near word 7199 character index: 101834-101845 Length: 9
>>  Match: |provides | (expected match marked above).
>>  Parsing pap7.tex
>>  Complex regular subexpression recursion limit (32766) exceeded at
>>  /home/tilmann/bin/latexdiff line 1159.
>>
>>  WARNING: Inconsistency in length of input string and parsed string:
>>       This often indicates faulty or non-standard latex code.
>>       In many cases you can ignore this and the following warning messages.
>>  Note that character numbers in the following are counted beginning after
>>  \begin{document} and are only approximate.DEBUG Original length 109458
>>  Parsed length 109456
>>  Complex regular subexpression recursion limit (32766) exceeded at
>>  /home/tilmann/bin/latexdiff line 1191.
>>
>>   in terms of $Q$ the \taumodel\ provides a good description, much bette
>>                                ^^^^^^^^^^^
>>  Missing characters near word 7199 character index: 101834-101845 Length: 9
>>  Match: |provides | (expected match marked above).
>>  (0.65 s)
>>  Pass 1: Expanding text commands and merging isolated identities with
>>  changed blocks
>>    7848 matching  tokens in 0 blocks.
>>    0 discarded tokens in 0 blocks.
>>    0 appended  tokens in 0 blocks.
>>  (0.08 s)
>>  Pass 2: inserting DIF tokens and mark up.
>>    7848 matching  tokens.
>>    0 discarded tokens in 0 blocks.
>>    0 appended  tokens in 0 blocks.
>>  (0.11 s)
>>  Postprocessing body.
>>  (0.03 s)
>>  Done.
>
> On 28/10/09 11:28, W.J. Metzger wrote:
>>  Dear Frederik,
>>
>>  This is what I get too for latexdiff pap7.tex pap7.tex > pap7-diff.tex
>>  But did you try it removing line 635? That is the line
>>  %%%%%%%% } ADDING THIS LINE PREVENTS segmentation fault in latexdiff
>>  If I remove that line I get the segmentation fault.
>>
>>  Cheers, Wes
>>
>>  On Tue, 27 Oct 2009, Frederik Tilmann wrote:
>>
>> >  Dear Wes,
>> >
>> >  I can't reproduce the error: see the following transcript.
>> >
>> > >  ~/tmp 56> latexdiff pap7.tex pap7.tex > pap7-diff.tex
>> > >
>> > >  WARNING: Inconsistency in length of input string and parsed string:
>> > >  This often indicates faulty or non-standard latex code.
>> > >  In many cases you can ignore this and the following warning messages.
>> > >  Note that character numbers in the following are counted beginning
>> > >  after
>> > >  \begin{document} and are only approximate.DEBUG Original length 109535
>> > >  Parsed length 109533
>> > >
>> > >  in terms of $Q$ the \taumodel\ provides a good description, much bette
>> > >  ^^^^^^^^^^^
>> > >  Missing characters near word 7149 character index: 101911-101922
>> > >  Length: 9
>> > >  Match: |provides | (expected match marked above).
>> > >
>> > >  WARNING: Inconsistency in length of input string and parsed string:
>> > >  This often indicates faulty or non-standard latex code.
>> > >  In many cases you can ignore this and the following warning messages.
>> > >  Note that character numbers in the following are counted beginning
>> > >  after
>> > >  \begin{document} and are only approximate.DEBUG Original length 109535
>> > >  Parsed length 109533
>> > >
>> > >  in terms of $Q$ the \taumodel\ provides a good description, much bette
>> > >  ^^^^^^^^^^^
>> > >  Missing characters near word 7149 character index: 101911-101922
>> > >  Length: 9
>> > >  Match: |provides | (expected match marked above).
>> > >  ~/tmp 57> latexdiff --version
>> > >  This is LATEXDIFF 0.5 (Algorithm::Diff 1.15 so)
>> > >  (c) 2004-2007 F J Tilmann
>> > >  ~/tmp 58> perl --version
>> > >
>> > >  This is perl, v5.10.0 built for i386-linux-thread-multi
>> > >
>> > >  Copyright 1987-2007, Larry Wall
>> > >
>> > >  Perl may be copied only under the terms of either the Artistic
>> > >  License or
>> > >  the
>> > >  GNU General Public License, which may be found in the Perl 5 source
>> > >  kit.
>> > >
>> > >  Complete documentation for Perl, including FAQ lists, should be found
>> > >  on
>> > >  this system using "man perl" or "perldoc perl". If you have access to
>> > >  the
>> > >  Internet, point your browser at http://www.perl.org/, the Perl Home
>> > >  Page.
>> > >
>> >
>> >  pap7-diff.tex seems to contain reasonable output. Only change to
>> >  pap7.tex is
>> >  that some newlines get removed. (particularly before comments, or
>> >  where there
>> >  are multiple newlines)
>> >
>> >  It is not the perl version either; I ran the same sequence on another
>> >  machine, which has perlv5.8.0, and get the same output as above. Also
>> >  get
>> >  the same result with latexdiff-so 0.5, and latexdiff-fast 0.42.
>> >
>> >  If anyone else is reading this thread, can someone else reproduce?
>> >
>> >  Your other reported bug (ignores "\ " is a real shortcoming leading to
>> >  the
>> >  warnings and I will try to address this in the next version).
>> >
>> >  Frederik
>> >
>> >
>> >
>> >
>> >
>> >
>> >  On 27/10/09 16:20, W.J. Metzger wrote:
>> > >  On Mon, 26 Oct 2009, Frederik Tilmann wrote:
>> > >
>> > > >  Dear Wes
>> > > >
>> > > >  I have never had any reports of segfaults, and I know some people
>> > > >  have
>> > > >  used
>> > > >  it on their PhD thesis, so length should not really be an issue. It
>> > > >  should
>> > > >  really bail with a Perl error if there was anything wrong with the
>> > > >  latexdiff
>> > > >  code.
>> > > >  What's your system and perl version? Did you try latexdiff-fast,
>> > > >  which
>> > > >  might
>> > > >  be more robust if there is a memory problem with perl?
>> > > >
>> > > >  Frederik
>> > >
>> > >  Dear Frederik,
>> > >
>> > >  I run on Scientific Linux 5.3, which is a clone of Red Hat Enterprise
>> > >  5.
>> > >  The perl version is v5.8.8 built for i386-linux-thread-multi
>> > >  latexdiff latexdiff-fast and latexdiff-so all gave the segmentation
>> > >  fault.
>> > >
>> > >  I tried doing it also on another machine with a slightly older
>> > >  version of
>> > >  perl v5.8.5, but with twice the memory. It also gave the segmentation
>> > >  fault.
>> > >
>> > >  I've played around with the tex file and found that the segmentation
>> > >  fault
>> > >  could be avoided by adding a comment line -- line 635 of the attached
>> > >  file.
>> > >  If that line is removed, I get the segmentation fault.
>> > >
>> > >  The segmentation fault occurs very quickly, almost immediately. So I
>> > >  think
>> > >  that latexdiff has not started looking for the differences yet.
>> > >
>> > >  I thought that the problem might be misinterpreting a { that was in a
>> > >  comment, since adding a comment with a } got rid of the segmentation
>> > >  fault.
>> > >  I attempted to isolate the problem in a small test file, containing
>> > >  only
>> > >  the \begin{figure} - \end{figure} in which line 635 occurs. But I did
>> > >  not
>> > >  get a segmentation fault with or without line 635.
>> > >  So the problem is more complicated than just the { in a comment.
>> > >
>> > >
>> > >  Another problem, but only a slightly annoying one, is an apparent
>> > >  misparsing of a line ending in a \
>> > >  e.g.
>> > >  use \ell\
>> > >  rather than l to avoid confusion with 1
>> > >  Apparently the blank after \ell\ is not seen and results in warning
>> > >  messages.
>> > >
>> > >  Further, differences in equations sometimes lead to incorrect
>> > >  mathmode in
>> > >  the difference file resulting in latex needing to insert a $.
>> > >
>> > >  All in all, latexdiff seems to work well for text, but has some
>> > >  problems
>> > >  when things get complicated.
>> > >
>> > >
>> > > >  W.J. Metzger wrote:
>> > > > >  On Fri, 23 Oct 2009, martin f. krafft wrote:
>> > > > >
>> > > > > >  also sprach Boris Veytsman <borisv at lk.net> [2009.09.22.1700
>> > > > > >  +0200]:
>> > > > > > >  Try latexdiff,
>> > > > > > >  http://www.ctan.org/tex-archive/support/latexdiff/
>> > > > > >
>> > > > > >  That was a marvelous suggestion. Thanks.
>> > > > >
>> > > > >  It sounded good to me too. So I downloaded it and tried it --
>> > > > >  works
>> > > > >  fine
>> > > > >  on small tex files, but when I tried it on 'real' files it results
>> > > > >  in a
>> > > > >  segmentation fault. Do others also have this experience?
>> > >
>> > >  Cheers, Wes
>> > >  --
>> > >
>> > >  Dr. W. J. Metzger Experimental High Energy Physics Group
>> > >  tel. +31-24-3653127 Faculty of Science
>> > >  +31-24-3652099 (secr.) Radboud University Nijmegen
>> > >  fax. +31-24-3652191 Heyendaalseweg 135
>> > >  6525 AJ Nijmegen, The Netherlands
>> > >  e-mail: wes at hef.ru.nl or Wesley.Metzger at cern.ch
>> > >  http://home.cern.ch/metzger/ or http://www.hef.ru.nl/~wes
>> >
>> >
>> >  --
>> >  Frederik Tilmann
>> >  Bullard Laboratories Tel. +44 1223 765545
>> >  Department of Earth Sciences Fax. +44 1223 360779
>> >  University of Cambridge email: tilmann at esc.cam.ac.uk
>> >  Madingley Road http://bullard.esc.cam.ac.uk/~tilmann
>> >  Cambridge CB3 0EZ
>> >  UK
>> >
>
>
>

--
Dr. W. J. Metzger            Experimental High Energy Physics Group
tel. +31-24-3653127          Faculty of Science
      +31-24-3652099 (secr.)  Radboud University Nijmegen (**)
fax. +31-24-3652191          Heyendaalseweg 135
                              6525 AJ  Nijmegen,  The Netherlands
e-mail:  wes at hef.ru.nl       or   Wesley.Metzger at cern.ch
http://home.cern.ch/metzger/ or   http://www.hef.ru.nl/~wes
   (**)  On 1 Sept. 2004 the University of Nijmegen changed its name


More information about the texhax mailing list