[tlbuild] dvibook failure on Cygwin

Ken Brown kbrow1i at gmail.com
Fri Feb 24 19:13:14 CET 2012


On 2/24/2012 9:20 AM, Ken Brown wrote:
> On 2/24/2012 8:02 AM, Ken Brown wrote:
>> On 2/24/2012 6:15 AM, Peter Breitenlohner wrote:
>>> On Thu, 23 Feb 2012, Ken Brown wrote:
>>>
>>>> I just did a test build of the trunk (rev 25482) on Cygwin. The build
>>>> went smoothly, but the dvibook test failed with a segfault:
>>>>
>>>> ...
>>>>
>>>> This may well be a Cygwin problem. But before digging further, I'd
>>>> like to know whether anyone is seeing problems on other platforms.
>>>
>>> Hi Ken,
>>>
>>> these tests were introduced after the TL11 release, so it is quite
>>> possible
>>> that the Cygwin versions of dvibook and dvitodvi never worked. Can you
>>> test that?
>>
>> Hi Peter,
>>
>> dvibook from TL11 fails in the same way, but dvitodvi works fine, both
>> in TL11 and in my build from the current trunk.
>>
>>> Dvibook and dvitodvi use SeekFile() from seek.c containing Cygwin
>>> specific
>>> code and it is quite possible that this code doesn't work as it does on
>>> Windows. Please have a look.
>>
>> All of that Cygwin specific code looks wrong to me, but removing it
>> doesn't solve the problem: dvibook still segfaults (and dvitodvi still
>> works). I'll try running dvibook under gdb to see if I can figure out
>> what's going on.
> 
> This is really strange. The segfault is occurring early in the program, 
> in the processing of the argument -s4 to dvibook. Here's the gdb session:
> 
> $ gdb ./dvibook
> [...]
> (gdb) b main
> Breakpoint 1 at 0x402036: file ../../../texk/seetexk/dvibook.c, line 371.
> (gdb) b atoi
> Breakpoint 2 at 0x405410
> (gdb) r -s4 playsel.dvi playbook.dvi
> Starting program: 
> /home/kbrown/src/texlive/Build/test/Work/texk/seetexk/dvibook -s4 
> playsel.dvi playbook.dvi
> [New Thread 1288.0xc2c]
> [New Thread 1288.0x19d8]
> 
> Breakpoint 1, main (argc=4, argv=0x28cc30)
> at ../../../texk/seetexk/dvibook.c:371
> 371 char *outname = NULL;
> (gdb) n
> 373 Signature = 0;
> (gdb) n
> 375 ProgName = *argv;
> (gdb) n
> 376 setbuf(stderr, serrbuf);
> (gdb) n
> 378 while ((c = getopt(argc, argv, "i:o:s:q")) != EOF) {
> (gdb) n
> 379 switch (c) {
> (gdb) n
> 398 if (Signature != 0)
> (gdb) n
> 400 Signature = atoi(optarg);
> (gdb) p optarg
> $1 = 0x28cca9 "4"
> (gdb) n
> 
> Breakpoint 2, atoi (s=0x0) at 
> ../../../../../src/newlib/libc/stdlib/atoi.c:70
> 70 return (int) strtol (s, NULL, 10);
> (gdb) n
> 10 [main] dvibook 1288 exception::handle: Exception: 
> STATUS_ACCESS_VIOLATION
> 1141 [main] dvibook 1288 open_stackdumpfile: Dumping stack trace to 
> dvibook.exe.stackdump
> 
> Program received signal SIGSEGV, Segmentation fault.
> _strtol_r (rptr=0x40010006, nptr=0x0, endptr=0x2, base=2670504)
> at ../../../../../src/newlib/libc/stdlib/strtol.c:152
> 152 c = *s++;
> 
> 
> Notice that s is null in the call to atoi. So somehow optarg got changed 
> to a null pointer.
> 
> I'll keep digging.

Something is seriously wrong with getopt (from kpathsea) on Cygwin.  For testing, I made the following trivial change to dvibook.c:

--- ../../../../source/texk/seetexk/dvibook.c   2012-02-21 18:55:45.150466800 -0500
+++ ../../../texk/seetexk/dvibook.c     2012-02-24 12:55:37.093278700 -0500
@@ -369,7 +369,7 @@
        register int c;
        register char *s;
        char *outname = NULL;
-
+       char *svalue = NULL;
        Signature = 0;

        ProgName = *argv;
@@ -395,9 +395,10 @@
                        break;

                case 's':
+                        svalue = optarg;
                        if (Signature != 0)
                                goto usage;
-                       Signature = atoi(optarg);
+                       Signature = atoi(svalue);
                        if (Signature <= 0)
                           error(1, -1, "-s parameter must be > 0");
                        if (Signature % 4 != 0)

Running this under gdb, svalue is null after the assignment `svalue = optarg', while optarg points to the string "4", as in the gdb session above.

Needless to say, Cygwin's own getopt doesn't exhibit this weird behavior.  For example, I compiled the sample program from

http://www.gnu.org/software/libc/manual/html_node/Example-of-Getopt.html#Example-of-Getopt

and it worked as expected.

By the way, the texlive build log is full of warnings like the following:

/home/kbrown/src/texlive/Build/test/texk/kpathsea/getopt.h:53:22: warning: 'optarg' redeclared without dllimport attribute: previous dllimport ignored

I've always assumed this was a harmless warning, but could it possibly be related to the present problem?  Otherwise, I'm stumped.  Is it necessary to use the kpathsea version of getopt rather than the system version?

Ken


More information about the tlbuild mailing list