[tex4ht] [bug #611] Random SIGSEGV of tex4ht due to invalid memory accesses

Oliver Freyermuth puszcza-hackers at gnu.org.ua
Thu Oct 5 21:31:55 CEST 2023


                 Summary: Random SIGSEGV of tex4ht due to invalid memory
                 Project: tex4ht
            Submitted by: olifre
            Submitted on: Thu Oct  5 19:31:55 2023
                Category: None
                Priority: 5 - Normal
                Severity: 7 - Important
                  Status: None
                 Privacy: Public
             Assigned to: None
        Originator Email: 
             Open/Closed: Open
         Discussion Lock: Any



I've been haunted by "Illegal storage address" for large documents with many
fonts for quite a while (coming and going with tex4ht releases and even
changing depending on shell environment), and often, unrelated changes in the
document (e.g. reducing fonts) fix it. 

I believe I have finally hunted down the underlying issue to an invalid memory
access in tex4ht — reproducible with an MWE (which does not crash), visible
with valgrind / gdb. 


1) Create foo.tex with content:


2) Run:
   make4ht --utf8 --output-dir html foo

3) Re-run the tex4ht step with valgrind:
   valgrind tex4ht -cmozhtf -utf8 foo.dvi

This reveals:
==4487== Conditional jump or move depends on uninitialised value(s)
==4487==    at 0x10EE14: main (tex4ht.c:8099)
==4487== (action on error) vgdb me ... 
==4487== Continuing ...
==4487== Conditional jump or move depends on uninitialised value(s)
==4487==    at 0x10EE16: main (tex4ht.c:8108)
==4487== (action on error) vgdb me ... 
==4487== Continuing ...
==4487== Invalid read of size 4
==4487==    at 0x10E794: main (tex4ht.c:8741)
==4487==  Address 0x8beb1f8 is 8 bytes before a block of size 2 alloc'd
==4487==    at 0x48407C4: malloc (vg_replace_malloc.c:431)
==4487==    by 0x11851B: malloc_chk (tex4ht.c:1481)
==4487==    by 0x10EC30: main (tex4ht.c:7104)

The invalid read is most worrisome, it originates from the source lines:

8740    if( span_on && !in_span_ch  && !ignore_chs && !in_accenting
8741                && (default_font != font_tbl[cur_fnt].num) ){

It is caused by the part:
   (default_font != font_tbl[cur_fnt].num) 
being evaluated, while the index cur_fmt is negative:

(gdb) p default_font
$12 = -1
(gdb) p cur_fnt 
$13 = -1

This yields an invalid read. If the document has many (many!) fonts, the
dynamically allocated and subsequently freed memory from opendir/closedir
looking for the htf files ends up right before the font_tbl array, and
depending on page boundaries, this read with negative index may yield an
invalid read / SIGSEGV. 

Since I don't understand the full logic of the code, I'm not fit to propose a
(good) fix. 

It seems this might be affecting other users, too, looking for reports of
"Illegal storage address" on tex stackexchange which in some cases were
"fixed" by unrelated document changes. 

Nota bene:
The two "Conditional jump or move depends on uninitialised value(s)" are from
the lines:
  if( value == htf_4hf[mid].ch ){
  } else if( value < htf_4hf[mid].ch ){
since htf_4hf seems to be used (in some cases) before being initialized. This
does not lead to a crash, though, since it's not an invalid read. 


Reply to this item at:


  Message sent via/by Puszcza

More information about the tex4ht mailing list.