<div dir="ltr"><div>Hello Nasser,</div><div><br></div><div>You don't give us much to go on. But it does provoke my curiosity.</div><div><br></div><div>I assume that you are able to build the 57,000 page pdf from the tex source that you want to process with tex4ht.</div><div><br></div><div>Is html output the final tex4ht target? I'm assuming it is.<br></div><div><br></div><div>You say:</div><div><br></div><div style="margin-left:40px">[INFO] make4ht-lib: parse_lg process file: reportsubsection1100.htm<br>
[WARNING] domfilter: DOM parsing of reportsubsection1100.htm failed:<br>
[WARNING] domfilter:<br>
...ive/2023/texmf-dist/tex/luatex/luaxml/luaxml-mod-xml.lua:175: Incomplete<br>
XML Document [char=33675]</div><div style="margin-left:40px"></div><div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div>From this I deduce that the 57,000 page document is being written in HTML pieces by tex4ht, "reportsubsection1100.htm" is one of those pieces, and perhaps not all expected pieces have been generated.<br></div><div><br></div><div>Have you checked whether "reportsubsection1100.htm" is well-formed XML using, say, the tool "xmlwf" found in the expat distribution?<br></div><div><br></div><div> -- Bill</div><div><br></div><div><br></div>William F Hammond<br>Email: <a href="mailto:gellmu@gmail.com" target="_blank">gellmu@gmail.com</a><br><a href="https://www.facebook.com/william.f.hammond" target="_blank">https://www.facebook.com/william.f.hammond</a><br><a href="http://www.albany.edu/~hammond/" target="_blank">http://www.albany.edu/~hammond/</a><div><br></div><div>𝑻𝒉𝒆 𝒕𝒊𝒎𝒆 𝒕𝒐 𝒔𝒂𝒗𝒆 𝒂 𝒅𝒆𝒎𝒐𝒄𝒓𝒂𝒄𝒚 𝒊𝒔 𝒃𝒆𝒇𝒐𝒓𝒆 𝒊𝒕 𝒊𝒔 𝒍𝒐𝒔𝒕. -- 𝐊𝐞𝐧 𝐁𝐮𝐫𝐧𝐬<br></div><div><br><br></div></div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Dec 11, 2023 at 5:04 PM Nasser M. Abbasi <<a href="mailto:puszcza-hackers@gnu.org.ua">puszcza-hackers@gnu.org.ua</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">URL:<br>
<<a href="http://puszcza.gnu.org.ua/bugs/?618" rel="noreferrer" target="_blank">http://puszcza.gnu.org.ua/bugs/?618</a>><br>
<br>
Summary: Incomplete XML Document, domfilter error, truncated<br>
build on large file. <br>
Project: tex4ht<br>
Submitted by: nma123<br>
Submitted on: Tue Dec 12 01:04:12 2023<br>
Category: None<br>
Priority: 5 - Normal<br>
Severity: 7 - Important<br>
Status: None<br>
Privacy: Public<br>
Assigned to: None<br>
Originator Email: <br>
Open/Closed: Open<br>
Discussion Lock: Any<br>
<br>
_______________________________________________________<br>
<br>
Details:<br>
<br>
I have been working with Michal on this via private email but thought to enter<br>
a bug report on this just for tracking and documentation.<br>
<br>
I have one large file (57,000 PDF pages) that when compiled with tex4ht (takes<br>
14 hrs), and at about 10% when generating the final HTML pages, it gets XML<br>
error and stops. <br>
<br>
i.e. the 90% rest of the sections are missing from the final web pages. <br>
<br>
-------------------------------------------------------<br>
<br>
[INFO] make4ht-lib: parse_lg process file: reportsubsection1100.htm<br>
[WARNING] domfilter: DOM parsing of reportsubsection1100.htm failed:<br>
[WARNING] domfilter:<br>
...ive/2023/texmf-dist/tex/luatex/luaxml/luaxml-mod-xml.lua:175: Incomplete<br>
XML Document [char=33675]<br>
<br>
[INFO] make4ht-lib: parse_lg process file: reportsubsection1100.htm<br>
[WARNING] domfilter: DOM parsing of reportsubsection1100.htm failed:<br>
[WARNING] domfilter:<br>
...ive/2023/texmf-dist/tex/luatex/luaxml/luaxml-mod-xml.lua:175: Incomplete<br>
XML Document [char=33675]<br>
<br>
[INFO] make4ht-lib: parse_lg process file: reportsubsection1100.htm<br>
<br>
----------------------------------<br>
<br>
I've just send Michal a link to complete self contained ZIP file (450 MB) with<br>
instructions how to run as standalone in order to see these errors on his end.<br>
<br>
<br>
I tried this on latest texlive 2023 on new Linux installation.<br>
<br>
I will work with Michal to provide any additional information he needs from<br>
me, to hopefully find the cause of this problem. <br>
<br>
This happens only on this file. I think may be due to the large size, since<br>
the Latex code is all generated by same program and only this file gives this<br>
error.<br>
<br>
--Nasser<br>
<br>
<br>
<br>
<br>
<br>
_______________________________________________________<br>
<br>
Reply to this item at:<br>
<br>
<<a href="http://puszcza.gnu.org.ua/bugs/?618" rel="noreferrer" target="_blank">http://puszcza.gnu.org.ua/bugs/?618</a>><br>
<br>
_______________________________________________<br>
Message sent via/by Puszcza<br>
<a href="http://puszcza.gnu.org.ua/" rel="noreferrer" target="_blank">http://puszcza.gnu.org.ua/</a><br>
<br>
</blockquote></div>