<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, 21 Feb 2021 at 11:47, Ross Moore <<a href="mailto:ross.moore@mq.edu.au">ross.moore@mq.edu.au</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">



<div style="overflow-wrap: break-word;">
Hi David.<br>
<div><br>
<blockquote type="cite">
<div>On 21 Feb 2021, at 10:12 pm, David Carlisle <<a href="mailto:d.p.carlisle@gmail.com" target="_blank">d.p.carlisle@gmail.com</a>> wrote:</div>
<br>
<div>
<div dir="ltr">
<div>I think that should be taken up with the xstring maintainers.</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>Is  xstring  intended for use with XeTeX ?</div>
<div>I suspect not. </div>
<div>But anyway, there are still issues with this.</div>
<div><br>
</div>
<div>(BTW, I wrote this before Jonathan Kew’s response.)</div>
<br>
<blockquote type="cite">
<div>
<div dir="ltr">
<div><br>
</div>
<div>I don't think there is any reasonable way to say you can comment out parts of a file in a different encoding.</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>I’m not convinced that this ought to be correct for TeX-based software.</div>
<div><br>
</div>
<div>TeX (not necessarily XeTeX) has always operated as a finite-state machine.</div>
<div>It *should* be possible to say that this part is encoded as such-and-such,</div>
<div>and a later part encoded differently.</div>
<div><br>
</div>
<div>I fully understand that editor software external to TeX might well have difficulties </div>
<div>with files that mix encodings this way, but TeX itself has always been byte-based </div>
<div>and should remain that way.</div>
<div><br>
</div>
<div>A comment character is meant to be viewed as saying that:</div>
<div> *everything else on this line is to be ignored*</div>
<div>– that’s the impression given by TeX documentation.</div></div></div></blockquote><div><br></div><div><br></div><div>But you only know it is a comment character if you can interpret the incoming byte stream <br></div><div>If there are encoding errors in that byte stream then everything ls is guess work.</div><div><br></div><div>In this particular case with mostly ascii text and a few latin-1 characters it may be that you can guess that</div><div>the invalid utf-8 is in fact valid latin1 and interpret it that way, and the guess would be right for this file</div><div>but what if the non-utf8 file were utf-16 or latin-2  or ... just guessing the encoding (which means guessing where the line and so the comment ends)</div><div>is just guesswork.<br></div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div>
<div><br>
</div>
<div>If it is the documentation that is incorrect, then it should certainly be clarified.</div>
<div><br>
</div>
<div>For XeTeX and this particular example, it’s probably just a matter of checking </div>
<div>that the non-UTF8 characters occur *after* a UTF-8  ‘%' , and not issuing </div>
<div>an error message under these conditions. </div>
<div>A warning, maybe, but not an error.</div></div></div></blockquote><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div>
<br>
<blockquote type="cite">
<div>
<div dir="ltr">
<div><br>
</div>
<div>The file encoding specifies the byte stream interpretation before any tex tokenization</div>
<div>If the file can not be interpreted as utf-8 then it can't be interpreted at all.
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>Why not?</div>
<div>Why can you not have a macro — presumably best on a single line by itself –</div></div></div></blockquote><div> </div><div>there is an xetex   primitive that switches the encoding as Jonathan showed, but  guessing a different encoding</div><div>if a file fails to decode properly against a specified encoding is a dangerous game to play.</div><div>So I don't think such a switch should be automatic to avoid reporting encoding errors.<br></div><div><br></div><div>I reported the issue at xstring here</div><div><a href="https://framagit.org/unbonpetit/xstring/-/issues/4">https://framagit.org/unbonpetit/xstring/-/issues/4</a></div><div><br></div><div><br></div><div>David</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div>
<div>that says what follows next is to be interpreted in a different way to what came previously?</div>
<div>Until the next switch that returns to UTF-8 or whatever?</div>
<div><br>
</div>
<div><br>
</div>
<div>If XeTeX is based on eTeX, then this should be possible in that setting.</div>
<div><br>
</div>
<div><br>
</div>
<blockquote type="cite">
<div>
<div dir="ltr">
<div>Even replacing by U+FFFD <br>
</div>
<div>is being lenient.</div>
<div><br>
</div>
<div>David</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Sun, 21 Feb 2021 at 11:04, jfbu <<a href="mailto:jfbu@free.fr" target="_blank">jfbu@free.fr</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Hi,<br>
<br>
consider this<br>
<br>
\documentclass{article}<br>
\usepackage{xstring}<br>
\begin{document}<br>
\end{document}<br>
<br>
and call it xexstring.tex<br>
<br>
Then xelatex xexstring triggers 136 warnings of the type<br>
<br>
Invalid UTF-8 byte or sequence at line 35 replaced by U+FFFD.<br>
<br>
Looking at file<br>
<br>
/usr/local/texlive/2020/texmf-dist/tex/generic/xstring/xstring.tex<br>
<br>
I see that this matches with use of latin-1 encoded characters in comments.<br>
<br>
Notice that it is a not a user decision here to use a latin-1<br>
encoded file.<br>
<br>
In fact I encountered this in a file I was given where<br>
xstring package was loaded by another package.<br>
<br>
Regards,<br>
<br>
Jean-François<br>
</blockquote>
</div>
</div>
</blockquote>
<br>
</div>
<div><br>
</div>
<div>Cheers.</div>
<div><br>
</div>
<div><span style="white-space:pre-wrap"></span>Ross</div>
<br>
<div><br>
Dr Ross Moore<br>
Department of Mathematics and Statistics 
<div>12 Wally’s Walk, Level 7, Room 734<br>
Macquarie University, NSW 2109, Australia<br>
T: +61 2 9850 8955  |  F: +61 2 9850 8114<br>
M:+61 407 288 255  |  E: <a href="mailto:ross.moore@mq.edu.au" target="_blank">ross.moore@mq.edu.au</a><br>
<a href="http://www.maths.mq.edu.au" target="_blank">http://www.maths.mq.edu.au</a><span style="font-size:12px;line-height:normal"><a href="http://mq.edu.au/" style="font-size:12px;line-height:normal" target="_blank"><span><br style="color:rgb(0,105,217);font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;font-family:Arial,sans-serif">
<span style="color:rgb(0,105,217);font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;font-family:Arial,sans-serif"><span><span><span><span><img id="gmail-m_-8933282417540275345C0FD618D-9F58-4849-A6D4-E87ABC33A342" src="cid:177c46f4bd54cff311"></span><br style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none">
<span style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none;float:none;display:inline">CRICOS
 Provider Number 00002J. Think before you</span><span style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none;float:none;display:inline"> </span><span style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none;float:none;display:inline">print. </span><br style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none">
<span style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none;float:none;display:inline">Please
 consider the environment before printing this</span><span style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none;float:none;display:inline"> </span><span style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none;float:none;display:inline">email.</span><br style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none">
<br style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none">
<span style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none;float:none;display:inline">This
 message is intended for the addressee named</span><span style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none;float:none;display:inline"> </span><span style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none;float:none;display:inline">and
 may </span><br style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none">
<span style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none;float:none;display:inline">contain
 confidential information. If you are not the</span><span style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none;float:none;display:inline"> </span><span style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none;float:none;display:inline">intended </span><br style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none">
<span style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none;float:none;display:inline">recipient,
 please delete it and notify the sender. Views</span><span style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none;float:none;display:inline"> </span><span style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none;float:none;display:inline">expressed </span><br style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none">
<span style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none;float:none;display:inline">in
 this message are those of the individual sender, and</span><span style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none;float:none;display:inline"> </span><span style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none;float:none;display:inline">are
 not </span><br style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none">
<span style="font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;color:rgb(0,0,0);font-family:Helvetica;text-decoration:none;float:none;display:inline">necessarily
 the views of Macquarie University.</span> </span></span></span></span></span></a></span></div>
<a href="http://mq.edu.au/" style="font-size:12px;line-height:normal" target="_blank"></a></div>
<br>
</div>

</blockquote></div></div>