[latex3-commits] [l3svn] r7357 - Reorganize list of future l3regex improvements a bit, add some

Thu Jul 13 14:24:48 CEST 2017

Author: bruno
Date: 2017-07-13 14:24:48 +0200 (Thu, 13 Jul 2017)
New Revision: 7357

Modified:
   trunk/l3kernel/l3regex.dtx
Log:
Reorganize list of future l3regex improvements a bit, add some


Modified: trunk/l3kernel/l3regex.dtx
===================================================================

--- trunk/l3kernel/l3regex.dtx	2017-07-13 11:58:54 UTC (rev 7356)
+++ trunk/l3kernel/l3regex.dtx	2017-07-13 12:24:48 UTC (rev 7357)
@@ -666,15 +666,13 @@
 %
 % The following need to be done now.
 % \begin{itemize}
-%   \item Change user function names!
-%   \item Clean up the use of messages.
 %   \item Rewrite the documentation in a more ordered way, perhaps add a
 %     \textsc{bnf}?
 % \end{itemize}
 %
 % Additional error-checking to come.
 % \begin{itemize}
-%   \item Currently, |a{\x34}| is recognized as |a{4}|.
+%   \item Clean up the use of messages.
 %   \item Cleaner error reporting in the replacement phase.
 %   \item Add tracing information.
 %   \item Detect attempts to use back-references and other
@@ -738,13 +736,24 @@
 %   \item Unicode properties: |\p{..}| and |\P{..}|;
 %     |\X| which should match any \enquote{extended} Unicode sequence.
 %     This requires to manipulate a lot of data, probably using tree-boxes.
+%   \item Provide a syntax such as |\ur{l_my_regex}| to use an
+%     already-compiled regex in a more complicated regex.  This makes
+%     regexes more easily composable.
+%   \item Allowing |\u{l_my_tl}| in more places, for instance as the
+%     number of repetitions in a quantifier.
 % \end{itemize}
 %
 % The following features of \textsc{pcre} or Perl may or may not be
 % implemented.
 % \begin{itemize}
-%   \item |\ddd|, matching the character with octal code \texttt{ddd};
-%   \item Callout with |(?C...)|;
+%   \item Callout with |(?C...)| or other syntax: some internal code
+%     changes make that possible, and it can be useful for instance in
+%     the replacement code to stop a regex replacement when some marker
+%     has been found; this raises the question of a potential
+%     |\regex_break:| and then of playing well with \cs{tl_map_break:}
+%     called from within the code in a regex.  It also raises the
+%     question of nested calls to the regex machinery, which is a
+%     problem since \tn{fontdimen} are global.
 %   \item Conditional subpatterns (other than with a look-ahead or
 %     look-behind condition): this is non-regular, isn't it?
 %   \item Named subpatterns: \TeX{} programmers have lived so far
@@ -754,21 +763,25 @@
 % The following features of \textsc{pcre} or Perl will definitely not be
 % implemented.
 % \begin{itemize}
-%   \item |\cx|, similar to \TeX{}'s own |\^^x|;
-%   \item Comments: \TeX{} already has its own system for comments.
-%   \item |\Q...\E| escaping: this would require to read the argument
-%     verbatim, which is not in the scope of this module.
+%   \item Back-references: non-regular feature, this requires
+%     backtracking, which is prohibitively slow.
+%   \item Recursion: this is a non-regular feature.
 %   \item Atomic grouping, possessive quantifiers: those tools, mostly
 %     meant to fix catastrophic backtracking, are unnecessary in a
 %     non-backtracking algorithm, and difficult to implement.
 %   \item Subroutine calls: this syntactic sugar is difficult to include
 %     in a non-backtracking algorithm, in particular because the
 %     corresponding group should be treated as atomic.
-%   \item Recursion: this is a non-regular feature.
-%   \item Back-references: non-regular feature, this requires
-%     backtracking, which is prohibitively slow.
 %   \item Backtracking control verbs: intrinsically tied to
 %     backtracking.
+%   \item |\ddd|, matching the character with octal code \texttt{ddd}:
+%     we already have |\x{...}| and the syntax is confusingly close to
+%     what we could have used for backreferences (|\1|, |\2|, \ldots{}),
+%     making it harder to produce useful error message.
+%   \item |\cx|, similar to \TeX{}'s own |\^^x|.
+%   \item Comments: \TeX{} already has its own system for comments.
+%   \item |\Q...\E| escaping: this would require to read the argument
+%     verbatim, which is not in the scope of this module.
 %   \item |\C| single byte in UTF-8 mode: Xe\TeX{} and Lua\TeX{} serve
 %     us characters directly, and splitting those into bytes is tricky,
 %     encoding dependent, and most likely not useful anyways.