[latex3-commits] [l3svn] r7357 - Reorganize list of future l3regex improvements a bit, add some
noreply at latex-project.org
noreply at latex-project.org
Thu Jul 13 14:24:48 CEST 2017
Author: bruno
Date: 2017-07-13 14:24:48 +0200 (Thu, 13 Jul 2017)
New Revision: 7357
Modified:
trunk/l3kernel/l3regex.dtx
Log:
Reorganize list of future l3regex improvements a bit, add some
Modified: trunk/l3kernel/l3regex.dtx
===================================================================
--- trunk/l3kernel/l3regex.dtx 2017-07-13 11:58:54 UTC (rev 7356)
+++ trunk/l3kernel/l3regex.dtx 2017-07-13 12:24:48 UTC (rev 7357)
@@ -666,15 +666,13 @@
%
% The following need to be done now.
% \begin{itemize}
-% \item Change user function names!
-% \item Clean up the use of messages.
% \item Rewrite the documentation in a more ordered way, perhaps add a
% \textsc{bnf}?
% \end{itemize}
%
% Additional error-checking to come.
% \begin{itemize}
-% \item Currently, |a{\x34}| is recognized as |a{4}|.
+% \item Clean up the use of messages.
% \item Cleaner error reporting in the replacement phase.
% \item Add tracing information.
% \item Detect attempts to use back-references and other
@@ -738,13 +736,24 @@
% \item Unicode properties: |\p{..}| and |\P{..}|;
% |\X| which should match any \enquote{extended} Unicode sequence.
% This requires to manipulate a lot of data, probably using tree-boxes.
+% \item Provide a syntax such as |\ur{l_my_regex}| to use an
+% already-compiled regex in a more complicated regex. This makes
+% regexes more easily composable.
+% \item Allowing |\u{l_my_tl}| in more places, for instance as the
+% number of repetitions in a quantifier.
% \end{itemize}
%
% The following features of \textsc{pcre} or Perl may or may not be
% implemented.
% \begin{itemize}
-% \item |\ddd|, matching the character with octal code \texttt{ddd};
-% \item Callout with |(?C...)|;
+% \item Callout with |(?C...)| or other syntax: some internal code
+% changes make that possible, and it can be useful for instance in
+% the replacement code to stop a regex replacement when some marker
+% has been found; this raises the question of a potential
+% |\regex_break:| and then of playing well with \cs{tl_map_break:}
+% called from within the code in a regex. It also raises the
+% question of nested calls to the regex machinery, which is a
+% problem since \tn{fontdimen} are global.
% \item Conditional subpatterns (other than with a look-ahead or
% look-behind condition): this is non-regular, isn't it?
% \item Named subpatterns: \TeX{} programmers have lived so far
@@ -754,21 +763,25 @@
% The following features of \textsc{pcre} or Perl will definitely not be
% implemented.
% \begin{itemize}
-% \item |\cx|, similar to \TeX{}'s own |\^^x|;
-% \item Comments: \TeX{} already has its own system for comments.
-% \item |\Q...\E| escaping: this would require to read the argument
-% verbatim, which is not in the scope of this module.
+% \item Back-references: non-regular feature, this requires
+% backtracking, which is prohibitively slow.
+% \item Recursion: this is a non-regular feature.
% \item Atomic grouping, possessive quantifiers: those tools, mostly
% meant to fix catastrophic backtracking, are unnecessary in a
% non-backtracking algorithm, and difficult to implement.
% \item Subroutine calls: this syntactic sugar is difficult to include
% in a non-backtracking algorithm, in particular because the
% corresponding group should be treated as atomic.
-% \item Recursion: this is a non-regular feature.
-% \item Back-references: non-regular feature, this requires
-% backtracking, which is prohibitively slow.
% \item Backtracking control verbs: intrinsically tied to
% backtracking.
+% \item |\ddd|, matching the character with octal code \texttt{ddd}:
+% we already have |\x{...}| and the syntax is confusingly close to
+% what we could have used for backreferences (|\1|, |\2|, \ldots{}),
+% making it harder to produce useful error message.
+% \item |\cx|, similar to \TeX{}'s own |\^^x|.
+% \item Comments: \TeX{} already has its own system for comments.
+% \item |\Q...\E| escaping: this would require to read the argument
+% verbatim, which is not in the scope of this module.
% \item |\C| single byte in UTF-8 mode: Xe\TeX{} and Lua\TeX{} serve
% us characters directly, and splitting those into bytes is tricky,
% encoding dependent, and most likely not useful anyways.
More information about the latex3-commits
mailing list