texlive[48146] Master: tagpdf (5jul18)
commits+karl at tug.org
commits+karl at tug.org
Thu Jul 5 23:45:47 CEST 2018
Revision: 48146
http://tug.org/svn/texlive?view=revision&revision=48146
Author: karl
Date: 2018-07-05 23:45:46 +0200 (Thu, 05 Jul 2018)
Log Message:
-----------
tagpdf (5jul18)
Modified Paths:
--------------
trunk/Master/tlpkg/bin/tlpkg-ctan-check
trunk/Master/tlpkg/libexec/ctan2tds
trunk/Master/tlpkg/tlpsrc/collection-latexextra.tlpsrc
Added Paths:
-----------
trunk/Master/texmf-dist/doc/latex/tagpdf/
trunk/Master/texmf-dist/doc/latex/tagpdf/README.md
trunk/Master/texmf-dist/doc/latex/tagpdf/ex-mc-manual-para-split.pdf
trunk/Master/texmf-dist/doc/latex/tagpdf/ex-mc-manual-para-split.tex
trunk/Master/texmf-dist/doc/latex/tagpdf/ex-structure.pdf
trunk/Master/texmf-dist/doc/latex/tagpdf/ex-structure.tex
trunk/Master/texmf-dist/doc/latex/tagpdf/ex-tagpdf-template.pdf
trunk/Master/texmf-dist/doc/latex/tagpdf/ex-tagpdf-template.tex
trunk/Master/texmf-dist/doc/latex/tagpdf/tagpdf.bib
trunk/Master/texmf-dist/doc/latex/tagpdf/tagpdf.pdf
trunk/Master/texmf-dist/doc/latex/tagpdf/tagpdf.tex
trunk/Master/texmf-dist/tex/latex/tagpdf/
trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-checks-code.sty
trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-luatex.def
trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-mc-code-generic.sty
trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-mc-code-lua.sty
trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-mc-code-shared.sty
trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-pdftex.def
trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-roles-code.sty
trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-struct-code.sty
trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-tree-code.sty
trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-user.sty
trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf.lua
trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf.sty
trunk/Master/tlpkg/tlpsrc/tagpdf.tlpsrc
Added: trunk/Master/texmf-dist/doc/latex/tagpdf/README.md
===================================================================
--- trunk/Master/texmf-dist/doc/latex/tagpdf/README.md (rev 0)
+++ trunk/Master/texmf-dist/doc/latex/tagpdf/README.md 2018-07-05 21:45:46 UTC (rev 48146)
@@ -0,0 +1,28 @@
+# tagpdf
+
+A package to experiment with tagging and other requirements of accessible pdfs with pdflatex and lualatex
+
+
+## Structure
+
+- source
+ - examples
+ - texmf (the code of the package)
+
+ - tagpdf.tex, pdf, bib (the documentation)
+
+- testfiles contains tests for the l3build system.
+
+## Rules for contribuations
+
+Comments, feedback, examples are welcome.
+
+Use the issue tracker, sent me a mail or make a pull-request.
+
+## Licence
+
+The tagpdf package may be modified and distributed under the terms and conditions of the
+[LaTeX Project Public License](https://www.latex-project.org/lppl/), version 1.3c or greater.
+
+
+
Property changes on: trunk/Master/texmf-dist/doc/latex/tagpdf/README.md
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: trunk/Master/texmf-dist/doc/latex/tagpdf/ex-mc-manual-para-split.pdf
===================================================================
(Binary files differ)
Index: trunk/Master/texmf-dist/doc/latex/tagpdf/ex-mc-manual-para-split.pdf
===================================================================
--- trunk/Master/texmf-dist/doc/latex/tagpdf/ex-mc-manual-para-split.pdf 2018-07-05 21:20:15 UTC (rev 48145)
+++ trunk/Master/texmf-dist/doc/latex/tagpdf/ex-mc-manual-para-split.pdf 2018-07-05 21:45:46 UTC (rev 48146)
Property changes on: trunk/Master/texmf-dist/doc/latex/tagpdf/ex-mc-manual-para-split.pdf
___________________________________________________________________
Added: svn:mime-type
## -0,0 +1 ##
+application/pdf
\ No newline at end of property
Added: trunk/Master/texmf-dist/doc/latex/tagpdf/ex-mc-manual-para-split.tex
===================================================================
--- trunk/Master/texmf-dist/doc/latex/tagpdf/ex-mc-manual-para-split.tex (rev 0)
+++ trunk/Master/texmf-dist/doc/latex/tagpdf/ex-mc-manual-para-split.tex 2018-07-05 21:45:46 UTC (rev 48146)
@@ -0,0 +1,56 @@
+% !Mode:: "TeX:DE:UTF-8:Main"
+\documentclass{book}
+\usepackage[english,ngerman]{babel}
+\usepackage{tagpdf}
+\usepackage{amsmath}
+\usepackage{graphicx}
+\tagpdfifpdftexT
+ {
+ \usepackage[utf8]{inputenc}
+ \usepackage[T1]{fontenc}
+ }
+
+\tagpdfifluatexT
+ {
+ \usepackage{fontspec}
+ \usepackage{luacode}
+ }
+
+
+\tagpdfsetup{tabsorder=structure,
+ activate-all,
+ uncompress
+ }
+\newsavebox\mybox
+
+\usepackage{lipsum}\textheight3cm
+\usepackage{fancyhdr}
+\pagestyle{fancy}
+\fancyhf{}
+
+\cfoot{\tagmcbegin{artifact=pagination}\thepage\tagmcend}
+\begin{document}
+\tagmcbegin{tag=P}
+Cras egestas ipsum a nisl. Vivamus varius dolor
+ut dolor. Fusce vel enim. Pellentesque accumsan ligula et eros. Cras
+id lacus non tortor facilisis facilisis. Etiam nisl elit, cursus
+sed, fringilla in, congue nec, urna. Cum sociis natoque penatibus et
+magnis dis parturient montes, nascetur ridiculus mus. Integer at
+turpis. Cum sociis natoque penatibus et magnis dis parturient
+montes, nascetur ridiculus mus. Duis fringilla, ligula sed porta
+fringilla, ligula wisi commodo felis, ut adipiscing felis dui in
+enim. Suspendisse malesuada ultrices ante.%
+\vadjust{\tagmcend\pagebreak\tagmcbegin{tag=P}}
+Pellentesque scelerisque
+augue sit amet urna. Nulla volutpat aliquet tortor. Cras aliquam,
+tellus at aliquet pellentesque, justo sapien commodo leo, id rhoncus
+sapien quam at erat. Nulla commodo, wisi eget sollicitudin pretium,
+orci orci aliquam orci, ut cursus turpis justo et lacus. Nulla vel
+tortor. Quisque erat elit, viverra sit amet, sagittis eget, porta
+sit amet, lacus.\tagmcend
+
+\end{document}
+
+Comment:
+
+manual tagging at a pagebreak
\ No newline at end of file
Property changes on: trunk/Master/texmf-dist/doc/latex/tagpdf/ex-mc-manual-para-split.tex
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: trunk/Master/texmf-dist/doc/latex/tagpdf/ex-structure.pdf
===================================================================
(Binary files differ)
Index: trunk/Master/texmf-dist/doc/latex/tagpdf/ex-structure.pdf
===================================================================
--- trunk/Master/texmf-dist/doc/latex/tagpdf/ex-structure.pdf 2018-07-05 21:20:15 UTC (rev 48145)
+++ trunk/Master/texmf-dist/doc/latex/tagpdf/ex-structure.pdf 2018-07-05 21:45:46 UTC (rev 48146)
Property changes on: trunk/Master/texmf-dist/doc/latex/tagpdf/ex-structure.pdf
___________________________________________________________________
Added: svn:mime-type
## -0,0 +1 ##
+application/pdf
\ No newline at end of property
Added: trunk/Master/texmf-dist/doc/latex/tagpdf/ex-structure.tex
===================================================================
--- trunk/Master/texmf-dist/doc/latex/tagpdf/ex-structure.tex (rev 0)
+++ trunk/Master/texmf-dist/doc/latex/tagpdf/ex-structure.tex 2018-07-05 21:45:46 UTC (rev 48146)
@@ -0,0 +1,77 @@
+% !Mode:: "TeX:DE:UTF-8:Main"
+\documentclass{book}
+\usepackage[english,ngerman]{babel}
+\usepackage{tagpdf}
+\usepackage{amsmath}
+\usepackage{graphicx}
+\tagpdfifpdftexT
+ {
+ \usepackage[utf8]{inputenc}
+ \usepackage[T1]{fontenc}
+ }
+
+\tagpdfifluatexT
+ {
+ \usepackage{fontspec}
+ \usepackage{luacode}
+ }
+
+
+\tagpdfsetup{tabsorder=structure,
+ activate-all,
+ uncompress
+ }
+\newsavebox\mybox
+
+\usepackage{lipsum}%\textheight3cm
+\usepackage{fancyhdr}
+\pagestyle{fancy}
+\fancyhf{}
+
+\cfoot{\tagmcbegin{artifact=pagination}\thepage\tagmcend}
+\begin{document}
+\tagstructbegin{tag=Document}
+
+ \tagstructbegin{tag=Sect}
+ \tagstructbegin{tag=H}
+ \tagmcbegin{tag=H}
+ \section{Section}
+ \tagmcend
+ \tagstructend
+ \tagstructbegin{tag=P}
+ \tagmcbegin{tag=P,raw=/Alt (x)}
+ a paragraph\par x
+ \tagmcend
+ \tagstructend
+
+ \tagstructbegin{tag=L} %List
+ \tagstructbegin{tag=LI}
+ \tagstructbegin{tag=Lbl}
+ \tagmcbegin{tag=Lbl}
+ 1.
+ \tagmcend
+ \tagstructend
+ \tagstructbegin{tag=LBody}
+ \tagmcbegin{tag=P}
+ List item body
+ \tagmcend
+ \tagstructend %lbody
+ \tagstructend %Li
+
+ \tagstructbegin{tag=LI}
+ \tagstructbegin{tag=Lbl}
+ \tagmcbegin{tag=Lbl}
+ 2.
+ \tagmcend
+ \tagstructend
+ \tagstructbegin{tag=LBody}
+ \tagmcbegin{tag=P}
+ another List item body
+ \tagmcend
+ \tagstructend %lbody
+ \tagstructend %Li
+ \tagstructend %L
+
+ \tagstructend %Sect
+\tagstructend %Document
+\end{document}
\ No newline at end of file
Property changes on: trunk/Master/texmf-dist/doc/latex/tagpdf/ex-structure.tex
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: trunk/Master/texmf-dist/doc/latex/tagpdf/ex-tagpdf-template.pdf
===================================================================
(Binary files differ)
Index: trunk/Master/texmf-dist/doc/latex/tagpdf/ex-tagpdf-template.pdf
===================================================================
--- trunk/Master/texmf-dist/doc/latex/tagpdf/ex-tagpdf-template.pdf 2018-07-05 21:20:15 UTC (rev 48145)
+++ trunk/Master/texmf-dist/doc/latex/tagpdf/ex-tagpdf-template.pdf 2018-07-05 21:45:46 UTC (rev 48146)
Property changes on: trunk/Master/texmf-dist/doc/latex/tagpdf/ex-tagpdf-template.pdf
___________________________________________________________________
Added: svn:mime-type
## -0,0 +1 ##
+application/pdf
\ No newline at end of property
Added: trunk/Master/texmf-dist/doc/latex/tagpdf/ex-tagpdf-template.tex
===================================================================
--- trunk/Master/texmf-dist/doc/latex/tagpdf/ex-tagpdf-template.tex (rev 0)
+++ trunk/Master/texmf-dist/doc/latex/tagpdf/ex-tagpdf-template.tex 2018-07-05 21:45:46 UTC (rev 48146)
@@ -0,0 +1,27 @@
+% !Mode:: "TeX:DE:UTF-8:Main"
+
+\documentclass{scrartcl}
+\usepackage[english]{babel}
+\usepackage{tagpdf}
+
+\tagpdfifpdftexT
+ {
+ \usepackage[utf8]{inputenc}
+ \usepackage[T1]{fontenc}
+ }
+
+\tagpdfifluatexT
+ {
+ \usepackage{fontspec}
+ }
+
+\tagpdfsetup
+ {
+ activate-all,
+ uncompress
+ }
+\begin{document}
+
+blbl
+
+\end{document}
\ No newline at end of file
Property changes on: trunk/Master/texmf-dist/doc/latex/tagpdf/ex-tagpdf-template.tex
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: trunk/Master/texmf-dist/doc/latex/tagpdf/tagpdf.bib
===================================================================
--- trunk/Master/texmf-dist/doc/latex/tagpdf/tagpdf.bib (rev 0)
+++ trunk/Master/texmf-dist/doc/latex/tagpdf/tagpdf.bib 2018-07-05 21:45:46 UTC (rev 48146)
@@ -0,0 +1,25 @@
+ at online{pdfreference,
+title= {PDF Reference, sixth edition},
+author={{Adobe Systems Incorporated}},
+url = {https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf},
+year = {2006}
+ }
+
+ at online{pac3,
+title = {PDF Accessibility Checker (PAC 3)},
+author= {{Zugang für alle -- Schweizerische Stiftung zur behindertengerechten Technologienutzung}},
+url = {http://www.access-for-all.ch/ch/pdf-werkstatt/pdf-accessibility-checker-pac.html},
+urldate ={2018-07-05}}
+
+ at online{verapdf,
+title = {veraPDF},
+author={{veraPDF consortium}},
+url = {http://verapdf.org/}
+}
+
+
+ at online{tugaccess,
+title={PDF accessibility and PDF standards},
+author= {{TeX User Group}},
+url= {https://tug.org/twg/accessibility/}
+}
Property changes on: trunk/Master/texmf-dist/doc/latex/tagpdf/tagpdf.bib
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: trunk/Master/texmf-dist/doc/latex/tagpdf/tagpdf.pdf
===================================================================
(Binary files differ)
Index: trunk/Master/texmf-dist/doc/latex/tagpdf/tagpdf.pdf
===================================================================
--- trunk/Master/texmf-dist/doc/latex/tagpdf/tagpdf.pdf 2018-07-05 21:20:15 UTC (rev 48145)
+++ trunk/Master/texmf-dist/doc/latex/tagpdf/tagpdf.pdf 2018-07-05 21:45:46 UTC (rev 48146)
Property changes on: trunk/Master/texmf-dist/doc/latex/tagpdf/tagpdf.pdf
___________________________________________________________________
Added: svn:mime-type
## -0,0 +1 ##
+application/pdf
\ No newline at end of property
Added: trunk/Master/texmf-dist/doc/latex/tagpdf/tagpdf.tex
===================================================================
--- trunk/Master/texmf-dist/doc/latex/tagpdf/tagpdf.tex (rev 0)
+++ trunk/Master/texmf-dist/doc/latex/tagpdf/tagpdf.tex 2018-07-05 21:45:46 UTC (rev 48146)
@@ -0,0 +1,802 @@
+% !Mode:: "TeX:DE:UTF-8:Main"
+% $UFDate: 2017-12-03 18:32:01 +0100 -- Commit: 77304d9 (HEAD, tag: v1.4, master) -- master$
+\makeatletter
+\def\UlrikeFischer at package@version{0.1}
+\makeatother
+\documentclass[DIV=12,parskip=half-,bibliography=totoc]{scrartcl}
+\usepackage[utf8]{inputenc}
+\usepackage[T1]{fontenc}
+\usepackage[english]{babel}
+\usepackage[autostyle]{csquotes}
+\usepackage{microtype}
+\DisableLigatures{encoding = T1, family = tt* }
+\usepackage[style=numeric,hyperref=false]{biblatex}
+\addbibresource{tagpdf.bib}
+\usepackage{fourier}
+\renewcommand\ttdefault{lmtt}
+\usepackage{tcolorbox}
+\usepackage{ydoc-desc}
+\usepackage{tabularx, marginnote}
+\reversemarginpar
+\usepackage{tikz}
+\usetikzlibrary{positioning}
+
+\tikzset{arg/.style = {font=\footnotesize\ttfamily, anchor=base,draw, rounded corners,node distance=2mm and 2mm}}
+\tikzset{operator/.style = {font=\footnotesize\ttfamily, anchor=base,draw, rounded corners,node distance=4mm and 4mm}}
+\usepackage{enumitem,needspace}
+\makeatletter
+\enitkv at key{enumitem}{compactsep}[true]{%
+ \divide\partopsep by 2\relax
+ \divide\topsep by 2\relax
+ \divide\itemsep by 2\relax
+ \divide\parsep by 2\relax}
+\makeatother
+\title{The \pkg{tagpdf} package, v\csname UlrikeFischer at package@version\endcsname}
+\author{Ulrike Fischer\thanks{fischer at troubleshooting-tex.de}}
+\usepackage{listings}
+\lstset{basicstyle=\ttfamily, columns=fullflexible,language=[LaTeX]TeX}
+\usepackage{tagpdf}
+
+
+\usepackage{hyperref}
+\begin{document}
+\maketitle
+
+\begin{tcolorbox}[colframe=red]
+This package is not meant for normal document production.
+
+You need a current expl3 version to use it.
+
+This package is incomplete, experimental and quite probably contains bugs.
+
+You need some knowledge about \TeX, pdf and perhaps even lua to use it.
+
+\medskip
+
+Issues, comments, suggestions should be added as issues to the github tracker:
+
+\medskip
+\centering \textbf{https://github.com/u-fischer/tagpdf}
+
+\end{tcolorbox}
+
+\tableofcontents
+
+\section{Introduction}
+
+Since many year the creation of accessible pdf-files with \LaTeX\ which conform to the PDF/UA standard has been on the agenda of \TeX-meetings. Many people agree that this is important and Ross Moore has done quite some work on it. There is also a TUG-mailing list and a webpage \parencite{tugaccess} dedicated to this theme.
+
+But in my opinion missing are means to \emph{experiment} with tagging and accessibility. Means to try out, how difficult it is to tag some structures, means to try out, how much tagging is really needed (standards and validators don't need to be right \ldots), means to test what else is needed so that a pdf works e.g. with a screen reader. Without such experiments it is imho quite difficult to get a feeling about what has to be done, which kernel changes are needed, how packages should be adapted.
+
+This package tries to close this gap by offering \emph{core} commands to tag a pdf.\footnote{In case you don't know what this means: there will be some explanations later on.}
+
+My hope is that the knowledge gained by the use of this package will at the end allow to decide if and how code to do tagging should be part of the \LaTeX\ kernel.
+
+The package does not patch commands from other packages. It is also not an aim of the package to develop such patches. While at the end changes to various commands in many classes and packages will be needed to get tagged pdf files -- and the examples accompaigning the package try (or will try) to show various strategies -- these changes should in my opinion be done by the class, package and document writers themselves using a sensible API provided by the kernel and not by some external package that adds patches everywhere and would need constant maintenance -- one only need to look at packages like tex4ht or bidi or hyperref to see how difficult and sometimes fragile this is.
+
+So this package deliberately concentrates on the basics -- and this already quite a lot, there are much more details involved as I expected when I started.
+
+I'm sure that it has bugs. Bugs reports, suggestions and comments can be added to the issue tracker on github. \url{https://github.com/u-fischer/tagpdf}.
+
+Please also check the github site for new examples and improvements.
+
+
+\subsection{Tagging and accessibility}
+
+While the package is named \texttt{tagpdf} the goal is actually \emph{accessible} pdf-files. Tagging is \emph{one} requirement for accessibility but there are others. I will mention some later on in this documentation, and -- if sensible -- I will also try to add code, keys or tips for them.
+
+So the name of the package is a bit wrong. As excuse I can only say that it is shorter and easier to pronounce.
+
+
+\subsection{Engines and modes}
+
+The package works currently with pdflatex and lualatex.
+
+The package has two modes: the \emph{generic mode} which should work in theory with every engine and the \emph{lua mode} which works only with lualatex.
+
+I implemented the generic mode first. Mostly because my tex skills are much better than my lua skills and I wanted to get the tex side right before starting to fight with attributes and node traversing.
+
+While the generic mode is not bad and I spent quite some time to get it working I nevertheless think that the lua mode is the future and the only one that will be usable for larger documents. pdf is a page orientated format and so the ability of luatex to manipulate pages and nodes after the \TeX-processing is really useful here. Also with luatex characters are normally already given as unicode. The main problem with luatex is how to insert \enquote{fake spaces} between words.
+
+
+
+\subsection{References}
+
+My main reference was the free reference for pdf 1.7. \parencite{pdfreference}. This document is from 2006.
+
+In the meantime pdf 2.0. has been released. I know that it contains also for accessibility relevant changes. But the specification is not available for free, also currently imho neither pdftex nor luatex actually target the creation of pdf 2.0. So I'm ignoring this for the moment.
+
+
+\subsection{Validation}
+
+pdf's created with the commands of this package must be validated:
+
+\begin{itemize}
+\item One must check that the pdf is \emph{syntactically} correct. It is rather easy to create broken pdf: e.g. if a chunk is opened on one page but closed on the next page.
+\item One must check how good the requirements of the PDF/UA standard are followed \emph{formally}.
+\item One must check how good the accessibility is \emph{practically}.
+\end{itemize}
+
+Syntax validation and formal standard validation can be done with preflight of the (non-free) adobe acrobat.
+It can also be done also with the free PDF Accessibility Checker (PAC~3) \parencite{pac3}.
+There is also the validator veraPDF \parencite{verapdf}. But I didn't try it yet and have no idea if it is useful here.
+
+Practical validation is naturally the more complicated part. It needs screen reader, users which actually knows how to handle them, can test documents and can report where a pdf has real accessibility problems.
+
+\minisec{Preflight woes}
+
+Sadly validators can not be always trusted. As an example for an reason that I don't understand the adobe preflight don't like the list structure \texttt{L}.
+It is also possible that validators contradict: that the one says everything is okay, while the other complains.
+
+\subsection{Examples wanted!}
+To make the package usable examples are needed: example that demonstrates how various structures can be tagged and which patches are needed, examples for the test suite, examples that demonstrates problems.
+
+\begin{tcolorbox}
+Feedback, contribuations and corrections are welcome!
+\end{tcolorbox}
+
+All examples should use the tagpdfsetup key \PrintKeyName{uncompress} described in the next section so that uncompressed pdf are created and the internal objects and structures can be inspected and -- hopefully soon -- be compared by the l3build checks.
+
+
+\section{Setup}
+
+\minisec{Activation needed!}
+When the package is loaded it will -- apart from loading more packages and defining a lot of things -- not do anything. You will have to activate it with \verb+\tagpdfsetup+, see below.
+
+\subsection{Modes and package options}
+
+The package has two different modes: The \textbf{generic mode} works (in theory, currently only with pdftex and luatex) probably with all engines, the \textbf{lua mode} only with luatex. The differences between both modes will be described later. The mode can be set with package options:
+
+\DescribeKey{luamode}
+
+This is the default mode. It will use the generic mode if the document is processed with pdflatex and the lua mode with lualatex.
+
+\DescribeKey{genericmode}
+
+This will force the generic mode for all engines.
+
+\subsection{Setup and activation}\label{ssec:setup}
+
+The following command setups the general behaviour of the package.
+The command should be normally used only in the preamble (for a few keys it could also make sense to change them in the document).
+
+\DescribeMacro\tagpdfsetup{<key-val-list>}
+
+
+The key-val list understands the following keys:
+\begin{description}
+\item[\PrintKeyName{activate-mc} ] Boolean, initially false. Activates the code related to marked content.
+\item[\PrintKeyName{activate-struct}] Boolean, initially false. Activates the code related to structures. Should be used only if \PrintKeyName{activate-mc} has been used too.
+\item[\PrintKeyName{activate-tree}] Boolean, initially false. Activates the code related to trees. Should be used only if the two other keys has been used too.
+\item[\PrintKeyName{activate-all}] Boolean, initially false. Activates everything, that normally the sensible thing to do.
+\item[\PrintKeyName{add-new-tag}] See section \ref{sec:new-tag} for a description.
+\item[\PrintKeyName{check-tags}] Boolean, initially true. Activates some safety checks.
+\item[\PrintKeyName{compresslevel}] Value is an integer between 0 and 9. It sets both the pdfcompresslevel and the pdfobjcompresslevel.
+\item[\PrintKeyName{log}] Choice key, possible values \PrintKeyName{none}, \PrintKeyName{v}, \PrintKeyName{vv}, \PrintKeyName{vvv}, \PrintKeyName{all}. Setups the log level. Changing the value affects currently mostly the luamode: \enquote{higher} values gives more messages in the log. The current levels and messages have been setup in a quite ad-hoc manner and will need improvement.
+\item[\PrintKeyName{tabsorder}] Choice key, possible values are \PrintKeyName{row}, \PrintKeyName{column}, \PrintKeyName{structure}, \PrintKeyName{none}. This decides if a \verb+/Tabs+ value is written to the dictionary of the page objects. Not really needed for tagging itself, but one of the things you probably need for accessibility checks. So I added it. Currently the tabsorder is the same for all pages. Perhaps this should be changed \ldots.
+\item[\PrintKeyName{tagunmarked}] Boolean,\marginnote{luamode} initially true. When this boolean is true, the lua code will try to mark everything that has not been marked yet as an artifact. The benefit is that one doesn't have to mark up every deco rule oneself. The danger is that it perhaps marks things that shouldn't be marked -- it hasn't been tested yet with complicated documents containing annotations etc.
+\item[\PrintKeyName{uncompress}] Equivalent to using \texttt{compresslevel=0}.
+
+\end{description}
+
+\section{Tagging}
+
+pdf is a page orientated graphic format. It simply puts ink and glyphs at various coordinates on a page. A simple stream of a page can look like this\footnote{The appendix contains some remarks about the syntax of a pdf file}:
+
+\begin{lstlisting}[columns=fixed]
+stream
+ BT
+ /F27 14.3462 Tf %select font
+ 89.291 746.742 Td %move point
+ [(1)-574(Intro)-32(duction)]TJ %print text
+ /F24 10.9091 Tf %select font
+ 0 -24.35 Td %move point
+ [(Let's)-331(start)]TJ %print text
+ 205.635 -605.688 Td %move point
+ [(1)]TJ %print text
+ ET
+endstream
+\end{lstlisting}
+
+From this stream one can extract the characters and their placement on the page but not their semantic meaning (the first line is actually a section heading, the last the page number). And while in the example the order is correct there is actually no garanty that the stream contains the text in the order it should be read.
+
+Tagging means to enrich the pdf with information about the \emph{semantic} meaning and the \emph{reading order}. (Tagging can do more, one can also store all sorts of layout information like font properties and indentation with tags. But as I already wrote this package concentrates on the part of tagging that is needed to improve accessibility.)
+
+
+
+\subsection{Three tasks}
+To tag a pdf three tasks must be carried out:
+
+\begin{enumerate}
+\item \textbf{The mark-content-task}:\marginnote{mc-task} The document must add \enquote{labels} to the page stream which allows to identify and reference the various chunks of text and other content. This is the most difficult part of tagging -- both for the document writer but also for the package code. At first there can be quite many chunks as every one is a leaf node of the structure and so often a rather small unit. At second the chunks must be defined page-wise -- and this is not easy when you don't know where the page breaks are. At last some text is created automatically, e.g. the toc, references, citations, list numbers etc and it is not always easy to mark them correctly.
+
+\item \textbf{The structure-task}:\marginnote{struct-task} The document must declare the structure. This means marking the start and end of semantically connected portions of the document (correctly nested as a tree). This too means some work for the document writer, but less than for the mc-task: at first quite often the mc-task and the structure-task can be combined, e.g. when you mark up a list number or a tabular cell or a section header; at second one doesn't have to worry about page breaks so quite often one can patch standard environments to declare the structure. On the other side a number of structures end in \LaTeX\ only implicitly -- e.g. an item ends at the next item, so getting the pdf structure right still means that additional mark up must be added.
+
+\item \textbf{The tree management}:\marginnote{tree-task} At last the structure must be written into the pdf. For every structure an object of type \texttt{StructElem} must be created and flushed with keys for the parents and the kids. A parenttree must be created to get a reference from the mc-chunks to the parent structure. A rolemap must be written. And a number of dictionary entries. All this is hopefully done automatically and correctly by the package \ldots.
+\end{enumerate}
+
+\begin{figure}[t!]
+\begin{tcolorbox}
+
+\minisec{Page stream with marked content}
+
+\begin{tikzpicture}[baseline=(a.north),node distance=2pt,remember picture]
+\node(start){\ldots~\ldots~\ldots};
+\node[draw,base right = of start](a) {mc-chunk 1};
+\node[draw,base right = of a](b) {mc-chunk 2};
+\node[draw,base right = of b](c) {mc-chunk 3};
+\node[draw,base right = of c](d) {mc-chunk 3};
+\node[base right = of d] {\ldots~\ldots};
+\end{tikzpicture}
+
+
+
+\minisec{Structure}
+
+\newlength\ydistance\setlength\ydistance{-0.8cm}
+\begin{tikzpicture}[remember picture,baseline=(root.north)]
+
+\node[draw,anchor=base west] (root) at (0,0) {Sect (start section)};
+\node[draw,anchor=base west] at (0.3,\ydistance) {H (header section)};
+\node[draw,anchor=base west](aref) at (0.6,2\ydistance){mc-chunk 1};
+\node[draw,anchor=base west](bref) at (0.6,3\ydistance){mc-chunk 2};
+\node[draw,anchor=base west] at (0.3,4\ydistance){/H (end header)};
+\node[draw,anchor=base west] at (0.3,5\ydistance){P (start paragraph)};
+\node[draw,anchor=base west](cref) at (0.6,6\ydistance){mc-chunk 3};
+\node[draw,anchor=base west](dref) at (0.6,7\ydistance){mc-chunk 4};
+\node[draw,anchor=base west] at (0.3,8\ydistance){/P (end paragraph)};
+\node[draw,anchor=base west] at (0,9\ydistance){/Sect (end section)};
+\end{tikzpicture}
+
+\begin{tikzpicture}[remember picture, overlay]
+\draw[->,red](aref)-|(a);
+\draw[->,red](bref)-|(b);
+\draw[->,red](cref)-|(c);
+\draw[->,red](dref)-|(d);
+
+\end{tikzpicture}
+\end{tcolorbox}
+\caption{Schematical description of the relation between marked content in the page stream and the structure}
+\end{figure}
+
+
+\subsection{Task 1: Marking the chunks: the mark-content-step}
+
+To be able to refer to parts of the text in the structure, the text in the page stream must get \enquote{labels}. In the pdf reference they are called \enquote{marked content}. The three main variants needed here are:
+
+\begin{description}
+\item[Artifacts] They are marked with of a pair of keywords, \texttt{BMC} and \texttt{EMC} which surrounds the text. \texttt{BMC} has a single prefix argument, the fix tag name \texttt{/Artifact}. Artifacts should be used for irrelevant text and page content that should be ignored in the structure. Sadly it is often not possible to leave such text simply unmarked -- the accessibility tests in Acrobat and other validators complain.
+
+\begin{lstlisting}
+/Artifact BMC
+ text to be marked
+/EMC
+\end{lstlisting}
+
+\item[Artifacts with a type] They are marked with of a pair of keywords, \texttt{BDC} and \texttt{EMC} which surrounds the text. \texttt{BDC} has two arguments: again the tag name \texttt{/Artifact} and a following dictionary which allows to specify the suppressed info. Text in header and footer can e.g. be declared as pagination like this:
+
+\begin{lstlisting}
+/Artifact <</Type /Pagination>> BDC
+ text to be marked
+/EMC
+\end{lstlisting}
+
+\item[Content] Content is marked also with of a pair of keywords, \texttt{BDC} and \texttt{EMC}. The first argument of \texttt{BDC} is a tag name which describes the structural type of the text.\footnote{There is quite some redundancy in the specification here. The structural type is also set in the structure tree. One wonders if it isn't enough to use always \texttt{/SPAN} here.} Examples are \texttt{/P} (paragraph), \texttt{/H2} (header), \texttt{/TD} (table cell). The reference mentions a number of standard types but it is possible to add more or to use different names.
+
+
+ In the second argument of \texttt{BDC} -- in the property dictionary -- more data can be stored. \emph{Required} is an \texttt{/MCID}-key which takes an integer as a value:
+
+\begin{lstlisting}
+/H <</MCID 3>> BDC
+ text to be marked
+/EMC
+\end{lstlisting}
+
+This integer is used to identify the chunk when building the structure tree. The chunks are numbered by page starting with 0. As the numbers are also used as an index in an array they shouldn't be \enquote{holes} in the numbering system\footnote{It is perhaps possible to handle a numbering scheme not starting by 0 and having holes, but it will enlarge the pdf as one would need dummy objects.}.
+
+It is possible to add more entries to the property dictionary, e.g. a title, alternative text or a local language setting.
+\end{description}
+
+The needed markers can be added with low level code e.g. like this (in pdftex syntax):
+
+\begin{lstlisting}
+\pdfliteral page {/H <</MCID 3>> BDC}%
+ text to be marked
+\pdfliteral page {EMC}%
+\end{lstlisting}
+
+This sounds easy. But there are quite a number of traps.
+
+\begin{enumerate}[beginpenalty=10000]
+ \item Pdf is a page oriented format. And this means that the start \texttt{BDC}/\texttt{BMC} and the corresponding end \texttt{EMC} must be on the same page.
+ So marking e.g. a section title like in the following example won't always work as the literal before the section could end on the previous page:
+
+\begin{lstlisting}
+\pdfliteral page {/H <</MCID 3>> BDC} %problem: possible pagebreak here
+ \section{mysection}
+\pdfliteral page {EMC}%
+\end{lstlisting}
+ Using the literals \emph{inside} the section argument is better, but then one has to take care that they don't wander into the header and the toc.
+
+ \item Literals are \enquote{whatsits} nodes and can change spacing, page and line breaking. The literal \emph{behind} the section in the previous example could e.g. lead to a lonely section title at the end of the page.
+
+ \item The \texttt{/MCID} numbers must be unique on a page. So you can't use the literal in a saved box that you reuse in various places. This is e.\,g. a problem with \texttt{longtable} as it saves the table header and footer in a box.
+
+ \item The \texttt{/MCID}-chunks are leaf nodes in the structure tree, so they shouldn't be nested.
+
+ \item Often text in a document is created automatically or moved around: entries in the table of contents, index, bibliography and more. To mark these text chunks correctly one has to analyze the code creating such content to find suitable places to inject the literals.
+
+ \item The literals are inserted directly and not at shipout. This means that due to the asynchronous page breaking of \TeX\ the MCID-number can be wrong even if the counter is reset at every page (this package uses in generic mode a label-ref-system to get around this problem. This sadly means that three compilations are needed until everything has settled down).
+
+ \item There exist environments that process their content more than once -- examples are \texttt{align} and \texttt{tabularx}.
+ So one has to check for doublettes and holes in the counting system.
+
+ \item Pdf is a page oriented format. And this means that the start and the end marker must be on the same page \ldots\ \emph{so what to do with normal paragraphs that split over pages??}. This question will be handled in subsection~\ref{sec:splitpara}.
+\end{enumerate}
+
+\subsubsection{Generic mode versus lua mode in the mc-task}
+
+While in generic mode the commands insert the literals directly and so have all the problems described above the lua mode works quite differently: The tagging commands don't insert literals but set some \emph{attributes} which are attached to all the following nodes. When the page is shipped out some lua code is called which wanders through the shipout box and injects the literals at the places where the attributes changes.
+
+This means that quite a number of problems mentioned above are not relevant for the lua mode:
+
+\begin{enumerate}
+\item Pagebreaks between start and end of the marker are \emph{not} a problem. So you can mark a complete paragraph. If a pagebreak occur directly after an start marker or before an end marker this can lead to empty chunks in the pdf and so bloat up pdf a bit, but this is imho not really a problem (compared to the size increase by the rest of the tagging).
+\item The commands don't insert literals directly and so affect line and page breaking much less.
+\item The numbering of the MCID are done at shipout, so no label/ref system is needed.
+\item The code can do some marking automatically. Currently everything that has not been marked up by the document is marked as artifact. This can probably be extended and improved.
+\end{enumerate}
+
+\subsubsection{Commands to mark content and chunks}
+
+It\marginnote{Generic mode only} is vital that the end command is executed on the same page as the begin command. So think carefully how to place them.
+For strategies how to handle paragraphs that split over pages see subsection~\ref{sec:splitpara}.
+
+\ExplSyntaxOn
+\DescribeMacro\tagmcbegin{<key-val-list>}
+\DescribeMacro\uftag_mc_begin:n{<key-val-list>}
+\ExplSyntaxOff
+
+These commands insert the begin of the marked content code in the pdf. They don't start a paragraph. The user command additionally issues an \verb+\ignorespaces+ to suppress spaces after itself.
+Such markers should not be nested. The command will warn you if this happens.
+
+The key-val list understands the following keys:
+\begin{description}
+ \item[\PrintKeyName{tag}] This is required, unless you use the \PrintKeyName{artifact} key. The value of the key is normally one of the standard type listed in section \ref{sec:new-tag}. It is possible to setup new tags, see the same section.
+ \item[\PrintKeyName{artifact}] This will setup the marked content as an artifact. The key should be used for content that should be ignored. The key can take one of the values \PrintKeyName{pagination}, \PrintKeyName{layout}, \PrintKeyName{page}, \PrintKeyName{background} and \PrintKeyName{notype} (this is the default). Text in the header and footer should be marked with \PrintKeyName{artifact=pagination}.
+
+ It is not quite clear if rules and other decorative graphical objects needs to be marked up as artifacts. Acrobat seems not to mind if not, but PAC~3 complained.
+
+ The validators complain if some text is not marked up, but it is not quite clear if this is a serious problem.
+
+ The\marginnote{lua mode only} lua mode will mark up everything unmarked as \texttt{artifact=notype}. You can suppress this behaviour by setting the tagpdfsetup key \texttt{tagunmarked} to false. See section \ref{ssec:setup}.
+
+ \item[\PrintKeyName{stash}] Normally marked content will be stored in the \enquote{current} structure. This may not be what you want. As an example you may perhaps want to put a marginnote behind or before the paragraph it is in the tex-code. With this boolean key the content is marked but not stored in the kid-key of the current structure.
+
+ \item[\PrintKeyName{label}] This key sets a label by which you can call the marked content later in another structure (if it has been stashed with the previous key). Internally the label name will start with \texttt{tagpdf-}.
+
+ \item[\PrintKeyName{raw}] This key allows you to add more entries to the properties dictionary. The value must be correct, low-level pdf. E.g. \verb+raw=/Alt (Hello)+ will insert an alternative Text. (I will probably add keys for \texttt{/Alt} and \texttt{/Actualtext} later, but I haven't made up my mind regarding the encoding yes).
+\end{description}
+
+\ExplSyntaxOn
+\DescribeMacro\tagmcend
+\DescribeMacro\uftag_mc_end:
+\ExplSyntaxOff
+
+These commands insert the end code of the marked content. The user command also issues at first an \verb+\unskip+. Both commands check if there has been a begin marker and issue a warning if not.
+
+\ExplSyntaxOn
+\DescribeMacro\tagmcuse{<label name>}
+\DescribeMacro\uftag_mc_use:n {<label name>}
+\ExplSyntaxOff
+
+These commands allow you to record a marked content that you stashed away into the current structure. Be aware that a marked content can be used only once -- the command will warn you if you try to use it a second time.
+
+\ExplSyntaxOn
+\DescribeMacro\tagmcifinTF{<true code>}{<false>}
+\DescribeMacro\_uftag_mc_if_in:TF{<true code>}{<false>}
+\ExplSyntaxOff
+
+These commands check if a marked content is currently open and allows you to e.g. add the end marker if yes.
+
+\subsubsection{Tips}
+
+\begin{itemize}
+\item Mark commands inside floats should work fine (but need perhaps some compilation rounds in generic mode).
+\item In case you want to use it inside a \verb+\savebox+ (or some command that saves the text internally in a box): If the box is used directly, there is probably no problem. If the use is later, stash the marked content and add the needed \verb+\tagmcuse+ directly before oder after the box when you use it.
+\item Don't use a saved box with markers twice.
+\item If boxes are unboxed you will have to analyze the pdf to check if everything is ok.
+\item If you use complicated structures and commands (breakable boxes like the one from tcolorbox, multicol, many footnotes) you will have to check the pdf.
+ \end{itemize}
+
+\subsubsection{Math}
+
+Math is a problem. I have seen an example where \emph{every single symbol} has been marked up with tags from MathML along with an \texttt{/ActualText} entry and an entry with alternate text which describes how to read the symbol.
+The pdf then looked like this
+
+\begin{lstlisting}
+/mn <</MCID 6 /ActualText<FEFF0034>/Alt( : open bracket: four )>>BDC
+...
+/mn <</MCID 7 /ActualText<FEFF0033>/Alt( third s )>>BDC
+...
+/mo <</MCID 8 /ActualText<FEFF2062>/Alt( times )>>BDC
+\end{lstlisting}
+
+
+If this is really the way to go one would need some script to add the mark-up as doing it manually is too much work and would make the source unreadable -- at least with pdflatex and the generic mode. In lua mode is it probably possible to hook into the \texttt{mlist\_to\_hlist} callback and add marker automatically.
+
+But I'm not sure that this is the best way to do math. It looks rather odd that a document should have to tell a screen reader in such detail how to read an equation. It would be much more efficient, sensible and flexible if a complete representation of the equation in mathML could be stored in the pdf and the task how to read this aloud delegated to the screen reader. More investigations are needed here.
+
+\subsubsection{Split paragraphs}\label{sec:splitpara}
+
+A\marginnote{Generic mode only} problem are paragraphs with page breaks. As already mentioned the end marker \texttt{EMC} must be added on the same page as the begin marker. But it is in pdflatex \emph{very} difficult to inject something at the page break automatically. One can manipulate the shipout box to some extend in the output routine, but this is not easy and it gets even more difficult if inserts like footnotes and floats are involved: the end of the paragraph is then somewhere in the middle of the box.
+
+So with pdflatex in generic mode one currently has to do the splitting manually.
+
+The example \texttt{mc-manual-para-split} demonstrates how this can be done. The general idea is to use \verb+\vadjust+ in the right place:
+
+\begin{lstlisting}
+\tagmcbegin{tag=P}
+...
+fringilla, ligula wisi commodo felis, ut adipiscing felis dui in
+enim. Suspendisse malesuada ultrices ante.% page break
+\vadjust{\tagmcend\pagebreak\tagmcbegin{tag=P}}
+Pellentesque scelerisque
+...
+sit amet, lacus.\tagmcend
+\end{lstlisting}
+
+
+\subsection{Task 2: Marking the structure}
+The structure is represented in the pdf with a number of objects of type \texttt{StructElem} which build a tree: each of this objects points back to its parent and normally has a number of kid elements, which are either again structure elements or -- as leafs of the tree -- the marked contents chunks marked up with the \verb+tagmc+-commands. The root of the tree is the \texttt{StructTreeRoot}.
+
+\subsubsection{Structure types}
+The tree should reflect the \emph{semantic} meaning of the text. That means that the text should be marked as section, list, table head, table cell and so on. A number of standard structure types is predefined, see section \ref{sec:new-tag} but it is allowed to create more. If you want to use types of your own you must declare them. E.g. this declares two new types \texttt{TAB} and {FIG} and base them on \texttt{P}:
+
+\begin{lstlisting}
+\tagpdfsetup{
+ add-new-tag = TAB/P,
+ add-new-tag = FIG/P,
+ }
+\end{lstlisting}
+
+\subsubsection{Sectioning}
+The sectioning units can be structured in two ways: a flat, html-like and a more xml-like version.
+The flat version creates a structure like this:
+
+\begin{lstlisting}
+<H1>section header</H1>
+<P> text</P>
+<H2>subsection header</H2>
+...
+\end{lstlisting}
+
+So here the headers are marked according their level with \texttt{H1}, \texttt{H2}, etc.
+
+In the xml-like tree the complete text of a sectioning unit is surrounded with the \texttt{Sect} tag, and all headers
+with the tag \texttt{H}. Here the nesting defines the level of a sectioning header.
+
+\begin{lstlisting}
+<Sect>
+ <H>section header</H>
+ <P> text</p>
+ <Sect>
+ <H>subsection header</H>
+ ...
+ </Sect>
+</Sect>
+\end{lstlisting}
+
+The flat version is more \LaTeX-like and it is rather straightforward to patch \verb+\chapter+, \verb+\section+ and so on to insert the appropriates \texttt{H\ldots} start and end markers. The xml-like tree is more difficult to automate. If such a tree is wanted I would recommend to use -- like the context format -- explizit commands to start and end a sectioning unit.
+
+\subsubsection{Commands to define the structure}
+The following commands can be used to define the tree structure:
+
+\ExplSyntaxOn
+\DescribeMacro\tagstructbegin{key-val-list}
+\DescribeMacro\uftag_struct_begin:n {key-val-list}
+\ExplSyntaxOff
+
+These commands start a new structure.
+
+The key-val list understands the following keys:
+\begin{description}
+ \item[\PrintKeyName{tag}] This is required. The value of the key is normally one of the standard type listed in section \ref{sec:standard-types}. It is possible to setup new tags/types, see section \ref{sec:new-tags}.
+ \item[\PrintKeyName{stash}] Normally a new structure inserts itself as a kid into the currently active structure. This key prohibits this. The structure is nevertheless from now on \enquote{the current active structure} and parent for following marked content and structures.
+ \item[\PrintKeyName{label}] This key sets a label by which you can use the structure later in another structure. Internally the label name will start with \texttt{tagpdfstruct-}.
+ \item[\PrintKeyName{title},\PrintKeyName{alttext},\PrintKeyName{actualtext}] These keys allow to set the dictionary entries \texttt{/Title}, \texttt{/Alt} and \texttt{/Actualtext}. But I haven't yet decided which is the suitable format for the values, so currently you must ensure yourself that the values lead to valid pdf content.
+ \end{description}
+
+
+\ExplSyntaxOn
+\DescribeMacro\tagstructend
+\DescribeMacro\uftag_struct_end:
+\ExplSyntaxOff
+
+This ends a structure.
+
+\ExplSyntaxOn
+\DescribeMacro\tagstructuse{<label>}
+\DescribeMacro\uftag_struct_use:n {<label>}
+\ExplSyntaxOff
+
+These commands insert a structure previously stashed away as kid into the currently active structure. A structure should be used only once, if the structure already has a parent you will get a warning.
+
+\subsubsection{Root structure}
+
+A document should have at least one structure which contains the whole document. A suitable tag is \texttt{Document} or \texttt{Article}. I'm considering to automatically inserting it.
+
+
+\subsection{Task 3: tree Management}
+When all the document content has been correctly marked and the data for the trees has been collected they must be flushed to the pdf. This is done automatically (if the package has been activated) with the following command in \verb+\AfterEndDocument+:
+
+\ExplSyntaxOn
+\DescribeMacro\uftag_finish_structure:
+\ExplSyntaxOff
+
+This will hopefully write all the needed objects and values to the pdf. (Beside the already mentioned \texttt{StructTreeRoot} and \texttt{StructElem} objects, additionally a so-called \texttt{ParentTree} is needed which records the parents of all the marked contents bits, a \texttt{Rolemap} and a few more values and dictionaries).
+
+I'm not quite sure if this shouldn't be a really internal command.
+
+
+\subsection{A fully marked up document body}
+The following shows the marking need for a section, a sentence and a list with two items. It is obvious that one wouldn't want to do like this for real documents. If tagging should be usable, the commands must be hidden as much as possible inside suitable \LaTeX\ commands and enviroments.
+
+\begin{lstlisting}
+\begin{document}
+
+\tagstructbegin{tag=Document}
+
+ \tagstructbegin{tag=Sect}
+ \tagstructbegin{tag=H}
+ \tagmcbegin{tag=H} %avoid page break!
+ \section{Section}
+ \tagmcend
+ \tagstructend
+ \tagstructbegin{tag=P}
+ \tagmcbegin{tag=P,raw=/Alt (x)}
+ a paragraph\par x
+ \tagmcend
+ \tagstructend
+
+ \tagstructbegin{tag=L} %List
+ \tagstructbegin{tag=LI}
+ \tagstructbegin{tag=Lbl}
+ \tagmcbegin{tag=Lbl}
+ 1.
+ \tagmcend
+ \tagstructend
+ \tagstructbegin{tag=LBody}
+ \tagmcbegin{tag=P}
+ List item body
+ \tagmcend
+ \tagstructend %lbody
+ \tagstructend %Li
+
+ \tagstructbegin{tag=LI}
+ \tagstructbegin{tag=Lbl}
+ \tagmcbegin{tag=Lbl}
+ 2.
+ \tagmcend
+ \tagstructend
+ \tagstructbegin{tag=LBody}
+ \tagmcbegin{tag=P}
+ another List item body
+ \tagmcend
+ \tagstructend %lbody
+ \tagstructend %Li
+ \tagstructend %L
+
+ \tagstructend %Sect
+\tagstructend %Document
+\tagfinish
+\end{document}
+\end{lstlisting}
+
+
+\section{Standard type and new tags}\label{sec:new-tag}
+
+The pdf reference mentions a number of standard types:
+\ExplSyntaxOn
+\clist_use:Nn \c__uftag_role_sttags_clist {,\c_space_tl}
+
+\ExplSyntaxOff
+
+Their meaning can be looked up in the pdf-reference\footnote{\url{https://wwwimages2.adobe.com/content/dam/acom/en/devnet/pdf/pdf_reference_archive/pdf_reference_1-7.pdf
+}}.
+
+New tags can be defined in the setup command with the key \texttt{add-new-tag}. It takes a value consisting of two names separated by a slash. The first is the new name, the second a known (e.g. a standard) tag it should be mapped too. Example:
+
+\Macro\tagpdfsetup{add-new-type = section/H1}
+
+
+
+\section{Accessibility is not only tagging}
+
+ A tagged pdf is needed for accessibility but this is not enough. As already mentioned there are more requirements:
+ \begin{itemize}
+ \item The language must be declared by adding a \texttt{/Lang xx-XX} to the pdf catalog or -- if the language changes for a part of the text to the structure or the marked content -- this can be rather easily done with existing packages. %%UF mention some code
+ \item All characters must have an unicode representation or a suitable alternative text.
+ With lualatex and open type (unicode) fonts this is normally not a problem. With pdflatex it could need
+ \begin{verbatim}
+ \input{glyphtounicode}
+ \pdfgentounicode=1
+ \end{verbatim}
+ and perhaps some\verb+\pdfglyphtounicode+ commands.
+ \item Hard and soft hyphen must be distinct.
+ \item Spaces between words should be space glyphs and not only a horizontal movement.
+ \item Various small infos must be present in the catalog dictionary, info dictionary and the page dictionaries.
+ \end{itemize}
+
+ If suitable I will add code for this tasks to this packages. But some of them can also be done already with existing packages like hyperref, hyperxmp, pdfx.
+
+
+
+\section{To-do}
+\begin{itemize}
+\item Add commands and keys to enable/disable the checks.
+\item Check/extend the code for language tags.
+\item Think about math.
+\item Think about Links/Annotations
+\item Keys for alternative and actualtext. How to define the input encoding? Like in Accsupp?
+\item Check twocolumn documents
+\item Examples
+\item Write more Tests
+\item Write more Tests
+\item \enquote{Fake spaces}
+\item Unicode
+\item Hyphenation char
+\item Think about included (tagged) pdf. Can one handle them?
+\item Improve the documentation
+\item Tag as proof of concept the documentation
+\item Document the code better
+\item Create dtx
+\item Find someone to check and improve the lua code
+\item Move more things to lua in the luamode
+\item Find someone to check and improve the rest of the code
+\item bidi?
+\end{itemize}
+
+\printbibliography
+
+\appendix
+
+\section{Some remarks about the pdf syntax}
+
+This is not meant as a full reference only as a background to make the examples and remarks easier to understand.
+
+\begin{description}
+\item[postfix notation] pdf uses in various places postfix notation. This means that the operator is behind its arguments:
+
+\begin{tikzpicture}[baseline=(c.base)]
+\node[arg](a1) {18};
+\node[arg,right=of a1.east](a2) {0};
+\node[operator,right= of a2.east](c) {obj};
+\draw[->] (c.south) --++(0,-2mm) -| (a1);
+\draw[->] (c.south) --++(0,-2mm) -| (a2);
+\end{tikzpicture}
+
+\begin{tikzpicture}[baseline=(c.base)]
+\node[arg](a1) {18};
+\node[arg,right=of a1.east](a2) {0};
+\node[operator,right= of a2.east](c) {R};
+\draw[->] (c.south) --++(0,-2mm) -| (a1);
+\draw[->] (c.south) --++(0,-2mm) -| (a2);
+\end{tikzpicture} (a reference (operator R) to an object
+
+
+\begin{tikzpicture}[baseline=(c.base)]
+\node[arg](a1) {1};
+\node[arg,right = of a1.east](a2) {0};
+\node[arg,right = of a2.east](a3) {0};
+\node[arg,right = of a3.east](a4) {1};
+\node[arg,right = of a4.east](a5) {100.2};
+\node[arg,right = of a5.east](a6) {742};
+\node[operator,right = of a6.east](c) {Tm};
+\draw[->] (c.south) --++(0,-2mm) -| (a6);
+\draw[->] (c.south) --++(0,-2mm) -| (a5);
+\draw[->] (c.south) --++(0,-2mm) -|(a4);
+\draw[->] (c.south) --++(0,-2mm) -|(a3);
+\draw[->] (c.south) --++(0,-2mm) -| (a2);
+\draw[->] (c.south) --++(0,-2mm) -|(a1);
+\end{tikzpicture}
+
+\begin{tikzpicture}[baseline=(c.base)]
+\node[arg](a1) {/P};
+\node[arg,right = of a1.east](a2) {<</MCID 0>>};
+\node[operator,right = of a2.east](c) {BDC};
+\draw[->] (c.south) --++(0,-2mm) -| (a1);
+\draw[->] (c.south) --++(0,-2mm) -| (a2);
+\end{tikzpicture}
+
+\item[Names] pdf knows a sort of variable called a \enquote{name}. Names start with a slash and may include any regular characters, but not delimiter or white-space characters. Uppercase and lowercase letters are considered distinct: \texttt{/A} and \texttt{/a} are different names. \verb+/.notdef+ and \verb+/Adobe#20Green+ are valid names.
+
+ Quite a number of the options of \texttt{tagpdf} actually define such a name which is later added to the pdf. I recommend \emph{strongly} not to use spaces and exotic chars in such names. While it is possible to escape such names it is rather a pain when moving them through the various lists and commands and quite probably I forgot some place where it is needed.
+
+\item[Strings] There are two types of strings: \emph{Literal strings} are enclosed in round parentheses. They normally contain a mix of ascii chars and octal numbers:
+
+ \verb+(gr\374\377ehello[]\050\051)+.
+
+ \emph{Hexadezimal strings} are enclosed in angle brackets. They allow for a representation of all characters the whole unicode ranges. This is the default output of lualatex.
+
+ \texttt{<003B00600243013D0032>}.
+
+\item[Arrays] Arrays are enclosed by square brackets. They can contain all sort of objects including more arrays. As an example here an array which contains five objects: a number, an object reference, a string, a dictionary and another array. Be aware that despite the spaces \texttt{15 0 R} is \emph{one} element of the array.
+
+ \mbox{\texttt{[0 15 0 R (hello) <</Type /X>> [1 2 3]]}}
+
+ \begin{tikzpicture}[baseline=(c.base)]
+ \node[arg](a1) {0};
+ \node[arg,right = of a1.east](a2) {15 0 R};
+ \node[arg,right = of a2.east](a3) {(hello)};
+ \node[arg,right = of a3.east](a4) {<</Type /X>>};
+ \node[arg,right = of a4.east](a5) {[1 2 3]};
+ \end{tikzpicture}
+
+
+\item[Dictionaries] Dictionaries are enclosed by double angle brackets. They contain key-value pairs. The key is always a name. The value can be all sort of objects including more dictionaries. It doesn't matter in which order the keys are given.
+
+ Dictionaries can be written all in one line:\\ \texttt{<</Type/Page/Contents 3 0 R/Resources 1 0 R/Parent 5 0 R>>}
+ but at least for examples a layout with line breaks and indentation is more readable:
+
+ \begin{verbatim}
+ <<
+ /Type /Page
+ /Contents 3 0 R
+ /Resources 1 0 R
+ /MediaBox [0 0 595.276 841.89]
+ /Parent 5 0 R
+ >>
+ \end{verbatim}
+
+
+ \item[(indirect) objects] These are enclosed by the keywords \texttt{obj} (which has two numbers as prefix arguments) and \texttt{endobj}. The first argument is the object number, the second a generation number -- if a pdf is edited objects with a larger generation number can be added. As with pdflatex/lualatex the pdf is always new we can safely assume that the number is always 0. Objects can be referenced in other places with the \texttt{R} operator. The content of an object can be all sort of things.
+
+ \item[streams] A stream is a sequence of bytes. It can be long and is used for the real content of pdf: text, fonts, content of graphics.
+ A stream starts with a dictionary which at least sets the \texttt{/Length} name to the length of the stream followed by the stream content enclosed by the keywords \texttt{stream} and \texttt{endstream}
+
+ Here an example of a stream, an object definition and reference. In the object 2 (a page object) the \texttt{/Contents} key references the object 3 and this then contains the text of the page in a stream. \texttt{Tf}, \texttt{Tm} and \texttt{TJ} are (postfix) operators, the first chooses the font with the name \texttt{/F15} at the size 10.9, the second displaces the reference point on the page and the third inserts the text.
+
+ \begin{verbatim}
+ % a page object (shortened)
+ 2 0 obj
+ <<
+ /Type/Page
+ /Contents 3 0 R
+ /Resources 1 0 R
+ ...
+ >>
+ endobj
+
+ %the /Contents object (/Length value is wrong)
+ 3 0 obj
+ <</Length 153 >>
+ stream
+ BT
+ /F15 10.9 Tf 1 0 0 1 100.2 746.742 Tm [(hello)]TJ
+ ET
+ endstream
+ endobj
+ \end{verbatim}
+
+ In such a stream the \texttt{BT}--\texttt{ET} pair encloses texts while drawing and graphics are outside of such pairs.
+
+\item[Number tree] This is a more complex data structure that is meant to index objects by numbers. In the core is an array with number-value pairs. A simple version of number tree which has the keys 0 and 3 is
+\begin{verbatim}
+6 0 obj
+<<
+ /Nums [
+ 0 [ 20 0 R 22 0 R]
+ 3 21 0 R
+ ]
+>>
+endobj
+\end{verbatim}
+
+This maps 0 to an array and 2 to the object reference \texttt{21 0 R}. Number trees can be split over various nodes -- root, intermediate and leaf nodes. We will need such a tree for the \emph{parent tree}.
+
+\end{description}
+
+\end{document}
+%http://msf.mathmlcloud.org/file_formats/8 %sample pdf for math
+
Property changes on: trunk/Master/texmf-dist/doc/latex/tagpdf/tagpdf.tex
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-checks-code.sty
===================================================================
--- trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-checks-code.sty (rev 0)
+++ trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-checks-code.sty 2018-07-05 21:45:46 UTC (rev 48146)
@@ -0,0 +1,177 @@
+\ProvidesExplPackage {tagpdf-checks-code} {2018/07/04} {0.1}
+ {part of tagpdf - code related to checks and messages}
+
+
+%messages
+
+% mc
+\msg_new:nnn { tagpdf } { mc-nested } { nested~marked~content~found~-~mcid~#1 }
+\msg_new:nnn { tagpdf } { mc-tag-missing } { required~tag~missing~-~mcid~#1 }
+\msg_new:nnn { tagpdf } { mc-label-unknown }{ label~#1~unknown~-~rerun }
+\msg_new:nnn { tagpdf } { mc-used-twice } { mc~#1~has~been~already~used }
+\msg_new:nnn { tagpdf } { mc-not-open } { there~is~no~mc~to~end~at~#1 }
+
+% structures
+\msg_new:nnn { tagpdf } { struct-no-objnum } { objnum~missing~for~structure~#1 }
+\msg_new:nnn { tagpdf } { struct-faulty-nesting } { there~is~no~open~structure~on~the~stack }
+\msg_new:nnn { tagpdf } { struct-missing-tag } { a~structure~must~have~a~tag! }
+\msg_new:nnn { tagpdf } { struct-show-closing } { closing~structure~#1~tagged~\prop_item:cn{g__uftag_struct_#1_prop}{S} }
+\msg_new:nnn { tagpdf } { struct-used-twice} { structure~with~label~#1~has~already~been~used}
+\msg_new:nnn { tagpdf } { struct-label-unknown} { structure~with~label~#1~is~unknown~rerun}
+
+
+%Roles
+\msg_new:nnn { tagpdf } { role-missing } { tag~#1~has~no~role~assigned }
+\msg_new:nnn { tagpdf } { role-unknown } { role~#1~is~not~known }
+\msg_new:nnn { tagpdf } { role-unknown-tag } { tag~#1~is~not~known }
+\msg_new:nnn { tagpdf } { role-new-tag } { adding~new~tag~#1~mapped~to~role~#2 }
+
+
+% trees
+\msg_new:nnn { tagpdf } {tree-mcid-index-wrong } {something~is~wrong~with~the~mcid}
+
+% obj
+\msg_new:nnn { tagpdf } {obj-write-num } {write~obj~#1~to~pdf}
+
+
+%checks
+%structures
+
+\cs_new:Nn \__uftag_check_structure_has_tag:n %#1 struct num
+ {
+ \prop_if_in:cnF { g__uftag_struct_#1_prop }
+ {S}
+ {
+ \msg_error:nn { tagpdf }{ struct-missing-tag }
+ }
+ }
+
+\cs_new:Nn \__uftag_check_info_closing_struct:n %#1 struct num
+ {
+ \msg_info:nnn {tagpdf}{struct-show-closing}{#1}
+ }
+
+\cs_generate_variant:Nn \__uftag_check_info_closing_struct:n {o,x}
+
+\cs_new:Nn \__uftag_check_no_open_struck:
+ {
+ \msg_error:nn {tagpdf}{ struct-faulty-nesting }
+ }
+
+\cs_new:Nn \__uftag_check_struct_used:n %#1 label
+ {
+ \prop_get:cnNT
+ {g__uftag_struct_\zref at extractdefault{tagpdfstruct-#1}{tagstruct}{unknown}_prop}
+ {P}
+ \l_tmpa_tl
+ {\msg_warning:nnn {tagpdf}{struct-used-twice}{#1}}
+ }
+
+%roles
+
+\cs_new:Nn \__uftag_check_add_tag_role:nn %#1 tag, #2 role
+ {
+ \tl_if_empty:nTF { #2 }
+ {
+ \msg_warning:nnn { tagpdf }{ role-missing } { #1 }
+ }
+ {
+ \prop_get:NnNF \g__uftag_role_tags_prop { #2 } \l_tmpa_tl
+ {
+ \msg_warning:nnn { tagpdf }{ role-unknown } { #2 }
+ }
+ {
+ \msg_info:nnnn { tagpdf }{ role-new-tag } { #1 }{ #2 }
+ }
+ }
+ }
+
+%mc
+\cs_new:Nn \__uftag_check_mc_if_nested:
+ {
+ \_uftag_mc_if_in:T
+ {
+ \msg_warning:nnx {tagpdf}{mc-nested}{ \__uftag_get_mc_abs_cnt: }
+ }
+ }
+
+\cs_new:Nn \__uftag_check_mc_if_open:
+ {
+ \_uftag_mc_if_in:F
+ {
+ \msg_warning:nnx {tagpdf}{mc-not-open}{ \__uftag_get_mc_abs_cnt: }
+ }
+ }
+
+\cs_new:Nn \__uftag_check_mc_tag:N
+ {
+ \tl_if_empty:NT #1
+ {
+ \msg_error:nnx {tagpdf}{mc-tag-missing}{ \__uftag_get_mc_abs_cnt: }
+ }
+ \prop_if_in:NoF \g__uftag_role_tags_prop { #1 }
+ {
+ \msg_warning:nnx {tagpdf}{role-unknown-tag} { #1 }
+ }
+ }
+
+\seq_new:N \g__uftag_check_mc_used_seq
+\cs_new:Nn \__uftag_check_mc_used:n
+ {
+ \seq_if_in:NnTF \g__uftag_check_mc_used_seq { #1 }
+ {
+ \msg_warning:nnn {tagpdf}{mc-used-twice}{ #1 }
+ }
+ {
+ \seq_gput_right:Nx\g__uftag_check_mc_used_seq { #1 }
+ }
+ }
+
+
+
+\cs_new:Nn \__uftag_check_show_MCID_by_page:
+ {
+ \tl_set:Nx \l__uftag_tmpa_tl
+ {
+ \zref at extractdefault
+ {LastPage} {abspage} {-1}
+ }
+ \int_step_inline:nnnn {1}{1}
+ {
+ \l__uftag_tmpa_tl
+ }
+ {
+ \seq_clear:N \l_tmpa_seq
+ \int_step_inline:nnnn {1}{1}
+ {
+ \zref at extractdefault
+ {LastPage} {tagmcabs} {-1}
+ }
+ {
+ \int_compare:nT
+ {
+ \zref at extractdefault
+ {mcid-####1} {tagabspage} {-1}
+ = ##1
+ }
+ {
+ \seq_gput_right:Nx \l_tmpa_seq
+ {
+ Page##1-####1-\zref at extractdefault
+ {mcid-####1} {tagmcid} {-1}
+ }
+ }
+ }
+ \seq_show:N \l_tmpa_seq
+ }
+ }
+
+\cs_new:Nn\__uftag_check_record_pdfobj_num:n
+ {
+ \int_compare:nT {\l__uftag_loglevel_int >= 3 }
+ {
+ \msg_info:nnx { tagpdf }{obj-write-num}{#1}
+ }
+ }
+
+\endinput
Property changes on: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-checks-code.sty
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-luatex.def
===================================================================
--- trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-luatex.def (rev 0)
+++ trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-luatex.def 2018-07-05 21:45:46 UTC (rev 48146)
@@ -0,0 +1,184 @@
+\ProvidesExplFile {tagpdf-luatex.def} {2018/07/04} {0.1}
+ {tagpdf driver for luatex}
+
+\newattribute \g__uftag_mc_type_attr %the value represent the type
+\newattribute \g__uftag_mc_cnt_attr %will hold the \c at g__uftag_MCID_abs_int value
+
+% The lua code
+\directlua { tagpdf=require('tagpdf.lua') }
+
+%%%% driver (lualatex) commands
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\cs_new:Nn \__uftag_pdfliteral_page:n {\__uftag_tex_pdfextension:D literal~page {#1}}
+\cs_new:Nn \__uftag_pdfcatalog:n {\__uftag_tex_pdfextension:D catalog {#1}}
+
+%reserve an object num
+\cs_new:Nn \__uftag_pdfreserveobjnum:N
+ {
+ % #1 = macro name to be populated with object number
+ \__uftag_tex_pdfextension:D~obj~reserveobjnum
+ \tl_set:Nx #1 { \__uftag_tex_pdffeedback:D lastobj }%
+ }
+
+% use an object num
+\cs_new:Nn \__uftag_pdfuseobjnum:Nn
+ {
+ % #1 = macro with object number to be populated
+ % #2 = object contents, as valid PDF
+ % should #2 be filtered through \pdfstringdef ???
+ \__uftag_tex_immediate:D \__uftag_tex_pdfextension:D~obj~useobjnum~#1~{ #2 }%
+ }
+
+\cs_new:Nn \__uftag_pdfuseobjnum:nn
+ {
+ % #1 = a number
+ % #2 = object contents, as valid PDF
+ % should #2 be filtered through \pdfstringdef ???
+ \__uftag_tex_immediate:D \__uftag_tex_pdfextension:D~obj~useobjnum~#1~{ #2 }%
+ }
+
+% obj num of the dictionary for a page:
+% the page count starts by 1
+% pages can referenced before they are actually created
+% no error if the page later doesn't exist
+
+\cs_new:Nn \__uftag_store_pdfpageref:Nn
+ {
+ % #1 = macro name to be populated with current page object number
+ % #2 = number or counter identifying the required page
+ \tl_set:Nx #1 { \__uftag_tex_pdffeedback:D pageref~#2 }%
+ }
+
+% a global version of the command
+\cs_new:Nn \__uftag_gstore_pdfpageref:Nn
+ {
+ \tl_gset:Nx #1 { \__uftag_tex_pdffeedback:D pageref~#2 }%
+ }
+
+\cs_new:Nn \__uftag_pdfobj:Nn
+ {
+ % #1 = macro name to refer to this object
+ % #2 = object contents, as valid PDF
+ % should #2 be filtered through \pdfstringdef ???
+ \__uftag_tex_immediate:D \__uftag_tex_pdfextension:D~obj~{ #2 }
+ \tl_set:Nx #1 { \__uftag_tex_pdffeedback:D~lastobj }%
+ }
+
+% pdfpage**s**attr: for all pages
+%% is global needed? Yes. Without it the setting is lost if issued in a group
+%
+\cs_new:Nn \__uftag_gset_pdfpagesattr:n
+ {
+ \__uftag_tex_global:D \__uftag_tex_pdfvariable:D~pagesattr { #1 }
+ }
+
+\cs_new:Nn \__uftag_gadd_pdfpagesattr:n
+ {
+ \exp_args:No \__uftag_gset_pdfpagesattr:n { \__uftag_tex_the:D \__uftag_tex_pdfvariable:D~pagesattr #1}
+ }
+
+% pdfpageattr for one page
+% do we need an immediate version??
+\cs_new:Nn \__uftag_gset_pdfpageattr:n
+ {
+ \__uftag_tex_global:D \__uftag_tex_pdfvariable:D~pageattr { #1 }
+ }
+
+\cs_new:Nn \__uftag_gadd_pdfpageattr:n
+ {
+ \exp_args:No \__uftag_gset_pdfpageattr:n { \__uftag_tex_the:D \__uftag_tex_pdfvariable:D~pageattr #1}
+ }
+
+\cs_new:Nn \__uftag_get_pdfpageattr:N
+ {
+ \tl_set:No #1 { \__uftag_tex_the:D \__uftag_tex_pdfvariable:D~pageattr }
+ }
+
+
+\cs_new:Nn \__uftag_pdfcompresslevel:n
+ {
+ \__uftag_tex_pdfvariable:D compresslevel #1
+ }
+
+\cs_new:Nn \__uftag_pdfobjcompresslevel:n
+ {
+ \__uftag_tex_pdfvariable:D objcompresslevel #1
+ }
+
+% I probably want also lua tables
+% I put them in the uftag.tables namespaces
+% The tables will be named like the variables but without backslash
+% To access such a table with a dynamical name create a string and then use
+% uftag.tables[string]
+% Old code, I'm not quite sure if this was a good idea. Now I have mix of table in
+% utftag.tables and uftag.mc/struct. And a lot is probably not needed.
+
+\cs_new:Nn \__uftag_luatex_get_table_name:Nn
+ {
+ \tl_set_rescan:Nnn #1 { \char_set_catcode_ignore:N \\ } { #2 }
+ }
+
+
+\cs_new:Nn \__uftag_prop_new:N
+ {
+ \prop_new:N #1
+ \__uftag_luatex_get_table_name:Nn \l_tmpa_tl { #1 }
+ \directlua { uftag.tables.\l_tmpa_tl = {} }
+ }
+
+
+\cs_new:Nn \__uftag_seq_new:N
+ {
+ \seq_new:N #1
+ \__uftag_luatex_get_table_name:Nn \l_tmpa_tl { #1 }
+ \directlua { uftag.tables.\l_tmpa_tl = {} }
+ }
+
+
+\cs_new:Nn \__uftag_prop_gput:Nnn
+ {
+ \prop_gput:Nnn #1 { #2 } { #3 }
+ \__uftag_luatex_get_table_name:Nn \l_tmpa_tl { #1 }
+ \directlua { uftag.tables.\l_tmpa_tl["#2"] = "#3" }
+ }
+
+
+\cs_new:Nn \__uftag_seq_gput_right:Nn
+ {
+ \seq_gput_right:Nn #1 { #2 }
+ \__uftag_luatex_get_table_name:Nn \l_tmpa_tl { #1 }
+ \directlua { table.insert(uftag.tables.\l_tmpa_tl, "#2") }
+ }
+
+%Hm not quite sure about the naming
+
+\cs_new:Npn \__uftag_seq_item:cn #1 #2
+ {
+ \directlua { tex.print(uftag.tables.#1[#2]) }
+ }
+
+\cs_new:Npn \__uftag_prop_item:cn #1 #2
+ {
+ \directlua { tex.print(uftag.tables.#1["#2"]) }
+ }
+
+%for debugging commands that show both the seq/prop and the lua tables
+\cs_new:Nn \__uftag_seq_show:N
+ {
+ \seq_show:N #1
+ \__uftag_luatex_get_table_name:Nn \l_tmpa_tl { #1 }
+ \directlua { uftag.trace.log ("lua~sequence~array~\l_tmpa_tl",1) }
+ \directlua { uftag.trace.show_seq (uftag.tables.\l_tmpa_tl) }
+ }
+
+\cs_new:Nn \__uftag_prop_show:N
+ {
+ \prop_show:N #1
+ \__uftag_luatex_get_table_name:Nn \l_tmpa_tl { #1 }
+ \directlua {uftag.trace.log ("lua~property~table~\l_tmpa_tl",1) }
+ \directlua {uftag.trace.show_prop (uftag.tables.\l_tmpa_tl) }
+ }
+
+
+\endinput
Property changes on: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-luatex.def
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-mc-code-generic.sty
===================================================================
--- trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-mc-code-generic.sty (rev 0)
+++ trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-mc-code-generic.sty 2018-07-05 21:45:46 UTC (rev 48146)
@@ -0,0 +1,223 @@
+\ProvidesExplPackage {tagpdf-mc-code-generic} {2018/07/04} {0.1}
+ {part of tagpdf - code related to marking chunks - generic mode}
+
+% for the label system
+% tagmcid is the id which should be also in the pdf
+% it will be (hopefully) reset by page
+
+\int_new:N \g__uftag_MCID_tmp_bypage_int
+\zref at newprop {tagmcid} [0] { \int_use:N \g__uftag_MCID_tmp_bypage_int }
+\zref at addprop {tagpdf} {tagmcid}
+
+% will hold the current maximum on a page
+% it will contain key-value of type "abspagenum=max mcid on this page"
+\__uftag_prop_new:N \g__uftag_MCID_byabspage_prop
+
+
+% this are the low level mc command.
+% they insert literals and so a are specific to generic mode
+% checking if the type is defined will done somewhere else
+% #1 is the type/tag
+\cs_new:Nn \__uftag_mc_bmc:n
+ {
+ \__uftag_pdf_escape_name:Nn\l_uttag_tmpa_tl { #1 }
+ \__uftag_pdfliteral_page:n
+ {
+ /\l_uttag_tmpa_tl\c_space_tl BMC
+ }
+ }
+
+\cs_new:Nn \__uftag_mc_emc:
+ {
+ \__uftag_pdfliteral_page:n
+ {
+ EMC
+ }
+ }
+
+% #1 tag, #2 properties
+% I escape the name. But the dictionary content
+% must imho be done at a higher level
+\cs_new:Nn \__uftag_mc_bdc:nn
+ {
+ \__uftag_pdf_escape_name:Nn\l_uttag_tmpa_tl { #1 }
+ \__uftag_pdfliteral_page:n
+ {
+ /\l_uttag_tmpa_tl\c_space_tl<<#2>>~BDC
+ }
+ }
+
+
+% bdc with /MCID + more properties
+% we need a ref-label system to ensure that the cnt restarts at 0 on a new page
+
+\tl_new:N \l__uftag_mc_ref_abspage_tl %will store the abspage value of label
+\tl_new:N \l__uftag_mc_tmp_tl
+
+\cs_new:Nn \__uftag_mc_bdc_mcid:nn
+ {
+ \int_gincr:N \c at g__uftag_MCID_abs_int
+ \tl_set:Nx \l__uftag_mc_ref_abspage_tl
+ {
+ \zref at extractdefault %3 args
+ {
+ mcid-\int_use:N \c at g__uftag_MCID_abs_int
+ }
+ {tagabspage}
+ {-1}
+ }
+ \prop_get:NoNTF
+ \g__uftag_MCID_byabspage_prop
+ {
+ \l__uftag_mc_ref_abspage_tl
+ }
+ \l__uftag_mc_tmp_tl
+ {
+ %key already present, use value for MCID and add 1 for the next
+ \int_gset:Nn \g__uftag_MCID_tmp_bypage_int { \l__uftag_mc_tmp_tl }
+ \__uftag_prop_gput:Nxx
+ \g__uftag_MCID_byabspage_prop
+ { \l__uftag_mc_ref_abspage_tl }
+ { \int_eval:n {\l__uftag_mc_tmp_tl +1} }
+ }
+ {
+ %key not present, set MCID to 0 and insert 1
+ \int_gzero:N \g__uftag_MCID_tmp_bypage_int
+ \__uftag_prop_gput:Nxx
+ \g__uftag_MCID_byabspage_prop
+ { \l__uftag_mc_ref_abspage_tl }
+ {1}
+ }
+ \zref at labelbylist
+ {
+ mcid-\int_use:N \c at g__uftag_MCID_abs_int
+ }
+ {tagpdf}
+ \__uftag_mc_bdc:nn
+ { #1 }
+ { /MCID~\int_eval:n { \g__uftag_MCID_tmp_bypage_int }~#2 }
+ }
+
+% only /MCID
+\cs_new:Nn \__uftag_mc_bdc_mcid:n
+ {
+ \__uftag_mc_bdc_mcid:nn { #1 } {}
+ }
+
+%artifact without type
+\cs_new:Nn \__uftag_mc_bmc_artifact:
+ {
+ \__uftag_mc_bmc:n {Artifact}
+ }
+
+%artifact with a type:
+\cs_new:Nn \__uftag_mc_bmc_artifact:n
+ {
+ \__uftag_mc_bdc:nn {Artifact}{/Type\c_space_tl/#1}
+ }
+
+% perhaps later: more properties for artifacts
+
+
+% keyval definitions for the user commands:
+
+\tl_new:N \l__uftag_mc_key_tag_tl
+
+%Attention! definitions are different in luamode.
+\keys_define:nn { tagpdf / mc }
+{
+ tag .code:n = % the name (H,P,Span etc
+ {
+ \__uftag_pdf_escape_name:Nn \l__uftag_mc_key_tag_tl { #1 }
+ },
+ raw .tl_set:N = \l__uftag_mc_key_properties_tl,
+ label .tl_set:N = \l__uftag_mc_key_label_tl,
+ artifact .meta:n = { artifact-bool, artifact-type=#1 },
+ artifact .default:n = {notype}
+}
+
+\cs_new:Nn \__uftag_mc_handle_artifact:N %#1 contains the artifact type
+ {
+ \tl_if_empty:NTF #1
+ { \__uftag_mc_bmc_artifact: }
+ { \exp_args:No\__uftag_mc_bmc_artifact:n { #1 } }
+ }
+
+\cs_new:Nn \__uftag_mc_handle_mcid:nn %#1 tag, #2 properties
+ {
+ \__uftag_mc_bdc_mcid:nn { #1 }{ #2 }
+ }
+
+
+% puts the absolute number of an mcid in the current structure
+\cs_new:Nn \__uftag_mc_handle_stash:n %1 mcidnum
+ {
+ \__uftag_check_mc_used:n { #1 }
+ \__uftag_struct_kid_mc_gput_right:nn
+ { \g__uftag_struct_stack_current_tl }
+ { #1 }
+ \prop_gput:Nxx \g__uftag_mc_parenttree_prop
+ { #1 }
+ { \g__uftag_struct_stack_current_tl }
+ }
+
+\cs_new:Nn \uftag_mc_begin:n
+ {
+ \group_begin:
+ \__uftag_check_mc_if_nested:
+ \bool_gset_true:N \g__uftag_in_mc_bool
+ \keys_set:nn { tagpdf / mc }{ #1 }
+ \bool_if:NTF \l__uftag_mc_artifact_bool
+ { %handle artifact
+ \__uftag_mc_handle_artifact:N \l__uftag_mc_artifact_type_tl
+ }
+ { %handle mcid type
+ \__uftag_check_mc_tag:N \l__uftag_mc_key_tag_tl
+ \__uftag_mc_handle_mcid:nn { \l__uftag_mc_key_tag_tl }{\l__uftag_mc_key_properties_tl}
+ \tl_if_empty:NF {\l__uftag_mc_key_label_tl}
+ {
+ \__uftag_mc_handle_mc_label:n { \l__uftag_mc_key_label_tl }
+ }
+ \bool_if:NF \l__uftag_mc_key_stash_bool
+ {
+ \__uftag_mc_handle_stash:n { \int_use:N \c at g__uftag_MCID_abs_int }
+ }
+ }
+ \group_end:
+ }
+
+\cs_new:Nn\uftag_mc_end:
+ {
+ \__uftag_check_mc_if_open:
+ \bool_gset_false:N \g__uftag_in_mc_bool
+ \__uftag_mc_emc:
+ }
+
+\cs_new:Nn\uftag_mc_use:n %#1: label name
+ {
+ \tl_set:Nx \l_tmpa_tl { \zref at extractdefault{tagpdf-#1}{tagmcabs}{} }
+ \tl_if_empty:NTF\l_tmpa_tl
+ {
+ \msg_warning:nnn {tagpdf} {mc-label-unknown} { #1 }
+ }
+ {
+ \prop_gput:Nxx
+ \g__uftag_mc_parenttree_prop
+ {
+ \zref at extractdefault{tagpdf-#1}{tagmcabs}{}
+ }
+ {
+ \g__uftag_struct_stack_current_tl
+ }
+ \__uftag_struct_kid_mc_gput_right:nn
+ {
+ \g__uftag_struct_stack_current_tl
+ }
+ {
+ \zref at extractdefault{tagpdf-#1}{tagmcabs}{}
+ }
+ }
+ }
+
+
+\endinput
Property changes on: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-mc-code-generic.sty
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-mc-code-lua.sty
===================================================================
--- trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-mc-code-lua.sty (rev 0)
+++ trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-mc-code-lua.sty 2018-07-05 21:45:46 UTC (rev 48146)
@@ -0,0 +1,172 @@
+\ProvidesExplFile {tagpdf-mc-code-lua.sty} {2018/07/04} {0.1}
+ {tagpdf - mc code only for the luamode }
+
+%the two attibutes are defined in the driver file.
+%it also load the lua (as it can also contain functions needed by generic mode.
+%\newattribute \g__uftag_mc_type_attr %the value represent the type
+%\newattribute \g__uftag_mc_cnt_attr %will hold the \c at g__uftag_MCID_abs_int value
+
+%An attribute for the current structure probably doesn't make sense as mc chunks can be used later.
+%\newattribute \g__uftag_struct_type_attr %represent the current structure type. Not sure if needed
+%\newattribute \g__uftag_struct_cnt_attr %will hold \c at g__uftag_struct_abs_int a cnt
+
+% handling attribute needs a different system to number the page wise mcid's:
+% a tagmcbegin ... tagmcend pair no longer surrounds exactly one mc chunk: it can be split
+% at page breaks. We know the included mcid(s) only after the ship out. So for the struct-> mcid mapping we
+% need to record struct -> mc-cnt (in \g__uftag_mc_parenttree_prop and/or a lua table
+% and at shipout mc-cnt-> {mcid, mcid, ...} (in a table?)
+% and when building the trees connect both
+
+% key definitions are overwritten for luatex to store that data in tables
+% the data for the mc are in uftag.mc[absnum]
+% the fields of the table are
+% tag : the type (a string)
+% raw : more properties (string)
+% label: a string. Do we need a way to retrieve the num from the label from lua??
+% artifact: the presence indicates an artifact, the value (string) is the type.
+% kids: a array of tables {1={kid=num2,page=pagenum1}, 2={kid=num2,page=pagenum2},...},
+% this describes the chunks the mc has been split to by the traversing code
+% parent: the number of the structure it is in. Needed to build the parent tree.
+
+% The main function which wanders through the shipout box to inject the literals.
+\AtBeginDocument
+{
+ \bool_if:NT\g_uftag_active_mc_bool
+ {
+ \AtBeginShipout
+ {
+ \directlua{uftag.func.mark_shipout ()}
+ }
+ }
+}
+
+% the keys
+\tl_new:N \l__uftag_mc_key_tag_tl
+\tl_new:N \l__uftag_mc_key_label_tl
+\tl_new:N \l__uftag_mc_key_properties_tl
+
+\keys_define:nn { tagpdf / mc }
+{
+ tag .code:n = %
+ {
+ \__uftag_pdf_escape_name:Nn \l__uftag_mc_key_tag_tl { #1 }
+ \directlua
+ {
+ uftag.func.store_mc_data(\__uftag_get_mc_abs_cnt:,"tag","#1")
+ }
+ },
+ raw .code:n =
+ {
+ \tl_set:Nn\l__uftag_mc_key_properties_tl { #1 }
+ \directlua
+ {
+ uftag.func.store_mc_data(\__uftag_get_mc_abs_cnt:,"raw","#1")
+ }
+ },
+ label .code:n =
+ {
+ \tl_set:Nn\l__uftag_mc_key_label_tl { #1 }
+ \directlua {uftag.func.store_mc_data(\__uftag_get_mc_abs_cnt:,"label","#1")}
+ },
+ __artifact-store .code:n =
+ {
+ \directlua {uftag.func.store_mc_data(\__uftag_get_mc_abs_cnt:,"artifact","#1")}
+ },
+ artifact .meta:n = { artifact-bool, artifact-type=#1,__artifact-store=#1, tag=Artifact },
+ artifact .default:n = { notype }
+}
+
+
+% attributes
+% set the mc from a tag name
+
+\cs_new:Nn \__uftag_mc_lua_gset_mc_type_attr:n % #1 is a tag name
+ {
+ \global\setattribute \g__uftag_mc_type_attr { \directlua {uftag.func.output_num_from ("#1") } }
+ \global\setattribute \g__uftag_mc_cnt_attr { \__uftag_get_mc_abs_cnt: }
+ }
+
+\cs_generate_variant:Nn\__uftag_mc_lua_gset_mc_type_attr:n { o }
+
+\cs_new:Nn \__uftag_mc_lua_gunset_mc_type_attr:
+ {
+ \global\unsetattribute \g__uftag_mc_type_attr
+ \global\unsetattribute \g__uftag_mc_cnt_attr
+ }
+
+%This command will in the finish code replace the dummy for a mc by the real mcid kids
+\cs_new:Nn \__uftag_mc_insert_mcid_kids:n
+ {
+ \directlua {uftag.func.mc_insert_kids (#1) }
+ }
+
+
+% puts an mcid absolute number in the current structure
+\cs_new:Nn \__uftag_mc_handle_stash:n %1 mcidnum
+ {
+ \__uftag_check_mc_used:n { #1 }
+ \seq_gput_right:cn % Don't fill a lua table due to the command in the item, so the kernel command
+ { g__uftag_struct_kids_\g__uftag_struct_stack_current_tl _seq }
+ {
+ \__uftag_mc_insert_mcid_kids:n {#1}
+ }
+ \directlua
+ {
+ uftag.func.store_struct_mcabs(\g__uftag_struct_stack_current_tl,#1)
+ }
+ \prop_gput:Nxx
+ \g__uftag_mc_parenttree_prop
+ { #1 }
+ { \g__uftag_struct_stack_current_tl }
+ }
+
+\cs_generate_variant:Nn \__uftag_mc_handle_stash:n { o }
+
+\cs_new:Nn \uftag_mc_begin:n
+ {
+ \group_begin:
+ \__uftag_check_mc_if_nested:
+ \bool_gset_true:N \g__uftag_in_mc_bool
+ \int_gincr:N \c at g__uftag_MCID_abs_int
+ \keys_set:nn { tagpdf / mc }{ label={}, #1 }
+ %check that a tag or artifact has been used
+ \__uftag_check_mc_tag:N \l__uftag_mc_key_tag_tl
+ %set the attributes (is done globally!):
+ \__uftag_mc_lua_gset_mc_type_attr:o { \l__uftag_mc_key_tag_tl }
+ \bool_if:NF \l__uftag_mc_artifact_bool
+ { % store the absolute num name in a label:
+ \tl_if_empty:NF {\l__uftag_mc_key_label_tl}
+ {
+ \__uftag_mc_handle_mc_label:n { \l__uftag_mc_key_label_tl }
+ }
+ % if not stashed record the absolute number
+ \bool_if:NF \l__uftag_mc_key_stash_bool
+ {
+ \exp_args:Nx \__uftag_mc_handle_stash:n { \__uftag_get_mc_abs_cnt: }
+ }
+ }
+ \group_end:
+ }
+
+\cs_new:Nn\uftag_mc_end:
+ {
+ \__uftag_check_mc_if_open:
+ \bool_gset_false:N \g__uftag_in_mc_bool
+ \__uftag_mc_lua_gunset_mc_type_attr:
+ }
+
+\cs_new:Nn\uftag_mc_use:n %#1: label name
+ {
+ \tl_set:Nx \l_tmpa_tl { \zref at extractdefault{tagpdf-#1}{tagmcabs}{} }
+ \tl_if_empty:NTF\l_tmpa_tl
+ {
+ \msg_warning:nnn {tagpdf} {mc-label-unknown} { #1 }
+ }
+ {
+ \__uftag_mc_handle_stash:o { \l_tmpa_tl }
+ }
+ }
+
+
+
+\endinput
Property changes on: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-mc-code-lua.sty
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-mc-code-shared.sty
===================================================================
--- trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-mc-code-shared.sty (rev 0)
+++ trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-mc-code-shared.sty 2018-07-05 21:45:46 UTC (rev 48146)
@@ -0,0 +1,57 @@
+\ProvidesExplPackage {tagpdf-mc-code-shared} {2018/07/04} {0.1}
+ {part of tagpdf - code related to marking chunks - code shared by generic and luamode }
+
+% I use a latex counter for the absolute count, so that it is added to
+% \cl@@ckpt and restored e.g. in tabulars and align
+% \int_new:N \c at g__uftag_MCID_int and
+% \tl_put_right:Nn\cl@@ckpt{\@elt{g_uf_test_int}}
+% would work too, but as the name is not expl3 then too, why bother?
+% the absolute counter can be used to label and to check if the page
+% counter needs a reset.
+
+\newcounter { g__uftag_MCID_abs_int }
+\cs_new:Nn \__uftag_get_mc_abs_cnt: { \int_use:N \c at g__uftag_MCID_abs_int }
+
+% tagmcabs is the label name of the absolute count which is used to identify the chunk
+\zref at newprop {tagmcabs} [0] { \int_use:N \c at g__uftag_MCID_abs_int }
+\zref at addprop {tagpdf} {tagmcabs}
+\zref at addprop {LastPage} {tagmcabs}
+
+%stores labels of mcid.
+\cs_new:Nn \__uftag_mc_handle_mc_label:n
+ {
+ \zref at labelbylist{tagpdf-#1}{tagpdf}
+ }
+
+% will hold the structure numbers for the parenttree
+% key: absolute number of the mc (tagmcabs)
+% value: the structure number the mc is in
+\__uftag_prop_new:N \g__uftag_mc_parenttree_prop
+
+%to test nesting mc:
+\bool_new:N \g__uftag_in_mc_bool
+
+\prg_new_conditional:Nnn \_uftag_mc_if_in: {p,T,F,TF}
+ {
+ \bool_if:NTF \g__uftag_in_mc_bool
+ { \prg_return_true: }
+ { \prg_return_false: }
+ }
+
+%shared keys
+%the rest are in the splitted code
+\tl_new:N \l__uftag_mc_artifact_type_tl
+
+\keys_define:nn { tagpdf / mc }
+ {
+ stash .bool_set:N = \l__uftag_mc_key_stash_bool,
+ artifact-bool .bool_set:N = \l__uftag_mc_artifact_bool,
+ artifact-type .choice:,
+ artifact-type / pagination .code:n = {\tl_set:Nn \l__uftag_mc_artifact_type_tl { Pagination }},
+ artifact-type / layout .code:n = {\tl_set:Nn \l__uftag_mc_artifact_type_tl { Layout }},
+ artifact-type / page .code:n = {\tl_set:Nn \l__uftag_mc_artifact_type_tl { Page }},
+ artifact-type / background .code:n = {\tl_set:Nn \l__uftag_mc_artifact_type_tl { Background }},
+ artifact-type / notype .code:n = {\tl_set:Nn \l__uftag_mc_artifact_type_tl {}},
+ }
+
+\endinput
Property changes on: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-mc-code-shared.sty
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-pdftex.def
===================================================================
--- trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-pdftex.def (rev 0)
+++ trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-pdftex.def 2018-07-05 21:45:46 UTC (rev 48146)
@@ -0,0 +1,125 @@
+\ProvidesExplFile {tagpdf-pdftex.def} {2018/07/04} {0.1}
+ {tagpdf driver for pdftex}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%% driver (pdflatex) commands
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+%literal
+\cs_new:Nn \__uftag_pdfliteral_page:n { \__uftag_tex_pdfliteral:D page { #1} }
+\cs_new:Nn \__uftag_pdfcatalog:n { \__uftag_tex_pdfcatalog:D { #1 } }
+
+% reserve an object num and store the number
+\cs_new:Nn \__uftag_pdfreserveobjnum:N % #1 = macro name to be populated with object number
+ {
+ \__uftag_tex_pdfobj:D reserveobjnum
+ \tl_set:Nx #1 { \int_use:N \__uftag_tex_pdflastobj:D }%
+ }
+
+% add content to an object num stored in a macro
+\cs_new:Nn \__uftag_pdfuseobjnum:Nn
+ {
+ % #1 = macro containing the object number
+ % #2 = object contents, as valid PDF
+ % should #2 be filtered through \pdfstringdef ???
+ \__uftag_tex_immediate:D \__uftag_tex_pdfobj:D useobjnum~#1~{#2}%
+ }
+
+\cs_new:Nn \__uftag_pdfuseobjnum:nn
+ {
+ % #1 = a number the object number
+ % #2 = object contents, as valid PDF
+ % should #2 be filtered through \pdfstringdef ???
+ \__uftag_check_record_pdfobj_num:n {#1}
+ \__uftag_tex_immediate:D \__uftag_tex_pdfobj:D useobjnum~#1~{#2}%
+ }
+
+% obj num of the dictionary for a page:
+% the page count starts by 1
+% pages can referenced before they are actually created
+% no error if the page later doesn't exist
+\cs_new:Nn \__uftag_store_pdfpageref:Nn
+ {
+ % #1 = macro name to be populated with current page object number
+ % #2 = number or counter identifying the required page
+ \tl_set:Nx #1 { \__uftag_tex_pdfpageref:D #2}%
+ }
+
+% a global version of the command
+\cs_new:Nn \__uftag_gstore_pdfpageref:Nn
+ {
+ \tl_gset:Nx #1 { \__uftag_tex_pdfpageref:D #2}%
+ }
+
+\cs_new:Nn \__uftag_pdfobj:Nn
+ {
+ % #1 = macro name to refer to this object
+ % #2 = object contents, as valid PDF
+ % should #2 be filtered through \pdfstringdef ???
+ \__uftag_tex_immediate:D \__uftag_tex_pdfobj:D { #2 }
+ \tl_set:Nx #1 {\int_use:N \__uftag_tex_pdflastobj:D }%
+ }
+
+% pdfpage**s**attr: for all pages
+%% is global needed? Yes. Without it the setting is lost if issued in a group
+%
+\cs_new:Nn \__uftag_gset_pdfpagesattr:n
+ {
+ \__uftag_tex_global:D \__uftag_tex_pdfpagesattr:D { #1 }
+ }
+
+\cs_new:Nn \__uftag_gadd_pdfpagesattr:n
+ {
+ \exp_args:No \__uftag_gset_pdfpagesattr:n { \__uftag_tex_the:D \__uftag_tex_pdfpagesattr:D #1 }
+ }
+
+% pdfpageattr for one page
+% I must avoid to overwrite other entries,
+% And I must avoid to add more and more /StructParens
+% do we need an immediate version??
+\cs_new:Nn \__uftag_gset_pdfpageattr:n
+ {
+ \__uftag_tex_global:D \__uftag_tex_pdfpageattr:D { #1 }
+ }
+
+\cs_new:Nn \__uftag_gadd_pdfpageattr:n
+ {
+ \exp_args:No \__uftag_gset_pdfpageattr:n { \__uftag_tex_the:D \__uftag_tex_pdfpageattr:D #1}
+ }
+
+\cs_new:Nn \__uftag_get_pdfpageattr:N
+ {
+ \tl_set:No #1 { \__uftag_tex_the:D \__uftag_tex_pdfpageattr:D }
+ }
+\cs_new:Nn \__uftag_pdfcompresslevel:n
+ {
+ \__uftag_tex_pdfcompresslevel:D #1
+ }
+
+\cs_new:Nn \__uftag_pdfobjcompresslevel:n
+ {
+ \__uftag_tex_pdfobjcompresslevel:D #1
+ }
+
+
+% These commands are only aliases for pdflatex but are defined differently with luatex
+% to get also lua tables.
+\cs_set_eq:NN \__uftag_prop_new:N \prop_new:N
+
+\cs_set_eq:NN \__uftag_seq_new:N \seq_new:N
+
+\cs_set_eq:NN \__uftag_prop_gput:Nnn \prop_gput:Nnn
+
+\cs_set_eq:NN \__uftag_seq_gput_right:Nn \seq_gput_right:Nn
+
+\cs_set_eq:NN \__uftag_seq_item:cn \seq_item:cn
+
+\cs_set_eq:NN \__uftag_prop_item:cn \prop_item:cn
+
+\cs_set_eq:NN \__uftag_seq_show:N \seq_show:N
+
+\cs_set_eq:NN \__uftag_prop_show:N \prop_show:N
+
+
+
+\endinput
Property changes on: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-pdftex.def
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-roles-code.sty
===================================================================
--- trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-roles-code.sty (rev 0)
+++ trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-roles-code.sty 2018-07-05 21:45:46 UTC (rev 48146)
@@ -0,0 +1,126 @@
+\ProvidesExplPackage {tagpdf-roles-code} {2018/07/04} {0.1}
+ {part of tagpdf - code related to roles and structure names}
+
+\__uftag_seq_new:N \g__uftag_role_tags_seq %to get names from numbers
+\__uftag_prop_new:N \g__uftag_role_tags_prop %to get numbers from names
+
+%The list of standard adobe tags.
+\clist_const:Nn \c__uftag_role_sttags_clist
+ {%possible root elements
+ Document, %A complete document. This is the root element of any structure tree containing
+ %multiple parts or multiple articles.
+ Part, %A large-scale division of a document.
+ Art, %A relatively self-contained body of text constituting a single narrative or exposition
+ %subelements
+ Sect, %A container for grouping related content elements.
+ Div, %A generic block-level element or group of elements
+ BlockQuote, %A portion of text consisting of one or more paragraphs attributed to someone other
+ %than the author of the surrounding text.
+ Caption, %A brief portion of text describing a table or figure.
+ TOC, %A list made up of table of contents item entries (structure tag TOCI; see below)
+ %and/or other nested table of contents entries
+ TOCI, %An individual member of a table of contents. This entry's children can be any of
+ %the following structure tags:
+ %Lbl,Reference,NonStruct,P,TOC
+ Index,
+ NonStruct, %probably not needed
+ H,
+ H1,
+ H2,
+ H3,
+ H4,
+ H5,
+ H6,
+ P,
+ L, %list
+ LI, %list item (around label and list item body)
+ Lbl, %list label
+ Lbody, %list item body
+ Table,
+ TR, %table row
+ TH, %table header cell
+ TD, %table data cell
+ THead, %table header (n rows)
+ TBody, %table rows
+ TFoot, %table footer
+ Span, %generic inline marker
+ Quote, %inline quote
+ Note, % footnote, endnote. Lbl can be child
+ Reference, % A citation to content elsewhere in the document.
+ BibEntry, %bibentry
+ Code, %
+ Link, %
+ Annot,
+ Figure,
+ Formula,
+ Form,
+ Artifact
+ }
+
+% get tag name from number: \seq_item:Nn \g__uftag_role_tags_seq { n }
+%\seq_gset_from_clist:NN \g__uftag_role_tags_seq \c__uftag_role_tags_clist
+
+\clist_map_inline:Nn \c__uftag_role_sttags_clist
+ {
+ \__uftag_seq_gput_right:Nn \g__uftag_role_tags_seq { #1 }
+ }
+
+
+% get tag number from name: \prop_item:Nn \g__uftag_role_tags_prop { name }
+\int_step_inline:nnnn { 1 }{ 1 }{ \seq_count:N \g__uftag_role_tags_seq }
+ {
+ \__uftag_prop_gput:Nxn \g__uftag_role_tags_prop
+ {
+ \seq_item:Nn \g__uftag_role_tags_seq { #1 }
+ }
+ { #1 }
+ }
+
+\cs_new:Nn \__uftag_role_get_tag_from_index:nn
+ {
+ \__uftag_seq_item:cn { #1_seq } { #2 }
+ }
+
+\cs_new:Nn \__uftag_role_get_index_from_tag:nn
+ {
+ \__uftag_prop_item:cn { #1_prop } { #2 }
+ }
+
+% new tags and the rolemap
+
+\__uftag_prop_new:N \g__uftag_role_rolemap_prop
+
+\cs_new:Nn \__uftag_role_add_tag:nn %new name, reference to old
+ {
+ \__uftag_seq_gput_right:Nn \g__uftag_role_tags_seq { #1 }
+ \__uftag_prop_gput:Nnx \g__uftag_role_tags_prop { #1 }
+ {
+ \seq_count:N \g__uftag_role_tags_seq
+ }
+ \__uftag_check_add_tag_role:nn {#1}{#2}
+ \tl_if_empty:nF { #2 }
+ {
+ \__uftag_prop_gput:Nnn \g__uftag_role_rolemap_prop
+ { #1 } { #2 }
+ }
+ }
+
+\cs_generate_variant:Nn \__uftag_role_add_tag:nn {xx}
+
+\keys_define:nn { tagpdf-setup }
+ {
+ add-new-tag .code:n =
+ {
+ \seq_set_split:Nnn \l_tmpa_seq { / } {#1/}
+ \__uftag_role_add_tag:xx
+ {
+ \seq_item:Nn \l_tmpa_seq {1}
+ }
+ {
+ \seq_item:Nn \l_tmpa_seq {2}
+ }
+ }
+ }
+
+
+\endinput
Property changes on: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-roles-code.sty
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-struct-code.sty
===================================================================
--- trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-struct-code.sty (rev 0)
+++ trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-struct-code.sty 2018-07-05 21:45:46 UTC (rev 48146)
@@ -0,0 +1,398 @@
+\ProvidesExplPackage {tagpdf-struct-code} {2018/07/04} {0.1}
+ {part of tagpdf - code related to storing structure}
+
+% I will use a latex counter for the structure count
+% to have a chance to avoid double structures in align etc
+
+\newcounter { g__uftag_struct_abs_int }
+\int_gzero:N \c at g__uftag_struct_abs_int
+
+
+\zref at newprop {tagstruct} [0] { \int_use:N \c at g__uftag_struct_abs_int }
+\zref at newlist {tagpdfstruct}
+\zref at addprop {tagpdfstruct}{tagstruct}
+
+% a sequence stores structnum -> the obj numbers
+% to allow easy mapping over the structures
+
+\__uftag_seq_new:N \g__uftag_struct_objR_seq
+
+% a sequence for the structure stack. When a sequence is opened it's number is put on the stack.
+\seq_new:N \g__uftag_struct_stack_seq
+\seq_gpush:Nn \g__uftag_struct_stack_seq {0}
+
+%this variables will hold the top entry and the parent on the stack
+\tl_new:N \l__uftag_struct_stack_parent_tmp_tl
+\tl_new:N \g__uftag_struct_stack_current_tl
+
+% I need at least one structure: the StructTreeRoot
+% normally it should have only one kid, e.g. the document element.
+
+% The data of the StructTreeRoot and the StructElem are in properties:
+% \g__uftag_struct_0_prop for the root
+% \g__uftag_struct_N_prop, N >=1
+% they have all the keys
+% objnum - number,
+% Type - StructTreeRoot or StructElem
+% num - number (identical to the num in the name, or 0 for the root)
+% and the keys from the two following lists
+% (the root has a special set of properties).
+% the values of the prop should be already escaped properly
+% when the entries are created (title,lange,alt,E,actualtext)
+
+
+\seq_new:N \c__uftag_struct_StructTreeRoot_entries_seq
+\seq_set_from_clist:Nn \c__uftag_struct_StructTreeRoot_entries_seq
+ {%p. 857/858
+ Type, % always /StructTreeRoot
+ K, % kid, dictionary or array of dictionaries
+ IDTree, % currently unused
+ ParentTree, % required,obj ref to the parent tree
+ ParentTreeNextKey, %optional
+ RoleMap,
+ ClassMap
+ }
+
+\seq_new:N \c__uftag_struct_StructElem_entries_seq
+\seq_set_from_clist:Nn \c__uftag_struct_StructElem_entries_seq
+ {%p 858 f
+ Type, %always /StructElem
+ S, %tag/type
+ P, %parent
+ ID, %optional
+ Pg, %obj num of starting page, optional
+ K, %kids
+ A, %attributes, probably unused
+ C, %class ""
+ %R,
+ T, %title, value in () or <>
+ Lang, %language
+ Alt, % value in () or <>
+ E, %abreviation
+ ActualText
+ }
+
+% I need an output handler for each prop, to get expandable output
+% see https://tex.stackexchange.com/questions/424208/expandable-version-of-a-expl3-command/424213#424213
+
+\cs_new:Nn \__uftag_struct_output_prop_aux:nn %#1 num, #2 key
+ {
+ \prop_if_in:cnT
+ { g__uftag_struct_#1_prop }
+ { #2 }
+ {
+ \c_space_tl/#2~ \prop_item:cn{ g__uftag_struct_#1_prop } { #2 }
+ }
+ }
+
+\cs_new:Nn \__uftag_new_output_prop_handler:n
+ {
+ \cs_new:cn { __uftag_struct_output_prop_#1:n }
+ {
+ \__uftag_struct_output_prop_aux:nn {#1}{##1}
+ }
+ }
+
+
+% the first one, the StructTreeRoot is special, so
+% created manually:
+\__uftag_prop_new:c { g__uftag_struct_0_prop }
+\__uftag_new_output_prop_handler:n {0}
+\tl_gset:Nn \g__uftag_struct_stack_current_tl {0}
+
+\__uftag_seq_new:c { g__uftag_struct_kids_0_seq}
+
+\__uftag_prop_gput:cno
+ { g__uftag_struct_0_prop }
+ { objnum}
+ { \c_uftag_tree_obj_structtreeroot_tl }
+
+\__uftag_prop_gput:cno
+ { g__uftag_struct_0_prop }
+ { Type }
+ { /StructTreeRoot }
+
+% the constants are defined in the tree code.
+\__uftag_prop_gput:cnx
+ { g__uftag_struct_0_prop }
+ { ParentTree }
+ { \c__uftag_tree_obj_parenttree_tl\c_space_tl 0\c_space_tl R }
+
+\__uftag_prop_gput:cnx
+ { g__uftag_struct_0_prop }
+ { RoleMap }
+ { \c__uftag_tree_obj_rolemap_tl\c_space_tl 0\c_space_tl R }
+
+\__uftag_prop_gput:cno
+ { g__uftag_struct_0_prop }
+ { entries }
+ { StructTreeRoot }
+
+\__uftag_prop_gput:cno
+ { g__uftag_struct_0_prop }
+ { num }
+ { 0 }
+
+% commands to store the kids
+% I don't compare the page objects number yet, but always add the /Pg key, perhaps later
+
+\cs_new:Nn \__uftag_struct_kid_mc_gput_right:nn %#1 structure num, #2 MCID absnum%
+ {
+ \__uftag_store_pdfpageref:Nn \l_tmpa_tl { \zref at extractdefault{mcid-#2}{tagabspage}{1} }
+ \__uftag_seq_gput_right:cx
+ { g__uftag_struct_kids_#1_seq }
+ { <<
+ /Type\c_space_tl/MCR\c_space_tl
+ /Pg\c_space_tl\l_tmpa_tl\c_space_tl0\c_space_tl R\c_space_tl
+ /MCID\c_space_tl\zref at extractdefault{mcid-#2}{tagmcid}{1}
+ >>
+ }
+ }
+
+\cs_new:Nn\__uftag_struct_kid_struct_gput_right:nn %#1 num of parent struct, #2 kid struct
+ {
+ \__uftag_seq_gput_right:cx
+ { g__uftag_struct_kids_#1_seq }
+ {
+ \prop_item:cn
+ { g__uftag_struct_#2_prop }
+ { objnum }
+ \c_space_tl 0 \c_space_tl R
+ }
+ }
+
+\cs_new:Nn \__uftag_struct_fill_kid_key:n %#1 is the struct num
+ {
+ \int_case:nnF
+ {
+ \seq_count:c
+ {
+ g__uftag_struct_kids_\prop_item:cn{ g__uftag_struct_#1_prop }{num}_seq
+ }
+ }
+ {
+ { 0 }
+ { } %no kids, do nothing
+ { 1 } % 1 kid, insert
+ {
+ \__uftag_prop_gput:cnx { g__uftag_struct_#1_prop } {K}
+ {
+ \seq_item:cn
+ {
+ g__uftag_struct_kids_\prop_item:cn{ g__uftag_struct_#1_prop }{num}_seq
+ }{1}
+ }
+ } %
+ }
+ { %many kids, use an array
+ \__uftag_prop_gput:cnx { g__uftag_struct_#1_prop } {K}
+ {
+ [
+ \seq_use:cn
+ {
+ g__uftag_struct_kids_\prop_item:cn{ g__uftag_struct_#1_prop }{num}_seq
+ }
+ {\c_space_tl}
+ ]
+ }
+ }
+ }
+
+% this command can be used for roots and structure elements
+% #1 is a num
+
+\tl_new:N \l_uftag_struct_dict_content_tl
+
+\cs_new:Nn \__uftag_struct_get_dict_content:n
+ {
+ \tl_set:Nn \l_uftag_struct_dict_content_tl {<<}
+ \seq_map_inline:cn
+ {
+ c__uftag_struct_\prop_item:cn{ g__uftag_struct_#1_prop }{entries}_entries_seq
+ }
+ {
+ \tl_put_right:Nx
+ \l_uftag_struct_dict_content_tl
+ {
+ \prop_if_in:cnT
+ { g__uftag_struct_#1_prop }
+ { ##1 }
+ {
+ \c_space_tl/##1~\prop_item:cn{ g__uftag_struct_#1_prop } { ##1 }
+ }
+ }
+ }
+ \tl_put_right:Nn \l_uftag_struct_dict_content_tl { >> }
+ }
+
+
+% #1 is the struct num
+\cs_new:Nn \__uftag_struct_write_obj:n
+ {
+ \prop_if_in:cnTF
+ { g__uftag_struct_#1_prop }
+ { objnum }
+ {
+ \__uftag_struct_fill_kid_key:n { #1 }
+ %\prop_show:c { g__uftag_struct_#1_prop }
+ \__uftag_struct_get_dict_content:n { #1 }
+ \__uftag_pdfuseobjnum:xx
+ { \prop_item:cn { g__uftag_struct_#1_prop } {objnum} }
+ {
+ \l_uftag_struct_dict_content_tl
+ }
+ }
+ {
+ \msg_error:nnn { tagpdf } { struct-no-objnum } { #1}
+ }
+ }
+
+% keys for the user commands
+% why did I call the submodule elem instead of struct?
+\keys_define:nn { tagpdf / elem }
+ {
+ label .tl_set:N = \l__uftag_struct_key_label_tl,
+ stash .bool_set:N = \l__uftag_struct_elem_stash_bool,
+ tag .code:n = % S property
+ {
+ \__uftag_pdf_escape_name:Nn \l__uftag_tmpa_tl { #1 }
+ \__uftag_prop_gput:cnx
+ { g__uftag_struct_\int_eval:n {\c at g__uftag_struct_abs_int}_prop }
+ { S }
+ { /\exp_not:V\l__uftag_tmpa_tl }
+ },
+ title .code:n = % T property
+ {
+ \__uftag_pdf_escape_string:Nn \l__uftag_tmpa_tl { #1 }
+ \tl_put_left:Nn \l__uftag_tmpa_tl {(^^fe^^ff}
+ \__uftag_prop_gput:cno
+ { g__uftag_struct_\int_eval:n {\c at g__uftag_struct_abs_int}_prop }
+ { T }
+ { \l__uftag_tmpa_tl) }
+ },
+ alttext .code:n = % Alt property
+ {
+ \__uftag_pdf_escape_string:Nn \l__uftag_tmpa_tl { #1 }
+ \tl_put_left:Nn \l__uftag_tmpa_tl {(^^fe^^ff}
+ \__uftag_prop_gput:cno
+ { g__uftag_struct_\int_eval:n {\c at g__uftag_struct_abs_int}_prop }
+ { Alt }
+ { \l__uftag_tmpa_tl) }
+ },
+ actualtext .code:n = % ActualText property
+ {
+ \__uftag_pdf_escape_hex:Nn \l__uftag_tmpa_tl {#1 }
+ \tl_put_left:Nn \l__uftag_tmpa_tl {<FEFF }
+ \__uftag_prop_gput:cno
+ { g__uftag_struct_\int_eval:n {\c at g__uftag_struct_abs_int}_prop }
+ { ActualText }
+ { \l__uftag_tmpa_tl>}
+ },
+}
+
+
+\cs_new:Nn \uftag_struct_begin:n
+ {
+ \group_begin:
+ \int_gincr:N \c at g__uftag_struct_abs_int
+ \__uftag_prop_new:c { g__uftag_struct_\int_eval:n { \c at g__uftag_struct_abs_int }_prop }
+ \__uftag_new_output_prop_handler:n {\int_eval:n { \c at g__uftag_struct_abs_int }}
+ \__uftag_seq_new:c { g__uftag_struct_kids_\int_eval:n { \c at g__uftag_struct_abs_int }_seq}
+ \__uftag_pdfreserveobjnum:N \l_tmpa_tl
+ \__uftag_prop_gput:cno
+ { g__uftag_struct_\int_eval:n { \c at g__uftag_struct_abs_int }_prop }
+ { objnum}
+ { \l_tmpa_tl }
+ \__uftag_prop_gput:cnx
+ { g__uftag_struct_\int_eval:n { \c at g__uftag_struct_abs_int }_prop }
+ { num}
+ { \int_eval:n { \c at g__uftag_struct_abs_int } }
+ \__uftag_prop_gput:cno
+ { g__uftag_struct_\int_eval:n { \c at g__uftag_struct_abs_int }_prop }
+ { Type }
+ { /StructElem }
+ \__uftag_prop_gput:cno
+ { g__uftag_struct_\int_eval:n { \c at g__uftag_struct_abs_int }_prop }
+ { entries }
+ { StructElem }
+ \keys_set:nn {tagpdf / elem} { #1 }
+ \__uftag_check_structure_has_tag:n { \int_eval:n {\c at g__uftag_struct_abs_int} }
+ \tl_if_empty:NF
+ {\l__uftag_struct_key_label_tl}
+ {
+ \zref at labelbylist{tagpdfstruct-\l__uftag_struct_key_label_tl}{tagpdfstruct}
+ }
+ %get the potential parent from the stack:
+ \seq_get:NNF
+ \g__uftag_struct_stack_seq
+ \l__uftag_struct_stack_parent_tmp_tl
+ {
+ \msg_error:nn { tagpdf } { struct-faulty-nesting }
+ }
+ \seq_gpush:NV \g__uftag_struct_stack_seq \c at g__uftag_struct_abs_int
+ \tl_gset:NV \g__uftag_struct_stack_current_tl \c at g__uftag_struct_abs_int
+ %\seq_show:N \g__uftag_struct_stack_seq
+ \bool_if:NF
+ \l__uftag_struct_elem_stash_bool
+ {%set the parent
+ \__uftag_prop_gput:cnx
+ { g__uftag_struct_\int_eval:n {\c at g__uftag_struct_abs_int}_prop }
+ { P }
+ {
+ \prop_item:cn { g__uftag_struct_\l__uftag_struct_stack_parent_tmp_tl _prop} { objnum }~0~R
+ }
+ %record this structure as kid:
+ %\tl_show:N \g__uftag_struct_stack_current_tl
+ %\tl_show:N \l__uftag_struct_stack_parent_tmp_tl
+ \__uftag_struct_kid_struct_gput_right:nn
+ { \l__uftag_struct_stack_parent_tmp_tl }
+ { \g__uftag_struct_stack_current_tl }
+ %\prop_show:c { g__uftag_struct_\g__uftag_struct_stack_current_tl _prop }
+ %\seq_show:c {g__uftag_struct_kids_\l__uftag_struct_stack_parent_tmp_tl _seq}
+ }
+ %\prop_show:c { g__uftag_struct_\g__uftag_struct_stack_current_tl _prop }
+ %\seq_show:c {g__uftag_struct_kids_\l__uftag_struct_stack_parent_tmp_tl _seq}
+ \group_end:
+ }
+
+\cs_new:Nn \uftag_struct_end:
+ {%take the current structure num from the stack:
+ %the objects are written later, lua mode hasn't all needed info yet
+ %\seq_show:N \g__uftag_struct_stack_seq
+ \seq_gpop:NNTF \g__uftag_struct_stack_seq \l_tmpa_tl
+ {
+ \__uftag_check_info_closing_struct:o { \g__uftag_struct_stack_current_tl }
+ }
+ { \__uftag_check_no_open_struck: }
+ % get the previous one, shouldn't be empty as the root should be there
+ \seq_get:NNTF \g__uftag_struct_stack_seq \l_tmpa_tl
+ {
+ \tl_gset:NV \g__uftag_struct_stack_current_tl \l_tmpa_tl
+ }
+ {
+ \__uftag_check_no_open_struck:
+ }
+ }
+
+\cs_new:Nn \uftag_struct_use:n %#1 is the label
+ {
+ \prop_if_exist:cTF
+ { g__uftag_struct_\zref at extractdefault{tagpdfstruct-#1}{tagstruct}{unknown}_prop }
+ {
+ \__uftag_check_struct_used:n {#1}
+ %add the label structure as kid to the current structure (can be the root)
+ \__uftag_struct_kid_struct_gput_right:nn
+ { \g__uftag_struct_stack_current_tl }
+ { \zref at extractdefault{tagpdfstruct-#1}{tagstruct}{0} }
+ %add the current structure to the labeled one as parents
+ \__uftag_prop_gput:cnx
+ { g__uftag_struct_\zref at extractdefault{tagpdfstruct-#1}{tagstruct}{0}_prop }
+ { P }
+ {
+ \prop_item:cn { g__uftag_struct_\g__uftag_struct_stack_current_tl _prop} { objnum }~0~R
+ }
+ }
+ {\msg_warning:nnn{tagpdf}{struct-label-unknown}{#1}}
+ }
+
+\endinput
Property changes on: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-struct-code.sty
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-tree-code.sty
===================================================================
--- trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-tree-code.sty (rev 0)
+++ trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-tree-code.sty 2018-07-05 21:45:46 UTC (rev 48146)
@@ -0,0 +1,147 @@
+\ProvidesExplPackage {tagpdf-tree-code} {2018/07/04} {0.1}
+ {part of tagpdf - code related to writing trees and dictionaries to the pdf}
+
+%this does the actual finishing:
+\AtBeginDocument
+{
+ \bool_if:NT \g_uftag_active_tree_bool
+ {
+ \AfterEndDocument { \uftag_finish_structure: }
+ }
+}
+
+
+% the StructTreeRoot
+% we better get the object number in any case:
+\__uftag_pdfreserveobjnum:N \l_tmpa_tl
+\tl_const:Nx \c_uftag_tree_obj_structtreeroot_tl { \l_tmpa_tl }
+
+
+\cs_new:Nn \__uftag_tree_write_structtreeroot:
+{
+ \__uftag_pdfcatalog:n { /StructTreeRoot~\c_uftag_tree_obj_structtreeroot_tl\c_space_tl0~R }
+ \__uftag_struct_write_obj:n { 0 }
+}
+
+\cs_new:Nn \__uftag_tree_write_structelements:
+ {
+ \int_step_inline:nnnn {1}{1}{\c at g__uftag_struct_abs_int}
+ {
+ %\prop_show:c {g__uftag_struct_##1_prop}
+ \__uftag_struct_write_obj:n { ##1 } %write the object
+ }
+ }
+
+
+%the ParentTree
+\__uftag_pdfreserveobjnum:N \l_tmpa_tl
+\tl_const:Nx \c__uftag_tree_obj_parenttree_tl { \l_tmpa_tl }
+\tl_new:N \l__uftag_parenttree_content_tl
+
+\cs_new:Nn \__uftag_tree_fill_parenttree:
+{
+ \int_step_inline:nnnn{1}{1}{\zref at extractdefault{LastPage}{abspage}{-1}} %not quite clear if labels are needed. See lua code
+ { %page ##1\par
+ \prop_clear:N \l_tmpa_prop
+ \int_step_inline:nnnn{1}{1}{\zref at extractdefault{LastPage}{tagmcabs}{-1}}
+ {
+ %mcid####1
+ \int_compare:nT%F
+ {\zref at extractdefault{mcid-####1}{tagabspage}{-1}=##1} %mcid is on current page
+ {% yes\par
+ \prop_put:Nxx \l_tmpa_prop
+ {\zref at extractdefault{mcid-####1}{tagmcid}{-1}}
+ {\prop_item:Nn \g__uftag_mc_parenttree_prop {####1}}
+ }
+ }
+ %\prop_show:N \l_tmpa_prop
+ \tl_put_right:Nx\l__uftag_parenttree_content_tl
+ {\int_eval:n {##1-1}\c_space_tl[\c_space_tl} %%brackt
+ \int_step_inline:nnnn {0}{1}{ \prop_map_function:NN \l_tmpa_prop\__uftag_prop_count:nn -1 }
+ {
+ \prop_get:NnNTF \l_tmpa_prop {####1} \l_tmpb_tl
+ {% page#1:mcid##1:\l_tmpb_tl :content
+ \tl_put_right:Nx \l__uftag_parenttree_content_tl
+ {
+ \prop_item:cn { g__uftag_struct_\l_tmpb_tl _prop } {objnum}~0~R~
+ }
+ }
+ {\msg_warning:nn {tagpdf} {tree-mcid-index-wrong} }
+ }
+ \tl_put_right:Nn\l__uftag_parenttree_content_tl{]^^J} %
+ }
+}
+
+%lua mode must/can do it differently
+\cs_new:Nn \__uftag_tree_lua_fill_parenttree:
+ {
+ \tl_set:Nn\l__uftag_parenttree_content_tl
+ {
+ \directlua{uftag.func.output_parenttree(\int_use:N\g__uftag_abspage_int)}
+ }
+ }
+
+
+
+\cs_new:Nn \__uftag_tree_write_parenttree:
+ {
+ \bool_if:NTF \g__uftag_mode_lua_bool
+ {
+ \__uftag_tree_lua_fill_parenttree:
+ }
+ {
+ \__uftag_tree_fill_parenttree:
+ }
+ \__uftag_pdfuseobjnum:xx { \c__uftag_tree_obj_parenttree_tl }
+ {
+ <<\c_space_tl/Nums\c_space_tl [\l__uftag_parenttree_content_tl] \c_space_tl >>
+ }
+ }
+
+%the Rolemap tree
+\__uftag_pdfreserveobjnum:N \l_tmpa_tl
+\tl_const:Nx \c__uftag_tree_obj_rolemap_tl { \l_tmpa_tl }
+\tl_new:N \l__uftag_rolemap_content_tl
+
+\cs_new:Nn \__uftag_tree_fill_rolemap:
+ {
+ \prop_map_inline:Nn \g__uftag_role_rolemap_prop
+ {
+ \tl_put_right:Nx \l__uftag_rolemap_content_tl
+ {
+ /##1\c_space_tl/##2^^J
+ }
+ }
+ }
+
+\cs_new:Nn \__uftag_tree_write_rolemap:
+ {
+ \__uftag_tree_fill_rolemap:
+ \__uftag_pdfuseobjnum:xx { \c__uftag_tree_obj_rolemap_tl }
+ {
+ <<\l__uftag_rolemap_content_tl >>
+ }
+ }
+
+\cs_new:Nn \uftag_finish_structure:
+ {
+ \__uftag_pdfcatalog:n {^^J/MarkInfo\c_space_tl<</Marked\c_space_tl true>> }
+ \__uftag_tree_write_parenttree:
+ \__uftag_tree_write_rolemap:
+ \__uftag_tree_write_structelements:
+ \__uftag_tree_write_structtreeroot:
+ }
+
+
+%StructParents + tabs order. Tabs order should be probably be changeable by page.
+
+\cs_new:Nn \__uftag_tree_write_pageattr:
+{
+ \__uftag_get_pdfpageattr:N \l_tmpa_tl
+ \regex_replace_once:nnN {/StructParents\s*\d+} {}\l_tmpa_tl
+ \regex_replace_once:nnN {/Tabs\s*/[SCR]} {}\l_tmpa_tl
+ \__uftag_gset_pdfpageattr:x
+ {\l_tmpa_tl/StructParents\c_space_tl \int_eval:n{\g__uftag_abspage_int }\l__uftag_tree_tabs_order_tl}
+}
+
+\endinput
Property changes on: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-tree-code.sty
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-user.sty
===================================================================
--- trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-user.sty (rev 0)
+++ trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-user.sty 2018-07-05 21:45:46 UTC (rev 48146)
@@ -0,0 +1,88 @@
+\ProvidesExplFile {tagpdf-user.sty} {2018/07/04} {0.1}
+ {tagpdf - user commands}
+
+\NewDocumentCommand \tagpdfsetup { m }
+ {
+ \keys_set:nn { tagpdf-setup } { #1 }
+ }
+
+\cs_set_eq:NN\tagpdfifluatexTF \sys_if_engine_luatex:TF
+\cs_set_eq:NN\tagpdfifluatexT \sys_if_engine_luatex:T
+\cs_set_eq:NN\tagpdfifpdftexT \sys_if_engine_pdftex:T
+
+%%%% mc related user commands
+\NewDocumentCommand \tagmcifinTF { m m }
+ {
+ \_uftag_mc_if_in:TF { #1 } { #2 }
+ }
+
+\NewDocumentCommand \tagmcbegin { m }
+ {
+ \uftag_mc_begin:n {#1}\ignorespaces
+ }
+
+
+\NewDocumentCommand \tagmcend { }
+ {
+ \unskip\uftag_mc_end:
+ }
+
+\NewDocumentCommand \tagmcuse { m }
+ {
+ \uftag_mc_use:n {#1}
+ }
+
+
+%%%% structure related commands
+
+\NewDocumentCommand \tagstructbegin { m }
+ {
+ \uftag_struct_begin:n {#1}
+ }
+
+\NewDocumentCommand \tagstructend { }
+ {
+ \uftag_struct_end:
+ }
+
+\NewDocumentCommand \tagstructuse { m }
+ {
+ \uftag_struct_use:n {#1}
+ }
+
+
+
+%%%% debug/show commands
+\NewDocumentCommand\showtagpdfmcdata { O {\__uftag_get_mc_abs_cnt:} }
+{
+ \bool_if:NT \g__uftag_mode_lua_bool
+ {
+ \sys_if_engine_luatex:T
+ {
+ \directlua{uftag.trace.show_all_mc_data(#1)}
+ }
+ }
+}
+
+\NewDocumentCommand\showtagpdfattributes { }
+{
+ \bool_if:NT \g__uftag_mode_lua_bool
+ {
+ \sys_if_engine_luatex:T
+ {
+ \directlua
+ {
+ uftag.trace.log(
+ "showtagpdfattributes: MC=>abscnt=\__uftag_get_mc_abs_cnt:=>attr=\the\g__uftag_mc_cnt_attr=>tag=" ..
+ tostring(uftag.func.get_tag_from (\the\g__uftag_mc_type_attr)) ..
+ "=\the\g__uftag_mc_type_attr",0
+ )
+ }
+ \ignorespaces
+ }
+ }
+}
+
+
+
+\endinput
Property changes on: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf-user.sty
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf.lua
===================================================================
--- trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf.lua (rev 0)
+++ trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf.lua 2018-07-05 21:45:46 UTC (rev 48146)
@@ -0,0 +1,478 @@
+-- Packageversion: 0.1
+-- Packagedate: 2018/07/04
+-- tagpdf.lua
+-- Ulrike Fischer
+
+--[[
+The code has quite probably a number of problems
+ - more variables should be local instead of global
+ - the naming is not always consistent due to the development of the code
+ - the traversing of the shipout box must be tested with more complicated setups
+ - it should probably handle more node types
+ -
+--]]
+
+
+
+--[[
+the main table is named uftag. It contains the functions and also the data
+collected during the compilation.
+
+uftag.mc will contain mc connected data.
+uftag.struct will contain structure related data.
+uftag.page will contain page data
+uftag.tables contains also data from mc and struct (from older code). This needs cleaning up.
+ There are certainly dublettes, but I don't dare yet ...
+uftag.func will contain (public) functions.
+uftag.trace will contain tracing/loging functions.
+local funktions starts with __
+
+functions
+ uftag.func.get_num_from (tag): takes a tag (string) and returns the id number
+ uftag.func.output_num_from (tag): takes a tag (string) and prints (to tex) the id number
+ uftag.func.get_tag_from (num): takes a num and returns the tag
+ uftag.func.output_tag_from (num): takes a num and prints (to tex) the tag
+ uftag.func.store_mc_data (num,key,data): stores key=data in uftag.mc[num]
+ uftag.func.store_mc_label (label,num): stores label=num in uftag.mc.labels
+ uftag.func.store_mc_kid (mcnum,kid,page): stores the mc-kids of mcnum on page page
+ uftag.func.store_mc_in_page(mcnum,mcpagecnt,page): stores in the page table the number of mcnum on this page
+ uftag.func.store_struct_mcabs (structnum,mcnum): stores relations structnum<->mcnum (abs)
+ uftag.func.mc_insert_kids (mcnum): inserts the /K entries for mcnum by wandering throught the [kids] table
+ uftag.func.mark_page_elements(box,mcpagecnt,mccntprev,mcopen,name,mctypeprev) : the main function
+ uftag.func.mark_shipout (): a wrapper around the core function which inserts the last EMC
+ uftag.func.fill_parent_tree_line (page): outputs the entries of the parenttree for this page
+ uftag.func.output_parenttree(): outputs the content of the parenttree
+ uftag.trace.show_mc_data (num): shows uftag.mc[num]
+ uftag.trace.show_all_mc_data (max): shows a maximum about mc's
+ uftag.trace.show_seq: shows a sequence (array)
+ uftag.trace.show_struct_data (num): shows data of structure num
+ uftag.trace.show_prop: shows a prop
+ uftag.trace.log
+--]]
+
+local mctypeattributeid = luatexbase.registernumber ("g__uftag_mc_type_attr")
+local mccntattributeid = luatexbase.registernumber ("g__uftag_mc_cnt_attr")
+
+local catlatex = luatexbase.registernumber("catcodetable at latex")
+local tagunmarkedbool= token.create("g__uftag_tagunmarked_bool")
+local truebool = token.create("c_true_bool")
+
+local tableinsert = table.insert
+
+-- not all needed, copied from lua-visual-debug.
+local nodeid = node.id
+local nodecopy = node.copy
+local nodegetattribute = node.get_attribute
+local nodenew = node.new
+local nodetail = node.tail
+local nodeslide = node.slide
+local noderemove = node.remove
+local nodetraverseid = node.traverse_id
+local nodeinsertafter = node.insert_after
+local nodeinsertbefore = node.insert_before
+local pdfpageref = pdf.pageref
+
+local HLIST = node.id("hlist")
+local VLIST = node.id("vlist")
+local RULE = node.id("rule")
+local DISC = node.id("disc")
+local GLUE = node.id("glue")
+local KERN = node.id("kern")
+local PENALTY = node.id("penalty")
+local LOCAL_PAR = node.id("local_par")
+local MATH = node.id("math")
+
+local function __uftag_get_mathsubtype (mathnode)
+ if mathnode.subtype == 0 then
+ subtype = "beginmath"
+ else
+ subtype = "endmath"
+ end
+ return subtype
+end
+
+
+
+uftag = uftag or { }
+uftag.mc = uftag.mc or { } -- mc data
+uftag.struct = uftag.struct or { } -- struct data
+uftag.tables = uftag.tables or { } -- tables created with new prop and new seq.
+ -- wasn't a so great idea ...
+uftag.page = uftag.page or { } -- page data, currently only i->{0->mcnum,1->mcnum,...}
+uftag.trace = uftag.trace or { } -- show commands
+uftag.func = uftag.func or { } -- functions
+uftag.conf = uftag.conf or { } -- configuration variables
+
+local __uftag_log =
+ function (message,loglevel)
+ if (loglevel or 3) <= tex.count["l__uftag_loglevel_int"] then
+ texio.write_nl("tagpdf: ".. message)
+ end
+ end
+
+uftag.trace.log = __uftag_log
+
+
+local __uftag_get_mc_cnt_type_tag = function (n)
+ local mccnt = nodegetattribute(n,mccntattributeid) or -1
+ local mctype = nodegetattribute(n,mctypeattributeid) or -1
+ local tag = uftag.func.get_tag_from(mctype)
+ return mccnt,mctype,tag
+end
+
+
+local function __uftag_insert_emc_node (head,current)
+ local emcnode = nodenew("whatsit","pdf_literal")
+ emcnode.data = "EMC"
+ emcnode.mode=1
+ head = node.insert_before(head,current,emcnode)
+ return head
+end
+
+
+local function __uftag_insert_bmc_node (head,current,tag)
+ local bmcnode = nodenew("whatsit","pdf_literal")
+ bmcnode.data = "/"..tag.." BMC"
+ bmcnode.mode=1
+ head = node.insert_before(head,current,bmcnode)
+ return head
+end
+
+local function __uftag_insert_bdc_node (head,current,tag,dict)
+ local bdcnode = nodenew("whatsit","pdf_literal")
+ bdcnode.data = "/"..tag.."<<"..dict..">> BDC"
+ bdcnode.mode=1
+ head = node.insert_before(head,current,bdcnode)
+ return head
+end
+
+
+--[[
+ Now follows the core function
+ It wades through the shipout box and checks the attributes
+ ARGUMENTS
+ box: is a box,
+ mcpagecnt: num, the current page cnt of mc (should start at -1 in shipout box), needed for recursion
+ mccntprev: num, the attribute cnt of the previous node/whatever - if different we have a chunk border
+ mcopen: num, records if some bdc/emc is open
+ These arguments are only needed for log messages, if not present are replaces by fix strings:
+ name: string to describe the box
+ mctypeprev: num, the type attribute of the previous node/whatever
+
+ there are lots of logging messages currently. Should be cleaned up in due course.
+ One should also find ways to make the function shorter.
+--]]
+
+function uftag.func.mark_page_elements (box,mcpagecnt,mccntprev,mcopen,name,mctypeprev)
+ local name = name or ("SOMEBOX")
+ local mctypeprev = mctypeprev or -1
+ local abspage = tex.count["g__uftag_abspage_int"] --["c at abspage"]
+ uftag.trace.log ("PAGE " .. abspage,3)
+ uftag.trace.log ("FUNC ARGS: pagecnt".. mcpagecnt.." prev "..mccntprev .. " type prev "..mctypeprev,4)
+ uftag.trace.log ("TRAVERSING BOX ".. tostring(name).." TYPE ".. node.type(node.getid(box)),3)
+ local head = box.head -- AtBeginShipoutBox is a vlist?
+ if head then
+ mccnthead, mctypehead,taghead = __uftag_get_mc_cnt_type_tag (head)
+ uftag.trace.log ("HEAD " .. node.type(node.getid(head)).. " MC"..tostring(mccnthead).." => TAG "..tostring(mctypehead).." => "..tostring(taghead),3)
+ else
+ uftag.trace.log ("HEAD is ".. tostring(head),3)
+ end
+ for n in node.traverse(head) do
+ local mccnt, mctype, tag = __uftag_get_mc_cnt_type_tag (n)
+ uftag.trace.log ("NODE ".. node.type(node.getid(n)).." MC"..tostring(mccnt).." => TAG "..tostring(mctype).." => " .. tostring(tag),3)
+ if n.id == HLIST
+ then -- enter the hlist
+ mcopen,mcpagecnt,mccntprev,mctypeprev=
+ uftag.func.mark_page_elements (n,mcpagecnt,mccntprev,mcopen,"INTERNAL HLIST",mctypeprev)
+ elseif n.id == VLIST then -- enter the vlist
+ mcopen,mcpagecnt,mccntprev,mctypeprev=
+ uftag.func.mark_page_elements (n,mcpagecnt,mccntprev,mcopen,"INTERNAL VLIST",mctypeprev)
+ elseif n.id == GLUE then -- glue is ignored
+ elseif n.id == LOCAL_PAR then -- local_par is ignored
+ elseif n.id == PENALTY then -- penalty is ignored
+ elseif n.id == KERN then -- kern is ignored
+ else
+ -- math is currently only logged.
+ -- we could mark the whole as math
+ -- for inner processing the mlist_to_hlist callback is probably needed.
+ if n.id == MATH then
+ uftag.trace.log("NODE "..node.type(node.getid(n)).." "..__uftag_get_mathsubtype(n),3)
+ end
+ -- endmath
+ uftag.trace.log("CURRENT "..mccnt.." PREV "..mccntprev,3)
+ if mccnt~=mccntprev then -- a new mc chunk
+ uftag.trace.log ("NODE ".. node.type(node.getid(n)).." MC"..tostring(mccnt).." <=> PREVIOUS "..tostring(mccntprev),3)
+ if mcopen~=0 then -- there is a chunk open, close it (hope there is only one ...
+ box.list=__uftag_insert_emc_node (box.list,n)
+ mcopen = mcopen - 1
+ uftag.trace.log ("INSERT EMC " .. mcpagecnt .. " MCOPEN = " .. mcopen,2)
+ if mcopen ~=0 then
+ uftag.trace.log ("!WARNING! open mc" .. " MCOPEN = " .. mcopen,1)
+ end
+ end
+ if uftag.mc[mccnt] then
+ if uftag.mc[mccnt]["artifact"] then
+ uftag.trace.log("THIS IS AN ARTIFACT of type "..tostring(uftag.mc[mccnt]["artifact"]),3)
+ if uftag.mc[mccnt]["artifact"] == "notype" then
+ box.list = __uftag_insert_bmc_node (box.list,n,"Artifact")
+ else
+ box.list = __uftag_insert_bdc_node (box.list,n,"Artifact", "/Type /"..uftag.mc[mccnt]["artifact"])
+ end
+ else
+ uftag.trace.log("THIS IS A TAG "..tostring(tag),3)
+ mcpagecnt = mcpagecnt +1
+ uftag.trace.log ("INSERT BDC "..mcpagecnt,2)
+ box.list = __uftag_insert_bdc_node (box.list,n,tag, "/MCID "..mcpagecnt)
+ uftag.func.store_mc_kid (mccnt,mcpagecnt,abspage)
+ uftag.func.store_mc_in_page(mccnt,mcpagecnt,abspage)
+ uftag.trace.show_mc_data (mccnt)
+ end
+ mcopen = mcopen + 1
+ else
+ uftag.trace.log("THIS HAS NOT BEEN TAGGED",1)
+ -- perhaps code that tag a artifact can be added ...
+ if tagunmarkedbool.mode == truebool.mode then
+ box.list = __uftag_insert_bmc_node (box.list,n,"Artifact")
+ end
+ mcopen = mcopen + 1
+ end
+ mccntprev = mccnt
+ end
+ end -- end if
+ end -- end for
+ if head then
+ mccnthead, mctypehead,taghead = __uftag_get_mc_cnt_type_tag (head)
+ uftag.trace.log ("ENDHEAD " .. node.type(node.getid(head)).. " MC"..tostring(mccnthead).." => TAG "..tostring(mctypehead).." => "..tostring(taghead),3)
+ else
+ uftag.trace.log ("ENDHEAD is ".. tostring(head),3)
+ end
+ uftag.trace.log ("QUITTING TRAVERSING BOX ".. tostring(name).." TYPE ".. node.type(node.getid(box)),3)
+ return mcopen,mcpagecnt,mccntprev,mctypeprev
+end
+
+function uftag.func.mark_shipout ()
+ mcopen = uftag.func.mark_page_elements (tex.box["AtBeginShipoutBox"],-1,-100,0,"Shipout",-1)
+ if mcopen~=0 then -- there is a chunk open, close it (hope there is only one ...
+ local emcnode = nodenew("whatsit","pdf_literal")
+ local box = tex.box["AtBeginShipoutBox"].list
+ emcnode.data = "EMC"
+ emcnode.mode=1
+ if box then
+ box = node.insert_after (box,node.tail(box),emcnode)
+ mcopen = mcopen - 1
+ uftag.trace.log ("INSERT LAST EMC, MCOPEN = " .. mcopen,2)
+ else
+ uftag.trace.log ("UPS ",1)
+ end
+ if mcopen ~=0 then
+ uftag.trace.log ("!WARNING! open mc" .. " MCOPEN = " .. mcopen,1)
+ end
+ end
+end
+
+function uftag.trace.show_seq (seq)
+ if (type(seq) == "table") then
+ for i,v in ipairs(seq) do
+ __uftag_log ("[" .. i .. "] => " .. tostring(v),1)
+ end
+ else
+ __uftag_log ("sequence " .. tostring(seq) .. " not found",1)
+ end
+end
+
+local __uftag_pairs_prop =
+ function (prop)
+ local a = {}
+ for n in pairs(prop) do tableinsert(a, n) end
+ table.sort(a)
+ local i = 0 -- iterator variable
+ local iter = function () -- iterator function
+ i = i + 1
+ if a[i] == nil then return nil
+ else return a[i], prop[a[i]]
+ end
+ end
+ return iter
+ end
+
+
+function uftag.trace.show_prop (prop)
+ if (type(prop) == "table") then
+ for i,v in __uftag_pairs_prop (prop) do
+ __uftag_log ("[" .. i .. "] => " .. tostring(v),1)
+ end
+ else
+ __uftag_log ("prop " .. tostring(prop) .. " not found or not a table",1)
+ end
+ end
+
+
+local __uftag_get_num_from =
+ function (tag)
+ if uftag.tables["g__uftag_role_tags_prop"][tag] then
+ a= uftag.tables["g__uftag_role_tags_prop"][tag]
+ else
+ a= -1
+ end
+ return a
+ end
+
+uftag.func.get_num_from = __uftag_get_num_from
+
+function uftag.func.output_num_from (tag)
+ local num = __uftag_get_num_from (tag)
+ tex.sprint(catlatex,num)
+ if num == -1 then
+ __uftag_log ("Unknown tag "..tag.." used")
+ end
+end
+
+local __uftag_get_tag_from =
+ function (num)
+ if uftag.tables["g__uftag_role_tags_seq"][num] then
+ a = uftag.tables["g__uftag_role_tags_seq"][num]
+ else
+ a= "UNKNOWN"
+ end
+ return a
+end
+
+uftag.func.get_tag_from = __uftag_get_tag_from
+
+function uftag.func.output_tag_from (num)
+ tex.sprint(catlatex,__uftag_get_tag_from (num))
+end
+
+function uftag.func.store_mc_data (num,key,data)
+ uftag.mc[num] = uftag.mc[num] or { }
+ uftag.mc[num][key] = data
+ __uftag_log ("storing mc"..num..": "..tostring(key).."=>"..tostring(data))
+end
+
+function uftag.trace.show_mc_data (num)
+ if uftag and uftag.mc and uftag.mc[num] then
+ for k,v in pairs(uftag.mc[num]) do
+ __uftag_log ("mc"..num..": "..tostring(k).."=>"..tostring(v),3)
+ end
+ if uftag.mc[num]["kids"] then
+ __uftag_log ("mc" .. num .. " has " .. #uftag.mc[num]["kids"] .. " kids",3)
+ for k,v in ipairs(uftag.mc[num]["kids"]) do
+ __uftag_log ("mc ".. num .. " kid "..k.." =>" .. v.kid.." on page " ..v.page,3)
+ end
+ end
+ else
+ __uftag_log ("mc"..num.." not found",3)
+ end
+end
+
+function uftag.trace.show_all_mc_data (max)
+ for i = 1, max do
+ uftag.trace.show_mc_data (i)
+ end
+end
+
+
+function uftag.func.store_mc_label (label,num)
+ uftag.mc["labels"] = uftag.mc["labels"] or { }
+ uftag.mc.labels[label] = num
+end
+
+function uftag.func.store_mc_kid (mcnum,kid,page)
+ uftag.trace.log("MC"..mcnum.." STORING KID" .. kid.." on page " .. page,3)
+ uftag.mc[mcnum]["kids"] = uftag.mc[mcnum]["kids"] or { }
+ local kidtable = {kid=kid,page=page}
+ tableinsert(uftag.mc[mcnum]["kids"], kidtable )
+end
+
+function uftag.func.mc_insert_kids (mcnum)
+ if uftag.mc[mcnum] then
+ uftag.trace.log("MC-KIDS test " .. mcnum,4)
+ if uftag.mc[mcnum]["kids"] then
+ for i,kidstable in ipairs( uftag.mc[mcnum]["kids"] ) do
+ local kidnum = kidstable["kid"]
+ local kidpage = kidstable["page"]
+ local kidpageobjnum = pdfpageref(kidpage)
+ uftag.trace.log("MC" .. mcnum .. " insert KID " ..i.. " with num " .. kidnum .. " on page " .. kidpage.."/"..kidpageobjnum,3)
+ tex.sprint(catlatex,"<</Type /MCR /Pg "..kidpageobjnum .. " 0 R /MCID "..kidnum.. ">> " )
+ end
+ else
+ uftag.trace.log("WARN! MC"..mcnum.." has no kids",0)
+ end
+ else
+ uftag.trace.log("WARN! MC"..mcnum.." doesn't exist",0)
+ end
+end
+
+
+function uftag.func.store_struct_mcabs (structnum,mcnum)
+ uftag.struct[structnum]=uftag.struct[structnum] or { }
+ uftag.struct[structnum]["mc"]=uftag.struct[structnum]["mc"] or { }
+ -- a structure can contain more than on mc chunk, the content should be ordered
+ tableinsert(uftag.struct[structnum]["mc"],mcnum)
+ -- but every mc can only be in one structure
+ uftag.mc[mcnum]= uftag.mc[mcnum] or { }
+ uftag.mc[mcnum]["parent"] = structnum
+end
+
+function uftag.trace.show_struct_data (num)
+ if uftag and uftag.struct and uftag.struct[num] then
+ for k,v in ipairs(uftag.struct[num]) do
+ __uftag_log ("struct "..num..": "..tostring(k).."=>"..tostring(v))
+ end
+ else
+ __uftag_log ("struct "..num.." not found ")
+ end
+end
+
+-- pay attention: lua counts arrays from 1, tex pages from one
+-- mcid and arrays in pdf count from 0.
+function uftag.func.store_mc_in_page (mcnum,mcpagecnt,page)
+ uftag.page[page] = uftag.page[page] or {}
+ uftag.page[page][mcpagecnt] = mcnum
+ uftag.trace.log("PAGE " .. page .. ": inserting MCID " .. mcpagecnt .. " => " .. mcnum,3)
+end
+
+function uftag.func.fill_parent_tree_line (page)
+ -- we need to get page-> i=kid -> mcnum -> structnum
+ -- pay attention: the kid numbers and the page number in the parent tree start with 0!
+ local numsentry
+ local pdfpage = page-1
+ if uftag.page[page] and uftag.page[page][0] then
+ mcchunks=#uftag.page[page]
+ uftag.trace.log("PAGETREE PAGE "..page.." has "..mcchunks.."+1 Elements ",3)
+ for i=0,mcchunks do
+ uftag.trace.log("PAGETREE CHUNKS "..uftag.page[page][i],0)
+ end
+ if mcchunks == 0 then
+ -- only one chunk so no need for an array
+ local mcnum = uftag.page[page][0]
+ local structnum = uftag.mc[mcnum]["parent"]
+ local propname = "g__uftag_struct_"..structnum.."_prop"
+ local objnum = uftag.tables[propname]["objnum"] or "XXXX"
+ texio.write_nl("=====>"..tostring(objnum))
+ numsentry = pdfpage .. " ".. objnum .. " 0 R"
+ uftag.trace.log("PAGETREE PAGE" .. page.. " NUM ENTRY = ".. numsentry,3)
+ else
+ numsentry = pdfpage .. " ["
+ for i=0,mcchunks do
+ local mcnum = uftag.page[page][i]
+ local structnum = uftag.mc[mcnum]["parent"]
+ local propname = "g__uftag_struct_"..structnum.."_prop"
+ local objnum = uftag.tables[propname]["objnum"] or "XXXX"
+ numsentry = numsentry .. " ".. objnum .. " 0 R"
+ end
+ numsentry = numsentry .. "] "
+ uftag.trace.log("PAGETREE PAGE" .. page.. " NUM ENTRY = ".. numsentry,3)
+ end
+ else
+ uftag.trace.log ("PAGETREE: NO DATA FOR PAGE "..i,3)
+ end
+ return numsentry
+end
+
+function uftag.func.output_parenttree (abspage)
+ for i=1,abspage do
+ line = uftag.func.fill_parent_tree_line (i) .. "^^J"
+ tex.sprint(catlatex,line)
+ end
+end
Property changes on: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf.lua
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf.sty
===================================================================
--- trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf.sty (rev 0)
+++ trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf.sty 2018-07-05 21:45:46 UTC (rev 48146)
@@ -0,0 +1,166 @@
+\RequirePackage{expl3}[2018/06/14]
+
+\ProvidesExplPackage {tagpdf} {2018/07/04} {0.1}
+ {A package to experiment with pdf tagging}
+
+% storing internal names to my name space:
+\cs_set_eq:NN \__uftag_tex_pdffeedback:D \tex_pdffeedback:D
+\cs_set_eq:NN \__uftag_tex_pdfextension:D \tex_pdfextension:D
+\cs_set_eq:NN \__uftag_tex_immediate:D \tex_immediate:D
+\cs_set_eq:NN \__uftag_tex_pdfvariable:D \tex_pdfvariable:D
+\cs_set_eq:NN \__uftag_tex_global:D \tex_global:D
+\cs_set_eq:NN \__uftag_tex_the:D \tex_the:D
+\cs_set_eq:NN \__uftag_tex_pdfliteral:D \tex_pdfliteral:D
+\cs_set_eq:NN \__uftag_tex_pdfcatalog:D \tex_pdfcatalog:D
+\cs_set_eq:NN \__uftag_tex_pdflastobj:D \tex_pdflastobj:D
+\cs_set_eq:NN \__uftag_tex_pdfobj:D \tex_pdfobj:D
+\cs_set_eq:NN \__uftag_tex_pdfpageattr:D \tex_pdfpageattr:D
+\cs_set_eq:NN \__uftag_tex_pdfpagesattr:D \tex_pdfpagesattr:D
+\cs_set_eq:NN \__uftag_tex_pdfpageref:D \tex_pdfpageref:D
+\cs_set_eq:NN \__uftag_tex_pdfcompresslevel:D \tex_pdfcompresslevel:D
+\cs_set_eq:NN \__uftag_tex_pdfobjcompresslevel:D \tex_pdfobjcompresslevel:D
+
+%%% package options
+\bool_new:N\g__uftag_mode_lua_bool
+
+\DeclareOption{luamode} { \sys_if_engine_luatex:T { \bool_gset_true:N \g__uftag_mode_lua_bool } }
+\DeclareOption{genericmode}{ \bool_gset_false:N\g__uftag_mode_lua_bool }
+\ExecuteOptions{luamode}
+\ProcessOptions
+
+%%% some packages
+\RequirePackage{xparse}
+\RequirePackage{atbegshi}
+\RequirePackage{zref-base,zref-lastpage}
+\RequirePackage{etoolbox}
+\RequirePackage{pdfescape}
+
+%%% absolute page numbers
+\int_new:N \g__uftag_abspage_int
+\int_gset:Nn \g__uftag_abspage_int { 0 }
+\zref at newlist {tagpdf}
+\zref at newprop*{tagabspage} [0] { \int_use:N \g__uftag_abspage_int }
+\zref at addprop {tagpdf} {tagabspage}
+
+
+%%% tagpdfsetup,
+%%% TODO: checks need to be improved
+\int_new:N \l__uftag_loglevel_int
+\tl_new:N \l__uftag_tree_tabs_order_tl
+
+\keys_define:nn { tagpdf-setup }
+{
+ activate-mc .bool_gset:N = \g_uftag_active_mc_bool,
+ activate-tree .bool_gset:N = \g_uftag_active_tree_bool,
+ activate-struct .bool_gset:N = \g_uftag_active_struct_bool,
+ activate-all .meta:n ={activate-mc,activate-tree,activate-struct},
+ check-tags .bool_set:N = \g__uftag_check_tags_bool,
+ check-tags .initial:n = true,
+ log .choice:,
+ log / none .code:n = {\int_set:Nn \l__uftag_loglevel_int { 0 }},
+ log / v .code:n = {\int_set:Nn \l__uftag_loglevel_int { 1 }},
+ log / vv .code:n = {\int_set:Nn \l__uftag_loglevel_int { 2 }},
+ log / vvv .code:n = {\int_set:Nn \l__uftag_loglevel_int { 3 }},
+ log / all .code:n = {\int_set:Nn \l__uftag_loglevel_int { 10 }},
+ tagunmarked .bool_gset:N = \g__uftag_tagunmarked_bool,
+ tagunmarked .initial:n = true,
+ tabsorder .choice:,
+ tabsorder / row .code:n = {\tl_set:Nn \l__uftag_tree_tabs_order_tl {/Tabs/R}},
+ tabsorder / column .code:n = {\tl_set:Nn \l__uftag_tree_tabs_order_tl {/Tabs/C}},
+ tabsorder / structure .code:n = {\tl_set:Nn \l__uftag_tree_tabs_order_tl {/Tabs/S}},
+ tabsorder / none .code:n = {\tl_set:Nn \l__uftag_tree_tabs_order_tl {}},
+ compresslevel .choices:nn = {0,1,2,3,4,5,6,7,8,9}
+ {
+ \__uftag_pdfcompresslevel:n {#1}
+ \__uftag_pdfobjcompresslevel:n {#1}
+ },
+ compresslevel .value_required:n = true,
+ uncompress .meta:n = { compresslevel = 0 },
+}
+
+
+% commands to escape strings so that they can be safely used in pdf
+% currently not much used. But will be needed later, when alt and actualtext are added.
+% we probably need the equivalent \pdfescapestring, \pdfescapename \pdfescapehex
+% the commands of pdfescape adds an additional layout to allow for babel shorthands
+\cs_set_eq:NN \__uftag_pdf_escape_string:Nn \EdefEscapeString
+\cs_set_eq:NN \__uftag_pdf_escape_name:Nn \EdefEscapeName
+\cs_set_eq:NN \__uftag_pdf_escape_hex:Nn \EdefEscapeHex
+
+% a hook for later code and an absolute page counter
+% should be executed before counters are resetted.
+% is it used?
+\cs_new:Nn \__uftag_finish_page_hook: { }
+
+\AtBeginShipout
+ {
+ \__uftag_finish_page_hook:
+ \__uftag_tree_write_pageattr:
+ \int_gincr:N \g__uftag_abspage_int
+ }
+
+
+%testing the engines and loading the driver files
+\sys_if_engine_xetex:T
+ {
+ \PackageError { tagpdf } { xelatex~is~not~supported~-~aborting } {}
+ \tex_endinput:D
+ }
+
+\sys_if_engine_luatex:T
+ {
+ \file_input:n {tagpdf-luatex.def}
+ }
+
+\sys_if_engine_pdftex:T
+ {
+ \file_input:n {tagpdf-pdftex.def}
+ }
+
+\sys_if_output_dvi:T
+ {
+ \PackageError { tagpdf } { dvi~output~is~not~supported~-~aborting }{}
+ \tex_endinput:D
+ }
+
+\cs_generate_variant:Nn \__uftag_prop_gput:Nnn { Nxn , Nxx, Nnx , cnn, cxn, cnx, cno}
+\cs_generate_variant:Nn \__uftag_seq_gput_right:Nn { Nx , No, cn, cx }
+\cs_generate_variant:Nn \__uftag_prop_new:N { c }
+\cs_generate_variant:Nn \__uftag_seq_new:N { c }
+\cs_generate_variant:Nn \__uftag_seq_show:N { c }
+\cs_generate_variant:Nn \__uftag_prop_show:N { c }
+\cs_generate_variant:Nn \prop_gput:Nnn {Nxx}
+\cs_generate_variant:Nn \prop_put:Nnn {Nxx}
+\cs_generate_variant:Nn \__uftag_pdfuseobjnum:Nn {Nx}
+\cs_generate_variant:Nn \__uftag_pdfuseobjnum:nn {nx,xx}
+\cs_generate_variant:Nn \__uftag_gset_pdfpageattr:n {x}
+
+% few temp tl
+\tl_new:N \l__uftag_tmpa_tl
+\tl_new:N \l__uftag_tmpb_tl
+\tl_new:N \l__uftag_tmpc_tl
+\tl_new:N \l__uftag_tmpd_tl
+\tl_new:N \l__uftag_tmpe_tl
+
+% helper function to get the propcount.
+% use as \prop_map_function:NN PROP { \__uftag_prop_count:nn -1 }
+\cs_new:Nn\__uftag_prop_count:nn { + 1 }
+
+%% Loading the tagpdf sub packages
+\RequirePackage { tagpdf-checks-code }
+\RequirePackage { tagpdf-user }
+\RequirePackage { tagpdf-tree-code }
+\RequirePackage { tagpdf-roles-code }
+% mc-code is split:
+\RequirePackage { tagpdf-mc-code-shared }
+\bool_if:NTF \g__uftag_mode_lua_bool
+ {
+ \RequirePackage {tagpdf-mc-code-lua}
+ }
+ {
+ \RequirePackage { tagpdf-mc-code-generic } %
+ }
+
+\RequirePackage { tagpdf-struct-code }
+
+\endinput
Property changes on: trunk/Master/texmf-dist/tex/latex/tagpdf/tagpdf.sty
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Modified: trunk/Master/tlpkg/bin/tlpkg-ctan-check
===================================================================
--- trunk/Master/tlpkg/bin/tlpkg-ctan-check 2018-07-05 21:20:15 UTC (rev 48145)
+++ trunk/Master/tlpkg/bin/tlpkg-ctan-check 2018-07-05 21:45:46 UTC (rev 48146)
@@ -613,7 +613,7 @@
tablists tablor tabls
tabriz-thesis tabstackengine tabto-generic tabto-ltx
tabu tabularborder tabularcalc tabularew
- tabulars-e tabulary tabvar tagging tagpair talk tamefloats
+ tabulars-e tabulary tabvar tagging tagpair tagpdf talk tamefloats
tamethebeast tap tapir tasks tcldoc tcolorbox tdclock tdsfrmath
technics ted templates-fenn templates-sommer templatetools tempora
tengwarscript
Modified: trunk/Master/tlpkg/libexec/ctan2tds
===================================================================
--- trunk/Master/tlpkg/libexec/ctan2tds 2018-07-05 21:20:15 UTC (rev 48145)
+++ trunk/Master/tlpkg/libexec/ctan2tds 2018-07-05 21:45:46 UTC (rev 48146)
@@ -1871,6 +1871,7 @@
'systeme', '^systeme\.tex$|' . $standardtex,
'tabls', '^[^m].*\.sty', # not miscdoc.sty
'tabto-generic','tabto.tex',
+ 'tagpdf', '\.lua|' . $standardtex,
'tap', ,'tap.tex',
'taylor', 'diagrams.tex',
'termmenu', '\.tex',
Modified: trunk/Master/tlpkg/tlpsrc/collection-latexextra.tlpsrc
===================================================================
--- trunk/Master/tlpkg/tlpsrc/collection-latexextra.tlpsrc 2018-07-05 21:20:15 UTC (rev 48145)
+++ trunk/Master/tlpkg/tlpsrc/collection-latexextra.tlpsrc 2018-07-05 21:45:46 UTC (rev 48146)
@@ -1050,6 +1050,7 @@
depend tabulary
depend tagging
depend tagpair
+depend tagpdf
depend talk
depend tamefloats
depend tasks
Added: trunk/Master/tlpkg/tlpsrc/tagpdf.tlpsrc
===================================================================
More information about the tex-live-commits
mailing list