[luatex] Tagging of parts of document
Саша Козловский
k.sasha1994 at yandex.ru
Sun Sep 29 19:56:43 CEST 2019
Hello Everybody. As someone know,i want to do package,which do automatic
tagging of pdf documents. I try to solve problem of taggings of
paragraphs with chapters,sections,subsections,subsubsections...
sub..sections and paragraphs with subparagraphs. I read example of
tagpdf package,which do tagging of chapters,sections,sub..sections
etc,but at first it only works for names of this parts of document and
text of document not see in pdf,and at second it works only for scrbook
class,but i want,if it possible,to it works for any class of document (i
dont know,is it possible to do checking of class of document via
latex/lualatex). I tryed to use \everypar commands but,as i
understoode,reading my log,it seems,that this command calls only n+1
times,where n - numbers of sections and sub..sections. In attachment you
will find document,which shows problem (it must compiled only in
lualatex). Thanks very much everybody for the help.
-------------- next part --------------
\documentclass[12pt]{scrbook}
\usepackage{tagpdf}
\tagpdfsetup{interwordspace=true,activate-all,uncompress}
\usepackage{amsmath,amssymb}
\title{test document}
\author{AlexanderKozlovskiy}
\date{\today}
%\maketitle (why this not works?)
%Marking the toc entries
%around the whole entry so only structure:
\newcommand\tagscrtocentry[1]{\tagstructbegin{tag=TOCI}#1\tagstructend}
%leaf so structure and mc:
\newcommand\tagscrtocpagenumber[1]{%
\tagstructbegin{tag=Reference}%
\tagmcbegin{tag=Reference}%
#1%
\tagmcend
\tagstructend}
\DeclareTOCStyleEntry[
entryformat=\tagscrtocentry,
pagenumberformat=\tagscrtocpagenumber]{tocline}{chapter}
\DeclareTOCStyleEntry[
entryformat=\tagscrtocentry,
pagenumberformat=\tagscrtocpagenumber]{tocline}{section}
\DeclareTOCStyleEntry[
entryformat=\tagscrtocentry,
pagenumberformat=\tagscrtocpagenumber]{tocline}{subsection}
\DeclareTOCStyleEntry[
entryformat=\tagscrtocentry,
pagenumberformat=\tagscrtocpagenumber]{tocline}{subsubsection}
\DeclareTOCStyleEntry[
entryformat=\tagscrtocentry,
pagenumberformat=\tagscrtocpagenumber]{tocline}{paragraph}
\renewcommand{\addtocentrydefault}[3]{%
\ifstr{#3}{}{}
{%\
\ifstr{#2}{}
{%
\addcontentsline{toc}{#1}
{%
\protect\nonumberline
\tagstructbegin{tag=P}%
\tagmcbegin{tag=P}%
#3%
\tagmcend
\tagstructend
}%
}%
{%
\addcontentsline{toc}{#1}{%
\tagstructbegin{tag=Lbl}%
\tagmcbegin{tag=Lbl}%
\protect\numberline{#2}%
\tagmcend\tagstructend
\tagstructbegin{tag=P}%
\tagmcbegin{tag=P}%
#3%
\tagmcend
\tagstructend
}%
}%
}}%
% the dots must be marked too
\makeatletter
\renewcommand*{\TOCLineLeaderFill}[1][.]{%
\leaders\hbox{$\m at th
\mkern \@dotsep mu\hbox{\tagmcbegin{artifact}#1\tagmcend}\mkern \@dotsep
mu$}\hfill
}
%%%%%%%%%
% Sectioning commands
%%%%%%%%
\ExplSyntaxOn
\prop_new:N \g_tag_section_level_prop
\prop_gput:Nnn \g_tag_section_level_prop {chapter}{H1}
\prop_gput:Nnn \g_tag_section_level_prop {section}{H2}
\prop_gput:Nnn \g_tag_section_level_prop {subsection}{H3}
\prop_gput:Nnn \g_tag_section_level_prop {subsubsection}{H4}
\prop_gput:Nnn \g_tag_section_level_prop {paragraph}{H5}
%new 0.6, as attributes are local we have to put \tagmcbegin everywhere.
\renewcommand{\chapterlinesformat}[3]
{
\@hangfrom
{
\tagstructbegin{tag=\prop_item:Nn\g_tag_section_level_prop{chapter}}
\tl_if_empty:nF{#2}
{
\tagmcbegin {tag=\prop_item:Nn\g_tag_section_level_prop{chapter}}
#2
\tagmcend
}
}
{\tagmcbegin {tag=\prop_item:Nn\g_tag_section_level_prop{chapter}}
#3\tagmcend\tagstructend}%
}
%unnumbered sections level give an empty mc, need to think about it.
\renewcommand{\sectionlinesformat}[4]
{
\@hangfrom
{\hskip #2
\tagstructbegin{tag=\prop_item:Nn\g_tag_section_level_prop{#1}}
\tl_if_empty:nF{#3}
{
\tagmcbegin {tag=\prop_item:Nn\g_tag_section_level_prop{#1}}
#3
\tagmcend
}
}
{\tagmcbegin {tag=\prop_item:Nn\g_tag_section_level_prop{#1}}
#4
\tagmcend\tagstructend}%
}
\ExplSyntaxOff
\AfterTOCHead{\tagstructbegin{tag=TOC}}
\AfterStartingTOC{\tagstructend} %end TOC
\begin{document}
\tagstructbegin{tag=Document}
%do tagging of paragraphs
\ExplSyntaxOn
\everypar{
\message{the_size_of_stack_of_structure_elements_is_\seq_count:N \g__uftag_struct_stack_seq} %i dont know,why spaces ignore when i try input something in log,so i use _ instead of space character.
\int_case:nn {\seq_count:N \g__uftag_struct_stack_seq}
{
{2}{\tagstructbegin{tag=P}\tagmcbegin{tag=P}}
{4}{\tagstructend \tagstructbegin{tag=P}\tagmcbegin{tag=P}}}}
\ExplSyntaxOff
\begin{centering}
test tagging of parts of documents\\
\end{centering}
\newpage
\tableofcontents
\newpage
\chapter{first chapter}
start testing of tagging of paragraphs
\section{test of section}
{\tiny
this is test document,which allow to do tests of tagging sections and paragraphs
\subsection{subsection 1}
test
again test
\begin{description}
\item[1] lemon
\item[2] orange
again testing of tagging parts of document
\item[3] red
\item[4] green
\end{description}}
\newpage
\subsection{new test}
end of test of tagging of document.
\ExplSyntaxOn
\int_step_inline:nnn{2}{\seq_count:N \g__uftag_struct_stack_seq }
{\tagstructend}
\ExplSyntaxOff
\end{document}
More information about the luatex
mailing list