[luatex] Tagging of parts of document

Саша Козловский k.sasha1994 at yandex.ru
Sun Sep 29 19:56:43 CEST 2019


Hello Everybody. As someone know,i want to do package,which do automatic 
tagging of pdf documents. I try to solve problem of taggings of 
paragraphs with chapters,sections,subsections,subsubsections... 
sub..sections and paragraphs with subparagraphs. I read example of 
tagpdf package,which do tagging of chapters,sections,sub..sections 
etc,but at first it only works for names of this parts of document and 
text of document not see in pdf,and at second it works only for scrbook 
class,but i want,if it possible,to it works for any class of document (i 
dont know,is it possible to do checking of class of document via 
latex/lualatex). I tryed to use \everypar commands but,as i 
understoode,reading my log,it seems,that this command calls only n+1 
times,where n - numbers of sections and sub..sections. In attachment you 
will find document,which shows problem (it must compiled only in 
lualatex). Thanks very much everybody for the help.

-------------- next part --------------
\documentclass[12pt]{scrbook}
\usepackage{tagpdf}
\tagpdfsetup{interwordspace=true,activate-all,uncompress}
\usepackage{amsmath,amssymb}
\title{test document}
\author{AlexanderKozlovskiy}
\date{\today}
%\maketitle (why this not works?)
%Marking the toc entries
%around the whole entry so only structure:
\newcommand\tagscrtocentry[1]{\tagstructbegin{tag=TOCI}#1\tagstructend}

%leaf so structure and mc:
\newcommand\tagscrtocpagenumber[1]{%
 \tagstructbegin{tag=Reference}%
 \tagmcbegin{tag=Reference}%
 #1%
 \tagmcend
 \tagstructend}

\DeclareTOCStyleEntry[
   entryformat=\tagscrtocentry,
   pagenumberformat=\tagscrtocpagenumber]{tocline}{chapter}
\DeclareTOCStyleEntry[
   entryformat=\tagscrtocentry,
   pagenumberformat=\tagscrtocpagenumber]{tocline}{section}
\DeclareTOCStyleEntry[
   entryformat=\tagscrtocentry,
   pagenumberformat=\tagscrtocpagenumber]{tocline}{subsection}
\DeclareTOCStyleEntry[
   entryformat=\tagscrtocentry,
   pagenumberformat=\tagscrtocpagenumber]{tocline}{subsubsection}
\DeclareTOCStyleEntry[
   entryformat=\tagscrtocentry,
   pagenumberformat=\tagscrtocpagenumber]{tocline}{paragraph}

\renewcommand{\addtocentrydefault}[3]{%
 \ifstr{#3}{}{}
   {%\
   \ifstr{#2}{}
    {%
     \addcontentsline{toc}{#1}
      {%
       \protect\nonumberline
       \tagstructbegin{tag=P}%
       \tagmcbegin{tag=P}%
        #3%
       \tagmcend
       \tagstructend
      }%
    }%
    {%
    \addcontentsline{toc}{#1}{%
     \tagstructbegin{tag=Lbl}%
     \tagmcbegin{tag=Lbl}%
     \protect\numberline{#2}%
     \tagmcend\tagstructend
     \tagstructbegin{tag=P}%
     \tagmcbegin{tag=P}%
      #3%
     \tagmcend
     \tagstructend
     }%
    }%
   }}%

% the dots must be marked too
\makeatletter
\renewcommand*{\TOCLineLeaderFill}[1][.]{%
  \leaders\hbox{$\m at th
    \mkern \@dotsep mu\hbox{\tagmcbegin{artifact}#1\tagmcend}\mkern \@dotsep
    mu$}\hfill
}

%%%%%%%%%
% Sectioning commands
%%%%%%%%

\ExplSyntaxOn
\prop_new:N   \g_tag_section_level_prop
\prop_gput:Nnn \g_tag_section_level_prop {chapter}{H1}
\prop_gput:Nnn \g_tag_section_level_prop {section}{H2}
\prop_gput:Nnn \g_tag_section_level_prop {subsection}{H3}
\prop_gput:Nnn \g_tag_section_level_prop {subsubsection}{H4}
\prop_gput:Nnn \g_tag_section_level_prop {paragraph}{H5}

%new 0.6, as attributes are local we have to put \tagmcbegin everywhere.
\renewcommand{\chapterlinesformat}[3]
 {
  \@hangfrom
   {
    \tagstructbegin{tag=\prop_item:Nn\g_tag_section_level_prop{chapter}}
    \tl_if_empty:nF{#2}
     {
      \tagmcbegin    {tag=\prop_item:Nn\g_tag_section_level_prop{chapter}}
      #2
      \tagmcend
     }
   }
   {\tagmcbegin    {tag=\prop_item:Nn\g_tag_section_level_prop{chapter}}
    #3\tagmcend\tagstructend}%
}

%unnumbered sections level give an empty mc, need to think about it.
\renewcommand{\sectionlinesformat}[4]
 {
  \@hangfrom
   {\hskip #2
    \tagstructbegin{tag=\prop_item:Nn\g_tag_section_level_prop{#1}}
    \tl_if_empty:nF{#3}
    {
     \tagmcbegin    {tag=\prop_item:Nn\g_tag_section_level_prop{#1}}
     #3
     \tagmcend
    }
   }
   {\tagmcbegin    {tag=\prop_item:Nn\g_tag_section_level_prop{#1}}
    #4
    \tagmcend\tagstructend}%
 }
\ExplSyntaxOff
\AfterTOCHead{\tagstructbegin{tag=TOC}}
\AfterStartingTOC{\tagstructend} %end TOC

\begin{document}
\tagstructbegin{tag=Document}
%do tagging of paragraphs
\ExplSyntaxOn
\everypar{
\message{the_size_of_stack_of_structure_elements_is_\seq_count:N \g__uftag_struct_stack_seq} %i dont know,why spaces ignore when i try input something in log,so i use _ instead of space character.
\int_case:nn {\seq_count:N \g__uftag_struct_stack_seq}
  {
   {2}{\tagstructbegin{tag=P}\tagmcbegin{tag=P}}
{4}{\tagstructend \tagstructbegin{tag=P}\tagmcbegin{tag=P}}}}
\ExplSyntaxOff
\begin{centering}
test tagging of parts of documents\\
\end{centering}

\newpage

\tableofcontents

\newpage

\chapter{first chapter}

start testing of tagging of paragraphs

\section{test of section}

{\tiny

this is test document,which allow to do tests of tagging sections and paragraphs

\subsection{subsection 1}

test

again test

\begin{description}

\item[1] lemon

\item[2] orange

again testing of tagging parts of document

\item[3] red

\item[4] green

\end{description}}

\newpage

\subsection{new test}

end of test of tagging of document.

\ExplSyntaxOn
\int_step_inline:nnn{2}{\seq_count:N \g__uftag_struct_stack_seq }
{\tagstructend}
\ExplSyntaxOff
\end{document}


More information about the luatex mailing list