[pdftex] experiment with tagged PDF

The Thanh Han hanthethanh at gmail.com
Fri May 2 12:49:05 CEST 2008


after some examples how to create tagged pdf manually, I
think we can start with some small step to ease the
process. I am thinking of providing a pair of primitives
that allows marking certain piece of the document and map it
to a structure type.

Syntax:

\startstructelem <parent> <general text>
    start a structure element with parent <parent>

\endstructelem
    end the current structure element

Example:

,--------
| % mark a section heading with structure type H (heading);
| \startstructelem <parent-ID> {H}
| \section{A section}
| \endstructelem
| <some text>
| ...
| 
| % mark a subsection heading with structure type H
| \startstructelem <parent-ID> {H1}
| \subsection{A subsection}
| \endstructelem
`--------

however it is not clear to me how to specify the parent of a
struct element (<parent-ID> in the above example). 

If we follow the "strongly structured" paradigm, then we
need to manage nested struct elements, so each struct element
must link to its parent. So each struct element must have an
ID, so that its children can use the ID for the link.

If we follow the "weakly structured" paradigm, then all
struct elements will have a single parent, namely the root
document. Then there is no need to manage the parent ID at
all.

Perhaps it's best to support both, however for the first
step I prefer to start with simpler things, ie weakly
structured.


Comments welcome.

Thanh

PS: From pdf spec:

,--------
| "Strongly structured .The grouping elements nest to as many levels as necessary
| to reflect the organization of the material into articles,sections,subsections,
| and so on.At each level,the children of the grouping element consist of a head-
| ing (H ),one or more paragraphs (P )for content at that level,and perhaps one or
| more additional grouping elements for nested subsections.
| 
| "Weakly structured .The document is relatively flat,having perhaps only one or
| two levels of grouping elements,with all the headings,paragraphs,and other
| BLSEs as their immediate children.In this case,the organization of the material
| is not reflected in the logical structure;however,it can be expressed by the use
| of headings with specific levels (H1 H6 )."
`--------


More information about the pdftex mailing list