[pdftex] experiment with tagged PDF

The Thanh Han hanthethanh at gmail.com
Sun May 4 23:38:25 CEST 2008


I updated the patch and examples at
http://sarovar.org/tracker/index.php?func=detail&aid=945&group_id=106&atid=495

what's new:
- added the primitives \pdfstartstructelem and
  \pdfendstructelem as proposed below
- added an example how to use it. The example passed
  pdf/a-1a verification by Acrobat 8.

Thanh

On Fri, May 02, 2008 at 12:49:05PM +0200, The Thanh Han wrote:
> after some examples how to create tagged pdf manually, I
> think we can start with some small step to ease the
> process. I am thinking of providing a pair of primitives
> that allows marking certain piece of the document and map it
> to a structure type.
> 
> Syntax:
> 
> \startstructelem <parent> <general text>
>     start a structure element with parent <parent>
> 
> \endstructelem
>     end the current structure element
> 
> Example:
> 
> ,--------
> | % mark a section heading with structure type H (heading);
> | \startstructelem <parent-ID> {H}
> | \section{A section}
> | \endstructelem
> | <some text>
> | ...
> | 
> | % mark a subsection heading with structure type H
> | \startstructelem <parent-ID> {H1}
> | \subsection{A subsection}
> | \endstructelem
> `--------
> 
> however it is not clear to me how to specify the parent of a
> struct element (<parent-ID> in the above example). 
> 
> If we follow the "strongly structured" paradigm, then we
> need to manage nested struct elements, so each struct element
> must link to its parent. So each struct element must have an
> ID, so that its children can use the ID for the link.
> 
> If we follow the "weakly structured" paradigm, then all
> struct elements will have a single parent, namely the root
> document. Then there is no need to manage the parent ID at
> all.
> 
> Perhaps it's best to support both, however for the first
> step I prefer to start with simpler things, ie weakly
> structured.
> 
> 
> Comments welcome.
> 
> Thanh
> 
> PS: From pdf spec:
> 
> ,--------
> | "Strongly structured .The grouping elements nest to as many levels as necessary
> | to reflect the organization of the material into articles,sections,subsections,
> | and so on.At each level,the children of the grouping element consist of a head-
> | ing (H ),one or more paragraphs (P )for content at that level,and perhaps one or
> | more additional grouping elements for nested subsections.
> | 
> | "Weakly structured .The document is relatively flat,having perhaps only one or
> | two levels of grouping elements,with all the headings,paragraphs,and other
> | BLSEs as their immediate children.In this case,the organization of the material
> | is not reflected in the logical structure;however,it can be expressed by the use
> | of headings with specific levels (H1 H6 )."
> `--------


More information about the pdftex mailing list