[pdftex] a wrapper for using pdftex as a library call

Wed Jan 31 20:27:51 CET 2007

[The mailing list handler didn't like me using GPG signing.  Resend to
this list only.]

Thanh Han The wrote:
>> I am working on a wrapper in C so that application can call pdftex
>> via a library call. Can you please have a look at the proposed API
>> and comment on it if you find a potential problem?

Trying to do something like this cross-platform is very difficult
because not all systems have the necessary underlying functionatlity to
do it right.

For instance, the init_pdftex_data interface in your proposal has a
working directory parameter as a string.  That's a security problem.
Since it's not guaranteed that there is a reference (open file
descriptor etc) in the directory somebody might change the directory
(rename an existing one) and the TeX run overwrites other files.  Or,
more likely, a part of the path name is changed (symlink attack).

The only way to guarantee that directory the caller intends to use is
indeed used is by passing in a file descriptor.  In the POSIX world this
is no problem.  The file descriptor is inherited through a fork() call
and before the exec() call to pdftex you call fchdir(fd).  He is where
you'll find problems since not all systems can implement this.

I assume your 'run_pdftex' interface is synchronous.  IMO It would be at
least required to have an asynchronous version as well.  I.e., a version
where you initiate the start and then later independently query and if
necessary wait for the result.  The reason is obvious: the program can
do work on its own while TeX is running.  Parallelism is extremely
important going forward.

And an implementation detail: _never_ expose data structures unless it
is really, *REALLY* needed.  I'm talking here about the
pdftex_data_struct, of course.  Direct access to any of its members in
the user code is in no way performance critical.  The initiated TeX runs
are quite expensive in terms of execution time so that any memory
allocation performed is completely negligible.

So I propose to make the structure completely opaque.  I.e., in the
public header only have

  typedef struct pdftex_data_struct pdftex_data_t;

(I renamed the struct as well, _t is often used to indicate type names).

Then change the init_pdftex_data() function to take a pdftex_data_t**
parameter.  The function will itself allocate the memory for the
structure.  If allocation fails the pointer variable pointed to by the
parameter is set to NULL.  Otherwise to the newly allocated memory.
Error handling when returning from init_pdftex_data() has to handle this
case (BTW: why not return an error code and not just success/failure
information from the functions, then you don't have to pass a pointer to
the tmp variable to pds_print_error).

Anyway, if you make this change the information about the struct is
completely encapsulated in your code.  This is important for
maintainability since it gives you the opportunity to change the
implementation as much as you want as long as the function interfaces
remain the same.

About pds_print_error_and_exit: such an interface is usually not useful
except in tiny little programs.  Assume you write a graphical shell for
TeX.  You don't want to terminate the program after a failed run, the
user should be able to fix problems and rerun.  What is needed, though,
is the ability to show an error string.  So, what maybe is needed is to
have a function which returns an error string which can be printed in
the appropriate way (on terminal, in dialog box, whatever).

About the interface naming: C's flat namespace is crowded.  To minimize
the risk of conflicts you should standardize on a common prefix for all
function and type names and stick with it.  E.g.,

  pdftex_data_struct       ->       pdftexlib_data_struct
  init_pdftex_data         ->       pdftexlib_data_init
  pds_print_error_and_exit ->       pdftexlib_error
  run_pdftex               ->       pdftexlib_run

 you get the idea.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖