[tex-live] texdoc in luatex

Mon Jul 2 15:43:33 CEST 2007

Robin Fairbairns <Robin.Fairbairns at cl.cam.ac.uk> wrote:

> in the ones not marked, we rely on a default attribute.

Well, I can assume that every document that has no 'language' attribute
is written in English.

>> Do you have some way to specify that one of the files is an entry point
>> for the whole documentation? I think that would be nice.
>
> no.

"No, we don't have that", or "No, that woudn't be nice"? :)

> i spend an inordinate amount of my (otherwise) free time scanning for
> these things; if i find a *.doc file that's actually plain text, i
> simply rename it -- the author's original wishes are probably irrelevant
> in today's world, where "microcruft knows best".

I see. Then, at least until a specific attribute is added to the DTD, I
believe we can more or less rely on the extension to identify the type
of a file with the following algorithm:
  - for known extensions ({pdf,ps,dvi,doc}.{gz,bz2,lzma}), this is
    straightforward[1];
  - anything else ending in .gz, .bz2 or .lzma is assumed to be
    compressed plain text.
  - anything else is assumed to be raw plain text.

> the problem is that unless one forces such things, m* is going to fall
> in a little heap on the floor, twitching quietly.  a text attribute
> would be good (though it would of course also require an encoding
> attribute -- there are lots of non-ascii iso-646 national variants, as
> well as myriad m$ code pages, in use on the archive, and to first order,
> there's no way of working out what these things are, other than simply
> guessing.

Yes, this is non-trivial. If you add an encoding attribute, I'll try to
make use of it on Unix through the locales mechanism, otherwise I
believe leaving this unadressed is acceptable in first approximation
(the majority of text files are encoded in ASCII).

>> Given the objection raised by README.en, I think a "type" attribute
>> that is used at least for text files would be desirable. Looking at
>> 
>>   <documentation href='ctan:/fonts/micropress/hvmath/hvmath.txt'/>
>> 
>> from hvmath-fonts.xml, it seems you don't already do that.
>
> we don't have such a thing in the catalogue dtd, so it can't be done.
> yet.

OK, that would be nice to have, but is not a must, since the algorithm
described above should be able to determine the correct type in most
cases, I think.

>> Another thing I'm missing is a way to map paths such as:
>> 
>>   ctan:/macros/latex/contrib/hyperref/README
>> 
>> to paths in the installed TL distro. But I think Norbert has the answer
>> to this question or can put it in the infrastructure scripts.
>
> i thought that's what david kastrup's script does.  (since my suggestion
> of writing another different target for catalogue html generation
> received no support at all, i assumed that was the reason.)

If your target will reliably write the path of each doc file as
installed in TL, this is exactly what I'd like to have (with a way to
map each installed file to the CTAN package it belongs to).

David Kastrup's script, AFAIU, is a "cheap" way to correct in *many or
most* cases the HTML files we have now, but it uses an heuristic
approach. David will correct me if I'm wrong, but basically, it it finds
a broken link to a file, it will look for a file in the installed TL
tree with the same name and the longest matching suffix possible (this
is to avoid confusing hyperref/manual.pdf with foobar/manual.pdf).

Though having this script is clearly better than not having it, I prefer
for the long term a deterministic approach where the installation path
of each file is determined reliably by whatever tool performs the
installation.

Moreover, it would be extremely ugly and painful for my program to have
to parse the catalogue HTML files in order to find the installation path
for each doc file and the corresponding package. I'd very much prefer be
provided with a simple file that tells me, for each CTAN package, the
path of each doc file as installed in TL. And I think Norbert can
provide me with that.

Even better would be if this file could carry with it the metadata about
each doc file, so that I could easily take advantage of the 'language'
attribute and possible extensions such as 'type' and 'index' (the latter
being the one that tells "this file is an entry point for the package
documentation").

> there is no problem in principle.  author-provided metadata is a
> medium-term aim; in practice, this is taking somewhat longer than we had
> hoped.  (jim hefferon is giving a paper at tug 2007 about his work in
> the related area of automating uploads still further.)

OK, that's good.

> i would concur with joachim that adding still further metadata to
> packages, elsewhere, is a route to chaos.  however, there is no doubting
> that the catalogue is a slow-moving object: it takes me more than a
> month to complete a scan of the catalogue to check on a particular
> issue.  which is not to say that we don't welcome suggestions; however,
> given the work-load involved, we simply don't act on any suggestion
> that's not thoroughly thought-through, and completely documented.

OK, I only asked for 'index' and 'type', since 'language' is already
there, but as explained in this mail, I could live without them (I can
guess 'type'; for 'index', the user will simply not have the feature
available, but this is not very important).

> (fwiw, i'm currently thinking of a way that the catalogue can be used to
> generate the by-topic index, automatically.  the basic idea is trivial,
> but the details are really rather complicated.)

If we implement the tags stuff, then you could have a topic (tag) for
fonts, one for maths, etc., and they wouldn't be mutually exclusive: for
instance, a package, such as fourier, providing a math font, could be
found under *both* topics.

Regards,

  [1] No, this doesn't mean I encourage compressing PDF files, just that
      it's easy to add simple support for them. Yes, such simple support
      will break with PDF-to-PDF links, that's the price to pay for
      people compressing their PDF files.

-- 
Florent