[tex-live] datestamp in texcat

Alexander Cherepanov cherepan at mccme.ru
Sat Sep 19 11:49:09 CEST 2009


Hi Robin!
On Fri, 18 Sep 2009 10:05:45 +0100, Robin Fairbairns <Robin.Fairbairns at cl.cam.ac.uk> wrote:

>> > the omitted ":" has meant that checkin hasn't recorded the "author"
>> > involved (in our case, the last editor of the xml).
>> 
>> modifier attribute is probably never used outside texcat so it's 
>> entirely up to you to decide on significance of this. I wrote about it 
>> just in case.

> it provides a trace of who did what.  (like "who on earth wrote this
> rubbish description")

But maybe you don't care about it...

>> > i (think) i've tidied up the problems you detected, even the ones
>> > (spurious line breaks) that don't bother anyone (white space inside xml
>> > elements tends to be ignored).
>> 
>> Thanks. Spurious line breaks are gone and irregular spacing before $ is 
>> up in the air. But hepunits is still with modifier='$Authors$' in the 
>> public repository.

> aha!  must have failed to check in the change (i certainly did it last
> night).  i've redone it now, and checked in (so when i get home tonight
> i'll get a version clash when i update my repo copy).

It's still there...

>> Another idea: just pipe entries through some xml normalizer before 
>> committing them. This will take care of spurious line breaks, 
>> different attributes order (66 entries have unusual order of 
>> attributes in entry element; yes, I know, it doesn't bother anyone) 
>> etc.

> i've never used a normaliser, but this may be worth looking at.
> however, this catalogue is a source of data for generating html pages,
> not a source of stuff to read; normalising things may not be useful.

AFAIU this catalogue is used at least to populate texlive.tlpdb. And 
from your presence on texlive mailing list I deduced that you care 
about it. I could be wrong on both points though.

Either way is fine with me -- i'd just tried to figure out where to 
report "irregularities".

>> >> Are you interested in other irregularities?
>> 
>> > if you spot significant things, like missing author attributions, or
>> > failure to record dates, yes.
>> 
>> Ok, I see.

> the real point is, if an xml file generates correct output, and is
> readable by someone using a general text editor to make changes, then
> it's ok.
> 
> for example, i occasionally spot some "irregular" order while editing:
> when that happens i usually reorder things.  but i don't care enough
> about the issue to go seeking out such irregularities (and my checking
> script doesn't even have the wherewithal to look).

Sorry, I don't get it. You mainly talk about xml level of things. But 
you don't need any scripts to check this -- just validate xml against 
dtd with any of the numerous xml validators.

OTOH to spot wrong authorref one needs to check it manually against 
real sources. So the only thing worth checking by scripts is the 
contents of xml attributes and elements that has some structure (like 
entry/@datestamp, documentation/@href, ctan/@path etc.).

Hm, by taking a look at the actual catalogue.dtd it seems to be fairly 
outdated. I've somewhat updated it (attached). It's quick and dirty: 
all IDREFS are changed to CDATA (it will not check that references to 
authors are valid etc.), fixed order of elements if thrown away (too 
much xml files with wrong order of elements; it will not check that 
there is at least one authorref in every file). And beware: I've never 
touched any dtd before.

Here are some results of validation:

  ams2bib.xml
    element 'documentation': attribute 'Details' instead of 'details'
  arrow.xml
    element 'documentation': attribute 'detals' instead of 'details'
  barkom.xml, colortbl.xml, koma-script-examples.xml, miktex_update.xml, 
  pdfcomment.xml
    element 'documentation': attribute 'lang' instead of 'language'
  bibunits.xml
    element 'license': attribute 'user' instead of 'username'
  breakurl.xml, geometry.xml
    attributes for element 'license' ended in element 'version'
  cjk.xml, fancybox.xml, makor2.xml
    element 'documentation': attribute 'description' instead of 'details'
  classicthesis.xml
    element 'license': spurious attribute 'number'
  cspsfonts.xml
    element 'texlivbe' instead of 'texlive'
  diagramf.xml
    element 'vesion' instead of 'version'
  doipubmed.xml
    element 'copyright': attribute 'owener' instead of 'owner'
  fig4latex.xml
    element 'tt' instead of 'xref':
      figures with graphics created by <tt refid='xfig'>XFig</tt>
  fihyph.xml
    element 't' instead of 'tt':
      accented letters to work with LaTeX2e, adding some <t>\catcode</t>,
  floatrow.xml
    element 'documentation': attribute 'lanuguage' instead of 'language'
  here.xml
    element 'fptex'?
  mmap.xml
    element 'version': attribute 'file'?
  musixtex.xml, ntsfaq.xml
    element 'xref': superfluous 'tt'?
      (<xref refid='musixflx'><tt>musixflx</tt></xref>) to generate
  ucthesis.xml, umich-thesis.xml, units.xml, upquote.xml
    element 'license': attribute 'location' instead of 'file'
  versions.xml, vmargin.xml, warning.xml
    element 'license': attribute 'date' instead of 'checked'
    element 'license': attribute 'location' instead of 'file'
  wadalab.xml
    element 'documentation': attribute 'deltails' instead of 'details'

Don't feel obliged to fix it just because I've repored it. If it's 
insignificant for you then I don't care either.

Alexander Cherepanov

-------------- next part --------------
A non-text attachment was scrubbed...
Name: catalogue.dtd.gz
Type: application/unknown
Size: 2813 bytes
Desc: not available
URL: <http://tug.org/pipermail/tex-live/attachments/20090919/f9f84dc2/attachment.bin>


More information about the tex-live mailing list