cleaning up bibtex files.

Mike Marchywka marchywka at
Sun Sep 22 19:25:27 CEST 2019

On Sun, Sep 22, 2019 at 06:07:11PM +0100, Peter Flynn wrote:
> On 22/09/2019 17:50, Mike Marchywka wrote:
> [...]
> > I guess I was curious if there is some preferred format as long as I
> > have gone to this much effort. My biggest concern was making sure
> > everything I got off the web was indicated as such with a url and I
> > wanted to preserve download time to retrace what I was doing. I put
> > all of this in "comments" but curious if putting it into the bibtex
> > would hurt anything.
> I don't think so, but this is one of many reasons I store all my
> references in DocBook XML, and run an XSLT script to create BiBTeX files
> when I need to. I try to pick "EndNote XML" export format from web site
> and biblio applications wherever possible because I have a script to
> handle that.

Maybe I just hate XML-like stuff but right now it would just be another 
level of translation. The bibtex format looks perfectly general
and I can use it as a primary source effectively but just wanted
to check on conventions and details for normal usage. 

This effort also let me write some parsing logic ( although probably
anyone would normally write syntax diagrams and generate the partsing code too )
to see what is going on. 

> That way I *know* everything is named and labelled correctly, and if
> biblatex formatters change how the use fields, I can easily modify the
> script and just regenerate the files. If I had to generate old BiBTeX
> files, I could also do that.

My solution here was to presere the download url's as I can just 
take all of them and refetch stuff although many as-received did need
to be cleaned up ( I've had a problem with url encoding of links etc and
that stupid percent thing caused some issues and if xml does not care
about that some other char would make a mess LOL  ).

 I don't think there is more reason to have more confidence in XML
than the bib format once you have set everything up. 

> > In particular, the field values seem to randomly be quoted or braced
> > and I just made them all braced. Does this lose something?
> Not as far as I know. The rule is, if the field value is an integer (eg
> a year) it doesn't need braces, otherwise it does.
> Peter


mike marchywka
306 charles cox
canton GA 30115
USA, Earth 
marchywka at
ORCID: 0000-0001-9237-455X

More information about the texhax mailing list