cleaning up bibtex files.

Sun Sep 22 23:41:54 CEST 2019

On 22/09/2019 18:25, Mike Marchywka wrote:
[...]
> Maybe I just hate XML-like stuff but right now it would just be
> another level of translation.

Many people do, and yes it is. That's a price I'm prepared to pay for a
file format that can be checked for syntactic verification 
independently.  Plus I'm using this data for more than just biblatex.

> The bibtex format looks perfectly general and I can use it as a
> primary source effectively but just wanted to check on conventions
> and details for normal usage.

These should be in any book on LaTeX that covers BiBTeX/biblatex.

> This effort also let me write some parsing logic (although probably 
> anyone would normally write syntax diagrams and generate the
> parsing code too) to see what is going on.

Always a good exercise. I'm just lazy.

> My solution here was to preseve the download urls as I can just
> take all of them and refetch stuff although many as-received did need
> to be cleaned up (I've had a problem with url encoding of links etc and
> that stupid percent thing caused some issues and if xml does not care
> about that some other char would make a mess LOL).

Browsers are very forgiving of urlencoding errors. Other systems are 
not. XML doesn't care unless you try to resolve the link or specify the 
datatype. Nor does biblatex/BiBTeX AFAIK, but maybe some formatters 
check it.

> I don't think there is more reason to have more confidence in XML
> than the bib format once you have set everything up.

Not really, unless you're using the data for another purpose, like 
running queries on it, or formatting outputs other than LaTeX (eg 
Markdown, HTNL, etc).

Peter