cleaning up bibtex files.

Mike Marchywka marchywka at
Sun Sep 22 18:50:34 CEST 2019

I had various versions and bugs in my bibtex downloading scripts
so I wrote some code to go through and clean them up. 
In the process of accumulating them however it looks like there is a lot
of variation in how they are supplied by publishers.

I guess I was curious if there is some prefered format as long as I
have gone to this much effort. My biggest concern was making sure everything
I got off the web was indicated as such with a url and I wanted to preserve
download time to retrace what I was doing. I put all of this in "comments"
but curious if putting it into the bibtex would hurt anything.

In particular, the field values seem to randomly be quoted or braced and
I just made them all braced. Does this lose something?

For example, I output stuff like this,

% programmatically fixed probably bu toobib
% loaded from test2.bib written on 2019-09-22:11:25:04

% srcurl:
% citeurl:
% med2bib comment:  handledoi
% date  Wed Feb 6 01:02:49 UTC 2019

    doi = {10.1073/pnas.1409284111},
    url = {\%2Fpnas.1409284111},
    year = {2014},
    month = {jun},
    publisher = {Proceedings of the National Academy of Sciences},
    volume = {111},
    number = {28},
    pages = {10257--10262},
    author = {N. Nikoh and T. Hosokawa and M. Moriyama and K. Oshima and M. Hatt
ori and T. Fukatsu},
    title = {Evolutionary origin of insect-Wolbachia nutritional mutualism},
    journal = {Proceedings of the National Academy of Sciences}

    title = {Wolbachia, doxycycline and macrocyclic lactones: New prospects in t
he treatment of canine heartworm disease},
    journal = {Veterinary Parasitology},
    volume = {254},
    pages = {95 - 97},
    year = {2018},
    issn = {0304-4017},
    doi = {},
    url = {},
    author = {L. Kramer and S. Crosara and G. Gnudi and M. Genchi and C. Mangia 
and A. Viglietti and C. Quintavalla},
    keywords = {, , Doxycycline, Macrocyclic lactones}

and an excerpt of the diff output gives almost all of the input, 
often due to a quote to brace change. 
( and I went to a lot of effort to perserve the field order  instead of alpha lol ) 

diff -b test2.bib check2.bib

< title = "Wolbachia, doxycycline and macrocyclic lactones: New prospects in the
 treatment of canine heartworm disease",
< journal = "Veterinary Parasitology",
< volume = "254",
< pages = "95 - 97",
< year = "2018",
< issn = "0304-4017",
< doi = "",
< url = "",
< author = "L. Kramer and S. Crosara and G. Gnudi and M. Genchi and C. Mangia an
d A. Viglietti and C. Quintavalla",
< keywords = ", , Doxycycline, Macrocyclic lactones"

I would imagine there are other utilities that clean these up but I also wanted to
structure the comments in a custom way although I could just include the
srcurl as a bibtex entry field. 
I started to write a command line interactive fixer and thought if there were other 
common problems I could include all of that now.



mike marchywka
306 charles cox
canton GA 30115
USA, Earth 
marchywka at
ORCID: 0000-0001-9237-455X

More information about the texhax mailing list