cleaning up bibtex files.

Mike Marchywka marchywka at hotmail.com
Sun Sep 22 18:50:34 CEST 2019


I had various versions and bugs in my bibtex downloading scripts
so I wrote some code to go through and clean them up. 
In the process of accumulating them however it looks like there is a lot
of variation in how they are supplied by publishers.

I guess I was curious if there is some prefered format as long as I
have gone to this much effort. My biggest concern was making sure everything
I got off the web was indicated as such with a url and I wanted to preserve
download time to retrace what I was doing. I put all of this in "comments"
but curious if putting it into the bibtex would hurt anything.

In particular, the field values seem to randomly be quoted or braced and
I just made them all braced. Does this lose something?

For example, I output stuff like this,

% programmatically fixed probably bu toobib
% loaded from test2.bib written on 2019-09-22:11:25:04



% srcurl:  https://www.pnas.org/content/pnas/111/28/10257.full.pdf
% citeurl:  http://api.crossref.org/works/10.1073/pnas.1409284111/transform/appl
ication/x-bibtex
% med2bib comment:  handledoi
% date  Wed Feb 6 01:02:49 UTC 2019


@article{Nikoh_2014,
    doi = {10.1073/pnas.1409284111},
    url = {https://doi.org/10.1073\%2Fpnas.1409284111},
    year = {2014},
    month = {jun},
    publisher = {Proceedings of the National Academy of Sciences},
    volume = {111},
    number = {28},
    pages = {10257--10262},
    author = {N. Nikoh and T. Hosokawa and M. Moriyama and K. Oshima and M. Hatt
ori and T. Fukatsu},
    title = {Evolutionary origin of insect-Wolbachia nutritional mutualism},
    journal = {Proceedings of the National Academy of Sciences}
}


@article{KRAMER201895,
    title = {Wolbachia, doxycycline and macrocyclic lactones: New prospects in t
he treatment of canine heartworm disease},
    journal = {Veterinary Parasitology},
    volume = {254},
    pages = {95 - 97},
    year = {2018},
    issn = {0304-4017},
    doi = {https://doi.org/10.1016/j.vetpar.2018.03.005},
    url = {http://www.sciencedirect.com/science/article/pii/S0304401718301055},
    author = {L. Kramer and S. Crosara and G. Gnudi and M. Genchi and C. Mangia 
and A. Viglietti and C. Quintavalla},
    keywords = {, , Doxycycline, Macrocyclic lactones}
}


and an excerpt of the diff output gives almost all of the input, 
often due to a quote to brace change. 
( and I went to a lot of effort to perserve the field order  instead of alpha lol ) 

diff -b test2.bib check2.bib


< title = "Wolbachia, doxycycline and macrocyclic lactones: New prospects in the
 treatment of canine heartworm disease",
< journal = "Veterinary Parasitology",
< volume = "254",
< pages = "95 - 97",
< year = "2018",
< issn = "0304-4017",
< doi = "https://doi.org/10.1016/j.vetpar.2018.03.005",
< url = "http://www.sciencedirect.com/science/article/pii/S0304401718301055",
< author = "L. Kramer and S. Crosara and G. Gnudi and M. Genchi and C. Mangia an
d A. Viglietti and C. Quintavalla",
< keywords = ", , Doxycycline, Macrocyclic lactones"


I would imagine there are other utilities that clean these up but I also wanted to
structure the comments in a custom way although I could just include the
srcurl as a bibtex entry field. 
I started to write a command line interactive fixer and thought if there were other 
common problems I could include all of that now.

Thanks. 



-- 

mike marchywka
306 charles cox
canton GA 30115
USA, Earth 
marchywka at hotmail.com
404-788-1216
ORCID: 0000-0001-9237-455X



More information about the texhax mailing list