bibtex from webpage [was Re: Tex utilities - related to ampersand cr error thread]

Mike Marchywka marchywka at
Mon Jul 8 11:04:51 CEST 2019

On Mon, Jul 08, 2019 at 04:15:15AM +0000, Schneider, Thomas (NIH/NCI) [E] wrote:
> Mike:
> > just trying to extract bibtex from webpages is a huge task in
> > itself...
> I've pretty much solved this for biology.  See my yvp page:
> It turns out that the year, volume and page are often sufficient to
> identify a unique paper in PubMed.  (If not, one can tack on an extra
> key word or select from the list provided by PubMed)  The yvp script
> does this.

For this case, I guess I would just use eutils instead of opening a browser.

and create bibtex from that. However, usualy the situation is I'm browsing
and find an interesting article at a publisher's site. Normally then you 
have to find a link to the bibtex and do some stuff to obtain it.
I ended up writing a script that takes a URL from clipboard and tries to
find bibtex for the article the link describes through either
a cite-specific derived link, scraping the page for bibtex, extracting
a DOI and using crossref etc. For pubmed, I extract the PMC or PMID and
get a result in medline format which can be parsed into bibtex.
Many publishers appear to use a few canned solutions so it is easy
to find the link on their pages but sometimes I have to scrape pdf files
for a doi and can get up the wrong article ... The script
I now have seems pretty good at getting bibtex but I do
need to check results and it seems awfully complicated- I probably
have 100 domains with special handlers and no assurance the webpages
are stable.  

I guess there are browser plugins or citation managers for this
but a script that moves away from browser integrated better. 


mike marchywka
306 charles cox
canton GA 30115
USA, Earth 
marchywka at
ORCID: 0000-0001-9237-455X

More information about the texhax mailing list