Generating bibtex entry from a URL: zotero, zbib and TooBib.

Mike Marchywka marchywka at hotmail.com
Thu Jun 3 19:06:47 CEST 2021


AFAICT, Zotero is consistently just providing skeleton bibtex from the news sites
but the few I checked from google news have a lot of cool info in the ld+json stuff.
For example,

https://www.washingtonpost.com/us-policy/2021/06/03/biden-infrastructure-tax-republicans/


@article{noauthor_biden_nodate,
	title = {Biden offers tax concession in infrastructure talks with key {Republican}},
	issn = {0190-8286},
	url = {https://www.washingtonpost.com/us-policy/2021/06/03/biden-infrastructure-tax-republicans/},
	abstract = {President Biden signaled at a private meeting on Wednesday that he would further narrow his infrastructure package to win Republican support, outlining a plan for about \$1 trillion in new spending financed through  tax reforms that would leave the top corporate rate intact.},
	language = {en-US},
	urldate = {2021-06-03},
	journal = {Washington Post},
}

But all the traditional things like author name are in an "application/ld+json" block and often
it includes things like video clips etc. I am right now just making skeleton bibtex and not
translating the fields but you can see that there is all kinds of useful stuff there,  largely
just in need of name translation, 

@NewsArticle{default,
GLOBAL_alternativeHeadline = {Biden offers major change in tax proposal in effort to lure Republicans on infrastructure plan},
GLOBAL_author_name = {Seung Min Kim, Tony Romm},
GLOBAL_author_type = {Person},
GLOBAL_context = {http://schema.org},
GLOBAL_dateModified = {2021-06-03T16:42:44.751Z},
GLOBAL_datePublished = {2021-06-03T14:45:48.519Z},
GLOBAL_description = {President Biden signaled at a private meeting on Wednesday that he would further narrow his infrastructure package to win Republican support, outlining a plan for about $1 trillion in new spending financed through tax reforms that would leave the top corporate rate intact.},
GLOBAL_headline = {Biden offers tax concession in infrastructure talks with key Republican},
GLOBAL_image_type = {ImageObject},
GLOBAL_image_url = {https://www.washingtonpost.com/wp-apps/imrs.php?src=https://arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/XHNTMWGBNYI6XCNEW6XCFKQZHY.jpg&w=1200},
GLOBAL_isAccessibleForFree = {false},
GLOBAL_isPartOf_brand_name = {The Washington Post},
GLOBAL_isPartOf_brand_type = {brand},
GLOBAL_isPartOf_description = {Breaking news and analysis on politics, business, world, national news, entertainment and more. In-depth DC, Virginia, Maryland news coverage including traffic, weather, crime, education, restaurant reviews and more.},
GLOBAL_isPartOf_image = {https://www.washingtonpost.com/resizer/2CjPNwqvXHPS_2RpuRTKY-p3eVo=/1484x0/www.washingtonpost.com/pb/resources/img/twp-social-share.png},
GLOBAL_isPartOf_name = {The Washington Post},
GLOBAL_isPartOf_offers_type = {offer},
GLOBAL_isPartOf_offers_url = {https://subscribe.washingtonpost.com/acquisition?promo=o26},
GLOBAL_isPartOf_productID = {washingtonpost.com:basic},
GLOBAL_isPartOf_sku = {https://subscribe.washingtonpost.com},
GLOBAL_isPartOf_type_type = {CreativeWorkProduct},
GLOBAL_mainEntityOfPage_id = {https://www.washingtonpost.com/us-policy/2021/06/03/biden-infrastructure-tax-republicans/},
GLOBAL_mainEntityOfPage_type = {WebPage},
GLOBAL_publisher_id = {washingtonpost.com},
GLOBAL_publisher_logo_height = {60},
GLOBAL_publisher_logo_type = {ImageObject},
GLOBAL_publisher_logo_url = {https://www.washingtonpost.com/wp-stat/img/wplogo_344x60_blk.png},
GLOBAL_publisher_logo_width = {344},
GLOBAL_publisher_name = {The Washington Post},
GLOBAL_publisher_type = {NewsMediaOrganization},
GLOBAL_type = {NewsArticle}
}


I was originally doing this to get person profiles from linkedin and BillOfMaterials entries
from Amazon but both have had problems with naive headless chrome ( and I hate nodejs
for pupeteer lol ). But, if you write a lot of popular articles and want to cite news sources, that
should hit a big academic and political audience ( as if rambling political compositions are 
scarce  on the internet lol ). 

If you are talking to Zotero or want to look at alternatives or just want to scrap pages,
I think and open source collection of hacks like my TooBib has some place :)






note new address
 Mike Marchywka 306 Charles Cox Drive Canton, GA 30115
470-758-0799
404-788-1216



________________________________________
From: Jonathan Fine <jfine2358 at gmail.com>
Sent: Wednesday, May 5, 2021 5:33 AM
To: Mike Marchywka
Cc: TeXhax
Subject: Generating bibtex entry from a URL: zotero, zbib and TooBib.

Was: Re: crediting contributions to a work- people or things like photographs. What tools do people use?

Hi Mike

You wrote:

I have also been working on a script, now a c++ program that invokes mostly bash utilities called TooBib , to generate a bibtex entry from just about any url ( right now this just works with journal articles and similar).

It seems that https://www.zotero.org/, via https://zbib.org/, already provides this or similar functionality (and much more besides).

For example I pasted https://arxiv.org/abs/2104.12015 into https://zbib.org/. In Export I then chose Download BibTeX from the dropdown menu. This gave me

@article{jones_kleins_2021,
title = {Klein's ten planar dessins of degree 11, and beyond},
url = {http://arxiv.org/abs/2104.12015},
abstract = {We reinterpret ideas in Klein's paper on transformations [snip] arise as permutation groups and monodromy groups of degree \$p\$ (an open problem in group theory).},
urldate = {2021-05-05},
journal = {arXiv:2104.12015 [math]},
author = {Jones, Gareth A. and Zvonkin, Alexander K.},
month = apr,
year = {2021},
note = {arXiv: 2104.12015},
keywords = {Mathematics - Group Theory, Mathematics - Algebraic Geometry, Mathematics - Number Theory, 05C10, 11G32, 11N13, 11N32, 14H57, 20B20, 20B25},
}

It also gave me, via Link to this version, a permanent link to this bibliography item: https://zbib.org/a310f7062faa4b4bba050ede5de85bf8

By the way, I found you tooBib technical report on https://independent.academia.edu/mikemarchywka (registration required to download).


with best regards

Jonathan





More information about the texhax mailing list.