bibtex etc from ieee, interesting thing called "xplglobal.document.metadata"

Mike Marchywka marchywka at hotmail.com
Mon Aug 2 14:22:14 CEST 2021



I was continuing to use and fix my bibtex extraction tool called
"TooBib" and recently encountered an IEEE link. I had not imported
this handler from my old bash script so I copied it over and
it appears to work the same as Zotero but they likely require a 
specialized handler for IEEE pages too.  ( Just to note in passing,
I found one publisher that was temporarily blocking headless chrome
but somehow Zotero still worked although AFAICT they don't
handle citeseer pdf files ... )
They have an article number you need to extract and then a way
to query for a bibtex entry. 
 But Anyway looking at this link  , 

2186  wget -O xxx.html -S -v "https://ieeexplore.ieee.org/abstract/document/9406427"

they have additional  really good info in json format similar to 
link-data json but indicated with something 
called "xplGlobal.document.metadata"
I had no interest in learning web technologies beyond what I need
but I was curious about this. "xplglobal json " turned up 
only a few google hits, the first was this,

https://www.programmersought.com/article/73898067506/

about crawling IEEE web pages. I guess they have a popular citation
database.  

Kind of curious if anyone knows anything about this.
I'm generalizing my json code to just make a bibtex entry
with all the fields it finds but translate some field names.
IIRC the bibtex docs suggested it was ok to add extra fields 
but this can be a big extreme :) I did not notice that
Zotero web form does this by default- but in case
I need a mass correction, being able go back to the original
source web page or  look at all the fields helps. 
I'm not sure if Zotero just includes whatever URL 
is entered by the user or whatever is in the bibtex
source it returns. I've also started to add meta fields in
the bibtex entry describing what TooBib did to the original...

 
The results are still not "pretty" unless the original
bibtex from the source is clean but I'm working on 
generalized clean up :)

Eventually I will probably have to learn pupeeter or
the headless chrome javascript evaluzation facility... 

Thanks.

-- 

mike marchywka
306 charles cox
canton GA 30115
USA, Earth 
marchywka at hotmail.com
404-788-1216
ORCID: 0000-0001-9237-455X


More information about the texhax mailing list.