problem with foreign letters in names apparently from crossref,

Tue Feb 8 02:01:48 CET 2022

On Mon, Feb 07, 2022 at 05:43:59PM -0600, Don Hosek wrote:
>      Thanks, I would not have clicked on the jats/xml link even if I had found it :)
> 
>    I guess I can add that capability to look for jats now but it will take some
>    time and then I need to deal with the non-ASCII consistently. It looks
>    like most new stuff is going to json and dealing with xml conversions
>    may be a throw back but important to have…
> 
>    In general, you’ll find that most if not all text on the internet these days will be Unicode, and in particular UTF-8. Most
>    modern toolchains handle it fine, what are you doing that you’re not getting the characters coming out correctly after
>    processing?

I have not bothered to look into this as almost all of the stuff I care
about is ASCII.
I just brought this up since it seemed wrong coming from crossref.

A lot of the code calls linux command line utilities or some simple parsers
for html and json. While I have worked with these, I have not bothered
to look at anything beyond 7 bit ASCII and not sure how or if they handle
larger things.

My c++ code uses a lot of typedefed strings that I guess could be easily
set to use wide characters and I have a char class parser that is perfectly
general upto to at least int size chars probably. However, I still use 8 bit char
for characters in places, routinely test based on ASCII etc.

I'm not sure what would be involved in making this work uniformly
with larger char sizes... Although I have been pretty impressed 
at how easy it can be to convert floating point code to using
rationals with the gmp library and changing a few c++ typedefs :)  

>    -dh

-- 

mike marchywka
306 charles cox
canton GA 30115
USA, Earth 
marchywka at hotmail.com
404-788-1216
ORCID: 0000-0001-9237-455X