problem with foreign letters in names apparently from crossref,
Mike Marchywka
marchywka at hotmail.com
Tue Feb 8 02:01:48 CET 2022
On Mon, Feb 07, 2022 at 05:43:59PM -0600, Don Hosek wrote:
> Thanks, I would not have clicked on the jats/xml link even if I had found it :)
>
> I guess I can add that capability to look for jats now but it will take some
> time and then I need to deal with the non-ASCII consistently. It looks
> like most new stuff is going to json and dealing with xml conversions
> may be a throw back but important to have…
>
> In general, you’ll find that most if not all text on the internet these days will be Unicode, and in particular UTF-8. Most
> modern toolchains handle it fine, what are you doing that you’re not getting the characters coming out correctly after
> processing?
I have not bothered to look into this as almost all of the stuff I care
about is ASCII.
I just brought this up since it seemed wrong coming from crossref.
A lot of the code calls linux command line utilities or some simple parsers
for html and json. While I have worked with these, I have not bothered
to look at anything beyond 7 bit ASCII and not sure how or if they handle
larger things.
My c++ code uses a lot of typedefed strings that I guess could be easily
set to use wide characters and I have a char class parser that is perfectly
general upto to at least int size chars probably. However, I still use 8 bit char
for characters in places, routinely test based on ASCII etc.
I'm not sure what would be involved in making this work uniformly
with larger char sizes... Although I have been pretty impressed
at how easy it can be to convert floating point code to using
rationals with the gmp library and changing a few c++ typedefs :)
> -dh
--
mike marchywka
306 charles cox
canton GA 30115
USA, Earth
marchywka at hotmail.com
404-788-1216
ORCID: 0000-0001-9237-455X
More information about the texhax
mailing list.