[tex-hyphen] Google Books corpus
Stephan Hennig
mailing_list at arcor.de
Wed Jul 13 15:08:22 CEST 2011
schrieb Aleks Kleyn:
> I agree with you. However we do not have code behind. We do not know
> how much rules uses Google team in code. Also, German has one rules,
> French has others. In the book can be few languages at the same time.
> But we as human being when watch the text may fix it.
Sure. I did not intend to blame Google for the corpus quality. I'm
sure they put a lot of efforts into achieving high OCR quality even for
book that had bad print quality. That said, I was astonished seeing
those typical errors, which are relatively low-hanging fruit.
> Due his business he relies to PC application, however due of lock of
> his knowledge he does not see when PC gives wrong answer.
But that's the future, I fear. Even more than today. :-(
Best regards,
Stephan Hennig
More information about the tex-hyphen
mailing list