[tex-hyphen] Google Books corpus

Stephan Hennig mailing_list at arcor.de
Wed Jul 13 15:08:22 CEST 2011


schrieb Aleks Kleyn:

> I agree with you. However we do not have code behind. We do not know
> how much rules uses Google team in code. Also, German has one rules,
> French has others. In the book can be few languages at the same time.
> But we as human being when watch the text may fix it.

Sure.  I did not intend to blame Google for the corpus quality.  I'm
sure they put a lot of efforts into achieving high OCR quality even for
book that had bad print quality.  That said, I was astonished seeing
those typical errors, which are relatively low-hanging fruit.


> Due his business he relies to PC application, however due of lock of
> his knowledge he does not see when PC gives wrong answer.

But that's the future, I fear.  Even more than today. :-(

Best regards,
Stephan Hennig


More information about the tex-hyphen mailing list