From hartmut.niemann at siemens.com Tue Jun 11 12:06:41 2024 From: hartmut.niemann at siemens.com (Niemann, Hartmut) Date: Tue, 11 Jun 2024 10:06:41 +0000 Subject: [XeTeX] Typesetting arabic and european mix encoded in utf8 Message-ID: Hello! In my current project I use XeLaTeX to typeset PDF files from texts in different languages held in a separate database. (This is done with a generator that is language-unaware, generating lines like \long\def\msgtext{??? ?? ??????? ??????????? GS} Into a .inc file and a manually written, language dependent, frame document that defines \msgtext{} I typeset a (mostly) Arabic document using XeLaTeX and \usepackage{arabxetex}[utf] Arabxetex supports encoding Arabic in ASCII, and this interferes with the fact, that our texts have latin characters, like English abbreviations, location IDs and such. The documented solution would be enclosing these latin characters which are to be typeset verbally into \text{LR}, which is rather hard if the text comes from a database. Does anybody how to switch off arabxetex?s ASCII-to-arabic conversion completely? Or is there a package that supports Arabic (with Arabic typographic conventions) but made for pure Unicode sources? With best regards Hartmut Niemann -------------- next part -------------- An HTML attachment was scrubbed... URL: From zdenek.wagner at gmail.com Tue Jun 11 12:22:19 2024 From: zdenek.wagner at gmail.com (Zdenek Wagner) Date: Tue, 11 Jun 2024 12:22:19 +0200 Subject: [XeTeX] Typesetting arabic and european mix encoded in utf8 In-Reply-To: References: Message-ID: Hello, I have not used Arabic but Urdu which uses a modified Arabic script. I have a book written in Czech with just small parts in Hindi and Urdu and I do it in XeLaTeX with the polyglossia package. A very small sample of the book is here: http://icebearsoft.euweb.cz/bharat.php The page contains a link to the presentation of typesetting the book. The slides are in Czech because it was a national conference but slide #10 shows that the line break in the Urdu text is correct although the main language of the paragraph is Czech. Zden?k Wagner https://www.zdenek-wagner.eu/ ?t 11. 6. 2024 v 12:07 odes?latel Niemann, Hartmut via XeTeX napsal: > Hello! > > > > In my current project I use XeLaTeX to typeset PDF files from texts in > different languages held in a separate database. > > (This is done with a generator that is language-unaware, generating lines > like > > \long\def\msgtext{??? ?? ??????? ??????????? GS} > > Into a .inc file and a manually written, language dependent, frame > document that defines \msgtext{} > > > > I typeset a (mostly) Arabic document using XeLaTeX and > \usepackage{arabxetex}[utf] > > > > Arabxetex supports encoding Arabic in ASCII, and this interferes with the > fact, that our texts have latin characters, like English abbreviations, > location IDs and such. > > The documented solution would be enclosing these latin characters which > are to be typeset verbally into \text{LR}, which is rather hard if the text > comes from a database. > > > > Does anybody how to switch off arabxetex?s ASCII-to-arabic conversion > completely? > > > > Or is there a package that supports Arabic (with Arabic typographic > conventions) but made for pure Unicode sources? > > > > With best regards > > > > Hartmut Niemann > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hartmut.niemann at siemens.com Tue Jun 11 16:36:31 2024 From: hartmut.niemann at siemens.com (Niemann, Hartmut) Date: Tue, 11 Jun 2024 14:36:31 +0000 Subject: [XeTeX] Typesetting arabic and european mix encoded in utf8 In-Reply-To: References: Message-ID: Hello Zden?k, thank you for your hints. What a wonderful book! I?ll take the TeX code of your tools package as a start and experiment with it. Maybe I will need to adopt to mark the latin characters and not rely on automatic switching between latin LtoR and Arabic RtoL. Hartmut Von: Zdenek Wagner Gesendet: Dienstag, 11. Juni 2024 12:22 An: XeTeX (Unicode-based TeX) discussion. Cc: Niemann, Hartmut (SMO RS LMC EN LM CCI FT) Betreff: Re: [XeTeX] Typesetting arabic and european mix encoded in utf8 Hello, I have not used Arabic but Urdu which uses a modified Arabic script. I have a book written in Czech with just small parts in Hindi and Urdu and I do it in XeLaTeX with the polyglossia package. A very small sample of the book is here: http://icebearsoft.euweb.cz/bharat.php The page contains a link to the presentation of typesetting the book. The slides are in Czech because it was a national conference but slide #10 shows that the line break in the Urdu text is correct although the main language of the paragraph is Czech. Zden?k Wagner https://www.zdenek-wagner.eu/ ?t 11. 6. 2024 v 12:07 odes?latel Niemann, Hartmut via XeTeX > napsal: Hello! In my current project I use XeLaTeX to typeset PDF files from texts in different languages held in a separate database. (This is done with a generator that is language-unaware, generating lines like \long\def\msgtext{??? ?? ??????? ??????????? GS} Into a .inc file and a manually written, language dependent, frame document that defines \msgtext{} I typeset a (mostly) Arabic document using XeLaTeX and \usepackage{arabxetex}[utf] Arabxetex supports encoding Arabic in ASCII, and this interferes with the fact, that our texts have latin characters, like English abbreviations, location IDs and such. The documented solution would be enclosing these latin characters which are to be typeset verbally into \text{LR}, which is rather hard if the text comes from a database. Does anybody how to switch off arabxetex?s ASCII-to-arabic conversion completely? Or is there a package that supports Arabic (with Arabic typographic conventions) but made for pure Unicode sources? With best regards Hartmut Niemann -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbakker at jbakker.de Tue Jun 11 16:46:43 2024 From: jbakker at jbakker.de (Jens Bakker) Date: Tue, 11 Jun 2024 16:46:43 +0200 Subject: [XeTeX] Typesetting arabic and european mix encoded in utf8 In-Reply-To: References: Message-ID: Hello Hartmut Niemann, may be that the XeLaTex-package polyglossia could serve your purposes better much better. You could use many languages in one document, also Arabic and other RTL text. Best wishes and best regards, Jens Bakker > Am 11.06.2024 um 12:06 schrieb Niemann, Hartmut via XeTeX : > > Hello! > > In my current project I use XeLaTeX to typeset PDF files from texts in different languages held in a separate database. > (This is done with a generator that is language-unaware, generating lines like > \long\def\msgtext{??? ?? ??????? ??????????? GS} > Into a .inc file and a manually written, language dependent, frame document that defines \msgtext{} > > I typeset a (mostly) Arabic document using XeLaTeX and \usepackage{arabxetex}[utf] > > Arabxetex supports encoding Arabic in ASCII, and this interferes with the fact, that our texts have latin characters, like English abbreviations, location IDs and such. > The documented solution would be enclosing these latin characters which are to be typeset verbally into \text{LR}, which is rather hard if the text comes from a database. > > Does anybody how to switch off arabxetex?s ASCII-to-arabic conversion completely? > > Or is there a package that supports Arabic (with Arabic typographic conventions) but made for pure Unicode sources? > > With best regards > > Hartmut Niemann -------------- next part -------------- An HTML attachment was scrubbed... URL: From hartmut.niemann at siemens.com Mon Jun 24 10:19:20 2024 From: hartmut.niemann at siemens.com (Niemann, Hartmut) Date: Mon, 24 Jun 2024 08:19:20 +0000 Subject: [XeTeX] Typesetting arabic and european mix encoded in utf8 In-Reply-To: References: Message-ID: Hello Jens, yes, it does. I have switched from arabxetex to polyglossia, needed to fix a few font specifications, and now the results look correct (says out Egyptian intern). The automatic RTL-LTR switching depending on the Unicode script information works perfectly, with the special case that we will use non-breakable spaces u00a0 in some places to keep the sequence of non-arabic text fragments as needed. Thank you for your help! Hartmut Von: XeTeX Im Auftrag von Jens Bakker Gesendet: Dienstag, 11. Juni 2024 16:47 An: XeTeX (Unicode-based TeX) discussion. Betreff: Re: [XeTeX] Typesetting arabic and european mix encoded in utf8 Hello Hartmut Niemann, may be that the XeLaTex-package polyglossia could serve your purposes better much better. You could use many languages in one document, also Arabic and other RTL text. Best wishes and best regards, Jens Bakker Am 11.06.2024 um 12:06 schrieb Niemann, Hartmut via XeTeX >: Hello! In my current project I use XeLaTeX to typeset PDF files from texts in different languages held in a separate database. (This is done with a generator that is language-unaware, generating lines like \long\def\msgtext{??? ?? ??????? ??????????? GS} Into a .inc file and a manually written, language dependent, frame document that defines \msgtext{} I typeset a (mostly) Arabic document using XeLaTeX and \usepackage{arabxetex}[utf] Arabxetex supports encoding Arabic in ASCII, and this interferes with the fact, that our texts have latin characters, like English abbreviations, location IDs and such. The documented solution would be enclosing these latin characters which are to be typeset verbally into \text{LR}, which is rather hard if the text comes from a database. Does anybody how to switch off arabxetex?s ASCII-to-arabic conversion completely? Or is there a package that supports Arabic (with Arabic typographic conventions) but made for pure Unicode sources? With best regards Hartmut Niemann -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbakker at jbakker.de Mon Jun 24 21:44:24 2024 From: jbakker at jbakker.de (Jens Bakker) Date: Mon, 24 Jun 2024 21:44:24 +0200 Subject: [XeTeX] Typesetting arabic and european mix encoded in utf8 In-Reply-To: References: Message-ID: <7F5126C3-4114-4F3A-ACC0-E4E183C08627@jbakker.de> Hello Hartmut, thank you very much for your kind reply, I am happy that this very modest hint was useful for you. With best wishes and best regards, Jens > Am 24.06.2024 um 10:19 schrieb Niemann, Hartmut via XeTeX : > > Hello Jens, > > yes, it does. I have switched from arabxetex to polyglossia, needed to fix a few font specifications, and now the results look correct (says out Egyptian intern). > The automatic RTL-LTR switching depending on the Unicode script information works perfectly, with the special case that we will > use non-breakable spaces u00a0 in some places to keep the sequence of non-arabic text fragments as needed. > > Thank you for your help! > > Hartmut > > > Von: XeTeX > Im Auftrag von Jens Bakker > Gesendet: Dienstag, 11. Juni 2024 16:47 > An: XeTeX (Unicode-based TeX) discussion. > > Betreff: Re: [XeTeX] Typesetting arabic and european mix encoded in utf8 > > Hello Hartmut Niemann, > > may be that the XeLaTex-package polyglossia could serve your purposes better much better. You could use many languages in one document, also Arabic and other RTL text. > > Best wishes and best regards, > Jens Bakker > > > > > Am 11.06.2024 um 12:06 schrieb Niemann, Hartmut via XeTeX >: > > Hello! > > In my current project I use XeLaTeX to typeset PDF files from texts in different languages held in a separate database. > (This is done with a generator that is language-unaware, generating lines like > \long\def\msgtext{??? ?? ??????? ??????????? GS} > Into a .inc file and a manually written, language dependent, frame document that defines \msgtext{} > > I typeset a (mostly) Arabic document using XeLaTeX and \usepackage{arabxetex}[utf] > > Arabxetex supports encoding Arabic in ASCII, and this interferes with the fact, that our texts have latin characters, like English abbreviations, location IDs and such. > The documented solution would be enclosing these latin characters which are to be typeset verbally into \text{LR}, which is rather hard if the text comes from a database. > > Does anybody how to switch off arabxetex?s ASCII-to-arabic conversion completely? > > Or is there a package that supports Arabic (with Arabic typographic conventions) but made for pure Unicode sources? > > With best regards > > Hartmut Niemann -------------- next part -------------- An HTML attachment was scrubbed... URL: