[pdftex] typesetting Hàn Thế Thành's name in pdfTeX

Thu Mar 26 22:59:59 CET 2009

On 27/03/2009, at 4:37 AM, Paulo Ney de Souza wrote:

> Is it possible to typeset Hàn Thế Thành name using full UTF-8  
> input in pdfTeX, that is, without gimmicks in the accent on the  
> letter "e" ?

Certainly.
Try this:

>> \documentclass[11pt]{article}
>> \usepackage[UTF8]{inputenc}
>> \usepackage[vietnam,USenglish]{babel}
>> \begin{document}
>>
>> \begin{otherlanguage}{vietnam}
>> \subsubsection{Vietnamese : }
>>  Hàn Thế Thành
>> \end{otherlanguage}
>> \end{document}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: texshop_image.jpeg
Type: image/jpeg
Size: 19070 bytes
Desc: not available
Url : http://tug.org/pipermail/pdftex/attachments/20090327/c915af18/attachment-0001.jpeg 
-------------- next part --------------

>
> I can see that his name is not properly accented in some of the  
> original documentation written by himself:
>
>     http://sarovar.org/docman/view.php/106/66/pdftex-s.pdf
>
> and that some other documentation:
>
>      http://sarovar.org/docman/view.php/106/64/pdftex-a.pdf.pdf
>    barely accentuates it properly:
>
>     \def\THANH{H\`an Th\^e\llap{\raise 0.5ex\hbox{\'{}}} Th\`anh}
>
> The rules of accentuation in Vietnamese say the marks are supposed  
> to be vertically centered, on top of each other - which is  
> definitely not the case in the macro above...
>
>     http://pjm.math.berkeley.edu/users/paulo/Snap1.gif

Using UTF-8 the position of the accent may depend upon what the
font supports.  At least this would be the situation using XeTeX
which just passes the bytes to your OS's font-rendering engine.

With pdfTeX, and Babel (as above), you would need to do detailed
tracing to see just what happens, according to the properties
of the font-encoding being used. With my simple example above,
here's a snippet of tracing:

\UTFviii at three@octets #1#2#3->\expandafter \UTFviii at defined \csname  
u8:#1\strin
g #2\string #3\endcsname
#1<-?
#2<-?
#3<-?
{\expandafter}
{\csname}
{\string}
{\string}

   (Thus we start with 3 bytes...)

...

\@text at composite #1#2#3\@text at composite ->\expandafter  
\@text at composite@x \csna
me \string #1-\string #2\endcsname
#1<-\T5\'
#2<-\ecircumflex
#3<-\@empty
{\expandafter}
{\csname}
{\string}
{\string}

\@text at composite@x #1->\ifx #1\relax \expandafter \@secondoftwo \else  
\expandaf
ter \@firstoftwo \fi #1
#1<-\\T5\'-\ecircumflex
{\ifx}
{false}
{\expandafter}
{\fi}

\@firstoftwo #1#2->#1
#1<-\\T5\'-\ecircumflex
#2<-\add at accent {1}{\ecircumflex }

\\T5\'-\ecircumflex ->?
{blank space  }
{end-group character }}

   ... and finish up with a single character.

>
> Paulo Ney de Souza
> Mathematical Sciences Publishers

Hope this helps,

	Ross

------------------------------------------------------------------------
Ross Moore                                       ross at maths.mq.edu.au
Mathematics Department                           office: E7A-419
Macquarie University                             tel: +61 (0)2 9850 8955
Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
------------------------------------------------------------------------