[XeTeX] Urdu script problems

maxwell maxwell at umiacs.umd.edu
Tue May 26 19:40:44 CEST 2009


We're using XeTeX to typeset an Urdu grammar.  The grammar itself is
written in DocBook XML, then converted to XeTeX using the dblatex program
(dblatex.sourceforge.net).  We're using the Nafees Nastaleeq font, version
1.02 (www.crulp.org/software/localization/Fonts/nafeesNastaleeq.html) for
the Urdu script.

We've encountered some problems with a few characters.  We checked with Dr.
Sarmad Hussain, the head of CRULP (Center for Research in Urdu Language
Processing), where the Nafees Nastaleeq font was developed.  When he runs
the text of some of our problematic words through Microsoft Word to produce
a PDF, the characters come out fine.  So my guess is that XeTeX is doing
something wrong, perhaps choosing the wrong glyph for some of the
characters.  Or perhaps I don't understand the parameters on \newfontface
(like how to tell it the language is Urdu, if that matters).  

Anyway, I'm attaching a semi-minimal document to illustrate the problems
(semi-minimal, in that you'll need to tell fontspec where you keep the
Nafees Nastaleeq font--my 'ExternalLocation' probably won't work for you). 
This document illustrates several of the problems we have had (there are a
few others, but if I can fix these, the others might clear themselves up?).
 I'm also attaching PDFs showing the right and wrong output;
SampleProblems.pdf is our output from running xelatex on the attached
SampleProblems.xelatex (and shows the wrong alef etc.), while
CorrectOutput.pdf is a PDF we received from Dr. Hussain.  (It's not quite
the same document, but does show the correct form for the Urdu words that
come out wrong in our output.  I can't reproduce the correct output in
Word, for reasons I don't quite understand--possibly a version difference
in Word.)

I should also say that we're still running xe(la)tex version
3.141592-0.996.  We downloaded the newer version (TEX Live 2008), but
haven't gotten around some install glitches yet (particularly the
hyphenation problem--I can't recall where those 8-bit files are, so I can
convert them to UTF-8).  If upgrading will solve our problem, that would be
good to know.

   Mike Maxwell
   CASL/ U MD
-------------- next part --------------
\documentclass[12pt,letterpaper]{report}
\usepackage{fontspec}
\setmainfont{Charis SIL}
\usepackage{bidi}

% Urdu script:
% The font is CRULP's Urdu Nastaliq, version 1.02, downloaded from
% http://www.crulp.org/software/localization/Fonts/nafeesNastaleeq.html
\newfontface\urdufont[Script=Arabic,ExternalLocation=/groups/tools/fonts/]{NafeesNastaleeq_v1_02.ttf}
% Trying to specify 'Language=Urdu' doesn't work: Nafees doesn't list that
% (although in fact it is specially designed for Urdu)
\newcommand{\urdu}[1]{{\RL{\urdufont #1}}}

\begin{document}
\urdu{سجانا} /sajānā/ (The aleph has an odd hook)
  
\urdu{مساجد میں} /masājid mēṁ/ (aleph hook again)

\urdu{حقوق} /huqūq/ (the vav is not connecting correctly)

\urdu{سلیم} (the meem in not connecting well)
\end{document}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CorrectOutput.pdf
Type: application/pdf
Size: 55602 bytes
Desc: not available
URL: <http://tug.org/pipermail/xetex/attachments/20090526/ddff47b5/attachment-0002.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SampleProblems.pdf
Type: application/pdf
Size: 25587 bytes
Desc: not available
URL: <http://tug.org/pipermail/xetex/attachments/20090526/ddff47b5/attachment-0003.pdf>


More information about the XeTeX mailing list