[metapost] Glyph names in METATYPE1

Mon Mar 4 17:05:53 CET 2013

On Sat, 2 Mar 2013, Boguslaw Jackowski wrote:
> As we wrote some time ago, we'd be more than happy if you could
> derail a few more discussions personally at a BachoTeX meeting...

I haven't forgotten!  Unfortunately, my life is kind of a mess right now -
my job search isn't going well - and I don't know what my availability for
travel will be in the next few months.  But I'll be keeping an eye out for
the EuroBachoTeX call for papers, and without making any promises at this
point, if things go well it might happen that I'd be able to submit
something and come out to the meeting.

> > I think it might work to insert an underscore before every character,
> > turning "uni4E00" into "_u_n_i_4_E_0_0".
>
> We came to nearly the same conclusion, although slightly different
> in details.
>
> Metatype1 provides a macro `assign_name' that makes life a little bit
> easier, although one may wonder if it is the best solution.

When I wrote my last message I hadn't looked much at this part of the
existing code, and I hadn't attempted to implement the "_u_n_i_4_E_0_0"
idea.  What I've been doing for a long time in my project has been to just
allow MT1 and Metapost to mangle the names, and then rename the glyphs in
a postprocessing step, from an externally-created list of "this code point
should have that name" pairs.  That approach works for me; my concern with
it was simply that changes in Metapost's parsing could cause it to fail in
the future.  Solving it in external postprocessing worked for me because I
was already doing a lot of other postprocessing, and I needed that code
point/name list for other reasons anyway.  My impression is that in your
project you hope to be able to do as much within Metapost as possible and
to have the output of MT1 be really usable Postscript fonts without
external postprocessing; doing all the processing in Metapost (or the AWK
script) is not so important for me.

Subsequent to writing my last message, I took another look at it.  I
discovered the assign_name mechanism, but I haven't attempted to use it.
I did change uni_name() (and several macros that make assumptions about
how uni_name works) to insert more underscores, but I also changed
beginglyph() to save the string it was given into a separate variable, and
make sure that that, rather than something derived from the suffix, is
written into the Postscript file.  The changes I've made so far are shown
here:

   http://sourceforge.jp/projects/tsukurimashou/svn/view/trunk/mp/fntbase.mp?root=tsukurimashou&r1=342&r2=386

This is not a clean patch, and I'll probably modify and improve it in the
future.  I suspect that the existing ps_name variable might serve the same
purpose as my new original_glyph_name and so maybe it's not necessary to
add a new variable.  Also, these changes break the ability of beginglyph()
to take a suffix - it now must take a string as input, only.  That aspect
might be easy to change, but I found it easier in my own code to just
always pass it a string.  I suspect that some of MT1's internal data
structures may end containing invalid data the way I'm using them, and I
haven't been observing errors just because I already removed a lot of code
for features I wasn't using (like hinting).  This works for me, but the
old code basically worked for me too.  I changed it only to improve
robustness to future Metapost changes.  I'll write another message in
response to Taco's, with my thoughts on those future Metapost changes.

> In our fonts, we prefer to use meaningful glyph names (suffixes)
> rather then names stuffed with underscores.

Right.  I'd certainly object to having the multi-underscore names written
out to the Postscript files.  However, in my project I don't need to refer
to glyphs by name in METAFONT-language code at all except when I call
beginglyph().  So I don't really care what suffixes MT1 uses internally as
long as they don't cause interpreter errors or name collisions.  This may
relate to the fact I've already thrown out all the METAFONT-style proof
generation, which might have used those suffixes to label things in
diagrams; I am generating my proofs with external Perl scripts that read
other data files I create for the purpose.

> As you rightly noted, periods should be actually treated like digits.
> We trod onto this trap quite recently, because of names like `mu1.alt'
> (str mu1.alt yields, obviously, "mu1alt"). We will try to fix this as
> soon as possible (but not sooner).

If period gets deleted in a suffix like "_m_u_1_._a_l_t", that's probably
not a big problem as long as periods are the only thing that get deleted -
there's no other glyph name that could translate to "_m_u_1__a_l_t".  We
have three separate issues:  1. can every string be converted into
something that parses as a valid suffix?; 2. can every distinct string be
converted into something that parses as a distinct suffix?; and 3. can
every string, once converted into a suffix, be converted back into the
original string?  I'd be content with just #1 and #2.  Property #3 isn't
so important because we can easily enough save the original string and
write that into the Postscript file.  But #1 and #2 are important because
we can't have the interpreter dying on a valid glyph name, nor whatever
strange effects may occur in the case of two different glyph names
colliding within data structures that are indexed by suffix.

-- 
Matthew Skala
mskala at ansuz.sooke.bc.ca                 People before principles.
http://ansuz.sooke.bc.ca/