[XeTeX] traditional to simplified Chinese character conversion utility or data base

Andy Lin kiryen at gmail.com
Wed Oct 19 04:05:46 CEST 2011


You can try digging in the source for Tong Wen Tang (a Firefox
extension). Or email its developers. They should have a map and
additional notes on the conversion.

On Tue, Oct 18, 2011 at 18:50, Daniel Greenhoe <dgreenhoe at gmail.com> wrote:
> Hi Zdenek, Thank you for your suggestions.
>
> On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner <zdenek.wagner at gmail.com> wrote:
>> you can just use tr, ... if you supply the map.
>
> I don't know what "tr" is, but this comes back to one of my original
> problems; and that is, I don't have a map. Does anyone know of a
> publicly available map? Such a map very likely exists. For example,
> Google Translate can translate from traditional to simplified. But
> even if they use a map for this service, that map may be proprietary.
>
>> If you wish to do it on the fly in XeTeX, you can write a TECkit map.
>> Having the TECkit map you can also run txtconv from the command line.
>
> I like these solutions. However, again, I would still need a map. SIL
> has a collection of maps available here:
>  http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&cat_id=ConversionMaps
> But I didn't see a Chinese traditional-->simplified character map.
>
> Dan
>
>
>
>
> On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner <zdenek.wagner at gmail.com> wrote:
>> 2011/10/17 Daniel Greenhoe <dgreenhoe at gmail.com>:
>>> I know that this is not really the right mailing list for this
>>> question, but I have so far not found the answer by any other means
>>> ...
>>>
>>> I would like to find or write some a utility that would take an
>>> unicode encoded file and map Chinese traditional characters to
>>> simplified, while leaving all other code points (such  as those in the
>>> Latin and IPA code spaces) untouched. For example, the traditional
>>> character for horse (馬) is at unicode U+99AC, the simplified one (马)
>>> is at unicode U+9A6C, and the Latin character for "A" is at U+0041. So
>>> I want a utility that would change the 99AC to 9A6C, but leave the
>>> 0041 unchanged.
>>>
>> If it is really that simple 1:1 mapping, you can just use tr, it does
>> exactly that if you supply the map. If you wish to do it on the fly in
>> XeTeX, you can write a TECkit map. Having the TECkit map you can also
>> run txtconv from the command line.
>>
>>> Does anyone know of such a utility? Does anyone know of any data base
>>> with a traditional to simplified character mapping such that I could
>>> maybe write the utility myself?
>>>
>>> Many thanks in advance,
>>> Dan
>>>
>>>
>>>
>>> --------------------------------------------------
>>> Subscriptions, Archive, and List information, etc.:
>>>  http://tug.org/mailman/listinfo/xetex
>>>
>>
>>
>>
>> --
>> Zdeněk Wagner
>> http://hroch486.icpf.cas.cz/wagner/
>> http://icebearsoft.euweb.cz
>>
>>
>>
>> --------------------------------------------------
>> Subscriptions, Archive, and List information, etc.:
>>  http://tug.org/mailman/listinfo/xetex
>>
>
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



More information about the XeTeX mailing list