[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] draft-ietf-idn-tsconv-01.txt
I agree your accessment on the proposal. We can not exclude
characters using exclusion map for Hanja and Kanji. I think
the authors are intented to say that for a rushing deployment
of IDN, these characters can be blocked out until a solution is
found later.
For the DNS as a
> whole, however, I do not believe that this approach can survive the
> information loss of the Unicode CJK unification.
CJK unificantion must happen. The information are not at lost,
but included in the characters themselves. How do we get that
information out is a question we have to deal with here.
Liana
On Thu, 15 Nov 2001 13:03:10 -0800 Ted Hardie <Ted.Hardie@nominum.com>
writes:
> I've just finished a review of draft-ietf-idn-tsconv-01.txt,
> and I would like to make a few comments on the conversion problem
> they
> have tackled and the solution they have presented.
>
> The most important starting point here is certainly to
> acknowledge both the work of those striving to find a reasonable
> approach to this problem and to acknowledge the problem they wish to
> address: there are groups of users for whom the perceived
> equivalence
> of specific Han ideographs is strong enough to cause confusion where
> those ideographs are treated differently. The authors of this draft
> (and many other peopl) believe that for Internet applications to
> function correctly for those users, that it must be possible to
> associate the ideographs in a way that allows simplified characters,
> complex characters, or a mixture of the two to be treated as
> equivalent when a user would see them as equivalent. The authors
> further recommend a method for retaining (as much as possible) the
> data required to display the characters in the form originally seen
> by
> the user, so that a check of the mapping among characters does not
> seem to involve a transformation. As goals for application design
> for
> that user community, these seem appropriate.
>
> It is not so clear, however, that these goals should or can
> be
> met using the DNS infrastructure as described. Probably one of the
> most important issues raised by the draft is in this note:
>
> [Editor's note: As Chinese character's in common use by CJK
> people, so such table may be modified after making consensus with
> language experts of CJK area.]
>
> As a non-native Chinese speaker with an even more limited
> knowledge of the use of Han characters in kanji and hanja, I am not
> qualified to serve as one of the language experts described in the
> note. With even my limited knowledge, however, it is clear that the
> overlap of characters creates an enormous problem for this approach.
> While it might be possible to create a mapping system that fits the
> Chinese user community to some reasonable degree, there are a large
> number of characters for which that same mapping would not fit
> either
> the Japanese or the Korean user community. The authors apparently
> feel that this could be managed with exclusion lists. I believe
> that
> a reasonable list of such exclusions would run into several thousand
> and that some of the most common characters would fall into that
> list.
> I think that the use of an exclusion list of that size is likely to
> diminish the effectiveness of this approach to the point of
> unusability. If the user community cannot know whether two
> characters
> map to equivalence without knowledge of an extensive exclusion list,
> they are considerably worse off than if they were dealing with just
> complex and simplified characters sets.
>
> As a trivial example, the character "guo" used in "zhongguo"
> (China) is also used in some form by kanji and hanja. As I said, I
> am
> not qualified to say which forms would be seen as equivalent by the
> Japanese or Korean language communities. Given the kanji use,
> though,
> it seems possible that it could fall into the excluded category. If
> it
> did, one of the most basic characters for the Chinese community
> would
> remain variable between complex and simplified. With exclusions of
> that type possible, attempting the mappings described within the DNS
> simply does not seem to me the best approach.
>
> In previous meetings of the IDN working group, I put forward
> the comment that the complex to simplified mapping was a one way
> transformation, and that without context there was simply no way to
> recover some of the mappings. The authors of this draft have done
> their best to provide mappings and restore what can be restored.
> Their efforts have, unfortunately, been thwarted by the character
> unification of Chinese, Japanese, and Korean inside Unicode
> standards.
> The Unicode folks clearly had engineering reasons to avoid
> replicating
> the characters several times. A consequence of that engineering
> trade
> off is that the user community which must be considered for this
> type
> of mapping is global.
>
> It may be possible to create a more limited context within
> specific applications or even registries, based on the user
> communities for those applications or registries. For the DNS as a
> whole, however, I do not believe that this approach can survive the
> information loss of the Unicode CJK unification.
>
> best regards,
> Ted Hardie
>
>