[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Traditional-simplified, yet again



At 3:50 PM +0800 10/29/01, Erin Chen wrote:
>If we consider TC/SC are not the names similar but just like
>Uppercase/Lowercase English letter.

It has been shown that this is *not* just like case in English. That 
would only be true if Han characters could be converted 1:1, if there 
was a stable matching system that did not need human context, and if 
everyone in all countries that use Han agreed to the matching system. 
None of those three things are true.

>If you think of TC/SC requirement is non-technical human means,
>then hou about Uppercase/Lowercase English letter deal in DNS?

English case folding meets that set of requirements, and more. Note 
that, even though it is pretty much universally accepted, it still 
has its problems. Capitalization is very important to some people's 
names, and the case folding smashes their names in an unpleasant way.

At 5:56 PM +0800 10/29/01, Erin Chen wrote:
>As mentioned in draft TSCONV, there are mainly 3 categories of TC/SC
>conversion. They are 1-1 , 1-n , n-1
>
>In the case of 1-1 , do not to care about the meaning. the matching of TC/SC
>can be achieved by a matching function or a finite matching table. That is to
>say matching of TC/SC can be done as Uppercase/Lowercase English letter
>(EX: A-a).

Maybe this is a false analogy. From my understanding, some of the 
*characters* that would be mapped in a 1-to-1 mapping are part of 
*words* and *names* that would instead be mapped n-to-1, yes? If so, 
doing the 1-to-1 mapping would make the DNS label unreadable.

Further, and more important, please remember that many of the 
characters that would be mapped in the 1-to-1 table are used in 
languages other than Chinese. It is, of course, completely 
unacceptable to do a traditonal-to-simplified mapping in a Japanese 
or Korean name, yet it is impossible for a system that is doing the 
mapping to tell the language in the label.



At 5:55 PM +0800 10/29/01, xiang deng wrote:
>On Monday, October 29, 2001 1:27 PM, Paul Hoffman / IMC wrote:
>
>>  Why not, indeed? No one here is saying that it should not be done. We
>>  are saying that it should not be mandated in a technical
>>  specification. There are many types of non-technical specifications
>>  that would be very useful here. We have been told that the JET group
>>  is coming out with one Real Soon Now, so that might address the
>>  problem in a suitable fashion.
>
>I'm not sure what's your meaning.  I always hear two voices:
>1. go to UTC.
>the requirements are:
>(1) preserve the characteristic of TC /SC.
>(2) compare without care of TC/SC.
>TC and SC are different character, there needn't mapping from
>TC(SC) to SC (TC), it will make a character disappear.
>
>2.two records solution, one is SC form and the other is TC fom.
>it's not applicable.

Well, I'm sure you hear more than just two voices on this! :-) I have 
never talked about a "two record solution" because we know that there 
will need to be more than two for many Chinese names. What I am 
saying is that it seems likely that there can be good advice given to 
registries and registrants about equivalence for Han names that might 
be used when registering (or even after registering), and that such 
advice might be appropriate for this WG to work on.

The people who have said "go to the UTC" have been responding to the 
assertion that  a technical mapping of traditional-simplified is 
possible today. Other people have countered that there is no 
well-agreed-on table for the mapping; still others have said that 
such a table is impossible because of the need for context; and still 
others have said that this is all moot because 
traditional-to-simplified mapping must not be done on Japanese or 
Korean names, but there is no way to prevent that without context.

Assertions that "we can do this" are OK, but if it hasn't been 
approved by an international body that is respected, it probably 
won't be approved here. So, "go to the UTC" might mean "until we hear 
from such a body that this is finished, stable, and sensible, we 
cannot wait for it". Fortunately, we don't need to: 
traditional-simplified and other similar mappings can help DNS users 
of the world without having to be part of the IDN protocol.

Where JET and/or CDNC can definitely help here is in clearly 
articulating the problem for Han characters and a proposed solution 
for people registering names, and for the agencies doing the 
registering. We have not seen such a document yet, so the WG cannot 
decide if that would be something we want to work on. Even if it is 
not done in the IETF, such a document (and similar documents for the 
many other languages with similar problems) will be very useful in 
the world.



At 3:26 PM +0800 10/29/01, L.M.Tseng wrote:
>          I think  the cost assigned by the registry is the service charge.

This has nothing to do with the IDN specification. A registry can 
charge as much or as little (including zero) as it wants. The 
overhead for assigning three additional names to one person who asks 
for one is so small that it would make sense if the cost went away. 
This is analogous to Internet Service Providers allowing one person 
to have five mailboxes for the same price as one mailbox. Some ISPs 
do this, and they attract customers because of it; others don't do it.

>          Some user register one domain name and setup a DNS to extend his
>domain service for organizations' host name. To handle the multiple TC/SC
>name registry is an overhead to treat them, the size of storage and the time
>for careful treatment is an inherent cost of  complexity. The complexity
>come from a neglection to reduce them in a proper level .  That complexity
>and cost will let the system manager try to not provide TC/SC multple
>records in his managed zone file, then the stability and trust to use CJK
>domain name will be very low. Finally , it come back to reflect it is not a
>workable approach.

It is unfortunate that some people keep talking about this as if it 
was only a Chinese issue. It is *not*. The very same issue appears 
for every language where there are words and names with nearly 
identical meaning, and the issue appears in every language where 
there are words and names with nearly identical spellings.

This discussion has an undercurrent of Chinese vs. non-Chinese, and 
that is really ugly. The problems faced in traditional-simplified 
conversion (that the conversions are 1:1, 1:n, and n:1; that there is 
not a single, stable standard; that it cannot be applied to Japanese 
or Korean characters) are similar to those faced in throughout the 
world. Making this a Chinese issue is deeply disrespectful to all the 
other people throughout the world who have exactly the same problem 
as the Chinese do. Nor is this an issue of the number of speakers of 
the language: some of these issues arise in Indian and Arabic names.

Once again: the DNS is pretty bad at dealing with these kinds of 
issues for names. This was proven conclusively first for US English, 
then for all languages that can use Latin characters. IDN will simply 
make these problems apparent for many scripts throughout the world. 
The people who say "the IETF cannot move forwards with IDN until it 
is better for my language than for other people's language" are doing 
a disservice to everyone, including the people who use their own 
language.

--Paul Hoffman, Director
--Internet Mail Consortium