[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Layer 2 and "idn identities" (was: Re: [idn] what are the IDN identifiers?)



Hi John,

(For the record, I think you have got me wrong. The motivation behind my
previous post was to merely point out the inappropriate comparison of TC-SC
equivalence to Japanese character equivalence, and was not intended to justify
(or otherwise) the inclusion of language-dependent canonicalization processes
such as TC-SC into the DNS layer)

John C Klensin wrote:
> The second question is whether that set of mappings/ conversions/
> translations ought to be incorporated into IDN or the DNS.   And
> it is _there_ that we differ.  Once again, the DNS incorporates
> strings of individual characters, not names.  The further we get
> away from doing bit-string-level matching, the more trouble we
> get ourselves into, and TC-SC mappings are pretty far from
> bit-string matching.
>

No, I don't think we differ much on this (--> my first paragraph in this
mail).

I agree that language-dependent canonicalization may be too incredibly complex
to be included in the DNS layer. Doing it at Layers 2 and above seems like a
more plausible solution since more locale-specific parameters (or facets, as
you call it) can be gathered at that layer, minimizing the risk of incorrectly
interpreting equivalences if only bit-string matching is used.

But the million-dollar question is this:

Given that:

Average Internet users probably do not care where the i18n is done anyway, as
long as they can access resources with names in their own native language,

and,

If i18n of names can be achieved at layers other than the DNS, then:

Why internationalize the DNS in the first place, and why continue with the
work on the IDN WG, if the DNS only serves to provide an identifier-based (and
not names-based) lookup service for Internet hosts and services?

IMHO, if we have decided to go ahead with IDN and we have come this far,
perhaps we should aim to provide a solution as comprehensive as possible, lest
users be bewildered when they cannot resolve their hostnames due to
non-equivalence of characters.

But if the comprehensive solution cannot be done at the DNS layer, then we
should not have to live with a half-baked solution, and resources spent on
this WG should be diverted to IRNSS instead.

> The logic "I should be able to communicate with my Taiwanese
> friends, even though we don't use the same characters to write
> the same words" doesn't work, at least in this DNS context, any
> more than "I should be able to communicate with my
> Arabic-speaking friends, even though I can't read their language"
> does.   If I said that about Arabic, I would be justly criticized
> for expecting the DNS to compensate for my ignorance.
>

Now, I might be justifiably considered ignorant if I were to expect a computer
to be able to help me understand written Arabic when I only understand the
spoken form.

However, the inability to perform mental TC-SC equivalence matching is common
amongst many Chinese speakers/writers, largely due to educational policy
differences. Many Taiwanese do not read Simplified characters. Most mainland
Chinese do not use Traditional characters anymore (although they can read
them). Construing this inability as ignorance is thus almost a form of
culturo-linguistic arrogance (albeit with a geo-political dimension to it).

Perhaps I might not expect the DNS to do it, but that is because I know what
the DNS is and what it was originally constructed to do, having a better
appreciation of its (limited) responsibilities after reading your dns-role
draft.

The average Internet user probably does not, though. And these users,
especially those with a monolithic view of how computer systems work, expect a
certain level of "assistance" (for lack of a better term) to be rendered to
them by the system (which encompasses all layers, not just the DNS layer).

I think there is an explicit requirement from the users for this "assistance"
to be rendered, regardless of what layer it is done at.

> > Compare this with Japanese; I do not think you can find two
> > Japanese-speaking individuals, one having knowledge of "egg" in
> > Kanji ONLY and one having knowledge of "egg" in Hiragana ONLY.
> >
> > Chances are most Japanese-speakers know both equivalent forms.
> > Chinese-speakers may not.
>
> At least in some styles of teaching Japanese outside of Japan,
> Hiragana is taught first. So finding someone who cannot recognize
> a given Kanji character, even when the Hirigana for the work is
> known or can be guessed merely requires finding someone young
> enough or new enough to the language.
>

Yes, I agree that in this case there might be a possibility that a Japanese
speaker may not be able to recognize a Kanji character but can do so for its
Hiragana equivalent.

(After all it is not uncommon to find Furigana hints alongside Kanji
characters in Japanese textbooks to aid in pronunciation for not-so-adept
Japanese speakers)

However, you are making a comparison here between Japanese speakers based on
their level of language mastery (or fluency); a comparison between young
Japanese speakers and mature ones is akin to comparing unripened apples to
fully-ripe apples.

The difference between TC and SC would be better appreciated as one between
apples and pears though. i.e. The respective trees probably belong to the same
genus/species, but the fruits are distinct enough for us to name them
differently (I would surreptitiously doubt the accuracy of this statement for
I am no botanist, but assume true for the purposes of this analogy).

Let me rephrase my earlier statement for greater clarity: A _fluent_ Japanese
speaker/writer would probably know both forms of "egg" in Japanese, but a
_fluent_ Chinese speaker/writer may not know both TC and SC forms of "egg" in
Chinese.

Thus, it is probably a less compelling argument to include Japanese character
equivalences into naming/identification engines than it is for Chinese.

But one should not construe all forms of CJK equivalences as having the same
level of necessity of being included in a canonicalization scheme, because
they do not.

regards,
maynard