[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] I-D ACTION:draft-ietf-idn-vidn-00.txt



Please see below for my responses to your comments.

Sung

----- Original Message -----
From: Mark Davis <mark@macchiato.com>
To: DualName - ShimSungJae <shimsungjae@dualname.com>; <idn@ops.ietf.org>;
Paul Hoffman / IMC <phoffman@imc.org>
Sent: Sunday, November 19, 2000 9:14 PM
Subject: Re: [idn] I-D ACTION:draft-ietf-idn-vidn-00.txt


> However it is done, what you are talking about is mapping the letters of
> each and every language on Earth into [a-z,0-9,and hyphen]. Korean Hangul
> has the marked advantage of being relatively rather simple to map to and
> from Latin letters without ambiguity. There are a huge number of problems
> with this approach in general:
>

Sung: Yes, VIDN maps the letters of non-English languages into [a-z,0-9, and
hyphen] that already exist in the DNS, in the same way that people in
regions where English is not widely spoken, currently create their domain
names in English.

Sung: No, I do not think that Korean Hangul is relatively rather simple to
map to and from Latin letters without ambiguity. Please take a look at the
example cited in email from Paul Hoffman and its corresponding sets of
characters in English. When we transliterate a person's name in Korean
Hangul into English, the possibilities may include:

gimgyeongseog
gimgyeongseok
kimkyungsuk
kimkyoungsuk
kimkyungsok
gimkyungsuk
kimkyongsuk
kimkyungseok
gimkyoungsuk
gimkyungsok
kimkyoungsok
kimkyungsuck
kimkyeongsuk
kimkyoungseok
gimkyongsuk
kimkyongsok
gimkyungseok
gimkyoungsok
kimkyoungsuck
gimkyungsuck
kimkyongseok

Sung: The focus of VIDN is not on the side of the many possible sets of
English
characters but on the side of the single set of Korean characters. Users do
not care about IP addresses, as they can use domain names in English
unambiguously. In the same way, users speaking local languages will not care
about the underlying domain names in English, if they can use domain names
in local languages unambiguously.

Sung: A huge number of problems you mentioned seem to be due to some
misunderstanding of VIDN. Please see below.

> a. It is unacceptable for the worlds' cultures to not be able to represent
> the native characters for their languages. But this requires an
unambiguous
> round-trip mapping.
>

Sung: It would be nice for those who speak local languages to have their
native characters represented in domain names. But this does not necessarily
mean that domain names should be created and put in their local languages.
VIDN allows using domain names in their local languages virtually, not
actually creating and registering them.

Sung: VIDN does not need round-trip mapping, although it may be possible to
convert characters from English back to local languages. What is the use of
this reversed conversion? Is it for those who speak English, so that they
can use English to go to domain names registered in local languages? In
VIDN, there are no domain names created and registered in local languages.
Those who speak English do not need domain names in local languages. Please
do not forget that domain names in local languages are for those who do not
speak English, not for those who speak English. Again, VIDN does not create
and register domain names in local languages, and VIDN needs only domain
names in English actually exist as in the current DNS.

> b. There are no transliteration standards for many, many languages.
>

Sung: Without such standards, people speaking local languages have been
creating and registering domain names in English. The most common way to
create domain names in English is to transliterate the characters in local
languages into the characters in English that have the same or proximate
sounds. VIDN uses the knowledge of this transliteration based upon the sound
or phonemic systems of the respective local language and English. Please
take a look at how those domain names in English have been created in
regions where English is not widely spoken, without such standards.

> c. Where standards exist, there are often many conflicting choices.
>

Sung: The same character or set of characters in a local language may
corresponds to many characters or sets of characters in English, as shown in
the above example. This aspect of one-to-many mapping is described in
details in the Internet-Draft. Further, VIDN includes a coding scheme for
one-to-one mapping, which is also described in details in the
Internet-Draft.

> d. Where standards for transliteration exist, such as the ISO standards,
> they are often defective -- they do not provide for lossless
round-tripping
> back from the romanized text.
>

Sung: Again, VIDN does not need a round-trip mapping from English to local
languages.

> e. Where standards exist, they often map from original characters to
> accented roman characters. There are generally no conventions for
> representing those characters strictly with [a-z, 0-9, and hyphen].
>

Sung: Please take a look at how those accented characters have been actually
transliterated or approximated into [a-z, 0-9, and hyphen] in domain names
in English in regions where those accented characters are used, without such
conventions.

> f. If the standards are phonetic, it is often impossible to round-trip
back
> to the original characters. Consider Japanese, for example. There is no
> accepted way to convert *unambiguously* from roman letters back to a
mixture
> of Kanji, Hiragana, and Katakana. Companies spend hundreds of millions of
> dollars to try to get the best input methods, which are doing essentially
> this -- yet all of them require human intervention.
>

Sung: Again, VIDN does not need a round-trip mapping from English into local
languages. In fact, VIDN for Japanese-English conversion is under
development, and initial testing results show that the conversion from
Japanese into English is more straightforward and less ambiguous than that
from Korean into English.

> g. If you are talking about a phonemic representation, that is typically
> done with IPA. There are hundreds of characters and accents in IPA. There
is
> no mechanism for straightforward, unambiguous representation of those
> characters with the limited set [a-z, 0-9, and hyphen]. Even if there
were,
> it would probably not be particularly readable.
>

Sung: Please take a look at how those characters have been actually
transliterated or approximated into the limited set [a-z, 0-9, and hyphen]
in domain names in English in regions where those characters are used,
without any mechanism for straightforward, unambiguous representation of
those characters with the limited set [a-z, 0-9, and hyphen]. They may not
be that readable, as we notice in current domain names in English. In fact,
that is one of the reasons why we need to allow using domain names in local
languages.

> h. This would also require all languages written currently with accented
> characters (French, German, Swedish, Slovak, Polish, Lithuanian, etc.) to
> have conventions for expressing those accents strictly with [a-z, 0-9, and
> hyphen]. Otherwise information is lost. These standards don't exist, and
> again, probably would not be particularly readable, except for a few cases
> (such as German, where there is an established tradition of using "e" for
> representing umlauts).
>

Sung: Again, please take a look at how those accented characters have been
actually transliterated or approximated into the limited set [a-z, 0-9, and
hyphen] in domain names in English in regions where those characters are
used, without such standards. VIDN uses the knowledge of this
transliteration process.

> i. Even if this were all possible in any reasonable amount of time, it
would
> be quite expensive, and extremely difficult to administer and validate. As
> you say "The knowledge base includes not only the general principles of
> transliteration but also common usages, idiomatic expressions, and
possible
> variations that may occur in transliteration." So every web browser and
> mailer would require such a knowledge base for converting the
romanizations
> back to native characters for every language that they need to handle.
>

Sung: VIDN provides more immediate and less costly solution to
internationalized domain names than other methods. For example, developing
the testing version of VIDN for Korean-English conversion has taken only a
few months. A couple of programmers have developed it with consultation with
several experts in Korean and English phonemics and linguistics. With more
resources, the development process can expedited significantly.

Sung: Because of its small size (e.g., the testing version of VIDN for
Korean-English conversion is about 800KB and the actual DLL file used for
the conversion is about 250KB), VIDN can be easily embedded into user
programs that use domain names, such as web browser and client email
software. Alternatively, the knowledge base of conversion and the logic to
process it can be embedded into operating systems as a library, so that
client software such as web browser and email software can share them. The
user will need only the module for conversion of his or her preferred local
language into English. Again, there is no need to convert the romanizations
back to native characters for every language.

> j. The process also lends itself to becoming completely and utterly
> politicized.
>

Sung: VIDN does not need much of "the process," and so, I do not see any
politicized issues in VIDN.

> k. Before even thinking about this, one would need to see a proof of
> concept -- a round-trip mapping for a large percentage of Unicode letters
to
> and from [a-z, 0-9, and hyphen]; and for a large number of languages, not
> simply Korean.
>

Sung: VIDN does not need a round-trip mapping, in using domain names in
local languages virtually. But I would be more than happy to demonstrate how
VIDN works.

> This is an interesting idea, but unfortunately simply won't work. If the
> infrastructure is put into place to allow Unicode/ISO 10646 characters in
> IDNs, then there is room for tools that transliterate arbitrary characters
> into some readable representation in the user's native characters. But
such
> tools are optional, and do not need to have the anything like the degree
of
> precision (and the language coverage) required by your proposal.
>

Sung: VIDN method is not only interesting but also working. But creating
domain names in local languages and changing the DNS infrastructure as
proposed in other methods will require costly and lengthy implementation
process and cause more disputes and squatting problems on domain names in
local languages.

> Mark
> ----- Original Message -----
> From: "DualName - ShimSungJae" <shimsungjae@dualname.com>
> To: "Mark Davis" <mark@macchiato.com>; <idn@ops.ietf.org>; "Paul Hoffman /
> IMC" <phoffman@imc.org>
> Sent: Sunday, November 19, 2000 15:49
> Subject: Re: [idn] I-D ACTION:draft-ietf-idn-vidn-00.txt
>
>
> > Mark,
> >
> > Thank you for your comments. Please see below for my responses to your
> > comments.
> >
> > Sung
> >
> > P.S. I have already responded to the comments provided by Mr. Paul
> Hoffman.
> >
> > ----- Original Message -----
> > From: Mark Davis <mark@macchiato.com>
> > To: <idn@ops.ietf.org>; Paul Hoffman / IMC <phoffman@imc.org>
> > Sent: Friday, November 17, 2000 12:37 AM
> > Subject: Re: [idn] I-D ACTION:draft-ietf-idn-vidn-00.txt
> >
> >
> > > I agree completely.
> > >
> > > a. There is no accepted set of rules for romanizations of all
languages.
> >
> > Sung: That is one of the reasons why VIDN uses the phonemes as a medium
of
> > the transliteration. Phonemes are very universal, being applicable to
any
> > language. In fact, most transliteration schemes are based upon the
> > systems of sounds of the respective two languages and the units of such
> > systems are phonemes.
>
>
> >
> > > b. Moreover, to be useful according to the proposal, the romanization
> > would
> > > have to provide a "round-trip" mapping.
> >
> > Sung: Again, since VIDN uses the phonemes as a medium of the
> > transliteration, a "two-way" mapping is possible. That is, VIDN
> > transliterates between two languages using the phonemes that have the
same
> > or very proximate sounds.
> >
> > > c. Furthermore, the romanizations will be subject to accidental
> collisions
> > > between different scripts.
> >
> > Sung: Such collisions between different scripts may occur when the
> different
> > scripts are actually used and registered as internationalized domain
> > names. Please note that VIDN do NOT create and register any
> > internationalized domain name, BUT it allows using internationalized
> domain
> > names virtually. Thus, as long as the characters in different scripts
> > represent the phonemes that have the same or very proximate sounds, VIDN
> > returns the same characters in English.
> >
> > Sung: Also, since domain names in English already exist, conversion from
> one
> > local language into another local language can be done via English
> > language. For example, a virtual domain name entered in Korean can be
> > converted into the corresponding domain name in English, which can be
also
> > converted from another virtual domain name entered in Japanese. Using
> domain
> > names in English as liaison between virtual domain names in two local
> > languages can minimize the possibilities for such collisions between the
> two
> > local languages.
> >
> > > d. And finally, the mechanisms for doing romanization need to be
fairly
> > > sophisticated. Look at ICU's, for example:
> > >
> > > http://oss.software.ibm.com/icu/userguide/Transliteration.html
> > >
> >
> > Sung: "Sophisticated" does not necessarily mean "impossible." In fact,
> VIDN
> > uses the knowledge base of transliteration, which is very comprehensive,
> if
> > not complete. For example, several experts in Korean and English
phonemics
> > and linguistics have consulted in constructing the knowledge base of
VIDN
> > for Korean-English conversion. The knowledge base includes not only the
> > general principles of transliteration but also common usages, idiomatic
> > expressions, and possible variations that may occur in transliteration.
> >
> > > Mark
> > >
> > > (I'm replying to a slightly broader list on this message).
> >
> >
>
>