[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Charter, refocus



Hi, Dan:

I like your RFC list.  The normalization should deal 
with the whole set of UCS.  While Latin has cases and 
wide width, CJK has equivlent sets and 
Latin-Cyrillic-Armenian can have equivlent "A"s.  I'd like
 to note, that I am not ignoring Arabic and Indics.  It is 
just I don't know what the main problem these scripts 
need to be dealt with for normalization.   

This all can be taken care of  with normalization process 
and reflected in [nameprep].  We need agreement on 
what characters in UCS is allowed as an IDN identifier.  
In another word,  what is the set of characters are used 
for UCS comparison as "case insensitive matching" 
(also width-insensitive, TC/SC insensitive, 
diacritic-composition-insensitive).  I believe, this is 
the point that John has been concerning and we are 
touching the "fine line".

Liana

On Wed, 21 Nov 2001 13:51:38 +0100 Dan Oscarsson <Dan.Oscarsson@trab.se>
writes:
> There have been a lot of talk about domain names and charter 
> recently.
> And
> in the charter there are many milstones for drafts I do not know 
> what
> they mean.
> 
> I will here go back to what I think DNS stand for, and how it 
> relates to
> how things
> are expected to work by people, and what we need to do.
> 
> The foundation in DNS is a binding of a "domain name" to one or more
> objects.
> The RFCs directely or indirectely state that the "domain name" is a
> printable text string.
> There is an RFC for binary labels with EDNS, but the standard label 
> is a
> text label.
> While allowed to contain "binary", the normal usage everywhere is a
> printable character string.
> The normal expected usage, and what has been defined by the RFCs, 
> are
> that the text
> labels (domain name) is to be matched case-insensitively. This 
> results
> in that
> within a domain, you cannot define two labels that only differs in 
> case.
> 
> So the expected usage of DNS is "a text string" (domain name) bound 
> to
> one or more
> objects. The same domain name can be bound to both a "host" and to
> something else.
> Because of this, the DNS do not have spcial rules for "host names". 
> It
> only have
> rulse for "domain names". The same format and matching rules are 
> applied
> for
> all domain names: a sequence of text lables, 1-63 characters long, 
> max
> 255 for complete
> domain name, and each label case-insensitively matched.
> And current DNS have only handled character in the ASCII range.
> 
> Current DNS also defines that stored case shall be returned, if
> possible. This means
> that a host name returned in a PTR
> 
> There is something called "host name" or maybe it is arpanet host 
> name
> that have been
> changed several times. I expect the restriction on characters 
> allowed
> was to simplify
> for programmers more than for users. DNS does allow a host name to
> contain any
> printable character.
> 
> That is the world today.
> 
> Now we are defining how DNS should handle domain names when they 
> include
> 
> non-ASCII characters. This means that the "domain name" now can 
> include
> any printable character in UCS.
> 
> Following the standard and, by users, expected workings of DNS, this
> means that
> the DNS should still be a database with "domain names" bound to one 
> or
> more
> objects. "domain names" should still be printable text strings and 
> they
> should still
> match case-insensitively. And due to that a "domain name" can be 
> bound
> to
> both a "host" and something else, DNS must treat all domain names 
> the
> same way.
> 
> Now when I look at the charter I see that we should produce a
> domain name normalisation draft. What is this? I have not seen any
> such thing yet.
> 
> Unicode and W3C have used the word "normalisation" on a text to
> mean: encode each character in a single standard way.
> This means that, for example, the Angstrom sign is replaced
> with "A with a ring above" and "double width" latin letters should
> be replaced by standard width. It does NOT mean to convert to
> lower case.
> This is needed because UCS does allow the same character to
> be encoded in many ways, and that makes software quite complex
> when doing matching of strings.
> As we want to use UCS as the single character set to domain names,
> all domain names must be normalised. Otherwise they cannot be 
> compared
> easily. A suitable starting point is Unicode normalisation form KC.
> 
> So what I see we need for DNS is:
> 
> 1) An RFC stating the standard normalisation of domain names.
> 
> 2) An RFC defining how domain names must be matched. As a minimum
>    it must require case-insentive matching for characters with case.
> 
> When that is done, we can add:
> 3) An RFC for ACE so we have a standard way to encode domain names
>    in ASCII,  complying with the legacy "host name" character
> restriction.
> 
> 4) An RFC updating what applications should accept as "host names".
> 
> 5) An RFC how to layer ACE on top of the current DNS protocol to 
> allow
>    legacy software to handle new domain names (for example IDNA).
> 
> 6) An RFC defining how DNS itself can handle UCS based domain names
>    using UTF-8.
> 
> Now in the charter is says:
> - produce a draft on normalisation of domain name identifiers.
>   Do we have a draft on this? I have not seen one. The existing
>   nameprep draft does have parts of normalisation but
>   includes much outside normalisation.
> 
> - produce a draft for architecture on many things.
>    Do we have this?
> 
> - produce a IDN protocol draft.
>   What is this? IDN protocol? Is this the IDNA layer on top of DNS?
> 
> 
> Can we not start with the basic needs of DNS instead of jumping at
> "host names"?
> 
>       Dan
> 
> 
> 
>