[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] questions: unassigned code points in nameprep



--On 01-10-23 22.13 +0900 Soobok Lee <lsb@postel.co.kr> wrote:

> I found somewhat obscure and unclear terminologies in the section 6 
> of stringprep. That may come from my lack of knowledge for 
> previous discussions, but for all  late joiners including me, 
> please reply  with clear answers for  the following   
> **asterisked** interposed comments. 

Ok.

First of all, what we try to explain is the robustness principle which is
quoted in many RFCs:

                "Be liberal in what you accept, and
                 conservative in what you send"

With this, we mean that if you generate a string, you should not use what
you belive is an unassigned codepoint. But, when receiving/reading a string
which someone else has generated, you should be prepared that unassigned
codepoints are used.

For example, if you have an older version of stringprep than I have, then I
might send codepoints to you which you belive is unassigned.

Now, given the principe Unicode Consortium has, to not change normalization
tables for assigned codepoints, the only thing that can happen between two
versions of the Unicode Tables are:

 - A previously unassigned codepoint is assigned.
 - A previously unassigned codepoint is assigned, and a normalization
   rule which normalizes it to another newly assigned codepoint.
 - A previously unassigned codepoint is assigned, and a normalization
   rule which normalizes it to another previously assigned codepoint.

This means that if you only create strings with codepoints you know how to
handle, and for others you codepoint by codepoint (on the normalized
string) you will still get a predictable result.

Because of this, no versioning is needed.

That said, I will try to explain what you ask.
 
> http://www.ietf.org/internet-drafts/draft-hoffman-stringprep-00.txt 
>   
> 6. Unassigned Code Points in Stringprep Profiles
> 
> This section describes two different types of strings in typical
> protocols where internationalized strings are used: "stored strings" and
> "queries". Of course, different Internet protocols use strings very
> differently, so these terms cannot be used exactly in every protocol
> that needs to use stringprep. 
>   
>    ** "CANNOT BE USED EXACTLY in EVERY PROTOCOL" should  be more 
>    specific regarding to every other protocols  not mentioned in this
> paragraph    in order to   eliminate confusions of implementors. 

Do you have a suggestion, given my explanation above?

The key is that an application can get a string in basically two different
ways:

 - Via some kind of interface where it is known that the string is
   not namepreped at all, for example a user interface or somewhere
   where non-namepreped text is used. We called this "stored".
 - Via some application protocol where the string is already
   namepreped, for example a DNS server which recieve a query. We
   call this "queries".

One can say "client" and "server", but that is not true either. The key
thing is whether the application is responsible for doing nameprep or not.

Suggest other words!

> In general, "stored strings" are strings 
>   
>   ** "IN GENERAL" ?   ANY Exceptional cases  worth to research ? 

See above. There are no exceptional cases, only possible to use other words
for the case when the application is responsible for doing nameprep.

> that are used in protocol identifiers and named entities, such as names
> in digital certificates, DNS domain name parts, and names of SNMP
> objects.    
>    ** "DNS DOMAIN NAME PARTS"  point to what ?  maybe those in any zone
> files.     all the host names in any webpages and 
>     all the email addresses in electronic address books is "queries" or
> "stored strings"?    

We explicitly don't talk about hostnames. We talk about the individual
labels. It is very important that we talk about labels and not hostnames.

Regarding the rest of this, see above about need for normalization.

> "Queries" are strings that are used to match against strings that are
> stored identifiers, such as user-entered names for digital certificate
> authorities, DNS lookups, and SNMP requests. 
>   
>     ** Email composing windows accept TO: addresee  before you send. 
>    Are  both sender/recipient addresses  "stored strings" or "queries"
> for email clients?     DNS lookups do not occur in email
> composing/account-setup window. Recipient's     IDN email address will be
> nameprep/ACE encoded in email clients without     any DNS lookup. 

If they are to be nameprepped, then they are "stored strings".

                "Be liberal in what you accept, and
                 conservative in what you send"
  
>    ** What occurs when TAGALOG email addresses are specified in the TO:
> addressee     in the email clients composing window  based on current
> nameprep without TAGALOG     supports ? 

If the address comes from for example a user interface where the
application belive it has to do nameprep on, an address with an unassigned
codepoint should be rejected. If the address comes from an address book
where the email application know that the address is namepreped already, it
can accept the address even though it contains unassigned codepoints.

It all boils down to the question on whether nameprep is needed or not.
 
>    ** SMTP servers receive sender/recipient addresses from email clients
> by SMTP protocol.  Then,Does SMTP server regard the incoming  email
> addresses  as "stored ones" or "queries"?    it is true that most SMTP
> servers will simply pass through those ACE hostnames  without any
> nameprep validity check.    

As queries. No nameprep is needed as the addresses should bee nameprepped
already.

>    ** What if  sender's ACEed Tagalog email address which is unassigned
> and invalid  yet in the  current  version of  nameprep?    should the
> recipient be warned   for security reasons ? 

See above. It depends on whether the email application belive it has to do
nameprep or not. If it does, unassigned codepoints can not be accepted.
Warning the user is one way of handling the case of the user wanting to use
some newly assigned codepoints before the email application is upgraded to
include the new codepoints in it's nameprep algorithm(s).

> All code points not assigned in the character repertoire named in the
> stringprep profile are called "unassigned code points". Stored strings
> using the profile MUST NOT contain any unassigned code points. Queries
> for matching strings MAY contain unassigned code points. Note that this
> is the only part of this document where the requirements for queries
> differs from the requirements for stored strings.
> 
>     ** Does this mean that  old nameprep/ACE email applications can't
> send       emails to  recipients with future TAGALOG IDN email addresses
>       without upgrading to new version of nameprep?   Or allowed with
> warnings?    

My take is that it is up to the application. The MUST NOT probably should
be SHOULD NOT instead, to give the ability for an email client (which is a
specific case) to allow such use. We were thinking of DNS Zonefiles which
is a more dangerous case, where unassigned codepoints is not to be.

   paf