[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Just send UTF-8 with nameprep (was: RE: [idn] Reality Check)



Hello Patrick

At 11:06 01/07/17 +0200, Patrik F$BgM(Btstr$B‹N(B wrote:
>--On 01-07-17 16.48 +0900 Martin Duerst <duerst@w3.org> wrote:
>
> > It is very important to understand that in most contexts, the chance
> > that a label is changed (except for lowercasing, which was part of
> > the UTF-8 proposal from the start) by nameprep when somebody takes a
> > proper domain name from paper and inputs it, is *very* small.
>
>We will have problems when domainnames are registered.
>
>Say that someone send in request for registering domain A which is
>correctly nameprepped. If someone else is sending in domain B for
>registration which would be nameprepped into A, it is extremely important
>that B is not registered, but instead this second person get to know that
>B, converted to A, is already registered.

I agree that it's extremely important for registration. But
registration happens once, and then it's over. I was answering
to your claim that applications that now just use UTF-8 need to
be updated to use nameprep. In the limit, compiling a zone file
into a domain name server is the only place we really need
nameprep. If that is done 100% consistently, everything else
can be left to quality of implementation (I don't think we
should leave it at that, but I'm saying it could, without
any major consequences).


>It is enough with only _one_ mistake (regardless of how small the risk is
>that this happens) and we will have some definition of chaos.

Well, yes, and the registrar who registered it will get some
*very* serious blame.


>I also ask myself this the last couple of days:
>
>- This wg is about what is used in the DNS protocol.

Yes, but it should also look at the overall consequences,
for backwards compatibility but even more for how we want
the future of the Internet and Internet internationalization
to work.

>- Many people want UTF-8 to be used in DNS.

For obvious reasons. It's the most straightforward solution,
given that getting people who use ASCII to abandon it and
given that the Internet is an 8-bit network.


>- Application protocols do not use UTF-8 but ASCII (most of them).

Well, is some way. FTP already uses utf-8 for file names.
NNTP is working on it, just to give a few examples.
And many others are perfectly capable
of doing more than ASCII.

The Internet is 8-bit end-to-end. The application protocols should
be no exception.


>- Someone (in Applications Area) have to write documents, one per protocol
>   on how to pass Unicode characters in the protocols, how to handle
>   downgrading to something else if needed (like 8BITMIME in SMTP) and
>   for each protocol what the downgrade is to.

If internationalization hasn't been done, it indeed should be done.
Are you suggesting that ACE will make it unnecessary to internationalize
protocols? I very much hope not.

For most protocols, in particular point-to-point protocols,
the downgrade will be failure if implemented tightly (i.e. checking
for ASCII) and success if implemented 8-bit transparent.
FTP internationalization is a typical case. People didn't find
it necessary to negotiate or downgrade (except for some provisions
for legacy encodings).

SMTP indeed needs some serious work, because it's application is
so important, because way less than could have been done has been
done until now, and because of the pushing and semi-point-to-point
characteristics of email. For most other protocols, the situation
is much simpler.


>Conclusion:
>- As many application protocols (like SMTP) can not use UTF-8 for a while
>   should not DNS use some more efficient encoding than UTF-8 in the
>   packets?
>
>I claim that the actual encoding which is to be used in the DNS protocol is
>something this wg can not come up with, but instead the DNSEXT wg in the
>DNS is the correct group which is to come up with an encoding which is as
>efficient as possible, given DNSSEC and other new things which they are
>working with.

Are you saying we couldn't decide to use UTF-8, even if everybody
and her grandmother wanted that? Or are you saying that if DNS
needs some generic compression, this WG shouldn't do it?


>I further guess that when people in this wg talk about "we want UTF-8" they
>really talk about what they want to use in the Application Layer protocols,
>but defining that is really out of scope for this wg.

Yes indeed. But choosing a solution that fits with the rest
should very much be within the scope of the wg, I hope.


>So be careful what you ask for!
>
>Either this wg come up with something which is backward compatible with
>other existing standards, or it comes up with a "framework" and give work
>items for other wg's and leave it to them for standarization.

So are you saying that this WG is allowed to do crude hacks
to smuggle characters around, but not to do serious work in the open?


Regards,   Martin.