[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] San Diego Meeting Notes

To: <idn@ops.ietf.org>
Subject: [idn] San Diego Meeting Notes
From: "James Seng/Personal" <James@Seng.cc>
Date: Sun, 14 Jan 2001 13:51:55 +0800
Cc: "Marc Blanchet" <Marc.Blanchet@viagenie.qc.ca>
Delivery-date: Sat, 13 Jan 2001 21:54:19 -0800
Envelope-to: idn-data@psg.com

Dear all,

Here is the minutes of the IDN WG Meeting at San Diego IETF. Thanks to David
(again) scribing 43Kb of text notes :P

Some important highlight: (feel free to object if you think I am wrong).

1. Requirements I-D is almost ready for last call with some minor changes. It
is
   likely we will do one more version. Any last comments, please do so ASAP.

2. General feeling that Nameprep sub-team is working in the right direction.

3. Strawpoll agrees with the Protocol sub-team recommendation that we will
focus
   on "ACE on Application" now but do not reject the possibiity to do
something
   else longer term. What is the longer term solution is left for future
discussion.

-James Seng

IDN WG Meeting San Diego

<RFC 2026 statement>

* Working group update, Marc Blanchet

20 new documents since Pittsburg
most are going to be presented this week
not: UDNS, BRACE, SACE, CJK (last presented in Pittsburgh).

Status of documents
- requirements: last rev and go WG last call after San Diego

WG ACE Prefix registry  if needed by the WG can be done on a temporary basis

* More radical solutions: directories and basic DNS changes, John Klensin

Internationalized access to Domain Names: Some Radical Solutions

Worried for some months about too narrow a focus.  We're looking for quick 
solutions. Problem with quick solutions is that they are usually not as 
quick as they're supposed to be, especially in deployment.  As a result, we 
have to get rid of them.  This issue is very important since its effects 
are very extensive -- affecting current past and future.

Started exploring a different question -- what should we have done if we 
started in the beginning?  This is a different question than how to we 
start from where we are today.

Short term question: how to get i18n domain names now, i.e., quick 
deployment.  Steps taken for short term are with us forever.

Design for long term: problem figures out how to get there, preserve 
installed base

Designing for i18n:
- not fitting i18n into the network
- not just a DNS problem
-- applications need to work and make sense
-- need to support presentation of non-ASCII characters everywhere users 
would expect
-- extreme user case is on whose keyboard doesn't have ASCII
- rethinking several assumptions

IDN may require a presentation layer
- IETF tradition usually
-- keeps users close to protocol interface
-- often done because we are lazy and it is easy
-- or to keep things simple (may be the same thing)
- internationalization may not permit this
- difference between
-- unique strings stored in Internet-wide databases
-- appearance of names to users
-- selection and choice mechanisms

two hypotheses / strawmen
- not a DNS problem
-- we are looking in the wrong place
-- very restricted protocol element, not ASCII
-- still the right choice
-- solve UI problems at another level
- DNS should be fully international
-- content is names, not protocol elements
-- names should reflect international usage

DNS is being pushed in a direction where it doesn't belong
- symptoms showing up in many places, not just i18n
- DNS doesn't do
-- search
-- ambiguity
-- nearest server

directories
- not a technology or protocol but a class of them
-- typically good at handling problems for which the DNS is weak
--- multiple forms of lookup
--- attributes with values
--- tolerance for ambiguity
- IETF has never succeeded in widespread deployment of a directory protocol
-- but maybe this issue is important enough

Directory protocol modules
- applications don't call DNS with user supplied names
-- call directory with names or keywords
-- involve user in evaluation if needed
-- get results ad call DNS
-- DNS labels remain "protocol elements"

For many applications, replace gethostbyname()

Directory advantages
- well adapted to user selection of target from a list
- use with keywords and descriptors not just rigid names
- attributes can identify
- types of use and business location (may help with trademarks)
- multiple server

Disadvantages
- the Internet has never deployed one
- reaching an agreement on protocol and schema
- too many options

Legacy application code:
- don't call directory, hence don't see internationalized names
- can use only a subset of Internet until upgraded
- transition similar to host table to DNS

Other hypothesis
- assume it should have contained non-ASCII (names not protocol elements) 
from the beginning
- would look like IS 10646 everywhere
- Mixing old and new conventions cause problems
- A new class?  Obsolete "class=IN"
Everything going forward goes into the new class

Changing classes
- in principle, obsoletes everything
-- no requirement to share server tree, delegations, etc. with class=IN
- Some advantages of preserving models

Transition issues
- copy existing records or search
- pointers from the new to old class
- political nightmares
- opportunity to not carry rubbish forward

Legacy applications and Code
- Don't know about new class so "see" only Class=IN
- non-ASCII names are invisible until they are upgraded

Shared problems
- DNS ASCII is convenient
-- case independence and matching rules are easy
-- very small number of characters
- Almost any international character set will require canonicalization and 
matching rules
-- and may require localization
- ISO 10646 requires non-obvious coding decisions UTF-8 may not be the 
right choice

Documents
- Directory
-- draft-klensin-dns-role-00.txt
- New Class
-- draft-klensin-i18n-class-00.txt

Eliot Lear:  the problems I face: we don't have a successful hierarchical 
directory.  I don't think we have the luxury of time and we don't have the 
technology.  I don't see anything that can scale as well as the DNS.

JK: let me lay part of that question aside.  I can argue either side.  The 
question of time is an interesting one.  Our hope a solution would be 
converged upon fast enough that junk wouldn't be deployed.  We lost.  But 
problems with the trash are becoming obvious.  Our only opportunity is to 
do it right.  Half-right and quick won't do it.  Taking the extra time to 
do it right is the right thing to do as it will help get rid of the 
trash.  Some better than trash, some worse, average is trash.

Michael Mealing: you said there were components to directory service.  Have 
you given any thought to what the components might be and whether we 
already have some of the components

JK: If I say "yes" and duck, would that be an appropriate 
answer?  Directories are complicated.  Talk about that as a design 
principle instead of getting bogged down into whois++/ldap is better 
arguments.  After posting my draft I received two groups of 
comments.  First group: first attempt to view it at an architectural 
issue.  Second group: my schema better than yours.  LDAP better than ....  
If WG comes to a decision that a directory approach is the best way 
forward, then we sit down and focus.

Keith Mitchell: in response to the new class proposal, it seems that what 
you're doing is partitioning the applications in the Internet to those who 
can deal with I18n and those that can't.  The on the wire protocol doesn't 
matter, it is what the applications do with the protocol.   The other thing 
is if we use this to de-cruft the DNS, trying to clean up things gets out 
of hand.

JK: No intent to change protocols.  The rest is a bar discussion

Hideyo Imazu: Directory based solution is mostly for WWW.  ACE based IDN 
solution can be applied to email.  Among countries like Japan, China, 
Korea, internationalization of entire email address should be considered

JK: Directory works as well for email as for the web.  Doesn't work for 
non-interactive, non-online applications.  Architecturally, it feels like 
the right solution.

David Chadwick:  Support the directory notion, users need the search 
facility.  Once we have multiple characters sets, we must have a way to search.

James Seng:  John's proposal is a very theoretical proposal.  Fairly close 
to NGDNS.  Internationalization of NGDNS is probably one of the problems 
that can be fixed.  Some of the comments are very true.  Good starting 
point for NGDNS.  Not talking about long term deployment of NGDNS, looking 
at RACE for quick deployment.  I'm afraid the chaos during transition to 
NGDNS would be disastrous.

JK:  I believe it will take 10 years to deploy anything this WG comes out with.

JS: Yes and no.  To do a directory would require a comprehensive 
re-architecture.  Removing stuff from current DNS has not really been 
discussed.

Ted Hardie:  Eliot was wrong, you aren't being radical enough.  We have 
never had a presentation layer.  Putting a directory layer between 
application and DNS is a presentation layer.  Perhaps coming up with a 
presentation layer.

Bob Hinden:  Both of these choices have plusses and minuses.  There are 
harder issues and dealing with them scares me.  Solve the important 
problems first.

JK:  I think this is the most interesting and hardest problems since 
deploying IPv4.  We need to take this very seriously.

* IDNA, Patrik Faltstrom

ACE from the Applications

Not so nervous about changing applications.  Really nervous about backwards 
compatibility.

This proposal is to change the presentation layer in applications.  Not 
change the application layer protocols, the DNS protocols, or the DNS 
servers.  We only change the presentation and input mechanisms.

When a user inputs a domain name into the application, the information will 
be stored in whatever local script the application is using.  The 
application must do a transformation anyway.  Many applications must be 
better regardless of what we choose.  All applications must transform from 
the local character set into something that is entirely backwards compatible.

Changes to applications:
- host names must be nameprep'd
- an ACE is applied
- sends encode name to resolver
Display of host names
+++

Known side effects
- un-updated applications will display obscure ACE format (leakage)
- moving names from update programs to un-updated programs might cause more 
leakage
- Non-IDN names that use the ACE prefix or suffix will either be considered 
illegal or will appear as nonsense characters
- The IDN WG must choose an ACe
- Doesn't internationalize text records in the DNS zone files

Updating name servers
- administrative interface for DNS servers must all ch3ck IDN names
- probably done with automated scripts converting from and to preferred 
native format (which will be different for different users)p
- Will probably be important to check all names with nameprep

No major comments (Yet!) to -00 draft
No planned revision

Pete Resnick: work in a protocol that has lots of layer violations.  email 
embeds DNS names.  Do we need sendmail to change headers to ace encoded things?

PF: one of the good things about this proposal is that it is an application 
layer issue.
Paul Hoffman: No.  What we're talking about presentation layer change.  If 
you've changed your application to handle IDNA, then all applications 
should handle IDNA.  We don't want to touch things that IDNA will be 
looking at layer.  You must change your paste function.

PR: I update Eudora to do IDNA, I encode for presentation, decode for 
transmission.

PF: Yes.

Yoneya:  It should apply to any application protocol

PH: Applies to any protocol that has a presentation layer.

Rick Wesson:  encoding and decoding leakage was experienced in the VGRS 
testbed.  Broke lots of things.  Leaks happen.

PF: leakage is a bad thing.  If leakage occurs and if you know the ACE 
version, you can still figure out where things should go.

Hideo Imazu: Considering the fact that there are very few fonts that have 
all Unicode characters in it, even if a platform has full support for ACE 
encoding, in some cases, it can't encode/decode an ACE sequence for some 
characters.  Even on a good implementation, leakage can happen.

PF: Yes.  It might be possible for user to look at ACE encoded name.

PH: Decoding will be possible, but display may not be possible.

JS: another leakage not considered: user assuming application is 
internationalized and its not.

PF: Yes.

* Virtually internationalized domain names, Shim

Allows the use of i18n domain names, but without creating i18n names.

** Description

Most domain names in regions where English is not widely spoken, are 
created as the characters of the local language are transliterated into 
those of English language.
VIDN formalizes and uses this knowledge of transliteration
At one end, we have local characters, at the other end, we have regular DNS 
characters
1. local characters changed to phonemes
2. each phoneme is matched with English phoneme
3. English phonemes are united to form the transliterated name

provides one to many mappings

For one-to-one reversible mapping
a) each server is pre-assigned a unique code (e.g., Unicode)
b) the code is also generated by VIDN on the client whenever the 
corresponding virtual domain name is typed.
c) the code is compared to the code retrieved from the server
d) VIDN matches the code

** Implementation and Administration

Development of VIDN software for each local language may be done by a local 
standard body
- the codes for one-to-one mapping may be administered by international 
standard body such as IANA
or the codes may be administered by a local standards body

** Key features

VIDN does not require creating and registering additional domain names in 
local scripts

VIDN does not make any change to the current DNS infrastructure

VIDN does not need separate name server/resolver

** Testing results

Keywords and ACE require conversion, VIDN already there.

Web browser add-on:
- Korean-English conversion
- 800 KB

Comments/ Suggestions
- Comments and responses on the IETF IDN discussion mail archive
- Contact me

Itashiro: different ideographic script can have same pronunciation, how do 
you convert?

Shim: include one-to-one mapping scheme

Paul Hoffman:  this can be used today, but only when a code isn't 
needed.  If a code is needed, they'd be non-obvious to users, so your 
assertion that this is easier/faster to deploy isn't true.  MIght be true 
for Korean or maybe Japanese, so same as ACE or IDNA.

Shim:  30%-40% will need the codes.  ACE/IDNA will be 100%

Maynard Kang: if you add the addition of the code, then it is the same as ACE.

Internationalized PTR RR, Hong Bo Shi

Why not PTR:
- Current PTR and its mapping method can't support IDNS as well as 
traditional ASCII domain names
- It is impossible to let client to choose which it wants from those PTR 
records without any additional selection
- It makes no sense to return an unreadable domain name
- But no PTR record is worse.

Why IPTR:
- IPTR can give a language selection to client/end-user
- accordin to the language tag, the cli3ent/end-user can get what they want 
from a list of corresponding values

IPTR Format
4.3.1.1.in-addr.arpa. IPTR "LANGUGAGE" "name-in-utf8"
- the LANGUAGE field should be treated in a case insensitive manner and 
must follow the conventions defined in 1866

EDNS0 is require to implement IPTR
IPTR format
Langauge: 2 octets, an argument for IPTR to define kind of language used in 
the following IDN label

IPTR query/response
If qytpe=IPTR then all of the corresponding IPTR RRs should be returned in 
one response
Transport: use UDP first if TC  bit set, then use TCP
in future, EDNS0 is required.

PTR extension FOR IDNS ONLY case
it is very difficult to avoid IDN ONLY
IDN ONLY means a host only has its IDN but not any traditional ASCII domain 
name
- in above case PTR RR must not be null in a response message
- PTR RR must contain a domain name in ACE to co-op with IDN unaware systems
-- else a syntax error message should be sent back with an administrator 
configures zone

Open issues:
- do you think we should return all corresponding IPTR records to a resolver
- nameserver should only send back one IDN in each language

- this kind of function has already been implemented, for example in BIND-9.x.x

To be or not to be
- it is said that the proposal that returns all the corresponding IPTR 
records increase the complexity to implement a resolver
- according to the suggestion it is better to let a server just feed  back

The above suggestion is in the belief that the IPTR design is to introduce 
a language tag. it should be used in queries from client.

Demchenko: it is not enough just language, need character set + language

Jiang: intention was just language, not character set.  the character set 
is supposed to be UTF-8

Hideo Imazu: specifying language is not enough in some cases.

Jiang: mixture of languages?

HI:  some languages have two scripts

J: right.  we're just trying to capture language, not script.

* Protocol Design Team Report, David Lawrence:

Primary task: categorize protocol proposals and make recommendation to WG
Members: DL, Olafur Guomundsson, Dan Oscarsson, Paul Hoffman
Observers/commentators: Marc Blanchet, Harald Alvestrand, John Klensin, Rob 
Austein, James Seng

Output of the design team
- report to the mailing list (didn't make it before this meeting, will have 
it soon afterwards
- comparison of protocols
- possibly other reports or recommendations
-- updating 2425 to take a much deeper look
-- looking at different ACEs

Categories of protocols, based on expected implementation
- Do a long term architecturally clean fix that requires upgrading to the 
whole naming infrastructure
- Change only the applications in the sort term and possibly the 
application protocols, but change none of the DNS infrastructure
- change the applications for short-term gain, but transition in the long 
term to a clearer solution that requires upgrades to the whole naming 
infrastructure (two-phase approach)

IDNA- application
IDNE- infrastructure

Big picture
- we cannot simply put binary characters into the current DNS without 
breaking many applications and some DNS servers
- none of the solutions at this point is a comprehensive solution that 
considers all the effects of the changes proposed
- the design team has not looked at all impacts on applications

Where the solutions fit
- long term solutions mostly involve changes to the current DNS 
infrastructure, although there is also a proposal for using a directory 
layer with internationalized input to find resources
- short term solutions are based on using ASCII compatible encoding in 
applications
- Two phase solutions are a mixture of the two

Infratructure solutions
- long term proposal require that all DNS servers in a request path be 
updated before the user can get a correct answer to a query for an 
internationalized host name
- different proposals and different legacy DNS servers will cause different 
error messages to get back to the user if their query traverse a server 
that was not upgraded
- a user can get different results for the same query
- maximum breakage of applications

ACE solutions, positive
- they are easier to implement than the long term solutions, but they are 
not without problems
- the obvious advantage is that update just applications will go faster 
than updating applications and the entire naming infrastructure
- they work on the presentation layer which means that they don't even 
require changes to any application protocols.

ACE solutions, negative
- ASCIIi-type ugly name leakage in non-updated applications
- non-IDN names that use the ACE prefix or suffix will either be considered 
illegal or will appear as nonsense characters
- the ACE solutions only internationalize host names, not textual material 
that appears in some types of DNS records
- IDN must choose an ACE
- versioning is harder in ACEs than in other proposals
- there are probably other ACE-specific implications that we haven't 
thought about yet

Two phase solution
- request that the applications be updated to handle an ACE encoding in the 
case the query using the long-term solution fails
- every application must work correctly with the short term solution and 
the long term part of the solution has all of the problems listed earlier
- it is likely that only the short term solution would be implemented 
unless the long term solution had other notable advantages

ACE considerations
- ACE must be designed to have one-to-one mapping and versioning
- they must continue to use 63 octet max for name parts, while other 
proposals cold extend the length of the name parts

Changes to applications
- all scenarios will require that all applications that use the new names 
be updated 2825 lists many protocols for which i18n may be very difficult.

Go with DNS infrastructure Change
- the decision about which of the solutions is chosen should be made by 
people in the DNS community and Application area with internationalization 
community

Directory infrastructure
- WG needs to work closely with the directory community for both protocol 
and schema interoperability
- no successful operational experience in an Internet wide directory service

Suggestion: go with application-only solution,
- focusing on the negative attributes of each solution, the design team 
considers the short--term "use ACE from the application"

Looking at costs:
- we note that the more arch. desirable infrastructure solutions are very 
costly in terms of new protocol work needed and upgrading deployed name servers
- predicting when any of the solutions might actually be useful is 
impossible, making them very difficult to sell to the Internet community


An ACE solution does not prevent an infrastructure solution
- fortunately choosing the ACE solution now does not preclude the EITF

What's next:
- design team finishes its report and send it to the WG
- design team finishes the comparison document
- somebody should study the impacts of the chose solution more in-depth
- WG decides

J Kim: forgot one negative -- two sided business card problem.  Amazon.com 
in English, Amazon.com in ACE is a tour site.  That is a serious problem to 
solve.

DL: which aspect of the two-side business card are you talking about.  this 
is a registry issue.

JK: but the problems arise from a technical basis

OG: the ace encoded name will be long and ugly

RW: Verisign has posted their resolution proposal: UTF-8 on the wire, how 
did the design team feel about it.

OG: no comment.
PH: UTF-8 on the wire was a non-starter
DL: we're avoiding recommending a two phase solution

EL: operational infrastructure change is not mutually exclusive from 
presentation change.  Refers back to John proposal.  Two problems: 
representing domain names and transliteration.  there is value add.

RB: DNS protocol supports 8 bit binary.  Does not break DNS but might break 
other protocols.
Design Team: agree

MM: directory approach not feasible due to schema is a red herring as DNS 
RRs define schema.  If you constrain your space, makes the problem more 
easily solved.

DL: Yup.

Proposal for a determining process of ACE identifier, Naomasa Murayama

Requirements of IDN technology:
- unified root
- interoperability
- compatibility to BIND

RACE: row based ACE
- prefix bq--
Brace: bimode row based ACE

Why ACE?

1. permitting 8-bit domain names and modify name server software to 8 bit 
clean
2. partitioning the current domain name space to accommodate MDNs using 
some kind of ACE

problems for approach 1
- difficulty in inter-operability
- needs a change of DNS protocol

problem for 2
- needs a consensus among all domain name registries in partitioning the DN 
space

How can we negotiate for a partitioning of the DN space

- how we can select ace identifiers
  ACE identifiers :=- ACE prefixes | ace Suffixes

What is happening in the testbed of MDN in .com,.net,.org

registration started from Nov. 10 but <nihongo>.com is encoded and taken by 
ns.bulkregister.com on Nov. 2

ACE identifier candidates
- prefixes: AA--, AB--, ..., 99--
- suffixes: --AA, --AB, ..., --99

Relevant domain names:
aa--a.com, aa-b.org, ..99--zzzz.net, aa--x.co.jp, etc.
a-aa.com, b--aa.org, ..., zzzzz--99.or.kr, etc.

Proposal
step 1: tentative suspension of registering relevant domain names for ACE 
identifier candidates
step 2: conduct a survey of relevant domain names already registered
step 3: select about 10 to 20 identifiers one of which is for test and 
others for real use, based on the survey
step 4: permanent blocking of registrations of domain names relevant to the 
selected identifiers (except for registrations compliant to MDN semantics).

when writing an ACE proposal
author should either
- describe the ACE identifier as "to be decided"
which must be decided by the IDN WG or other organ when it is p[ublished as 
an Internet draft)
or use an ACE identifier

When a proposal becomes an Internet standard
- when a specific ACE proposal is accepted as an Internet standard, the 
experimental ACE identifier should be replace by one for real use 
(hopefully decided by IANA)

Important change from -00 to -01

excluded suffixes of one hyphen followed by the alpha numerics from the 
candidates

Among 227, 852 registrations of .JP domain names, 23921 were relevant to 
these suffixes

Need cooperation of IETF, ICANN, and domain name registries.

* Handling revisions of IDN, Marc Blanchet

Problem statement: Unicode is going to have revisions because of 
characters, languages, scripts that change

Nameprep is going to have revisions.  Nameprep should not necessarily be 
sync'd with Unicode revisions.    We might fix bugs in nameprep, etc.

Protocol will have revisions.  We should include versioning in the 
requirement document

Patrik Faltstrom:  I think the requirement should be able to handle changes 
in Unicode.  I'm not sure we need versioning.  I have some ideas on how to 
handle this in ACE which would not affect nameprep.  Not fully 
baked.  Might not need versioning.  Don't want to have versioning as the 
requirement.  Unicode and nameprep will change.

MB: will work together on how to specify the wording for the requirements

Versioning with ACEs and IDNA-like approach: there is no protocol.  One 
way: have a different prefix for each version.  But: no negotiation is 
handled.  Needs the same domain registered with different prefixes.

Versioning with DNS extensions

Version numbers: simple, increment by 1.  More complex major.minor.  minor 
changes table lookup, major being changes in lookup algorithm.

Table format simplified Unicode table.

Conclusion:  we need versioning in IDN.  We can do it in different ways and 
need to think about it.

Harald Alvestrand:  versioning in the ACE means you'll either see every 
i18n domain name disappear each time you upgrade client or you'll need to 
do queries for each version.  Having different labels is too broken for words.

Mark Welter: what would drive the versioning is characters forbidden at the 
application layer.  Can push this to the registration layer.

MB: I'd prefer it if we don't need it at all.  But we should think about it 
before doing the protocol.

Randy Prezen:  You only care what happens after the ACE transform is applied.

MB: Just think about it.  If we don't need it, good.

* Japanese characters in multilingual domain name label, Yoneya

Definition of characters to be used as JP characters in MDN label
Definition of JP characters to be normalized.

Def. of JP characters:
- idntabjp10.txt
-- does not include NAMEPREP prohibited characters
- usual characters for JP names
- selection of chars is based on JIS
- table consists of code points in Unicode and corresponding JIS code
-- does not mean specifying chars

6531 characters in table
Kanji: 6355
Hiragana: 83
Katackana: 86
Graphic:7

Definiton of normalization:
table of compatible characters to be canonicalized
idntabjpcanon10.txt
- one character must be added, will correct in next version
compatible characters prohibited in NAMEPREP but widely used in PCs, PDA, etc.
- half width katakana
- full width alphabets, numerics, hyphen
- table consists from code points in Unicode to be canonicalized
-- half width katakana and full width katakana must be canonicalized to the 
same thing

Def. of normalization (cont)
Table of characters to be composed
idntabjpcomp10.txt
composition of kana and voiced sound mark varieties
- the table consists of code point sequences in Unicode to be composed
-- ka-tenten -> ga

Definiton of normalization rules:
1. canonicalize compatible characters
-- adopt idntabjpcanon conversion
2. compose voiced sound marks
-- adopt idmtabjpcomp conversion

Example:
1 canonicalization idntabjpcanon
2. canonicalization idntabjpcomp
3. NAMEPREP

Why:
- convenience for users and implementors
-- explicit definition of usable characters and normalization
-- Unicode KC is insufficient
--- differences exist between VGRS and JPNIC

Mark Davis: the changes you are talking about are combining certain 
forms.  I don't see the requirement for additional steps.

YY: difference is between JIS and Unicode KC.

MD: if you map these together in the folding step than KC takes care of 
this for you in the normalization step.

* Nameprep design team report, Paul Hoffman

This is a summary of what we posted to the mailing list last week.

Overview:
- make it easy for user to enter names.
-- we don't want to make it hard.
- prohibit as little as possible
-- not all domain names are entered by typing.
- keep names sensible
-- there are plenty of chars (such as backspace) that are bad/dangerous.
- linguistic juggling.
-- don't over-restrict.  not a protocol discussion.  "yes, good 
character.  no, bad character"

Proposed changes from -00:
- fewer prohibitions on input
-- may limit on output
-- make it so that input programs do not need to follow the output rules
- make it easier to implement
-- every application has to do nameprep regardless of protocol approach taken
-- give tables for 2 of 3 steps

prohibit less:
- it is difficult and probably not useful to try to limit confusion
-- e.g., should 'O' been prohibited because it looks like '0'.
- get out of the business of disallowing because they look alike
- -01 will have much smaller list of prohibited characters
- -00 prohibited compatibility characters.  -01 says that if you can 
algorithmically change, then accept on input
-- many examples in Arabic and Asian scripts

Change order of steps:
- ordering was prohibit -> fold (case mapping) -> normalize
- ordering is now map -> normalize -> prohibit
-- prohibit on output
Many edge cases in ISO-10646, so doing mapping first can be very clean.

Currently we just do case mapping.  New version will do additional mapping 
such as mapping all hyphen characters to normal hyphen.  There are some 
special cases for case-mapping that need to be added so that all characters 
case-map as expected.  Won't change semantic meaning of characters (in JP, 
hyphen and lengthener characters would be treated differently).  At the end 
of the process, we won't have surprised characters.

Have option of mapping into nothing instead of prohibiting.  Haven't 
specified which we would do this to, but Arabic and Hebrew vowels could be 
mapped to nothing (as per discussion on mailing list).

Use a couple of hundred line list of mappings to be done with the first 
step of nameprep.  We think this would simplify things

New case folding:
- mapping to lowercase will be derived from Unicode case mapping file

Non-character code points:
- non-character codepoints will be listed as prohibited characters
- already Unicode code points assigned as non-characters
Make everyone look outside plane 0.

Remove location of nameprep
- this is a protocol issue.  the protocol must say where nameprep is done.
- different protocol proposals need nameprep in different places

Change canonicalize to normalize

Next steps:
- WG reviews design team recommendations
- Marc and Paul produce -01 based on WG consensus
- Design team keeps working on remaining issues
-- what is still prohibited on output vs. what is mapped to nothing on input
-- a few specific characters need attention

PF:  sent a proposal about taking unmapped characters and calling them 
prohibited or pass through depending on usage (registration or lookup).  If 
you want to talk, I'll forward to nameprep design team.

PH: we'd love to see a specification

Mark Walter: did you find a solution to the dotted capital i problem

PH: yes, we picked one.

MW: Greek capital gamma looks like a capital Russian character, so there is 
room to masquerade one name as another.  One way to fix this is to have the 
user look very hard.  Greek user most likely not use Russian chars.

PH: the design group saw this as the '0'/'O' problem.  We didn't want to go 
there.  THis will go into the security

Rick Wilson: are some of these font characters?

PH: yes.

RW: Can you characterize where you lightened up?  E.g., can you now put in 
Zapf Dingbats?

PH: yes.  We also allow the compatibility characters

Eric Brunner(?): does the change of ordering allow us to fix problems 
introduced by authors of 10646?

PH: yes.  If you have a list of errors, send them to the nameprep design team.

Chris Neuman:  on the versioning issue, having a recommendation that 
deployed software be able to load mapping table without a new version of 
the software, would be reasonable and sufficient.

PH: please send the suggestion to the design team.

DNSII transitional reflexive ASCII compatible encoding, Edmund Chang

Trace is not another ACE.  It is a deployment/implementation strategy.

Trace format is a zone file management system.  We've put a control 
character into the ACE so people won't be able to register before things 
have been finalized.

Transition:
- ASCII to multilingual
- local encoding to ISO 10646
- ace to long term solution
- ace to ace

reflexive:
- deployed at the server end and activated only when certain criteria are met
- utilizes existing RR types as ad hoc records: CNAME, DNAME
-- using DNAME is probably better

Long term solution:
- ACE
-- strength: easy to deploy
-- weakness: version control
- Protocol approach
-- expandability
-- weakness: more difficult to deploy
- Phased hybrid approach
-- ACE approach as immediate and fallback, protocol approach as next 
generation (eliminating ACE versioning requirement)

Bit-flag based:
- DNSII-TRace format
- \127\127ILET-Hex
Possible implementation:
- \12701--UTF8inhex
- \12701--acestring

Quasi-directory based
- DNS directory hybrid
-- utilizing the DNS wildcard: *
- *.domain in zonefile
- employ separate server for lookup and sub-delegation

OpenIDN
- open-sourced NeDNS
- Current implementations:
- RACE, simple hex dump DNSII & TRACE
- Contemplated Additoins:
-- IDNE, DNS CLASS (john Klensin)
- Invite all I-D authors and interested parties to contribute to the IDN 
server experiments
- http://www.openidn.org

Paul Hoffman:  have you done a draft on this?

EC: Yes, TRACE.

PH: Have you released the IPR for your patents?

EC: I send the info to the secretariat when we put the site up.

JK: For those whose proposals or plans require changing or doing tricky 
things with the DNS, please remember that it is a complex 
protocol.  Wildcards will get you.  DNS is UDP, it has timeout 
properties.  Not many tries possible.  Please understand the protocol 
before you go

EC: that's exactly why we want to create a working prototype and see what 
happens.

JK: the interesting thing about the IETF is that we have exactly one 
problem: scaling.  A prototype can often tell us exactly nothing.

* LACE: Length based ACE for IDN, Mark Davis

Goal: simple design with good compression.  Uses run length encoding.  Uses 
base 32 encoding

Input is UTF-16 after nameprep.  Take each sequence of common top bytes.
If total length happens to be longer than the original, then you just quote 
the UTF-16 and base32 encode

Same compression as RACE for incompressible characters.

Simple.  All code points are equal.  No quoting necessary, except when no 
compression possible.

RW: I like this.  You should update your draft

JS: Can you explain why LACE would have better compression than RACE?

MD: If you have two different scripts, LACE will be better than RACE.

MW: Any comments on how it handles names outside plane 0?

MD: It handles it as UTF-16

YY: LACE is simple and efficient for Japanese

* Designing an ACE for IDN, Mark Welter

Had a bunch of ACEs proposed, explores design base well.

Using Unicode as a base is good. Simplicity and efficiency are where 
difference lie.

Enoding algorithm should be straight forward.
If possible, making it pencil and paper algorithm would be nice.

My two schemes are nibble based.

Efficiency:
- should have uniform treatment for the various scripts
- CJK are pre-compressed, can't expect too much better than hex

Handling surrogates
- in our UTF-6 proposal, we treated surrogates to 16 bit quantities and 
closed to our eyes to the issue
- we should be dealing with surrogates expanded

DUDE:
- encoding based on radix 16, representation of initial code point followed 
by encoded hex diffs of subsequent codepoints

What about surrogates:
- if you don't expand surrogates, the worst case limits are half to two 
thirds of claimed name lengths
- DUDE handles expanded surrogates gracefully

Ways to separate IDNs from ordinary DNs
- add a per segment redirector
- add a once per name redirector
- combination of above

MD: Characters above FFFF will be extremely rare, so trying to compress is 
a waste of effort.

MW: focus is handling full Unicode.  Matter of taste.

* Update on RACE, Paul Hoffman

Major changes in -03 draft
- added the need to check for all -STD13 names before encoding and after 
decoding
- added many error conditions in both the ACE and the Base32 encoding and 
decoding
-- didn't change anything on the wire
- Changed all the examples to use lowercase characters on input
-- nameprep is going to change everything to lowercase

What I didn't change:
- left the prefix the same because the bits on the wire are the same

Verisign is using RACE in their testbed.  They provided the first 1500 
rejections.  Some were base on Verisign's prohibited character.  Most were 
errors in RACE encoding.  Speaks poorly of RACE's ease of implementation

RW: for those that are implementing RACE, there is a mailing list for 
developers.  Send mail to Rick <rick@ar.com>.

* Judging ACEs, Paul Hoffman

Going to have to pick an ACE.  What we should look at.

Features of a good ACE:
- it has compression
-- least restriction for total name length
-- shorter transmissions
- simplicity
-- easy to code
-- easy to find bugs in code
-- few special cases

Compression:
- prohibiting sensible long names is bad
- all ACEs allow different length for different scripts
- shorter transmissions is useful but not nearly as important as not 
restricting long names
-- if we have a trademark, go with longer names

Simplicity:
- RACE has show that implementors can easily get it wrong
- Even if compression step is easy, if decompressing is hard, display and 
security errors will be made

Complexity:
- special cases get missed or are misunderstood
- bit stuffing achieves better compression but is very difficult for many 
programmers
-- even base32 seems hard for some

Mandatory Features for all ACEs:
- encoding a string of characters can have only one result
- decoding an ACE string can have only one result
- There must be a way to indicate a version number

Summary
- compression
- simplicity
- mandatory features
- what else?

MW: in terms of decoding, it is complex enough that the only way you can 
guarantee is to take the resulting decoded Unicode, encode it and decode it 
again to see it is the same.

PH: I agree.

* Discussion and WG Next Steps, Marc Blanchet

We think we have a lot of solutions.  We should narrow the solution 
space.  We've had two design teams, one on nameprep one on protocol.

Is the nameprep work the right thing to do as a major WG orientation?

RW: yes, I agree with what you are going and should continue to do it.

MB: Namprep right direction?

<WG consensus>

MB: on the protocol, we have a recommendation to use an ACE (not 
disallowing something different in the future).  Do we have enough information?

RW:  I think some of the things John Klensin proposed should be looked into 
further.  I don't think we should pick RACE.

Dongman Lee:  I'd like more information about 8-bit approach problems.  A 
web page or report would be helpful.

MB: 8-bit on the wire or 8-bit tagged.

Dongman Lee: more interested in the problems

RW:  there will be a report kicked to the DNSO on operational 
considerations for the registrar constituency

Erik Nordmark:  Are you asking to choose an ACE today?

MB: The question was are we going forward with the design team recommendation.

Rob Austein:  I believe continuing to figure out which ACE to choose is 
good.  Both of JK's drafts came in late.  I don't think we've had time to 
consider them properly.  My major concern is that we may never move on from 
an ACE if we deploy it.

Olafur Gudmundsson:  it was very hard to recommend an ACE while on the 
protocol design time.  I'm not sure it will work, but I can't find a place 
in applications where it will fail.  We're looking for 7-10 year deployment 
period.  All current proposals fail in one way or another.

Paul Hoffman: if we go with a short-term (application only ACE), we are not 
preventing longer term solutions.  I want a long term directory solution.

David Lawrence: as a member of the protocol design team, I don't want the 
IETF/IDN to be marginalized if we don't respond in a timeframe that matches 
the market demand.

Ted Hardie: getting something out of the Internet infrastructure is 
hard.  Effort should be focused on the more general solution.  There might 
be other areas of the presentation side of things that need to be looked at.

Patrik Faltstrom: as an area director, seeing what works and not works 
doesn't depend on the applications, we need to look at other 
protocols.  Regarding the ACE encodings, I think the path we are going is a 
much more layered design than we normally do.  This doesn't preclude 
inserting a directory layer

John Klensin:  this working group has two choices regarding market forces 
and competitive ideas.  Do it hastily or do it right.  Too late to do it 
hastily.  If the good solution solves the problem better than the market, 
the right solution wins.

Harald Alvestrand: before we leave this room today we should have an idea 
of whether we should go with ACE or not.

Mark Davis: I agree with the last speaker.  People have been talking about 
domain names as if what you see is what you get, but the stuff over the 
wire is just bits.  Using one of the ACEs gives a different set of bits.

<>: Comfortable with what the protocol design team did.  I don't think we 
should have a final sign off unless we have a transition strategy for the 
long term solution.

MB: Many said we should choose ACE.  Many said we should think more.

JS: Do we have enough information to say the protocol design team is 
working the right direction?

PH: We thought we were done.

MB: Is there enough information to make a decision?

<about 20 hands>

JK:  Of the people who think they have enough information, do you feel you 
know enough about the DNS?

<a few hands>

DL: is the consensus of the working group to be focusing on an application 
area approach.  Should we stop looking at infrastructure approaches.

<half the room>

HA: this does not mean the infrastructure need replacing.  just that this 
WG is not the one to do it.

MB: Next step is choosing an ACE.

Prev by Date: Re: [idn] Some comments
Next by Date: Re: [idn] who uses BIND as a base for their internationalized DNSpackage?
Prev by thread: Re: [idn] The report from the design team
Next by thread: Re: [idn] San Diego Meeting Notes
Index(es):
- Date
- Thread