[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] San Diego Meeting Notes
Dear all,
Here is the minutes of the IDN WG Meeting at San Diego IETF. Thanks to David
(again) scribing 43Kb of text notes :P
Some important highlight: (feel free to object if you think I am wrong).
1. Requirements I-D is almost ready for last call with some minor changes. It
is
likely we will do one more version. Any last comments, please do so ASAP.
2. General feeling that Nameprep sub-team is working in the right direction.
3. Strawpoll agrees with the Protocol sub-team recommendation that we will
focus
on "ACE on Application" now but do not reject the possibiity to do
something
else longer term. What is the longer term solution is left for future
discussion.
-James Seng
IDN WG Meeting San Diego
<RFC 2026 statement>
* Working group update, Marc Blanchet
20 new documents since Pittsburg
most are going to be presented this week
not: UDNS, BRACE, SACE, CJK (last presented in Pittsburgh).
Status of documents
- requirements: last rev and go WG last call after San Diego
WG ACE Prefix registry if needed by the WG can be done on a temporary basis
* More radical solutions: directories and basic DNS changes, John Klensin
Internationalized access to Domain Names: Some Radical Solutions
Worried for some months about too narrow a focus. We're looking for quick
solutions. Problem with quick solutions is that they are usually not as
quick as they're supposed to be, especially in deployment. As a result, we
have to get rid of them. This issue is very important since its effects
are very extensive -- affecting current past and future.
Started exploring a different question -- what should we have done if we
started in the beginning? This is a different question than how to we
start from where we are today.
Short term question: how to get i18n domain names now, i.e., quick
deployment. Steps taken for short term are with us forever.
Design for long term: problem figures out how to get there, preserve
installed base
Designing for i18n:
- not fitting i18n into the network
- not just a DNS problem
-- applications need to work and make sense
-- need to support presentation of non-ASCII characters everywhere users
would expect
-- extreme user case is on whose keyboard doesn't have ASCII
- rethinking several assumptions
IDN may require a presentation layer
- IETF tradition usually
-- keeps users close to protocol interface
-- often done because we are lazy and it is easy
-- or to keep things simple (may be the same thing)
- internationalization may not permit this
- difference between
-- unique strings stored in Internet-wide databases
-- appearance of names to users
-- selection and choice mechanisms
two hypotheses / strawmen
- not a DNS problem
-- we are looking in the wrong place
-- very restricted protocol element, not ASCII
-- still the right choice
-- solve UI problems at another level
- DNS should be fully international
-- content is names, not protocol elements
-- names should reflect international usage
DNS is being pushed in a direction where it doesn't belong
- symptoms showing up in many places, not just i18n
- DNS doesn't do
-- search
-- ambiguity
-- nearest server
directories
- not a technology or protocol but a class of them
-- typically good at handling problems for which the DNS is weak
--- multiple forms of lookup
--- attributes with values
--- tolerance for ambiguity
- IETF has never succeeded in widespread deployment of a directory protocol
-- but maybe this issue is important enough
Directory protocol modules
- applications don't call DNS with user supplied names
-- call directory with names or keywords
-- involve user in evaluation if needed
-- get results ad call DNS
-- DNS labels remain "protocol elements"
For many applications, replace gethostbyname()
Directory advantages
- well adapted to user selection of target from a list
- use with keywords and descriptors not just rigid names
- attributes can identify
- types of use and business location (may help with trademarks)
- multiple server
Disadvantages
- the Internet has never deployed one
- reaching an agreement on protocol and schema
- too many options
Legacy application code:
- don't call directory, hence don't see internationalized names
- can use only a subset of Internet until upgraded
- transition similar to host table to DNS
Other hypothesis
- assume it should have contained non-ASCII (names not protocol elements)
from the beginning
- would look like IS 10646 everywhere
- Mixing old and new conventions cause problems
- A new class? Obsolete "class=IN"
Everything going forward goes into the new class
Changing classes
- in principle, obsoletes everything
-- no requirement to share server tree, delegations, etc. with class=IN
- Some advantages of preserving models
Transition issues
- copy existing records or search
- pointers from the new to old class
- political nightmares
- opportunity to not carry rubbish forward
Legacy applications and Code
- Don't know about new class so "see" only Class=IN
- non-ASCII names are invisible until they are upgraded
Shared problems
- DNS ASCII is convenient
-- case independence and matching rules are easy
-- very small number of characters
- Almost any international character set will require canonicalization and
matching rules
-- and may require localization
- ISO 10646 requires non-obvious coding decisions UTF-8 may not be the
right choice
Documents
- Directory
-- draft-klensin-dns-role-00.txt
- New Class
-- draft-klensin-i18n-class-00.txt
Eliot Lear: the problems I face: we don't have a successful hierarchical
directory. I don't think we have the luxury of time and we don't have the
technology. I don't see anything that can scale as well as the DNS.
JK: let me lay part of that question aside. I can argue either side. The
question of time is an interesting one. Our hope a solution would be
converged upon fast enough that junk wouldn't be deployed. We lost. But
problems with the trash are becoming obvious. Our only opportunity is to
do it right. Half-right and quick won't do it. Taking the extra time to
do it right is the right thing to do as it will help get rid of the
trash. Some better than trash, some worse, average is trash.
Michael Mealing: you said there were components to directory service. Have
you given any thought to what the components might be and whether we
already have some of the components
JK: If I say "yes" and duck, would that be an appropriate
answer? Directories are complicated. Talk about that as a design
principle instead of getting bogged down into whois++/ldap is better
arguments. After posting my draft I received two groups of
comments. First group: first attempt to view it at an architectural
issue. Second group: my schema better than yours. LDAP better than ....
If WG comes to a decision that a directory approach is the best way
forward, then we sit down and focus.
Keith Mitchell: in response to the new class proposal, it seems that what
you're doing is partitioning the applications in the Internet to those who
can deal with I18n and those that can't. The on the wire protocol doesn't
matter, it is what the applications do with the protocol. The other thing
is if we use this to de-cruft the DNS, trying to clean up things gets out
of hand.
JK: No intent to change protocols. The rest is a bar discussion
Hideyo Imazu: Directory based solution is mostly for WWW. ACE based IDN
solution can be applied to email. Among countries like Japan, China,
Korea, internationalization of entire email address should be considered
JK: Directory works as well for email as for the web. Doesn't work for
non-interactive, non-online applications. Architecturally, it feels like
the right solution.
David Chadwick: Support the directory notion, users need the search
facility. Once we have multiple characters sets, we must have a way to search.
James Seng: John's proposal is a very theoretical proposal. Fairly close
to NGDNS. Internationalization of NGDNS is probably one of the problems
that can be fixed. Some of the comments are very true. Good starting
point for NGDNS. Not talking about long term deployment of NGDNS, looking
at RACE for quick deployment. I'm afraid the chaos during transition to
NGDNS would be disastrous.
JK: I believe it will take 10 years to deploy anything this WG comes out with.
JS: Yes and no. To do a directory would require a comprehensive
re-architecture. Removing stuff from current DNS has not really been
discussed.
Ted Hardie: Eliot was wrong, you aren't being radical enough. We have
never had a presentation layer. Putting a directory layer between
application and DNS is a presentation layer. Perhaps coming up with a
presentation layer.
Bob Hinden: Both of these choices have plusses and minuses. There are
harder issues and dealing with them scares me. Solve the important
problems first.
JK: I think this is the most interesting and hardest problems since
deploying IPv4. We need to take this very seriously.
* IDNA, Patrik Faltstrom
ACE from the Applications
Not so nervous about changing applications. Really nervous about backwards
compatibility.
This proposal is to change the presentation layer in applications. Not
change the application layer protocols, the DNS protocols, or the DNS
servers. We only change the presentation and input mechanisms.
When a user inputs a domain name into the application, the information will
be stored in whatever local script the application is using. The
application must do a transformation anyway. Many applications must be
better regardless of what we choose. All applications must transform from
the local character set into something that is entirely backwards compatible.
Changes to applications:
- host names must be nameprep'd
- an ACE is applied
- sends encode name to resolver
Display of host names
+++
Known side effects
- un-updated applications will display obscure ACE format (leakage)
- moving names from update programs to un-updated programs might cause more
leakage
- Non-IDN names that use the ACE prefix or suffix will either be considered
illegal or will appear as nonsense characters
- The IDN WG must choose an ACe
- Doesn't internationalize text records in the DNS zone files
Updating name servers
- administrative interface for DNS servers must all ch3ck IDN names
- probably done with automated scripts converting from and to preferred
native format (which will be different for different users)p
- Will probably be important to check all names with nameprep
No major comments (Yet!) to -00 draft
No planned revision
Pete Resnick: work in a protocol that has lots of layer violations. email
embeds DNS names. Do we need sendmail to change headers to ace encoded things?
PF: one of the good things about this proposal is that it is an application
layer issue.
Paul Hoffman: No. What we're talking about presentation layer change. If
you've changed your application to handle IDNA, then all applications
should handle IDNA. We don't want to touch things that IDNA will be
looking at layer. You must change your paste function.
PR: I update Eudora to do IDNA, I encode for presentation, decode for
transmission.
PF: Yes.
Yoneya: It should apply to any application protocol
PH: Applies to any protocol that has a presentation layer.
Rick Wesson: encoding and decoding leakage was experienced in the VGRS
testbed. Broke lots of things. Leaks happen.
PF: leakage is a bad thing. If leakage occurs and if you know the ACE
version, you can still figure out where things should go.
Hideo Imazu: Considering the fact that there are very few fonts that have
all Unicode characters in it, even if a platform has full support for ACE
encoding, in some cases, it can't encode/decode an ACE sequence for some
characters. Even on a good implementation, leakage can happen.
PF: Yes. It might be possible for user to look at ACE encoded name.
PH: Decoding will be possible, but display may not be possible.
JS: another leakage not considered: user assuming application is
internationalized and its not.
PF: Yes.
* Virtually internationalized domain names, Shim
Allows the use of i18n domain names, but without creating i18n names.
** Description
Most domain names in regions where English is not widely spoken, are
created as the characters of the local language are transliterated into
those of English language.
VIDN formalizes and uses this knowledge of transliteration
At one end, we have local characters, at the other end, we have regular DNS
characters
1. local characters changed to phonemes
2. each phoneme is matched with English phoneme
3. English phonemes are united to form the transliterated name
provides one to many mappings
For one-to-one reversible mapping
a) each server is pre-assigned a unique code (e.g., Unicode)
b) the code is also generated by VIDN on the client whenever the
corresponding virtual domain name is typed.
c) the code is compared to the code retrieved from the server
d) VIDN matches the code
** Implementation and Administration
Development of VIDN software for each local language may be done by a local
standard body
- the codes for one-to-one mapping may be administered by international
standard body such as IANA
or the codes may be administered by a local standards body
** Key features
VIDN does not require creating and registering additional domain names in
local scripts
VIDN does not make any change to the current DNS infrastructure
VIDN does not need separate name server/resolver
** Testing results
Keywords and ACE require conversion, VIDN already there.
Web browser add-on:
- Korean-English conversion
- 800 KB
Comments/ Suggestions
- Comments and responses on the IETF IDN discussion mail archive
- Contact me
Itashiro: different ideographic script can have same pronunciation, how do
you convert?
Shim: include one-to-one mapping scheme
Paul Hoffman: this can be used today, but only when a code isn't
needed. If a code is needed, they'd be non-obvious to users, so your
assertion that this is easier/faster to deploy isn't true. MIght be true
for Korean or maybe Japanese, so same as ACE or IDNA.
Shim: 30%-40% will need the codes. ACE/IDNA will be 100%
Maynard Kang: if you add the addition of the code, then it is the same as ACE.
Internationalized PTR RR, Hong Bo Shi
Why not PTR:
- Current PTR and its mapping method can't support IDNS as well as
traditional ASCII domain names
- It is impossible to let client to choose which it wants from those PTR
records without any additional selection
- It makes no sense to return an unreadable domain name
- But no PTR record is worse.
Why IPTR:
- IPTR can give a language selection to client/end-user
- accordin to the language tag, the cli3ent/end-user can get what they want
from a list of corresponding values
IPTR Format
4.3.1.1.in-addr.arpa. IPTR "LANGUGAGE" "name-in-utf8"
- the LANGUAGE field should be treated in a case insensitive manner and
must follow the conventions defined in 1866
EDNS0 is require to implement IPTR
IPTR format
Langauge: 2 octets, an argument for IPTR to define kind of language used in
the following IDN label
IPTR query/response
If qytpe=IPTR then all of the corresponding IPTR RRs should be returned in
one response
Transport: use UDP first if TC bit set, then use TCP
in future, EDNS0 is required.
PTR extension FOR IDNS ONLY case
it is very difficult to avoid IDN ONLY
IDN ONLY means a host only has its IDN but not any traditional ASCII domain
name
- in above case PTR RR must not be null in a response message
- PTR RR must contain a domain name in ACE to co-op with IDN unaware systems
-- else a syntax error message should be sent back with an administrator
configures zone
Open issues:
- do you think we should return all corresponding IPTR records to a resolver
- nameserver should only send back one IDN in each language
- this kind of function has already been implemented, for example in BIND-9.x.x
To be or not to be
- it is said that the proposal that returns all the corresponding IPTR
records increase the complexity to implement a resolver
- according to the suggestion it is better to let a server just feed back
The above suggestion is in the belief that the IPTR design is to introduce
a language tag. it should be used in queries from client.
Demchenko: it is not enough just language, need character set + language
Jiang: intention was just language, not character set. the character set
is supposed to be UTF-8
Hideo Imazu: specifying language is not enough in some cases.
Jiang: mixture of languages?
HI: some languages have two scripts
J: right. we're just trying to capture language, not script.
* Protocol Design Team Report, David Lawrence:
Primary task: categorize protocol proposals and make recommendation to WG
Members: DL, Olafur Guomundsson, Dan Oscarsson, Paul Hoffman
Observers/commentators: Marc Blanchet, Harald Alvestrand, John Klensin, Rob
Austein, James Seng
Output of the design team
- report to the mailing list (didn't make it before this meeting, will have
it soon afterwards
- comparison of protocols
- possibly other reports or recommendations
-- updating 2425 to take a much deeper look
-- looking at different ACEs
Categories of protocols, based on expected implementation
- Do a long term architecturally clean fix that requires upgrading to the
whole naming infrastructure
- Change only the applications in the sort term and possibly the
application protocols, but change none of the DNS infrastructure
- change the applications for short-term gain, but transition in the long
term to a clearer solution that requires upgrades to the whole naming
infrastructure (two-phase approach)
IDNA- application
IDNE- infrastructure
Big picture
- we cannot simply put binary characters into the current DNS without
breaking many applications and some DNS servers
- none of the solutions at this point is a comprehensive solution that
considers all the effects of the changes proposed
- the design team has not looked at all impacts on applications
Where the solutions fit
- long term solutions mostly involve changes to the current DNS
infrastructure, although there is also a proposal for using a directory
layer with internationalized input to find resources
- short term solutions are based on using ASCII compatible encoding in
applications
- Two phase solutions are a mixture of the two
Infratructure solutions
- long term proposal require that all DNS servers in a request path be
updated before the user can get a correct answer to a query for an
internationalized host name
- different proposals and different legacy DNS servers will cause different
error messages to get back to the user if their query traverse a server
that was not upgraded
- a user can get different results for the same query
- maximum breakage of applications
ACE solutions, positive
- they are easier to implement than the long term solutions, but they are
not without problems
- the obvious advantage is that update just applications will go faster
than updating applications and the entire naming infrastructure
- they work on the presentation layer which means that they don't even
require changes to any application protocols.
ACE solutions, negative
- ASCIIi-type ugly name leakage in non-updated applications
- non-IDN names that use the ACE prefix or suffix will either be considered
illegal or will appear as nonsense characters
- the ACE solutions only internationalize host names, not textual material
that appears in some types of DNS records
- IDN must choose an ACE
- versioning is harder in ACEs than in other proposals
- there are probably other ACE-specific implications that we haven't
thought about yet
Two phase solution
- request that the applications be updated to handle an ACE encoding in the
case the query using the long-term solution fails
- every application must work correctly with the short term solution and
the long term part of the solution has all of the problems listed earlier
- it is likely that only the short term solution would be implemented
unless the long term solution had other notable advantages
ACE considerations
- ACE must be designed to have one-to-one mapping and versioning
- they must continue to use 63 octet max for name parts, while other
proposals cold extend the length of the name parts
Changes to applications
- all scenarios will require that all applications that use the new names
be updated 2825 lists many protocols for which i18n may be very difficult.
Go with DNS infrastructure Change
- the decision about which of the solutions is chosen should be made by
people in the DNS community and Application area with internationalization
community
Directory infrastructure
- WG needs to work closely with the directory community for both protocol
and schema interoperability
- no successful operational experience in an Internet wide directory service
Suggestion: go with application-only solution,
- focusing on the negative attributes of each solution, the design team
considers the short--term "use ACE from the application"
Looking at costs:
- we note that the more arch. desirable infrastructure solutions are very
costly in terms of new protocol work needed and upgrading deployed name servers
- predicting when any of the solutions might actually be useful is
impossible, making them very difficult to sell to the Internet community
An ACE solution does not prevent an infrastructure solution
- fortunately choosing the ACE solution now does not preclude the EITF
What's next:
- design team finishes its report and send it to the WG
- design team finishes the comparison document
- somebody should study the impacts of the chose solution more in-depth
- WG decides
J Kim: forgot one negative -- two sided business card problem. Amazon.com
in English, Amazon.com in ACE is a tour site. That is a serious problem to
solve.
DL: which aspect of the two-side business card are you talking about. this
is a registry issue.
JK: but the problems arise from a technical basis
OG: the ace encoded name will be long and ugly
RW: Verisign has posted their resolution proposal: UTF-8 on the wire, how
did the design team feel about it.
OG: no comment.
PH: UTF-8 on the wire was a non-starter
DL: we're avoiding recommending a two phase solution
EL: operational infrastructure change is not mutually exclusive from
presentation change. Refers back to John proposal. Two problems:
representing domain names and transliteration. there is value add.
RB: DNS protocol supports 8 bit binary. Does not break DNS but might break
other protocols.
Design Team: agree
MM: directory approach not feasible due to schema is a red herring as DNS
RRs define schema. If you constrain your space, makes the problem more
easily solved.
DL: Yup.
Proposal for a determining process of ACE identifier, Naomasa Murayama
Requirements of IDN technology:
- unified root
- interoperability
- compatibility to BIND
RACE: row based ACE
- prefix bq--
Brace: bimode row based ACE
Why ACE?
1. permitting 8-bit domain names and modify name server software to 8 bit
clean
2. partitioning the current domain name space to accommodate MDNs using
some kind of ACE
problems for approach 1
- difficulty in inter-operability
- needs a change of DNS protocol
problem for 2
- needs a consensus among all domain name registries in partitioning the DN
space
How can we negotiate for a partitioning of the DN space
- how we can select ace identifiers
ACE identifiers :=- ACE prefixes | ace Suffixes
What is happening in the testbed of MDN in .com,.net,.org
registration started from Nov. 10 but <nihongo>.com is encoded and taken by
ns.bulkregister.com on Nov. 2
ACE identifier candidates
- prefixes: AA--, AB--, ..., 99--
- suffixes: --AA, --AB, ..., --99
Relevant domain names:
aa--a.com, aa-b.org, ..99--zzzz.net, aa--x.co.jp, etc.
a-aa.com, b--aa.org, ..., zzzzz--99.or.kr, etc.
Proposal
step 1: tentative suspension of registering relevant domain names for ACE
identifier candidates
step 2: conduct a survey of relevant domain names already registered
step 3: select about 10 to 20 identifiers one of which is for test and
others for real use, based on the survey
step 4: permanent blocking of registrations of domain names relevant to the
selected identifiers (except for registrations compliant to MDN semantics).
when writing an ACE proposal
author should either
- describe the ACE identifier as "to be decided"
which must be decided by the IDN WG or other organ when it is p[ublished as
an Internet draft)
or use an ACE identifier
When a proposal becomes an Internet standard
- when a specific ACE proposal is accepted as an Internet standard, the
experimental ACE identifier should be replace by one for real use
(hopefully decided by IANA)
Important change from -00 to -01
excluded suffixes of one hyphen followed by the alpha numerics from the
candidates
Among 227, 852 registrations of .JP domain names, 23921 were relevant to
these suffixes
Need cooperation of IETF, ICANN, and domain name registries.
* Handling revisions of IDN, Marc Blanchet
Problem statement: Unicode is going to have revisions because of
characters, languages, scripts that change
Nameprep is going to have revisions. Nameprep should not necessarily be
sync'd with Unicode revisions. We might fix bugs in nameprep, etc.
Protocol will have revisions. We should include versioning in the
requirement document
Patrik Faltstrom: I think the requirement should be able to handle changes
in Unicode. I'm not sure we need versioning. I have some ideas on how to
handle this in ACE which would not affect nameprep. Not fully
baked. Might not need versioning. Don't want to have versioning as the
requirement. Unicode and nameprep will change.
MB: will work together on how to specify the wording for the requirements
Versioning with ACEs and IDNA-like approach: there is no protocol. One
way: have a different prefix for each version. But: no negotiation is
handled. Needs the same domain registered with different prefixes.
Versioning with DNS extensions
Version numbers: simple, increment by 1. More complex major.minor. minor
changes table lookup, major being changes in lookup algorithm.
Table format simplified Unicode table.
Conclusion: we need versioning in IDN. We can do it in different ways and
need to think about it.
Harald Alvestrand: versioning in the ACE means you'll either see every
i18n domain name disappear each time you upgrade client or you'll need to
do queries for each version. Having different labels is too broken for words.
Mark Welter: what would drive the versioning is characters forbidden at the
application layer. Can push this to the registration layer.
MB: I'd prefer it if we don't need it at all. But we should think about it
before doing the protocol.
Randy Prezen: You only care what happens after the ACE transform is applied.
MB: Just think about it. If we don't need it, good.
* Japanese characters in multilingual domain name label, Yoneya
Definition of characters to be used as JP characters in MDN label
Definition of JP characters to be normalized.
Def. of JP characters:
- idntabjp10.txt
-- does not include NAMEPREP prohibited characters
- usual characters for JP names
- selection of chars is based on JIS
- table consists of code points in Unicode and corresponding JIS code
-- does not mean specifying chars
6531 characters in table
Kanji: 6355
Hiragana: 83
Katackana: 86
Graphic:7
Definiton of normalization:
table of compatible characters to be canonicalized
idntabjpcanon10.txt
- one character must be added, will correct in next version
compatible characters prohibited in NAMEPREP but widely used in PCs, PDA, etc.
- half width katakana
- full width alphabets, numerics, hyphen
- table consists from code points in Unicode to be canonicalized
-- half width katakana and full width katakana must be canonicalized to the
same thing
Def. of normalization (cont)
Table of characters to be composed
idntabjpcomp10.txt
composition of kana and voiced sound mark varieties
- the table consists of code point sequences in Unicode to be composed
-- ka-tenten -> ga
Definiton of normalization rules:
1. canonicalize compatible characters
-- adopt idntabjpcanon conversion
2. compose voiced sound marks
-- adopt idmtabjpcomp conversion
Example:
1 canonicalization idntabjpcanon
2. canonicalization idntabjpcomp
3. NAMEPREP
Why:
- convenience for users and implementors
-- explicit definition of usable characters and normalization
-- Unicode KC is insufficient
--- differences exist between VGRS and JPNIC
Mark Davis: the changes you are talking about are combining certain
forms. I don't see the requirement for additional steps.
YY: difference is between JIS and Unicode KC.
MD: if you map these together in the folding step than KC takes care of
this for you in the normalization step.
* Nameprep design team report, Paul Hoffman
This is a summary of what we posted to the mailing list last week.
Overview:
- make it easy for user to enter names.
-- we don't want to make it hard.
- prohibit as little as possible
-- not all domain names are entered by typing.
- keep names sensible
-- there are plenty of chars (such as backspace) that are bad/dangerous.
- linguistic juggling.
-- don't over-restrict. not a protocol discussion. "yes, good
character. no, bad character"
Proposed changes from -00:
- fewer prohibitions on input
-- may limit on output
-- make it so that input programs do not need to follow the output rules
- make it easier to implement
-- every application has to do nameprep regardless of protocol approach taken
-- give tables for 2 of 3 steps
prohibit less:
- it is difficult and probably not useful to try to limit confusion
-- e.g., should 'O' been prohibited because it looks like '0'.
- get out of the business of disallowing because they look alike
- -01 will have much smaller list of prohibited characters
- -00 prohibited compatibility characters. -01 says that if you can
algorithmically change, then accept on input
-- many examples in Arabic and Asian scripts
Change order of steps:
- ordering was prohibit -> fold (case mapping) -> normalize
- ordering is now map -> normalize -> prohibit
-- prohibit on output
Many edge cases in ISO-10646, so doing mapping first can be very clean.
Currently we just do case mapping. New version will do additional mapping
such as mapping all hyphen characters to normal hyphen. There are some
special cases for case-mapping that need to be added so that all characters
case-map as expected. Won't change semantic meaning of characters (in JP,
hyphen and lengthener characters would be treated differently). At the end
of the process, we won't have surprised characters.
Have option of mapping into nothing instead of prohibiting. Haven't
specified which we would do this to, but Arabic and Hebrew vowels could be
mapped to nothing (as per discussion on mailing list).
Use a couple of hundred line list of mappings to be done with the first
step of nameprep. We think this would simplify things
New case folding:
- mapping to lowercase will be derived from Unicode case mapping file
Non-character code points:
- non-character codepoints will be listed as prohibited characters
- already Unicode code points assigned as non-characters
Make everyone look outside plane 0.
Remove location of nameprep
- this is a protocol issue. the protocol must say where nameprep is done.
- different protocol proposals need nameprep in different places
Change canonicalize to normalize
Next steps:
- WG reviews design team recommendations
- Marc and Paul produce -01 based on WG consensus
- Design team keeps working on remaining issues
-- what is still prohibited on output vs. what is mapped to nothing on input
-- a few specific characters need attention
PF: sent a proposal about taking unmapped characters and calling them
prohibited or pass through depending on usage (registration or lookup). If
you want to talk, I'll forward to nameprep design team.
PH: we'd love to see a specification
Mark Walter: did you find a solution to the dotted capital i problem
PH: yes, we picked one.
MW: Greek capital gamma looks like a capital Russian character, so there is
room to masquerade one name as another. One way to fix this is to have the
user look very hard. Greek user most likely not use Russian chars.
PH: the design group saw this as the '0'/'O' problem. We didn't want to go
there. THis will go into the security
Rick Wilson: are some of these font characters?
PH: yes.
RW: Can you characterize where you lightened up? E.g., can you now put in
Zapf Dingbats?
PH: yes. We also allow the compatibility characters
Eric Brunner(?): does the change of ordering allow us to fix problems
introduced by authors of 10646?
PH: yes. If you have a list of errors, send them to the nameprep design team.
Chris Neuman: on the versioning issue, having a recommendation that
deployed software be able to load mapping table without a new version of
the software, would be reasonable and sufficient.
PH: please send the suggestion to the design team.
DNSII transitional reflexive ASCII compatible encoding, Edmund Chang
Trace is not another ACE. It is a deployment/implementation strategy.
Trace format is a zone file management system. We've put a control
character into the ACE so people won't be able to register before things
have been finalized.
Transition:
- ASCII to multilingual
- local encoding to ISO 10646
- ace to long term solution
- ace to ace
reflexive:
- deployed at the server end and activated only when certain criteria are met
- utilizes existing RR types as ad hoc records: CNAME, DNAME
-- using DNAME is probably better
Long term solution:
- ACE
-- strength: easy to deploy
-- weakness: version control
- Protocol approach
-- expandability
-- weakness: more difficult to deploy
- Phased hybrid approach
-- ACE approach as immediate and fallback, protocol approach as next
generation (eliminating ACE versioning requirement)
Bit-flag based:
- DNSII-TRace format
- \127\127ILET-Hex
Possible implementation:
- \12701--UTF8inhex
- \12701--acestring
Quasi-directory based
- DNS directory hybrid
-- utilizing the DNS wildcard: *
- *.domain in zonefile
- employ separate server for lookup and sub-delegation
OpenIDN
- open-sourced NeDNS
- Current implementations:
- RACE, simple hex dump DNSII & TRACE
- Contemplated Additoins:
-- IDNE, DNS CLASS (john Klensin)
- Invite all I-D authors and interested parties to contribute to the IDN
server experiments
- http://www.openidn.org
Paul Hoffman: have you done a draft on this?
EC: Yes, TRACE.
PH: Have you released the IPR for your patents?
EC: I send the info to the secretariat when we put the site up.
JK: For those whose proposals or plans require changing or doing tricky
things with the DNS, please remember that it is a complex
protocol. Wildcards will get you. DNS is UDP, it has timeout
properties. Not many tries possible. Please understand the protocol
before you go
EC: that's exactly why we want to create a working prototype and see what
happens.
JK: the interesting thing about the IETF is that we have exactly one
problem: scaling. A prototype can often tell us exactly nothing.
* LACE: Length based ACE for IDN, Mark Davis
Goal: simple design with good compression. Uses run length encoding. Uses
base 32 encoding
Input is UTF-16 after nameprep. Take each sequence of common top bytes.
If total length happens to be longer than the original, then you just quote
the UTF-16 and base32 encode
Same compression as RACE for incompressible characters.
Simple. All code points are equal. No quoting necessary, except when no
compression possible.
RW: I like this. You should update your draft
JS: Can you explain why LACE would have better compression than RACE?
MD: If you have two different scripts, LACE will be better than RACE.
MW: Any comments on how it handles names outside plane 0?
MD: It handles it as UTF-16
YY: LACE is simple and efficient for Japanese
* Designing an ACE for IDN, Mark Welter
Had a bunch of ACEs proposed, explores design base well.
Using Unicode as a base is good. Simplicity and efficiency are where
difference lie.
Enoding algorithm should be straight forward.
If possible, making it pencil and paper algorithm would be nice.
My two schemes are nibble based.
Efficiency:
- should have uniform treatment for the various scripts
- CJK are pre-compressed, can't expect too much better than hex
Handling surrogates
- in our UTF-6 proposal, we treated surrogates to 16 bit quantities and
closed to our eyes to the issue
- we should be dealing with surrogates expanded
DUDE:
- encoding based on radix 16, representation of initial code point followed
by encoded hex diffs of subsequent codepoints
What about surrogates:
- if you don't expand surrogates, the worst case limits are half to two
thirds of claimed name lengths
- DUDE handles expanded surrogates gracefully
Ways to separate IDNs from ordinary DNs
- add a per segment redirector
- add a once per name redirector
- combination of above
MD: Characters above FFFF will be extremely rare, so trying to compress is
a waste of effort.
MW: focus is handling full Unicode. Matter of taste.
* Update on RACE, Paul Hoffman
Major changes in -03 draft
- added the need to check for all -STD13 names before encoding and after
decoding
- added many error conditions in both the ACE and the Base32 encoding and
decoding
-- didn't change anything on the wire
- Changed all the examples to use lowercase characters on input
-- nameprep is going to change everything to lowercase
What I didn't change:
- left the prefix the same because the bits on the wire are the same
Verisign is using RACE in their testbed. They provided the first 1500
rejections. Some were base on Verisign's prohibited character. Most were
errors in RACE encoding. Speaks poorly of RACE's ease of implementation
RW: for those that are implementing RACE, there is a mailing list for
developers. Send mail to Rick <rick@ar.com>.
* Judging ACEs, Paul Hoffman
Going to have to pick an ACE. What we should look at.
Features of a good ACE:
- it has compression
-- least restriction for total name length
-- shorter transmissions
- simplicity
-- easy to code
-- easy to find bugs in code
-- few special cases
Compression:
- prohibiting sensible long names is bad
- all ACEs allow different length for different scripts
- shorter transmissions is useful but not nearly as important as not
restricting long names
-- if we have a trademark, go with longer names
Simplicity:
- RACE has show that implementors can easily get it wrong
- Even if compression step is easy, if decompressing is hard, display and
security errors will be made
Complexity:
- special cases get missed or are misunderstood
- bit stuffing achieves better compression but is very difficult for many
programmers
-- even base32 seems hard for some
Mandatory Features for all ACEs:
- encoding a string of characters can have only one result
- decoding an ACE string can have only one result
- There must be a way to indicate a version number
Summary
- compression
- simplicity
- mandatory features
- what else?
MW: in terms of decoding, it is complex enough that the only way you can
guarantee is to take the resulting decoded Unicode, encode it and decode it
again to see it is the same.
PH: I agree.
* Discussion and WG Next Steps, Marc Blanchet
We think we have a lot of solutions. We should narrow the solution
space. We've had two design teams, one on nameprep one on protocol.
Is the nameprep work the right thing to do as a major WG orientation?
RW: yes, I agree with what you are going and should continue to do it.
MB: Namprep right direction?
<WG consensus>
MB: on the protocol, we have a recommendation to use an ACE (not
disallowing something different in the future). Do we have enough information?
RW: I think some of the things John Klensin proposed should be looked into
further. I don't think we should pick RACE.
Dongman Lee: I'd like more information about 8-bit approach problems. A
web page or report would be helpful.
MB: 8-bit on the wire or 8-bit tagged.
Dongman Lee: more interested in the problems
RW: there will be a report kicked to the DNSO on operational
considerations for the registrar constituency
Erik Nordmark: Are you asking to choose an ACE today?
MB: The question was are we going forward with the design team recommendation.
Rob Austein: I believe continuing to figure out which ACE to choose is
good. Both of JK's drafts came in late. I don't think we've had time to
consider them properly. My major concern is that we may never move on from
an ACE if we deploy it.
Olafur Gudmundsson: it was very hard to recommend an ACE while on the
protocol design time. I'm not sure it will work, but I can't find a place
in applications where it will fail. We're looking for 7-10 year deployment
period. All current proposals fail in one way or another.
Paul Hoffman: if we go with a short-term (application only ACE), we are not
preventing longer term solutions. I want a long term directory solution.
David Lawrence: as a member of the protocol design team, I don't want the
IETF/IDN to be marginalized if we don't respond in a timeframe that matches
the market demand.
Ted Hardie: getting something out of the Internet infrastructure is
hard. Effort should be focused on the more general solution. There might
be other areas of the presentation side of things that need to be looked at.
Patrik Faltstrom: as an area director, seeing what works and not works
doesn't depend on the applications, we need to look at other
protocols. Regarding the ACE encodings, I think the path we are going is a
much more layered design than we normally do. This doesn't preclude
inserting a directory layer
John Klensin: this working group has two choices regarding market forces
and competitive ideas. Do it hastily or do it right. Too late to do it
hastily. If the good solution solves the problem better than the market,
the right solution wins.
Harald Alvestrand: before we leave this room today we should have an idea
of whether we should go with ACE or not.
Mark Davis: I agree with the last speaker. People have been talking about
domain names as if what you see is what you get, but the stuff over the
wire is just bits. Using one of the ACEs gives a different set of bits.
<>: Comfortable with what the protocol design team did. I don't think we
should have a final sign off unless we have a transition strategy for the
long term solution.
MB: Many said we should choose ACE. Many said we should think more.
JS: Do we have enough information to say the protocol design team is
working the right direction?
PH: We thought we were done.
MB: Is there enough information to make a decision?
<about 20 hands>
JK: Of the people who think they have enough information, do you feel you
know enough about the DNS?
<a few hands>
DL: is the consensus of the working group to be focusing on an application
area approach. Should we stop looking at infrastructure approaches.
<half the room>
HA: this does not mean the infrastructure need replacing. just that this
WG is not the one to do it.
MB: Next step is choosing an ACE.