[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: IDN and Kerberos



Some comments.

-James Seng

Background:
   ...
   This ASN.1 type requires the use of designation and invocation escape
   sequences as specified in ISO-2022/ECMA-35 to switch character sets,
   and the default character set that is designated for G0 is the
   ISO-646/ECMA-6 International Reference Version (IRV) (aka U.S.
ASCII),
   which mostly works.

JS>> If ASN.1 specify that its string is in ISO-2022, then it might be
worthwhile to consider to "fix" this in ASN.1 instead of working
around it Kerberos. You may end up with problems with other ASN.1
standard encoder/decoder.

   ISO-2022/ECMA-35 defines four character-set code elements (G0..G3)
and
   two Control-function code elements (C0..C1).  DER prohibits the
   invocation of character sets into any but the G0 and C0 sets.

JS>> Usually you dont invoke anything into G0 except ISO646. ISO8859
are invoked into G1. But as a matter of curiousity, what has DER do with
invocation of character sets?

Transitioning to the use of UTF-8:

   For various reasons, a transition to the use of UTF-8 encoding is
   desirable.  First, there is a mandate from the IESG to support
   international character sets generally, and UTF-8 specifically.

JS>> RFC2277 specify a desire for protocol to support UTF-8. But it is
in need to be updated. Desire and practical dont always meet
unfortunately.
You should also take a look at draft-alverstand-i18n-guide.

Therefore, the second sentence is definately too strong. It is
appropriate
to say a transition to the use of ISO/IEC 10646 aka Unicode Character
Set
is desired. The encoding is preferably (but not neccessary) UTF-8.

   ... As I8N support is deployed in DNS
   there will be a need to represent Unicode service names.

s/I8N/I18N/ s/DNS/domain names/

   At the same time, backward compatibility with the existing installed
   base is crucial.  Few site administrators have the luxury of
declaring a
   flash cut-over of all users, applications, servers, etc to an
incompatible
   protocol -- many have non-local users over whom they have little or
no
   control.  To this end, it is important for new implementations to be
able
   to tell whether a particular non-US-ASCII string was encoded as UTF-8
by a
   new implementation, or as something else by an old implementation.
In the
   latter case, it is of course impossible to know what the "something
else"
   is without being told in advance.

JS>> Cut-and-paste is generally a operation system problem. A
well-behave
I18N OS with I18N apps will be able to handle cut-and-paste in
appropriate
manner, transcoding if neccessary to the neccessary encodings for the
apps.

   There have been three proposals for how the fields currently encoded
   as GeneralStrings should be interpreted in order to accomplish such
   a transition:

   (1) Lie.  Start using UTF-8, but continue to encode all of these
       fields as GeneralStrings.  To my knowledge, this is what
Microsoft
       is doing today.  This approach is attractive because it requires
       ....

JS>> "Be liberal in what you accept, and conservative in what you send"
     What you describe above is basically a total reverse of this.

   (2) Don't lie.  Start using UTF-8 encoded in GeneralStrings with
       ISO-2022/ECMA-35 compatible escape sequences.

JS>> I dont really think it is possible to use ISO-2022 shifting to tag
UTF-8. But this have nothing to do with G0 sequencing.

   (3) Don't use GeneralString.  In all the places where we currently
       use GeneralString, begin using a new "KerberosString" type
instead.
       This type would be defined as an ASN.1 choice, with GeneralString
       and some form of UTF-8 strings as alternatives.  The selection of
       The new KerberosString could be implemented as one of:

JS>> If this is possible, this would be a nice solution to move towards
UTF-8.

I could also think of two other possible solutions.

(4) Use ASCII Compatible Encoding (ACE) like IDN

(5) Use ISO-2022 since it is already supported in ASN.1 (as you said).
    Yes, but it is a solution.

JS>> You also have other problems like matching, sorting, searching etc.
I am not sure how far you want to do but a few other readings:

Normalization
  - draft-hoffman-stringprep
    (see how it Nameprep is a profile of stringprep)
  - Unicode Technical Report #15

Sorting
  - Unicode Technical Report #
  - ISO-14651