[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Combining characters (was: Re: [idn] hostname historyhell)
Sorry, I fogot about checking cc field. Thank you for catching it,
and I'd better clean up this message a little for a easier reading.
> > > liana Ye wrote:
> > >
> > > > I'd like to propose a more specific layering of IDN symbols:
> > > >
> > > > From the top where the user input buffer offers:
> > > >
> > > > Layer 3: label seperators and label order normalizing
> > > >
> > > > Layer 2: Bidi label normalizing (or verticle label >
normalizing)
> > >
> > > What is the current display order for unstructured and structured
> > > data in right-to-left display systems? Does unstructured data "the
> files are on server1") typically flow RTL, while URLs and other
structured
> data display as LTR?
> > >
As far as I know, unstructured data mostly based on LTR internally,
and displayed as RTL or swaped to topdown display. But I don't
know how people handles Mongolian and Man text specifically,
they are also the cases I have in mind when I have mentioned Layer 3
and Layer 2.
> > > It seems that these questions are for the structured data groups to
> > > handle when they decide on an output presentation mechanism.
But if URLs and
> > > other structured types will display RTL then that may affect us
> as well (your Layer 3 label ordering in particular).
Yes. These are issues have been raised before by Ben. For
example, Chinese addresses from general to specifics:
China, province, city, district, street, building#, appartment#
While English is the reverse but only partially:
Building#, street, appartment#, city, state, country
From processing view point, the Chinese way produces less
confusion, while our TDL has been following the English way.
For IDN, if we are speaking of backward compatibility with
the current DNS, we do need to allow different local label
ordering. Which group handling this part?
As this group concerns, I think we need come up with
1) a list of equivalent symbols as label separators;
2) a list of special character processing protocol, which
mostly have been in use from the first day of ASCII
standard, for example: $/%/?/*, which may indicate
how to handle special characters whthin a label.
I think this the base for us to work on bidi and vertical
issues in Layer 2 nomalization.
After we get the two lists out of our way, and we can have
a table of prohibition and a sensible equivalent list to
work with before we broadly exclude all of them in
[nameprep].
Liana
> > > > Layer 1.5: diacritic marks and combining symbol normalizing
> > > >
> > > > Layer 1: IDN identifier matching or whatever comes out
> > > > of [nameprep].
> > > >
> > > > The reason for Layer 1.5 is that these symbols can be
> > > > treated in a similar way with Han characters depending on
> > > > what architecture we end up with, and what ACE will be
> > > > our focus.
> > >
> > > --
> > > Eric A. Hall
> > > http://www.ehsco.com/
> > > Internet Core Protocols
> > > http://www.oreilly.com/catalog/coreprot/
>
> --
> Eric A. Hall
> http://www.ehsco.com/
> Internet Core Protocols
> http://www.oreilly.com/catalog/coreprot/