[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] what are the IDN identifiers?



>   We have [STD13] defines that LDH are the DNS identifiers,
> then what are the IDN identifiers?  UCS is too big and contains
> many semantically equivalent characters for IDN.  Should we
> ask for a table of semantically equivalent character sets
> definition table from Unicode Consortium?

To me, what you saying is no different from Normalization Form.

> 1) label separators, ie puncturations and formating marks
> 2) structured data indicators, ie. $/%/& ...
> 3) unstructured data identifiers, ie. alphabet, CJKs,
>  sound marks...

Take a look at the categories. It is already there. We just have to use
it properly.

> 1)case insensitive,

Case folding. Done.

> 2)size or width insensitive,

Normalization Form. Done.

> 3)font insensitive (include majority of TC/SC)

Unicode Consortium dont deal with fonts. It provide characters for
references, but not fonts standardisation. TC-SC is not "font
sensitivity" issues.

> 4)language insensitive (include CJK),

Normalization form *is* language insensitive. It only deals with
scripts.

> 5)combination insensitive(regardless NFC or KNFC).

>   Language insensitive: ie. circled numbers, circled
> Han numerals, Dingbats, subset of CJKs.  But other
> subset of CJK will be different semantically for each
> languages, then we have to have separated tables to
> work with for each or them.

You are venturing into a very dangerous area of script vs language.
ISO10646 and UCS is a script based CCS, not a language-based.
The moment we want to deal with "language", we are on our own.

AFAICS, we do have agreement that we can do I18N, => script.
We do not have agreement to do multilingual => language.
Please dont confuse the two.

I have no intention to start a conversation about "multilingual domain
names". We tried and the conclusion is that it is not possible.

-James Seng