[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] what are the IDN identifiers?
> We have [STD13] defines that LDH are the DNS identifiers,
> then what are the IDN identifiers? UCS is too big and contains
> many semantically equivalent characters for IDN. Should we
> ask for a table of semantically equivalent character sets
> definition table from Unicode Consortium?
To me, what you saying is no different from Normalization Form.
> 1) label separators, ie puncturations and formating marks
> 2) structured data indicators, ie. $/%/& ...
> 3) unstructured data identifiers, ie. alphabet, CJKs,
> sound marks...
Take a look at the categories. It is already there. We just have to use
it properly.
> 1)case insensitive,
Case folding. Done.
> 2)size or width insensitive,
Normalization Form. Done.
> 3)font insensitive (include majority of TC/SC)
Unicode Consortium dont deal with fonts. It provide characters for
references, but not fonts standardisation. TC-SC is not "font
sensitivity" issues.
> 4)language insensitive (include CJK),
Normalization form *is* language insensitive. It only deals with
scripts.
> 5)combination insensitive(regardless NFC or KNFC).
> Language insensitive: ie. circled numbers, circled
> Han numerals, Dingbats, subset of CJKs. But other
> subset of CJK will be different semantically for each
> languages, then we have to have separated tables to
> work with for each or them.
You are venturing into a very dangerous area of script vs language.
ISO10646 and UCS is a script based CCS, not a language-based.
The moment we want to deal with "language", we are on our own.
AFAICS, we do have agreement that we can do I18N, => script.
We do not have agreement to do multilingual => language.
Please dont confuse the two.
I have no intention to start a conversation about "multilingual domain
names". We tried and the conclusion is that it is not possible.
-James Seng