[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: host names and nameprep (was: Re: [idn] IRIs ought to useinternationalized *host* names)
On Tue, 9 Apr 2002, Adam M. Costello wrote:
> > This is a very good first shot. There are some things that have to
> > be carefully checked, e.g. do some M (marks) have to be excluded, or
> > should some signs corresponding to the hyphen-minus be allowed. Two
> > examples I know would be the zero-width space which could be desirable
> > for Farsi,
>
> But that would mean allowing a white-space character in host names.
> There are *lots* of contexts where white-space is a delimiter. Is there
> any non-space character that could serve this purpose for Farsi?
I had a lot of debates with Persian experts here in Iran about ZWNJ in
domain names. There are four possible scenarios:
1. Forbid ZWNJ: This means you can't write many words in a readable way,
the best example being the word "daanesh-aamooz" (student), written "Dal,
Alef, Noon, Sheen, ZWNJ, Alef with Madda Above, Meem, Waw, Zain".
2. Respect ZWNJ: Which makes two domain names that look equal in every way
resolve to different hosts. For example, my first name, written "Reh, Waw,
Zain, Beh, Heh", will be displayed exactly the same way if we insert a
ZWNJ between the first two letters, according to the definition of ZWNJ in
Unicode.
3. Ignore ZWNJ: Sadly, you should then treat names with ZWNJ the same as
those without it. There exist examples which will have different meanings
with or without ZWNJ, or with changing the place of ZWNJ. One such example
is "Noon, Alef, Meem, Heh, Alef, Farsi Yeh". Without a ZWNJ, it translates
to "names"; with a ZWNJ between Heh and Alef, it means "a letter".
4. Treat ZWNJ in an intelligent way: The implementation should ignore ZWNJ
if it doesn't make a visual difference. So it is ignored between two
letters that don't join, but respected in other places. This will require
the domain resolver to be able to do some Arabic contextual shaping.
My personal preference is 4, 3, 2, 1. Latest nameprep does 3.
I'm reading this out of context, so just ask for any specific details,
roozbeh