[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] ZWNJ
--On Saturday, 28 July, 2001 17:31 +0430 Roozbeh Pournader
<roozbeh@sharif.edu> wrote:
>> Arabic alphabets put together may have different rendering
>> forms depends whether there is a break between the alphabets
>> or not. But it is not neccessary a 'space' because there is
>> no gap. Someone told me that ZWNJ was suggested as a way to
>> insert this break without adding a space altho that was not
>> the reason ZWNJ was there in the first place.
>
> You have some idea. The ZWNJ is actually used in languages
> like Persian, which are not Arabic (language) based, but only
> use the Arabic script. Since in this languages the words are
> longer than Arabic words, and there is a need to separate
> parts of the word. Consider the case of German: those long
> words are really hard to read. Add the Arabic joinging to this,
> you will surely have problems finding subword boundaries.
> Persian typography then introduced ZWNJ (which is called
> semi-space in Persian traditional typography) to break the
> words so they become readable.
Roozbeh,
My apologies for my lack of understanding and poor statement of
what I did understand. Let me try again..
DNS names are identifiers and, as identifiers, are subject to
certain restrictions. Length is one of them -- we even have
words in English that are over 63 character long, although there
aren't many of them, and they can't be used in domain names. A
second is that, as identifiers, labels are what computer languge
syntax folks call "atoms" or "atomic". That typically means
"one label, one "word". There is a long history of pushing
words together to make one DNS labels and using either the one
joining character available (hyphen) or just catenating them.
In the latter case, we just hope the user will figure out what
is going on to preserve whatever mneumonic value we intend.
Parenthetically, this is one place where our colleagues with
ideographic languages have a huge advantage: they can actually
write multi-word phrases into DNS labels/identifiers without
doing violence to the natural rules of the writing system.
So I don't know quite what to do with ZWNJ and other separators
or near-separators without opening the door to other characters
normally used in other languages as near-separators or
punctuation or near-punctuation, e.g., ":", "'", "!", or "&",
which have been used, normally or artistically, in Indo-European
languages using Roman-derived character sets for many years, and
even recognizing distinct interpretations for some of the
distinct spacing characters and hyphenation ones in Unicode.
For example, while I would _strongly_ not recommend going down
this path, we could, in principle, adopt presentation and coding
rules that would permit, e.g.,
"O'Reilly & Associates"
to use the domain name
www."O'Reilly & Associates".com
by coding the key second label as
O(U+0027)Reilly(U+00A0)(U+0026)(U+00A0)Associates
Again, I don't suggest doing this, but, ultimately, the DNS
itself would have no problems with it and, if one starts
introducing near-space characters from other scripts, then there
is little justification for prohibiting this type of usage in
Roman-based ones.
john