[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] ZWNJ
After giving this some thought, I agree with Roozbeh that the ZWNJ should be
removed from the prohibited list, and added to the Mapped Out list.
There is a strong difference between this and the BIDI codes. Those could be
used to provide a display that was very much different than the default
ordering, and could lead to confusion or be used for spoofing. ZWNJ on the
other hand, is harmless. The only effect it would have is to break a normal
cursive join, which would be easily recognized for what it is.
Mark
—————
πάντων μέτρον ἄνθρωπος — Πρωταγόρας
[http://www.macchiato.com]
----- Original Message -----
From: "Roozbeh Pournader" <roozbeh@sharif.edu>
To: "John C Klensin" <klensin@jck.com>
Cc: <idn@ops.ietf.org>
Sent: Sunday, July 29, 2001 04:16
Subject: Re: [idn] ZWNJ
>
> John,
>
> You're right about the identifier nature of DNS names. Being brought up in
> such a world, I'm already well familiar with the way this impacts the
> language. For a good example, see Arthur C Clarke's The Light of Other
> Days, ISBN 0812576403, where words like SearchEngine are common. The DNS
> and other identifier restrictions have changed the shape of English
> language, for sure.
>
> Getting back to the thread, Arabic lacks many of the possiblities of the
> Latin script, for getting a distinguished sense out of a sequence of
> letters (which we will call identifiers). I consider the use of ZWNJ to be
> equivalent to the use of inter-identifier captialization. Just like that,
> it should be ignored, just like that, it will help the reader, and just
> like that, the original should be retreivable in some way.
>
> Please note that even in single words, ZWNJ is used. In many single words
> like the Persian words for "houses", "circular", "eraser", "compatriot",
> and "synonymous", or single-word names of places, it may not be dropped in
> any way, or the word becomes completely unreadable.
>
> Arabic is connected, unlike Latin where the letters are separate enough
> that you can sometimes omit the space (like in domain names, or German).
> It's also unlike Han, where there is a good boundary between the words,
> without even the need for spaces. So it should use spaces and ZWNJ heavily
> to stop joining where it will ruin the meaning or readablity of the
> phrases. Please note that ZWNJ is somehow considered a *nothing* in the
> Unicode recommendation. It should only affect contextual shaping, and
> nothing else...
>
> While I see the use of space-like characters in Latin problematic (mainly
> because of indistinguishablity of the written word), the case is
> difference with ZWNJ. It is not a space character.
>
> BTW, there are also many other needs for being able to retreive the
> original non-nameprepped name. Have you thought about national digit
> shapes (as used in Arabic and Indic scripts), for example? Many countries
> do not use European digits (which Europeans call Arabic).
>
> roozbeh
>
> On Sat, 28 Jul 2001, John C Klensin wrote:
>
> > DNS names are identifiers and, as identifiers, are subject to
> > certain restrictions. Length is one of them -- we even have
> > words in English that are over 63 character long, although there
> > aren't many of them, and they can't be used in domain names. A
> > second is that, as identifiers, labels are what computer languge
> > syntax folks call "atoms" or "atomic". That typically means
> > "one label, one "word". There is a long history of pushing
> > words together to make one DNS labels and using either the one
> > joining character available (hyphen) or just catenating them.
> > In the latter case, we just hope the user will figure out what
> > is going on to preserve whatever mneumonic value we intend.
> >
> > Parenthetically, this is one place where our colleagues with
> > ideographic languages have a huge advantage: they can actually
> > write multi-word phrases into DNS labels/identifiers without
> > doing violence to the natural rules of the writing system.
> >
> > So I don't know quite what to do with ZWNJ and other separators
> > or near-separators without opening the door to other characters
> > normally used in other languages as near-separators or
> > punctuation or near-punctuation, e.g., ":", "'", "!", or "&",
> > which have been used, normally or artistically, in Indo-European
> > languages using Roman-derived character sets for many years, and
> > even recognizing distinct interpretations for some of the
> > distinct spacing characters and hyphenation ones in Unicode.
> >
> > For example, while I would _strongly_ not recommend going down
> > this path, we could, in principle, adopt presentation and coding
> > rules that would permit, e.g.,
> >
> > "O'Reilly & Associates"
> >
> > to use the domain name
> >
> > www."O'Reilly & Associates".com
> >
> > by coding the key second label as
> > O(U+0027)Reilly(U+00A0)(U+0026)(U+00A0)Associates
> >
> > Again, I don't suggest doing this, but, ultimately, the DNS
> > itself would have no problems with it and, if one starts
> > introducing near-space characters from other scripts, then there
> > is little justification for prohibiting this type of usage in
> > Roman-based ones.
> >
> > john
> >
>
>
>
>
>