[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] The Business Card problem (was: Re: An experiment with UTF-8domainnames)



--On Tuesday, January 09, 2001 9:52 PM +0900 "Martin J. Duerst"
<duerst@w3.org> wrote:

> Hello John,
> 
> I think your considerations are important, but I think
> you went a bit overboard on some issues.

Given that I was trying to identify the extreme cases, that would
not be a surprise.

> At 01/01/06 15:18 -0500, John C Klensin wrote:
> 
>> Let me use myself as an example (this one is inherently
>> European-language-oriented, but I assume it has Asian and
>> African analogues that are as bad or worse).  If I'm handed a
>> business card with an IDN on it, in the "native" character
>>...

> There is one very clear reason that this problem will in
> practice be more serious for what you call European languages
> (really, the Latin script) than for other scripts.

Actually, I really intended European languages, although that
term too is a little sloppy.   A "western-trending subset of the
character sets derived from Old North Semitic" would have been
better, but that isn't precise either, and fewer people would
have understood it.  Explanation below.

> It's interestingly enough the fact that Latin is easier for
> many. French or Swedish business cards are not usually
> double-sided, because they are recognizable even if there are
> some accents. If you use them to send a letter, the letter will
> arrive even without the accents. So there is quite some danger
> that people happily create domain names with accents or other
> diacritics, and put them on their business card.

I can't speak with any confidence about French or Swedish, but it
is certainly possible --as has been pointed out on this list by
others more expert than I-- to create some rather serious
ambiguities in German by simply dropping diacriticals (not,
strictly, accents) without substituting some in-stream
convention.  Whether the results are still deliverable in the
postal system may depend more on the ability of humans to do
fuzzy matching of an envelope with a known number of possible
targets than with any characteristics that would do us good in
the DNS.
> 
> On the other hand, for Arabic or Japanese (or even just Greek
> or Cyrillic), there is no expectation nor that your
> international business partners will somewhat be able to send
> you snail mail with just the 'local' side of the business card,
> or even for that matter have any idea of whom they met when
> they look at the card a few days later. Therefore I think we
> will quickly see things develop to have dual domain names, one
> for each side of the business card.

And here I would argue that you are correct but, unless you are
willing to walk the slippery slope that overhangs the "it would
just be easier if everyone learned English" argument, you have it
backwards.  Your presumption seems to be that my Russian business
partners will need to put English on their business cards and,
for the DNS, dual-register.  I don't see why they shouldn't take
the position that it is my responsibility to put Cyrillic on my
business cards (presumably the front side, with English on the
back).  And why they should use cards with English/ASCII on them
at all when communicating with, say, their Korean business
partners escapes me.

Just in passing, there are interesting historical artifacts in
some of this, but we should not confuse them with general
principles.  For example, most Greek characters bear more than a
passing resemblance to Roman ones (no surprise, see comment about
North Semitic above) and most of the rest are known to those who
have studied mathematics.  That makes Greek recognizable to a
significant fraction of the educated US and Western European
population.   Does it mean single-sided cards and
single-registrations?  Well, if I were Greek and wanted to carry
out commercial relationships with English-speakers, I wouldn't
risk it.  But I wouldn't risk it if I were Swedish, either.

Similarly, these postal analogies are really very misleading.
What is officially deliverable internationally is specified to a
collection of treaties which, if my memory is correct, have their
origins in the Napoleonic period and require that anything
internationally-significant on envelopes moving between countries
be in French.  So things are deliverable, inter-country, either
on the basis of tolerance and fuzzy matching (as mentioned above)
or because they are in a Roman-based language.  The ideological
portion of the IDN discussion involves not imposing another few
hundred years of French (or English) on people who aren't native
users or those, or closely-related languages... and that position
seems to me to be reasonable.

> As Paul has noted, in the European context (with few non-
> ASCII characters), an approach such as Dan's might even
> work, although I don't think it will become widespread,
> for various reasons including the fact that people don't
> know how to use the hex input feature.

I agree, although a reasonable person might suggest that hex
input is significantly easier for the typical person than
figuring out the algorithms to construct and write an ACE from
scratch.
 
>> Now, if the characters used are Latin-based, I can probably
>> cope (and I do today).  I'm going to recognize the basic
>> structures. Even if I have to consult a table, I can match
>> what I see on the card with what I see on the screen.  In the
>> worst case --in which I have no idea how to "type" the
>> character-- I can then enter the character position from the
>> table into a software program.  Interestingly, for that
>>...
> Already available, in any Japanese input method (not really
> for Latin) and in software such as MS Word, and probably
> quite a few others.

Available for Roman-based characters longer than there has been
an Internet-- such tablets are widely used to enable writing (and
sometime "speaking") by those who are impaired manually and/or
orally.

>> First, I can't tell European characters apart to the degree
>> needed by IDN.  If you had me a card with a native-glyph IDN on
>> it, I need to know the language (or a surrogate for the
>> language) to pick the right glyphs from a table.   I simply
>> cannot tell U+0041 from U+0391 from U+0410 by looking at a
>> glyph written on a page (and those are, of course, not the only
>> examples).  I have to know the language context, or apply
>> heuristics, to pick one.
> 
> Well, there are situations where this can go wrong. But
> as I said above, for Cyrillic and Greek, we'll probably
> see a 'two-sided business card' approach. Also, please
> note that if there are ambiguities, the main people who
> will suffer from them are the locals, so they will try
> to avoid them. Although for them less things may be ambiguous
> than for you, and it may be easier for you to start inputing
> an ascii A and then somehow continue on the rest with tables
> (rather than switch the keyboard at the start, as they will
> do), the cases where this won't work won't be that frequent.

I'm sure it isn't what you intend, but I can't parse the above
without a "Roman-based characters are the center of the universe"
assumption.
 
>> But that isn't the serious problem either.  It turns out that I
>> can do just about the same pattern-matching job --looking at a
>> glyph, deducing its essential characteristics because I'm
>> familiar with the repertorie from which it comes, finding it in
>> a table and picking it or entering it by table position-- if
>> that card contains Greek, Cyrillic, or Hebrew glyphs.  That
>>...
> You are explorating this too far. Just because you personally
> can handle (with a lot of pain) some part of this, don't
> expect that this is what your business partners will expect
> from you, or what everybody will have to do, and so on.

That is exactly my point.  I will rush right out and order some
fifteen-sided business cards... as soon as a discover a
sixteen-dimensional space in which to hand them out.

>> So, for the business card, we need, I think, to start asking
>> different questions.  If want an address on my business card to
>> be recognized and usable, after the translator leaves, by a
>> Thai native who lacks familiarity with Latin alphabets, I had
>>...
> Yes. Visitors to such countries interested in business contacts
> with such people regularly prepare double-sided namecards.
> But it's usually just the name of the person, and maybe the
> name of the institution, that is transcribed/translated.
> I would not expect the street address or the email address
> to be translated, because this may raise expectations that
> you can be sent Thai email. For the Web address, that's
> different, if your company has a Thai Web site with a
> Thai address, it should go on there.

But, if a Thai user has a keyboard without ASCII characters on
it, and I expect him to communicate with me, I'd better be able
to render an address into Thai.  Yes?

>> Another important inference from this story is almost certainly
>> that a solution that works for "ASCII" and "other language" is,
>> at least in the long term, inadequate.  The same issues exist
>> between "other language 1" and "other language 2", and the
>> virtual business card will need to be many-sided.  And that may
>>...
> For Web sites for a multinational, clearly yes. For email
> addresses, clearly no. If somebody can't type an ASCII
> address, how are they going to send you email you can
> read?

I don't mean to be faceteous here, but, if they wanted to send a
picture, or a musical piece, there really would not be an issue,
would there?  And my examples don't imply that I can't read Thai
(I can't, but...) or go find a translator and bring a diskette
with me, only that, if my email address is only renderable in
ASCII, there may be transmission problems if we are in a world in
which such address-exclusivity is not considered problematic.

>> Or we can try to turn it into an ACE-like question, where what
>> goes on the business card is the "native" set of glyphs plus
>> some Latin-based encoding.  I'm not sure, as argued above, that
>> there is a "native" set of glyphs for the general case.  But,
>>...
> People want names, words, whatever you call it, for their
> Web and email addresses. Trying to provide a fallback
> for a business card won't work. And translating or
> transliterating names is not something that can be automated,
> so this has to be solved by double registrations.

With the qualification that I think "double" is going to need to
be a much larger number, we are probably in agreement about this.
But think about what it means: As a database with synchronize and
commit primitives, the DNS just stinks.  CNAMEs are extremely
limited in capability and often surprise people even within that
range.  

So, "multiple registrations", if done in the DNS, will tend to
imply multiple names (in different languages) mapping onto the
same object/target (RHS of A, MX, or NS RR).  But, when changes
need to be made in, e.g., the address of some host, zone
differences in propagation are going to get us badly, to say
nothing of the odds of human errors (e.g., forgetting one or
two).    But that implies that all of the "multilingual" forms
ought to map onto --perhaps with CNAMEs or some new RR or from
outside the DNS-- a single ASCII-renderable name which is then
the source for, e.g., name to address translation.  And, whether
it takes us toward a directory or a two-stage intra-DNS lookup,
that is a rather different model from the current directions of
the IDN WG.
> 
> 
>> >   Probably a common scenario will be a non-technical person
>> >   receiving a business card with an IDN on it.  They'll try
>> >   to type the name into their browser, and it will fail to
>> >...
> This is the point where your example becomes extremely
> unrealistic. Do you really think somebody from Japan will
> hand you a Japanese-only business card and tell you:
> "Here is my business card, I don't bother whether it's
> of any use to you."? People are a bit more intelligent
> than that.

Not my example, just my (slightly sarcastic) comments on another
note/ point of view.  But, usually because I don't have time to
prepare properly, I carry English-only cards into places where
they may be of no direct (i.e., translator-free) use to the
native population fairly regularly.  I shouldn't, and I'm
sensitive enough to feel bad about it, but I've done it too many
times.

And I don't know how anyone not intimately familiar with a
language (and knowing the language, not just the script in which
it is written, can distinguish between an "accent" and a
diacritical symbol that is actually part of the character-symbol.

     john