[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] The Business Card problem (was: Re: An experiment with UTF-8 domainnames)



Hello John,

I think your considerations are important, but I think
you went a bit overboard on some issues.


At 01/01/06 15:18 -0500, John C Klensin wrote:

>Let me use myself as an example (this one is inherently
>European-language-oriented, but I assume it has Asian and
>African analogues that are as bad or worse).  If I'm handed a
>business card with an IDN on it, in the "native" character glyph
>set for that name, the next encoding step depends on my
>knowledge and pattern-recognition and discrimination skills, and
>not on any computer issues.  I need to be able to take that
>collection of glyphs and somehow transcribe and enter them into
>a computer.

There is one very clear reason that this problem will in practice
be more serious for what you call European languages (really,
the Latin script) than for other scripts.

It's interestingly enough the fact that Latin is easier for many.
French or Swedish business cards are not usually double-sided,
because they are recognizable even if there are some accents.
If you use them to send a letter, the letter will arrive even
without the accents. So there is quite some danger that people
happily create domain names with accents or other diacritics,
and put them on their business card.

On the other hand, for Arabic or Japanese (or even just Greek
or Cyrillic), there is no expectation nor that your international
business partners will somewhat be able to send you snail mail
with just the 'local' side of the business card, or even for
that matter have any idea of whom they met when they look at
the card a few days later. Therefore I think we will quickly
see things develop to have dual domain names, one for each
side of the business card.

As Paul has noted, in the European context (with few non-
ASCII characters), an approach such as Dan's might even
work, although I don't think it will become widespread,
for various reasons including the fact that people don't
know how to use the hex input feature.


>Now, if the characters used are Latin-based, I can probably cope
>(and I do today).  I'm going to recognize the basic structures.
>Even if I have to consult a table, I can match what I see on the
>card with what I see on the screen.  In the worst case --in
>which I have no idea how to "type" the character-- I can then
>enter the character position from the table into a software
>program.  Interestingly, for that purpose, a simple hexadecimal
>encoding of UCS-4 (or UCS-2) is "easier" than UTF-8 which is a
>good deal "easier" than ACE.  While tedious, 0x000000E4 is quite
>easy to figure out and write.  And selection of characters from
>tablets is not exactly a new technology; I'd expect user
>interface software to show up quickly and be widely deployed
>that uses table-selection technologies to build up strings that
>could then be pasted.

Already available, in any Japanese input method (not really
for Latin) and in software such as MS Word, and probably
quite a few others.


>But that is the easy part and it is conditioned on my being able
>to recognize Latin-based characters well enough to tell them
>apart.  Extrapolating from it as an example is dangerous:

Yes, it's dangerous, because things are not linear.


>First, I can't tell European characters apart to the degree
>needed by IDN.  If you had me a card with a native-glyph IDN on
>it, I need to know the language (or a surrogate for the
>language) to pick the right glyphs from a table.   I simply
>cannot tell U+0041 from U+0391 from U+0410 by looking at a glyph
>written on a page (and those are, of course, not the only
>examples).  I have to know the language context, or apply
>heuristics, to pick one.

Well, there are situations where this can go wrong. But
as I said above, for Cyrillic and Greek, we'll probably
see a 'two-sided business card' approach. Also, please
note that if there are ambiguities, the main people who
will suffer from them are the locals, so they will try
to avoid them. Although for them less things may be ambiguous
than for you, and it may be easier for you to start inputing
an ascii A and then somehow continue on the rest with tables
(rather than switch the keyboard at the start, as they will
do), the cases where this won't work won't be that frequent.


>But that isn't the serious problem either.  It turns out that I
>can do just about the same pattern-matching job --looking at a
>glyph, deducing its essential characteristics because I'm
>familiar with the repertorie from which it comes, finding it in
>a table and picking it or entering it by table position-- if
>that card contains Greek, Cyrillic, or Hebrew glyphs.  That puts
>me a bit ahead of most of my American and Western European
>colleagues, for which those sets of glyphs may look like
>indistinguishable chicken-scratches.   I've even been around
>this work long enough that I've learned to distinguish enough
>kana that I might be able to successfully look them up in a
>table (but not pronounce them), although it would take me a long
>time since I don't know the table order.   But most Han, Kanji,
>and Hangul are hopeless for me: there are lots of them, I can't
>(in general) tell them apart, and I just don't have the pattern
>recognition skills (or training in distinguishing radicals) to
>be able to look them up, especially if the font-rendering on the
>business card is different from that in my table.

You are explorating this too far. Just because you personally
can handle (with a lot of pain) some part of this, don't
expect that this is what your business partners will expect
from you, or what everybody will have to do, and so on.



>So, for the business card, we need, I think, to start asking
>different questions.  If want an address on my business card to
>be recognized and usable, after the translator leaves, by a Thai
>native who lacks familiarity with Latin alphabets, I had best
>figure out how to get a Thai rendering (or something else
>recognizable) of my address on that card.

Yes. Visitors to such countries interested in business contacts
with such people regularly prepare double-sided namecards.
But it's usually just the name of the person, and maybe the
name of the institution, that is transcribed/translated.
I would not expect the street address or the email address
to be translated, because this may raise expectations that
you can be sent Thai email. For the Web address, that's
different, if your company has a Thai Web site with a
Thai address, it should go on there.



>Another important inference from this story is almost certainly
>that a solution that works for "ASCII" and "other language" is,
>at least in the long term, inadequate.  The same issues exist
>between "other language 1" and "other language 2", and the
>virtual business card will need to be many-sided.  And that may
>imply that I may need many DNS identities, associated with
>different character sets, to be accessible from all over the
>world.

For Web sites for a multinational, clearly yes. For email
addresses, clearly no. If somebody can't type an ASCII
address, how are they going to send you email you can
read?



>Or we can try to turn it into an ACE-like question, where what
>goes on the business card is the "native" set of glyphs plus
>some Latin-based encoding.  I'm not sure, as argued above, that
>there is a "native" set of glyphs for the general case.  But,
>more important, if one is going to put an encoding on a card
>from which one can get back and forth from 10646/Unicode, it is
>not clear to me that ACE is superior to hexified UTF-8.  And
>neither may be as convenient as
>hexified UCS-4 (or UCS-2) or a PGP-like biometric word list
>encoding.  Any of these are going to require software to get
>from the business card form to the DNS one; the correct answer
>for encoding them may be different from the correct answer for
>what should actually go into the DNS.

People want names, words, whatever you call it, for their
Web and email addresses. Trying to provide a fallback
for a business card won't work. And translating or transliterating
names is not something that can be automated, so this has
to be solved by double registrations.


> >   Probably a common scenario will be a non-technical person
> >   receiving a business card with an IDN on it.  They'll try to
> > type the name into their browser, and it will fail to resolve.
> > They'll contact their IT support staff, who will tell them
> > what they need to do to use the IDN (and educate them a little
> > bit on this 'new thing'), they'll update their software as
> > required, and they'll move on.
>
>         Sigh.
>
>         JU:  Hello, IT support staff, this is Joe User.  I've
>         got a business card in front of me with a URL on it, and
>         I don't know how to type it in.
>
>         ITSS: Just copy the letters from the card into the "Go
>         to" box on the browser.  Haven't you learned that yet?
>
>         JU: But I don't recognize any of the characters.. they
>         aren't on my keyboard.
>
>         ITSS: We can send you out a keyboard for Latin-1, which
>         covers Swedish, German, French,... and instructions for
>         installing it.
>
>         (A week later)
>
>         JU: Hello, IT Support Staff, your technician put in the
>         nice new keyboard and software.  Some of my applications
>         now say "Guten Tag", rather than "Hello", but that is
>         ok.  I still can't type that URL in, the characters
>         still aren't on the keyboard.
>
>         ITSS:  Huh?  What language did you say it was in?
>
>         JU: Beats me.   Doesn't look much like either English or
>         Klingon.

This is the point where your example becomes extremely
unrealistic. Do you really think somebody from Japan will
hand you a Japanese-only business card and tell you:
"Here is my business card, I don't bother whether it's
of any use to you."? People are a bit more intelligent
than that.

The case of languages written with the Latin script
really leaves me much more to worry, because people are
accustomed to the fact that the accents can be dropped
(e.g. in snail mail, or for the pronunciation of the
name, where the average foreigner will anyway not get
it right).

I'm not saying that we should ignore Latin accents, not
at all. But it may take people much more time to understand
that putting an accented email address on their name
card will not work for every recipient, than it will
take somebody in Russia, Egypt,... to understand that
a Latin equivalent is needed.


Regards,   Martin.