[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Identification, layering, transactions and universal connectivity
Identity is a very interesting concept. What exactly makes a person, an
object or some piece of information exactly that person, object or
information and not someone or something else?
One school of thought is that the total of all attributes provides
identity. Then there is the notion that some attributes are fundamental
while others are inconsequential to the identity of the person, object
or information. Finally, there is the view that identity is a
fundamental property that doesn't depend on any attributes that may or
may not be present.
In the real world the first method of identification works to some
degree, as it is impossible for two objects to be identical in all
aspects, including occupying the same place at the same time. However,
for information this doesn't work so well, since this way the identity
of a piece of information is identical to the information itself, so
referring to information by its identifier becomes useless.
The latter two concepts of identification surface in relational
databases and object oriented databases respectively. In a relational
database model, links between objects are made using one or more key
attributes in the destination object, while in object oriented
databases objects are assigned an identifier that remains stable
regardless of changes to any or all of an object's attributes.
So what do we use on the internet today?
At first glance it seems that we use a relatively ephemeral attribute
as the identifier: the host address. The FQDNs we generally use can be
seen as simple mnemonics for addresses. However, I think over the years
the role of FQDNs has expanded and they now generally provide the
unchanging identifier that remains the same through changes of just
about anything, such as the IP address, IP version or even hardware
platform. FQDNs also change from time to time, which undermines this
viewpoint to some degree, but not to the degree of invalidating it, IMO.
The question at this point is: are we happy with the way identification
of resources happens at this level? While I agree that the DNS suffers
from many problems, I assert (for now at least) that none of these are
fundamental and/or easily fixable by ditching the DNS/FQDN system and
adopt something substantially different.
Identification on the internet, especially by means of IP addresses or
FQDNs, has traditionally concerned itself with hosts. However, at other
levels of abstraction identification is also useful or required:
- a service running on a host or a group of hosts
- a process implementing a service on a host
- a (reply to a) transaction
- a position within a stream
- a combination of resource and service / service location providing
the resource
- a resource independent of the service / service location providing it
For each of these there is a suitable identification mechanism that
works (reasonably) well within the scope of the entity being
identified. (Aside: the difficulty in mapping from an identifier at the
current layer to one at the next lower layer seems to be especially
hard at the top abstraction level: for instance, a person's name to
email address or song name to file location.) The trouble starts when
multihoming and resiliency against failure enter the picture.
The basic idea here is: select options (destination address etc) when
multiple are available, execute transaction. Repeat with different
options on failure. Stop on success or either all options have been
tried.
This works very well of course, unless the transaction is sufficiently
expensive to perform or has side-effects that make repeating it after
failure problematic. In those cases repeating the transaction and
possibly failing multiple times isn't optimal or acceptable.
Solution: split up a large transaction into smaller ones that can each
be repeated. This is something that TCP does very well, within the
limitations imposed by the protocol (= no changing addresses during a
session). HTTP can also do it, by allowing parts of files to be
transferred. A smart application can request parts of a file from
different hosts and thereby easily recover from failures in one of
those hosts. So HTTP provides better functionality than TCP.
Unfortunately, TCP comes with the OS, while handling partial transfers
over HTTP is something applications have to do for themselves. (And TCP
can do many things that HTTP can't (properly).)
In some cases it is completely appropriate for applications to
implement load balancing and failover, because the dynamics are very
specific to the application. A good example is peer to peer file
sharing. (Although some reseach into n-on-m transport protocols might
be interesting.) In other cases this doesn't make sense at all as it
leads to pretty much re-implementing TCP but now with the single
address for each end limitation removed, with all the duplication of
effort and interoperability issues that must ensue. But the most
important reason why this shouldn't go in applications is because the
transition to IPv6 has proven that it it much harder to get
applications to support new network features than doing the same in the
OS. Multiply by the number of applications in existence compared to the
number of OSes and the scale of the potential catastrophe becomes
apparent.
I think now is the time to admit that the IETF in its enthousiasm to
improve IP has actually broken it: we no longer have universal
connectivity in IPv6. This is even worse than in IPv4, where this is
still relatively common in at least the client-to-server direction.
With IPv4-style multihoming out the window and in the presence of
address scoping, it is now very easy for two hosts to have a set of
addresses each, without any way to know which combination of addresses
makes bidirectional communication possible, except to try all
combinations.
For someone doing traditional IP on a box sitting in a fixed place this
may not seem like a huge problem. But now try again with a laptop that
connects to a variety of networks throughout the day. And then consider
an airplane that must communicate with the ground when possible, but
also needs stable addresses for its internal communication. Or ad hoc
networks where some people connect to the internet (using different
ISPs of course) and others don't, but they all want to communicate
locally at high speed.
Solutions?
The good part is that we have a number of partial solutions on the
table today. If we integrate those, we're more than halfway there.
First of all, we have the domain name system. Unfortunately the DNS was
tacked on to the IP architecture very late in the game, and we're still
suffering because of this.
Traditionally, the DNS needs root servers to lend authority to name
claims and to help the resolving process. Multicast DNS solves the
latter and makes it possible to use names in ad hoc networks. DNSSEC
could hopefully solve the authority problem so authorative use of names
would be possible in ad hoc environments.
We have SCTP and proof of concept TCP modifications that allow TCP to
jump addresses in the middle of a session. We also have proposals for
locator/identifier splits, some of which are even implemented.
There are also some proposals to take the edge off of the address
selection problem, but unfortunately nothing that completely solves
this problem yet.
The only thing we need to do is integrate it all:
- Hide the use of multiple addresses from 99% of the applications,
including the ones that use existing APIs (by implementing this
somewhere between IP and the transport protocols). Unfortunately
this means another identifier that sits between the real
identifier = the FQDN and the addresses.
- Work on the address selection problem.
- Improve the DNS by making it faster and have it work in ad hoc
networks where the root is unreachable too.
- Create namespace and protocol independent APIs with extensions
for handling multiple addresses.
To use someone's favorite quote: "Perfection is achieved, not when
there is nothing more to add, but when there is nothing left to take
away." (Antoine de Saint-Exupery)
There is still lots left to take away.