[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Identification, layering, transactions and universal connectivity



Identity is a very interesting concept. What exactly makes a person, an object or some piece of information exactly that person, object or information and not someone or something else?

One school of thought is that the total of all attributes provides identity. Then there is the notion that some attributes are fundamental while others are inconsequential to the identity of the person, object or information. Finally, there is the view that identity is a fundamental property that doesn't depend on any attributes that may or may not be present.

In the real world the first method of identification works to some degree, as it is impossible for two objects to be identical in all aspects, including occupying the same place at the same time. However, for information this doesn't work so well, since this way the identity of a piece of information is identical to the information itself, so referring to information by its identifier becomes useless.

The latter two concepts of identification surface in relational databases and object oriented databases respectively. In a relational database model, links between objects are made using one or more key attributes in the destination object, while in object oriented databases objects are assigned an identifier that remains stable regardless of changes to any or all of an object's attributes.

So what do we use on the internet today?

At first glance it seems that we use a relatively ephemeral attribute as the identifier: the host address. The FQDNs we generally use can be seen as simple mnemonics for addresses. However, I think over the years the role of FQDNs has expanded and they now generally provide the unchanging identifier that remains the same through changes of just about anything, such as the IP address, IP version or even hardware platform. FQDNs also change from time to time, which undermines this viewpoint to some degree, but not to the degree of invalidating it, IMO.

The question at this point is: are we happy with the way identification of resources happens at this level? While I agree that the DNS suffers from many problems, I assert (for now at least) that none of these are fundamental and/or easily fixable by ditching the DNS/FQDN system and adopt something substantially different.

Identification on the internet, especially by means of IP addresses or FQDNs, has traditionally concerned itself with hosts. However, at other levels of abstraction identification is also useful or required:

- a service running on a host or a group of hosts
- a process implementing a service on a host
- a (reply to a) transaction
- a position within a stream
- a combination of resource and service / service location providing the resource
- a resource independent of the service / service location providing it

For each of these there is a suitable identification mechanism that works (reasonably) well within the scope of the entity being identified. (Aside: the difficulty in mapping from an identifier at the current layer to one at the next lower layer seems to be especially hard at the top abstraction level: for instance, a person's name to email address or song name to file location.) The trouble starts when multihoming and resiliency against failure enter the picture.

The basic idea here is: select options (destination address etc) when multiple are available, execute transaction. Repeat with different options on failure. Stop on success or either all options have been tried.

This works very well of course, unless the transaction is sufficiently expensive to perform or has side-effects that make repeating it after failure problematic. In those cases repeating the transaction and possibly failing multiple times isn't optimal or acceptable.

Solution: split up a large transaction into smaller ones that can each be repeated. This is something that TCP does very well, within the limitations imposed by the protocol (= no changing addresses during a session). HTTP can also do it, by allowing parts of files to be transferred. A smart application can request parts of a file from different hosts and thereby easily recover from failures in one of those hosts. So HTTP provides better functionality than TCP. Unfortunately, TCP comes with the OS, while handling partial transfers over HTTP is something applications have to do for themselves. (And TCP can do many things that HTTP can't (properly).)

In some cases it is completely appropriate for applications to implement load balancing and failover, because the dynamics are very specific to the application. A good example is peer to peer file sharing. (Although some reseach into n-on-m transport protocols might be interesting.) In other cases this doesn't make sense at all as it leads to pretty much re-implementing TCP but now with the single address for each end limitation removed, with all the duplication of effort and interoperability issues that must ensue. But the most important reason why this shouldn't go in applications is because the transition to IPv6 has proven that it it much harder to get applications to support new network features than doing the same in the OS. Multiply by the number of applications in existence compared to the number of OSes and the scale of the potential catastrophe becomes apparent.

I think now is the time to admit that the IETF in its enthousiasm to improve IP has actually broken it: we no longer have universal connectivity in IPv6. This is even worse than in IPv4, where this is still relatively common in at least the client-to-server direction. With IPv4-style multihoming out the window and in the presence of address scoping, it is now very easy for two hosts to have a set of addresses each, without any way to know which combination of addresses makes bidirectional communication possible, except to try all combinations.

For someone doing traditional IP on a box sitting in a fixed place this may not seem like a huge problem. But now try again with a laptop that connects to a variety of networks throughout the day. And then consider an airplane that must communicate with the ground when possible, but also needs stable addresses for its internal communication. Or ad hoc networks where some people connect to the internet (using different ISPs of course) and others don't, but they all want to communicate locally at high speed.

Solutions?

The good part is that we have a number of partial solutions on the table today. If we integrate those, we're more than halfway there.

First of all, we have the domain name system. Unfortunately the DNS was tacked on to the IP architecture very late in the game, and we're still suffering because of this.

Traditionally, the DNS needs root servers to lend authority to name claims and to help the resolving process. Multicast DNS solves the latter and makes it possible to use names in ad hoc networks. DNSSEC could hopefully solve the authority problem so authorative use of names would be possible in ad hoc environments.

We have SCTP and proof of concept TCP modifications that allow TCP to jump addresses in the middle of a session. We also have proposals for locator/identifier splits, some of which are even implemented.

There are also some proposals to take the edge off of the address selection problem, but unfortunately nothing that completely solves this problem yet.

The only thing we need to do is integrate it all:

- Hide the use of multiple addresses from 99% of the applications,
including the ones that use existing APIs (by implementing this
somewhere between IP and the transport protocols). Unfortunately
this means another identifier that sits between the real
identifier = the FQDN and the addresses.
- Work on the address selection problem.
- Improve the DNS by making it faster and have it work in ad hoc
networks where the root is unreachable too.
- Create namespace and protocol independent APIs with extensions
for handling multiple addresses.

To use someone's favorite quote: "Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." (Antoine de Saint-Exupery)

There is still lots left to take away.