**Interconnecting the Internets - a way to Accelerate IPv6 Deployment?** Brian Candler, DRAFT 2016-03-15 # Abstract Given the choice of connecting just to the IPv4 Internet or to both the IPv4 and IPv6 Internets ("dual-stack"), many users are still choosing to connect just to the IPv4 Internet. This is holding back the deployment of IPv6. This document proposes to break the logjam by interconnecting the IPv4 and IPv6 Internets with a global, public NAT64 infrastructure, thereby allowing people to connect to just the IPv6 Internet and still access existing IPv4 Internet content. The (possibly) novel approach is to have multiple NAT64 nodes which are selected via DNS64, in the same way as a Content Distribution Network would select a nearby host to serve web content to a user. In effect, the architecture is a CDN where the "content" is the IPv4 Internet. In the first section, this paper outlines what such an interconnect might look like. The second section considers the political feasibility (and even desirability) of such an approach, assuming that we can build it. The third section considers the technical feasibility, and a [companion paper](candler-hyperscale-nat64.md.html) outlines a design for a horizontally-scalable NAT64 node for such large-scale deployment. # Motivation The IPv4 Internet uses 32-bit addresses, and this address space is now exhausted. End-users and ISPs are able to obtain only very small allocations directly from registries - otherwise they must attempt to buy them on the open market. This is widely agreed to be a problem for the healthy development of the Internet, as it stifles innovation and competition. For a long time the plan has been to migrate users to a new protocol, IPv6, with 128-bit addresses. The expected approach was that users would roll out IPv4 and IPv6 in parallel ("dual-stack"). There was expected to be accelerating take-up of dual stack. Eventually there would be so little worth talking to over IPv4 that IPv4 could be removed. However in practice this is not happening. Many end-users, faced with the choice between supporting a single protocol (IPv4) or two (IPv4+IPv6) continue to choose the IPv4-only path. Connecting to just the IPv6 Internet is not an option, since most content and users are not there; and it will never be an option until *practically everything* is reachable over IPv6. Whilst it is fashionable to castigate such users for being "short-sighted", this has had little effect, and does not acknowledge the reality that users act in their own direct interests. Dual-stack increases complexity and has ongoing financial costs to support; meanwhile, there is little sign of any impending failure of IPv4. It is also worth observing that deploying dual-stack does *nothing* to reduce requirements for IPv4 address space - and therefore even address space exhaustion gives no practical incentive to end-users to deploy dual-stack. Content providers have also been slow to deploy dual-stack. For them, the main consideration is "reach" - how well they can connect to their customers. Since they know all the customers are compelled to deploy IPv4 in one form or another, putting their content on IPv4 is all they need to achieve full reach. Furthermore, nobody's business plan is going to include putting content on IPv6-only. This means that for the foreseeable future, all useful content will be either on IPv4 or IPv4+IPv6. In addition, content providers do not feel address exhaustion very much, since they have been using various mechanisms to share IP addresses for years (e.g. HTTP virtual hosts, HTTPS SNI, reverse proxies, CDNs) So it looks like we could be heading for a stalemate where all users are either IPv4 or dual-stack, which ensures all content is available on IPv4, which ensures most users don't see any need for deploying dual-stack. The fundamental problem is that we have ended up with two parallel Internets - an IPv4 Internet and an IPv6 Internet - and these clearly *are* distinct networks. They use different protocols and different numbering schemes. They also have different topologies: some networks peer on v4 but not on v6, or vice versa, so that `traceroute` and `traceroute6` often take a different path. And of course, some hosts are connected to one but not the other. In fact, there is little in common apart from AS numbers, BGP, and the layer 4 and higher protocols - and that some of the packets traverse the same wires. Arguably, the IPv6 and IPv4 Internets are as separate as IP and X25 were - or even moreso, since at least there were some gateways between IP and X25. # Overview The proposal here is to *interconnect* the two Internets. This would mean that end-users would gain a third option: deploy IPv6-only networks, whilst still continuing to have access to content which is only on the IPv4 Internet. Since the IPv4 and IPv6 Internets use different addressing schemes, this basically means connecting them with NAT64. The potential benefits of this are: * End-users could deploy single-stack IPv6-only networks, as a realistic alternative to RFC1918 behind NAT44. This would mean no need for multiple address assignment mechanisms, multiple security policies and so forth. * IPv6-only ISPs could start to spring up - possibly in niche markets, such as new fibre-to-the-home providers But why do this globally? After all, ISPs are able to build their own NAT64 infrastructure, and indeed some already do (e.g. some mobile operators) The idea is to conceptually and really join the two Internets into one single Internet, where the connecting glue is usable by anybody from wherever they are. Any IPv6-only ISPs who spring up can legitimately say they are providing full Internet access; those who have already been saying "the Internet *is* IPv6" would finally be correct. IPv6-only users would be more free to switch providers, knowing that the infrastructure they need to reach IPv4 content is always available. The required components outlined here are: * Public NAT64 translator nodes, with their own IPv6 prefixes and IPv4 pools * Public DNS64 service with client location awareness * IXP NAT64 translator nodes which integrate with the DNS64 service * Auxiliary services for IPv4 to IPv6 communication ## Global translator nodes There would be a number of NAT64 translator "nodes" deployed around the Internet. Each node would be a cluster of physical servers which perform the translation. Each node would announce a translation prefix into the IPv6 Internet, and its own IPv4 pool prefix(es) into the IPv4 Internet. A naïve approach would have all nodes announce the same well-known prefix `64:ffb9::/96` and form an Anycast service. However, Anycast has some serious problems discussed later, so we'll instead assume that each node announces its own unique IPv6 translator prefix. Each one also has its own IPv4 address pool(s) to ensure symmetric routing of return traffic. ~~~ . . . . . announce +-------+ announce . . . . . . . 2001:db8:6401::/96 | NAT64 | 192.0.0.0/16 . . . . <---------------- | Node | ----------> . . . IPv6 . +-------+ . IPv4 . . Internet . . Internet . . . announce +-------+ announce . . . . 2001:db8:6402::/96 | NAT64 | 192.1.0.0/16 . . . . <---------------- | Node | ----------> . . . . . . . +-------+ . . . . . ~~~ Any user on the IPv6 Internet who wants to talk to an address on the IPv4 Internet (e.g. `203.0.113.99`) would form an IPv6 destination address combined with a translator prefix, e.g. `2001:db8:6401::203.0.113.99`, and send traffic upstream. This traffic would hit the selected NAT64 node, and be translated to IPv4. To allow full connectivity, these nodes should be located in Tier 1 ISPs, and in multiple geographic locations to be nearby the users. Assume that each NAT64 node will use a *consistent hash* of the top 64 bits of the source IPv6 address to select one of the available IPv4 addresses, and as far as possible choose distinct port ranges for each /64 prefix in use. ## DNS64 To make use of NAT64, clients will need to use a DNS64 service, which synthesises AAAA records including the translation prefix for target names which have only an A record. ~~~ example.com AAAA? +--------+ example.com AAAA? +--------+ ------------------> | | --------------------> | | | | <-------------------- | | | | (NOERROR) | | | DNS64 | | auth | | cache | example.com A? | name | | | --------------------> | server | | | <-------------------- | | | | A 203.0.113.99 | | <------------------ | | | | AAAA 2001:db8:6401 +--------+ +--------+ ::203.0.113.99 ~~~ There are already multiple providers of public DNS resolvers on IPv6, for example: * Google: `2001:4860:4860::8888` and `2001:4860:4860::8844` * OpenDNS: `2620:0:ccc::2` and `2620:0:ccd::2` In principle, it would be relatively straightforward for them to provide an additional globally-usable DNS64 service, on a different IPv6 address. However, since there are multiple NAT64 translators, the DNS64 service will need to select a suitable translator prefix to embedded in each response, based on the source address of the client. Similar location-sensitive DNS servers already exist and are widely deployed as authoritative servers in CDNs. ## IXP nodes Clearly not all traffic wants to use transit connections. It would be unacceptable in the medium term for an IPv6-only user in an African country (say) to have their traffic routed via a Tier 1 in Europe, just to reach an IPv4-only user in the same country. There may therefore be a demand for smaller translators at IXPs. Unlike the global translators, these will only handle traffic between local peers. To make this work, the IXP translator node needs to have two components: the actual NAT64 translator, and a control system which establishes BGP peerings with any IXP member who wishes to participate. With each peer it can exchange IPv4 routes, IPv6 routes, or both. ~~~ 198.51.100.0/24 +------------+ 2001:db8:1000::/48 Peer 1 ----------------> | NAT64 | <----------------- Peer 2 (v4 only) <---------------- | node | -----------------> (v6 only) 192.0.2.0/24 |192.0.2.0/24| 2001:db8:6403::/96 +------------+ ^ | 203.0.113.0/24 | | 2001:db8:2000::/48 | | Peer 3 ------------------------+ | (v4+v6) <--------------------------+ 192.0.2.0/24 2001:db8:6403::/96 ~~~ The example node above has an IPv6 prefix of 2001:db8:6403::/96 and an IPv4 pool of 192.0.2.0/24 * To each IPv4 peer, it will announce its IPv4 translation pool, and learn IPv4 routes to that peer * To each IPv6 peer, it will announce its IPv6 translation prefix, and learn IPv6 routes back to that peer In order to direct traffic to these local IXP translators, all DNS64 servers would need to be aware of them. Specifically, they would need to learn what IPv6 and IPv4 routes are available at each IXP location, and select appropriate prefixes for certain combinations of source IPv6 and destination IPv4 address. A technical approach to achieve this is described later. ## Inbound services NAT64 only allows access from the IPv6 world to the IPv4 world. There may additionally be a demand for services to allow communication in the opposite direction. These are discussed later and are "nice to have", but are not critical to the overall thrust of this proposal. At worst, anyone who wants a reliable way to accept traffic from the IPv4 Internet can always dual-stack a small part of their network. # Political feasibility ## Desirability Do we even want people to deploy IPv6-only networks? Isn't dual-stack the correct approach today? Perhaps that is true, but in the real world this is not happening. Given a choice between IPv4 and dual-stack, many end-user sites continue to choose the simpler option. If we rephrase the question as: "Do we want people to deploy IPv6-only networks rather than IPv4-only networks?" then the former choice becomes attractive, since we don't want "legacy" IPv4-only networks to be built. If the desired end goal is for the Internet to be IPv6-only, this means that one day in the future, we *do* want people to be able to deploy IPv6-only networks. So what is being proposed is simply to bring this date forward, and to provide the gateway onto any rump IPv4-only content which may be required then anyway. Furthermore, there is a positive feedback effect. Once there is any significant base of IPv6-only users accessing IPv4-only content through this global NAT, content providers will have a real incentive to make their content directly available over IPv6. The restricted pools of IPv4 source addresses will make their users harder to identify in logs, and geo-location will work less well. Furthermore, traffic which takes a triangular path via a NAT will necessarily be taking a longer path than it otherwise would. It might only take 5% or 10% of their user base to fall into this category for it to become worthwhile to make the content available dual-stack, and that in turn speeds up the transition. If nothing else, it would send a powerful message: "The Old Internet is obsolete, and we've just shoved it behind a ruddy great NAT. You can now join us on the New Internet!" ## Usefulness "If we build it, will they come?" In other words, would there be any significant deployment of IPv6-only networks as a result of this deployment? This is perhaps the biggest uncertainty. IPv6-only networks have one obvious advantage, which is that you don't need any IPv4 addresses. (They also have this advantage over dual-stack). Using NAT64 may also be preferable to being behind NAT444, i.e. two levels of NAT44, in cases where the ISP has insufficient address space to give each customer their own public IPv4 address. However for edge users, building IPv6-only networks requires access to ISPs who supply IPv6; it also requires a change away from current practice. Even green field sites may still find it easier to use existing IPv4 know-how. Some people like using private addresses and NAT44 because it gives them a (false) feeling of security. Whether any IPv6-only ISPs are to spring up depends on whether they consider running an IPv6-only service puts them at a commercial disadvantage to competitors who offer IPv4 (or dual-stack). However it may be that there are markets in which they can present their IPv6-only service, combined with expertise in deploying IPv6-only edge networks, as a differentiator. Businesses will almost certainly require dual-stack in some part of their network, for example to allow inbound VPN access from hotels and airports. However they may see operational advantages to running only IPv6 in most of their network, dual-stack in just small portions, and not having to build their own NAT64 infrastructure. ### The Long Tail: Web Content There is much web content which is stale or maintained by non-technical users. Whilst it is straightforward for a hosting company to add IPv6 addresses to its web servers, they often serve many thousands of domains, and the provider may not have access to the authoritative DNS for all of those domains (needed in order to add the new AAAA records). This means that for a long time there is likely to be *some* IPv4-only content. The presence of the global translator may help here. Suppose we get to a point where 80% of the sites a client needs to reach are directly accessible over IPv6 (whilst of course, 100% are reachable over IPv4). Without the translator, even dual-stack sites would still have to continue running IPv4. But with it, such sites could make a decision to strip out IPv4 earlier than they otherwise would, thus hastening the migration to IPv6. In other words, what we have here is an end-game plan; there will be a point at which people can switch over to IPv6 without having to wait for absolutely everyone else to go dual-stack (which may never happen). If we accept that this infrastructure will be required one day to deal with legacy IPv4 content, then we may as well build it now and get it running smoothly. ### The Long Tail: Enterprises Other important sites which are slow to migrate are enterprises, who have a big investment in existing networks and need their networks to be as reliable and manageable as possible. The presence of the translator means that it becomes more feasible for these sites to transition to IPv6-only, if and when they are comfortable to do so or they feel a pressing need for IPv6 for some other reason. But surely, to migrate their networks from IPv4 to IPv6, they would go via dual-stack anyway? Certainly they would. However there is a big difference between going dual-stack as a transition step for a few weeks or months, versus dual-stack as an essentially permanent deployment (which is what we are currently asking them to do) The long-running dual-stack scenario means that every device has two addresses, everything has to be monitored on two addresses, every firewall policy has two addresses, there are two mechanisms for handing out addresses to clients, and so forth. This is exactly why they are rejecting dual-stack today. Furthermore, the "happy eyeballs" failover between IPv6 and IPv4 makes problems harder to detect and diagnose. However if they were to deploy dual-stack with the ability to remove IPv4 shortly afterwards, then they would end up with a "future" network which is no harder to manage than it was before. It could be argued that they can do this today, by building their own DNS64 and NAT64 infrastructure. But that represents additional cost and complexity; at this point it is also betting on an IPv6 future which they feel may not even happen. If there are global translators it means there is no additional infrastructure to manage, and potentially a lower-risk decision to migrate to IPv6. This does depend on the translation infrastructure becoming well-established. In the early days, enterprises may take the view that this is an experiment which may be withdrawn or may be insufficiently reliable. Its attractiveness will increase, however, as the ratio of dual-stack to v4-only content increases. ## "NAT does not scale" A large portion of this document is dedicated to the technical and scaling issues. However if this statement is true, and with Internet traffic growing at perhaps 40% per year, then it would imply the transition to IPv6 needs to take place sooner rather than later, so anything which can hasten the changeover is desirable. ## Positioning Shouldn't ISPs be building and managing the NAT64 and DNS64 services themselves, instead of having a shared global service? There is a strong technical case that it would be better if they did. In that case there would be no sub-optimal routing, no central infrastructure to manage, and in particular there would be no need for the IXP translator nodes and corresponding DNS64 complexity. There are still some advantages to the global approach though: * If IPv6-only ISPs spring up, they can use as much capacity on the NAT64 service as they require. However if they build their own NAT64, they may be limited to the small amount of address space they can get from their RIR (e.g. a /22) which would limit their growth. Perhaps the RIRs could have less restrictive policies for the final allocations which prioritise applications for customers that can prove they are using addresses for NAT64 - but right now they do not. * Customers already find it hard enough to get IPv6 from their ISP. Finding an ISP who provides IPv6 *and* an acceptable quality of NAT64 service will be even more difficult. * There is a bootstrapping problem that ISPs won't invest in building the NAT64/DNS64 platform (nor apply scarce IPv4 address space and engineering time to it) without significant use from customers, but customers can't use it until it exists. If there is a global service they can use initially, ISPs may catch up later. * ISPs may not want to invest significantly in a "transition" service with an intentionally limited lifetime. Equally, they may not want to take on the ongoing operational costs of the service if it is required for an extended period of time. * A well-managed central service could be more reliable than a poorly-maintained ISP service which has little operational focus. Such a bad service would push users back to IPv4 connections. (This is maybe a reason why such a large number of users configure their clients to use one of the global DNS cache services rather than their ISP's cache; the global DNS cache operators can often do a better job) * The global approach does not preclude ISPs from building their own NAT64/DNS64 infrastructure as well. ## Reliability The experience of 6-to-4 (as reported in RFC6343) was that it was unreliable, and ended up encouraging people to turn off IPv6, not use it more. Some of the failures were down to the use of Anycast, protocol 41, and encapsulation - none of which are used in this proposal. But it is important to evaluate what the possible failure modes are. The main problem is if DNS64 directs a particular client to translator X, but for some reason there is a routing problem somewhere between the client and X, or between X and the IPv4 destination. This will manifest itself as a partial outage (to IPv4-only destinations, not dual-stack ones). The use of DNS64 to direct traffic to a suitable NAT64 node, based on the client source address, is very much how a Content Delivery Network operates, and there is much operational experience in this area. These networks are able to provide very high levels of availability - if they didn't, they wouldn't have any paying customers. Presumably CDNs also have to deal with the same network reachability issue, and there are different ways they could do so: * Ensuring their content nodes are highly available and well connected * Limiting the TTL of DNS replies * Actively testing connectivity and RTT from customer networks (e.g. using beacon nodes) * Measuring the response times and failure rates for actual traffic (e.g. using Javascript embedded in web pages) Hence a web site could include some test code, which is activated when the page is fetched over IPv6, which does an test call to an IPv4-only destination. This information could then be fed back into the DNS64 operator. But the best gains would be made by ensuring that the NAT64 nodes have reliable IPv6 and IPv4 connectivity. For each node, it is not really any different to connecting a standalone webserver and making it available to the world. ## Responsibility If this doesn't work, who do you call? This is also a very strong argument that translation should take place at the edge, either in the ISP or in the user's own network. As said above, CDNs manage to achieve extremely high quality of service, but they have full control over everything from the DNS to the content nodes themselves, and manage them as a whole. Given a global collection of NAT64 nodes, it seems better to have them managed from a central authority rather than as a loose-knit collection. This could be done by separating the *hosting* of NAT64 nodes from their *management*. In the limit, a single entity would be responsible for running all the DNS64 and NAT64 nodes together. Of course, there is nothing to stop a second entity running a completely separate DNS64/NAT64 infrastructure. Users can then choose whichever one works best for them, simply by changing their client DNS settings. There are perhaps similarities with the management of a large GTLD, which runs multiple authoritative servers in diverse locations, but they are all under one control. ## Coordination One of the consequences of this design is that the DNS64 operators have a significant degree of control over traffic flows. This means that there needs to be coordination to manage load effectively and handle operational incidents, such as lack of capacity at a given NAT64 node. Network operators may feel that they are ceding control over their own networks (although they already invite CDNs into their network, where the CDN DNS steers traffic in a similar way) There certainly is some risk that mis-managed DNS64 could send traffic to the wrong places. Procedures would have to be put in place to ensure incidents are dealt with swiftly. Some NAT64 operators may wish to ensure their nodes are only used by certain ASes or netblocks, and could publish this policy. Again these are simplified if both DNS64 and NAT64 are under the same administration. ## Acceptability and Completeness There will not be take-up of IPv6-only networks if the service has unacceptable limitations compared having an IPv4 address with NAT44. Essentially what users will want to know is "does it work?" - with all the applications that they want to use; or at least well enough that the advantages of single-stack IPv6 outweigh any minor niggles. Fortunately, almost everyone is used to sitting behind a NAT44 these days. Being behind a NAT64 (with DNS64 and 464XLAT) is not likely to be noticeably worse for the bulk of use cases. Also fortunately, most devices today have IPv6 support out-of-the-box, and many have CLAT support. On the minus side: anyone who has equipment which is IPv4-only will still need to support it on their network, for example by routing private IPv4 addresses with a 464XLAT on the CPE. This is returning very much into the territory of dual-stacking the entire network, but it may be feasible for those devices to be contained in a small part of the customer's network. ### Real and perceived performance If gamer A is on an IPv6-only network, and gamer B is on an IPv4-only network, and assuming traffic between them successfully traverses both A's NAT64 and B's NAT44, it will have a longer latency than a direct path. The question then is how this is perceived in the marketplace: whether it is gamer A's IPv6-only ISP which is "at fault", or gamer B's IPv4-only ISP - and in particular, which gamer is more likely to move their subscription to a different ISP. If gamer A finds that most of their peers are on IPv4-only then they may feel they are putting themselves at a disadvantage. There are some factors which act in gamer A's favour though: in particular, some games automatically use Teredo tunnelling when they can. In practice, gamer A's IPv6 provider will most likely be able to offer dual stack. Most users are not as latency-sensitive as gamers, but they *will* expect throughput to be acceptable (which implies low packet loss) and reliability to be very high. If a global NAT64 translator service cannot provide this then it will be rejected. ### Application support Some applications are not capable of traversing NAT without some in-band modifications to the traffic - FTP being a major example (RFC 6384). There are also special considerations for SIP (RFC 6157). It would be difficult to support such applications, especially at scale. Possibly there is a need for specific application proxies to be built; for example, it's conceivable to build an FTP gateway which allowed the user to specify the target host as part of the login username. Some equivalent HTTP proxies already exist (e.g. sixxs.org) Otherwise, such users may be forced to keep at least parts of their infrastructure on IPv4 behind a traditional NAT44 with ALG support. ### Peer-to-peer connectivity Some network applications (such as games) depend on peer-to-peer connectivity, using various mechanisms to detect and establish connectivity, in the common case where both end users are behind at NAT44. Whether this works depends on the exact details of the NAT in use - in particular, a symmetric NAT does not allow inbound traffic except from the specific IP address and port that the outbound session used, and will not work with mechanisms like STUN/ICE. A large-scale NAT64 could have a high degree of overloading of individual IP addresses which may mean that it has to be a symmetric NAT, and such connectivity cannot be achieved. Obviously, this is not a problem for `IPv6<->IPv6` traffic, so it's only a problem for `IPv6<-->NAT64<-->NAT44<-->IPv4` At worst, applications may fall back to relaying traffic via a third-party relay server which they can both contact (e.g. TURN). The degree of impact this has on users will depend on what kind of applications they use. ### Inbound connectivity Some users require inbound connectivity to their devices, which mainly fall into these categories: 1. Home users with inbound access to their home servers (e.g. NAS), typically configured with static Port Forwarding or uPnP 2. Enterprise users with their own mail servers or web portals 3. Enterprise users with inbound VPN requirements (where the clients are almost certainly on a poor-quality IPv4-only network, e.g. a hotel or airport) (Interestingly, the IoT doesn't seem to matter here; most IoT devices communicate outbound with a cloud-hosted management server, and users talk to the same server from their smartphone or other device) For these users, the availability and usefulness of inbound v4 to v6 services may be a deciding factor. Some options for providing these are given later. Larger enterprises will probably want to keep these services in a dual-stack DMZ. ## Permanence For early adopters, the translation service may be seen as an experiment which could fail and be withdrawn in future - therefore, depending on it for their IPv6-only network is a risk. This risk would decrease over time if usage takes off and it becomes clear this will become a semi-permanent feature of the Internet. The larger the number of NAT64 translators which exist, the lower the risk that they will be all turned off. Such users know that at worst, they could deploy their own NAT64/DNS64 infrastructure (especially if the global infrastructure is open-source and can be easily replicated) ## Bootstrapping For this plan to work, in the initial stage it only needs to be deployed at a single Tier 1 ISP. By definition, Tier 1 ISPs peer with each other at no cost, and everyone who buys transit has the ability for their traffic to reach a Tier 1. So as long as a single Tier 1 is announcing a translator prefix, it will be reachable by everyone; the end-users will be paying their own transit costs to reach it, as they normally do. The first proof-of-concept node could use an existing standard NAT64 device. This allows time for the scalable NAT64 described in the companion paper to be developed. Over time, if the service becomes widely used, other Tier 1's may wish to deploy their own nodes, to prevent traffic "tromboning" over to the other Tier 1 and back again. The location-aware DNS64 would need to be deployed before additional global translators were brought on line. Additionally, a complete proof-of-concept would ideally include at least one IXP node, which means that DNS64 with routing information feed (described later) would have to be developed. ## Development and running costs There are several global and regional organistions which have a vested interest in the successful migration to IPv6 (including RIRs, ISP associations, individual ISPs with a particular orientation towards IPv6) who may be prepared to shoulder the development and/or running costs, and possibly even transit to run the initial NAT64 traffic if no interested Tier 1 can be found. For the point of view of hardware and hosting, the service would start with essentially zero utilisation, and could be grown in response to demand, minimising up-front cost. ## Additional deployment Any dual-stack ISP (or even end user) who wants to run their own DNS64, their own translator with their own IPv4 address space, and/or inbound proxies is free to do so. The global project is likely to provide a technical template, including software implementations, that they are free to copy at smaller scale. This may become increasingly attractive once traffic volumes become significant, until ultimately they die away as remaining content moves onto IPv6. It's important to note that ISPs don't have to build the whole solution to be useful; they can opt into components as and when it makes sense. For example, providing DNS64 caches could reduce their outbound DNS traffic and reduce latency for their users, but they could continue to rely on the global translators and inbound proxies. Similarly, they can turn them off when no longer needed. ## Opposition Would there be anyone who would be actively opposed to this plan? There will certainly be some who say it technically can't work. There will be some who say that the current dual-stack migration plan is working just fine and should continue as is; maybe some who say that the answer is to compel people to deploy dual-stack, for example via legislation. Any or all of these might be right. However it should be clear that this infrastructure doesn't *have* to be used by anyone, neither does it interfere with current operations, and at worst it represents a waste of effort and money which could be better used elsewhere. It remains to be seen if there are any stakeholders who would be actively opposed to it and would attempt to derail it. For example, there may be some large access providers who see benefit in locking out smaller competing players who have difficulty obtaining sufficient IPv4 address space for their needs. They would be unlikely to express such an anti-IPv6 position publically. # Scaling ## Traffic and state The following rough back-of-envelope figures are a starting point for what scale we might have to consider for Internet-wide deployment. * One billion concurrent users * Not everyone in the world has Internet access, nor are they all awake at the same time, nor are they all using the Internet all the time. * As of March 2016, about [168 /8s](http://www.potaroo.net/tools/ipv4/) of IPv4 address space are advertised, about 2.6 billion addresses; not all of it is in use * 100 concurrent active translations per user? * 32 bytes per active translation * Implies a total of 100 x 32GB of RAM for translation state. No problem. * 0.1Mbps average traffic per user? * Total traffic of 100Tbps!! * On average, 1 connection established per second per user? * One billion connections per second!! Clearly such a service cannot be built in a single box, and even with hundreds of nodes distributed around the Internet, each node would have severe demands. Therefore, each node needs to be able to scale horizontally, by the addition of extra capacity as required. A [separate paper](candler-hyperscale-nat64.md.html) discusses the detailed design for such a node. A figure of 100Tbps would equate to over 12,000 network devices each with 10G NICs at 80% utilisation, which clearly pushes the limits of what is feasible to build, run and finance. Furthermore, achieving an even traffic balance across all those devices would be very hard. However, things may not be as bad as they first seem. The figure of 0.1Mbps is pulled out of thin air. While engaging in browsing and E-mail, Internet links are often idle; and yet some users will be transferring large files, watching video streams, or have background peer-to-peer file transfers in action. But what we actually care about here is how much traffic an *IPv6-only* user would be exchanging with the *IPv4-only* Internet. NAT64 would not be used for talking to dual-stack hosts. Anecdotal evidence from existing dual-stack user sites suggests that 40-50% of traffic is already IPv6 (and therefore would bypass NAT64); this proportion would increase over time. The IPv6-only Internet is currently non-existent and so the traffic would start small, and ramp up. As the traffic ramps up, large ISPs may deploy their own translators within their own networks (for maximum control and visibility) More importantly: as the operators of IPv4-only content see significant volumes of traffic from the translators' IPv4 ranges, and notice their logs are not as detailed and their traffic flows to those consumers are suboptimal, they have a real incentive to roll out dual-stack. The big streaming services are a relatively small number of companies who would have a particularly high incentive to dual-stack if they have not already done so. In summary: the scale required is proportional to the *product* of the size of the IPv6-only Internet with the size of the IPv4-only Internet - it is independent of the number of dual-stack connections. As it increases with the size of the IPv6-only client base, it will also go down as the IPv4-only content base shrinks. Therefore I would expect a utilisation curve for the public NAT64 translators to look like this: ~~~ large ISPs roll out own translators content providers move v **** more to dual-stack *** *** v ** ** * ** * *** * ^ initial **** ** ramp-up ***** ~~~ At worst, individual persistent high-volume users could be throttled. Surely we can insist that bittorrent traffic should choose IPv6 or dual-stack peers. ## IPv4 address space Each of the translators needs its own unique IPv4 space, so that return traffic returns to the same translator node that it originated from. Each node's utilisation would vary based on time of day and reachability of other nodes, and it needs enough address space for its peak requirements. If we assume IPv4 address sharing of 16:1 as a starting point, then this implies a /6 equivalent of IPv4 address space is required for our hypothetical one billion concurrent users. Note that this does not have to be continguous, it can be made up of any routable space. Address utilisation would increase in proportion to the number of IPv6-only users, until such a time as users no longer need to reach any v4 content. However the traffic utilisation per user would go down, as more and more content becomes directly reachable over IPv6. This means that the address sharing factor could be increased over time. Once there are one billion IPv6-only users, increasing the address sharing factor to 64:1 would reduce the total space required to a /8, at the cost of increased scarcity of source ports in the translators (on average only ~1000 per user) - although at that stage one would imagine that overwhelming majority of content would also be on IPv6. But generally it would be wise to aim for the lowest possible initial sharing factor, and leave it to increase only if demand growth exceeds the rate at which the IPv4 pool can be grown. Is this feasible? A total /6 possibly *could* be found with sufficient concerted effort - for example if each RIR donated half of its remaining /8 block that would get most of the way. A /8 is probably more realistic. However, much more could be available by negotiating with holders of registered but unused space. As of March 2016 there are still over 48 /8's worth of unadvertised space. Clearly, time may be running out to deploy using remaining unallocated addresses. At very least, it may be prudent to reserve now at least a /8 of space for future plans like this one, rather than giving them out to end-users so that the chance is lost. ## Source ports If one IPv4 address is being shared by 16 users, then only 1/16 of the ports are available; that is, on average they will have access to a range of only around 4000 ports each. In practice, single users will need less, but busy networks will need more. Empirically, we know that a typical office or school network with NAT44 normally has a single public IPv4 address, and it works fine. If we take it as good practice that a layer 2 broadcast domain (subnet) has up to 250 devices on it, then those 250 devices happily share a range of around 64,000 ports. If all were active at the same time, they would be using 256 ports each on average; if only a quarter were active at the same time then they would be happy with an average of 1024 ports each. This also ties up with our experience of client devices: if you type "netstat" on a client device it would be rare to see many hundreds of open sockets. This means that in principle, an address sharing factor of 64:1 would probably be fine *if we are talking about individual source IPv6 addresses* (client devices). The problem comes if we decide to treat a whole /64 prefix (network) as a single entity. For traceability reasons, we would like to map this prefix to the same IPv4 address *and* ideally give them a limited port range so that the traffic is distinguishable. But not all /64 prefixes are the same: a single home user might be happy with as few as 1000 ports, whilst a large office or school network may require a much larger range. There are a few approaches we could consider: * Give every /64 prefix a larger range of ports, say 8000 ports each. This implies 8:1 IP address sharing at the level of /64 networks. This is still a compromise: small networks will need much less than this, and large networks may still see problems (8000 ports between 250 devices is only 32 ports each). If we say that our 1 billion users are spread over 500 million networks, then we would need a /6 of address space to service them. * Use a more dynamic approach, where depending on the number of active IPv6 addresses seen in a /64 prefix and/or the number of concurrent sessions, add additional port ranges when required. This is the approach described in the companion paper. * Symmetric NAT would allow the same port to be re-used multiple times, as long as each connection is to a different destination address and/or port. This could increase port availability, although in the worst case it doesn't help (that is, if all the users are communicating with the same remote server and port). Symmetric NAT is undesirable for UDP as it stops NAT-traversal mechanisms like STUN/ICE from working, although this is not an issue for TCP. Note also that much UDP port usage comes from DNS, but there is no reason for DNS to traverse NAT64: an IPv6-only user should be talking IPv6 to a DNS64 cache, and the cache itself should be dual-stack. Hence DNS can be blocked at the translator. Other peak usage might come from batched or bursty applications - for example, a mail server which makes multiple outbound SMTP connections for delivering queued mail after an outage. Such an application is likely to be tolerant of intermittent connection failures (this is the purpose of its queue after all). Any other server which in normal operation opens many hundreds of concurrent connections to the IPv4 Internet (say, a web crawler) should be dual-stack and not using NAT64 - it would be reasonable to allow such users to intermittently fail on NAT64, in particular to keep some ports available for normal users. For the purposes of this discussion, we will assume that available source ports probably *are* sufficient for typical client use cases, with careful design. # Logging and traceability It is very important to note that a translated IPv4 address may be in use by multiple users at the same time. A NAT64 service is not TOR, and it should not be a tool for miscreants to hide their tracks. Even genuine users traversing NAT64 should have no more expectation of privacy of their source address than if they were connecting directly over IPv6 to the destination. If IPv4 packets could carry an option field with "original-IPv6-address", we would do so! Unfortunately this is not possible, so we need a way to map translated IPv4 address/port back to the origin, for tracing network abuse; similar access is required for law enforcement. ## Translated source address and port assignment Here is a proposed algorithm designed to give reasonable port availability balanced with the ability to track which users are using which port ranges. The IPv4 source address is chosen using a *consistent hash* of the top 64 bits of the IPv6 source address. This algorithm is explained in more detail in the companion paper, since it also forms the mechanism for horizontal scaling of NAT64. The key points to note are: * There is an even spread of /64 prefixes across the available IPv4 addresses. * All users on the same /64 map to the same IPv4 address. This is what users will expect from a normal NAT. It also means that we do not care how the lower 64 bits are used, e.g. privacy addresses. * For as long as the IPv4 pools stay the same, the same /64 prefix always maps to the same IPv4 address. * If the pools change, then most users will retain their same translated IPv4 address. A minimal proportion of users will get a new IPv4 address to balance the usage across all addresses - for those users, their existing sessions will be interrupted. Ports are dynamically allocated to /64 prefixes in blocks of 1024 ports. Each /64 prefix is allowed to have a certain number of blocks of "dedicated" ports which are not shared with anyone else, and beyond that additional blocks in a "shared" range which may be used by other prefixes. ## Real-time session mapping Sites should be able to query the translator to find out where any particular connection is coming from. Given the exact combination of (source IP, source port, dest IP, dest port) for an active session, then even a symmetric NAT can relate this back to exactly one source IPv6 address (and port). For TCP there is an existing protocol which could be used for that: IDENT (RFC1413), where the returned "identity" would be the source IPv6 address. This is easy to implement at the translator, and there are many existing client implementations which can make the callback. The downside is that it is TCP-based (thus resource-intensive and the three-way handshake adds latency). Also it currently cannot lookup information about UDP connections, although it could be extended to allow this. Since it needs visibility of the complete 4-tuple it is only usable by the recipient of the connection. Given the additional latency involved, many web destinations may choose not to perform real-time IDENT lookups anyway. However it could be extremely valuable for other services like ssh. As another idea, the "finger" protocol could be used to connect to the translator and list all current translations. For privacy reasons this would show only connections from the finger server's IPv4 address to the finger client's IPv4 address. ## Log storage and access For post-factum abuse analysis and law enforcement purposes, it is necessary to be able to trace activity from a given IPv4 address/port to the source. If we have a hypothetical total user base of one billion active users, all creating an average of one session per second, and each session log requires 32 bytes, that would be 32GB per second, or 2.6PB per day, of logs to store, archive, index and (occasionally) search - a severe problem of scale. Furthermore, if those logs contain records of both the source and the destination, they are highly privacy-sensitive. Fortunately, this should not be necessary. Firstly: let us weaken the requirement to only tracing users back to the /64 prefix they were on. This should be acceptable, as it would be the level of granularity you would get if they were behind a NAT44 anyway. Secondly, note also that it's not necessary to record which destinations these prefixes were communicating with, simply that a translation was active, and the a particular source IPv6 prefix was using a particular IPv4 port range. Therefore, if we ensure that each /64 maps to a specific IPv4 address and port range, then all we need to record is which prefixes were active and which IPv4 address and port range(s) they were mapped to. ### Publishing and collecting mapping information It is highly desirable to be able to publish this information freely, because this would avoid the administrative overhead of deciding who is or is not authorized to access it (potentially in hundreds of jurisdictions), and enforcing that control. This collected information says nothing about *who* you were communicating with: only that your source IPv6 was translated to a particular source IPv4 and port range. If we handle this information at the level of the /64 prefix, then the only part which can be considered sensitive is the information about the times that prefix was active (or not active) on the Internet. This could become a non-issue if the mapping information is stored at a sufficiently low level of granularity - say per 24-hour period - and is not available for the current 24-hour period (so you can't repeatedly poll it to determine the time at which a prefix became active). The information stored would therefore be: * Source IPv6 prefix * Source IPv4 address and port range(s) used once every 24 hours. Given 500 million active /64 prefixes, and let's say 32 bytes per record, that would be a total of 16GB per day across the entire Internet, which is absolutely feasible. The information could be made available via whois, or a REST API. Given an IP address and date, it would tell you which prefix(es) could have been generating traffic and which port range(s) each one was using. Note also that in some circumstances it is possible that more than one IPv6 prefix would be mapped to the same IPv4 address and port range. If this happens occasionally it should not be a major problem; it just means that once in a while, an investigation would have to try 2 or 3 options upstream, and often the context will make it clear which is the likely offender. This would also be the case if the source port information had not been logged at the destination. This is another reason why it would be desirable to keep the address sharing ratio as low as possible. ### Active prefixes The above mapping services should make it unnecessary to store any per-session logs. What's essential to record is the "active IPv6 prefixes" and port ranges, and this may be being done anyway for the purpose of statistics. There is one special concern here, which is that many networks do not have ingress filters to prevent source address spoofing. To prevent this active prefix table being filled with useless junk, prefixes should only be marked as active once a TCP session has successfully established via a three-way exchange. There is no such mechanism for UDP; however it is expected that there would be few networks which use UDP exclusively and *never* use TCP. # Location-aware DNS64 The DNS64 service needs to scale to global demand. As has already been mentioned, there are already global public DNS cache services; they handle a substantial proportion of client DNS traffic today. Unlike NAT64, there is no problem with the DNS64 service being anycast. The DNS queries and responses are typically single packet exchanges, not going via any NAT, and hence are not sensitive to short-term topology changes. ## Closest NAT64 selection The DNS64 service needs to select a prefix for a suitable translator for each client. In the simplest case, each anycast DNS64 node could be located near to a NAT64 translator, and statically return the prefix for that translator. This is trivial to set up and may be good enough for an initial deployment. A more sophisticated DNS64 would use AS topology, latency and/or geo-location information to select a nearby translator based on the client's IPv6 address. There are CDNs which work this way already. In principle it could also take into account the destination IPv4 address from the 'A' record, although choosing a translator close to the source should be a good enough strategy (and is the only strategy for 464XLAT). Note that since this would be a caching (resolver) service it would have direct visibility of true client IP addresses - unlike CDNs which provide authoritative DNS and are a level of indirection away from the client. When there are several suitable translators for a particular client, DNS64 responses could be weighted to send more or less traffic to particular NAT64 nodes, in response to capacity or availability requirements. All of the above ought to be straightforward to implement, based on existing DNS64 and CDN DNS practices. There may be a desire for some ISPs to request that traffic from their AS *must* prefer to use a specific translator or set of translators (e.g. ones they manage themselves). If so, there needs some way for that policy to be published to the DNS64 operators. ## DNS64 with IXPs Where it gets more complex is when the IXP-level translators are added. At this point, if a combination of (IPv6 source, IPv4 destination) has both addresses available at the same IXP, then the prefix for that IXP translator can be returned. If multiple translators are usable for the same prefix pair, ideally the one nearest to the source (or to both source and destination) would be used. To implement this requires a feed of information into DNS64 from the IXP translators about what IPv6 and IPv4 prefixes they are learning from their peers. The obvious mechanism for this is multihop eBGP. Each route could be associated with the IXP it was learned from by means of a community. As the same information would be needed by multiple DNS64 operators, it could be aggregated in a small number of route servers, and public feeds made available. ~~~ example.com? | ^ AAAA | ^ v | v | DNS64 DNS64 ^ ^ 2001:db8:1000::/48 64640:00001 . multi . 2001:db8:2000::/48 64640:00002 . hop . 2001:db8:3000::/48 64640:00003 . eBGP . 198.51.100.0/24 64640:00001 . . 192.0.2.0/24 64640:00002 +--------+ 203.0.113.0/24 64640:00003 | Route |+ | Servers|| +--------+| . +--------+ . . . . multihop eBGP . . . . . . NAT64 NAT64 NAT64 IXP1 IXP2 IXP3 ^ ^ ^ ^ ^ ^ ^ ^ ^ routes from peers / | \ / | \ / | \ 198.51.100.0/24 192.0.2.0/24 203.0.113.0/24 2001:db8:1000::/48 2001:db8:2000::/48 2001:db8:3000::/48 ~~~ If a DNS64 server receives a query from a particular IPv6 source which resolves to a particular IPv4 destination, and both addresses are in the BGP feed with the same community, then it can return the translator prefix associated with that community. (Aside: the unique community identifying the node need not be sent by the IXP NAT64 itself; it can be added at the route server based on which eBGP session the route was learned from. Hence the route server can retain central control over the translator IDs, and does not need to either trust or coordinate the configuration at the IXP NAT64 nodes) The diagram above suggests a single centrally managed set of route servers, and this would be appropriate to provide feeds to smaller ISPs who want to run their own DNS64. Large DNS64 providers possibly might want more control by running their own set of route servers, and establishing direct peerings from those to the IXP translator nodes. The volume of eBGP updates ought to be relatively low - certainly less than is already handled by public route view servers which contain transit routes as well. ## Private and public translators Some ISPs may host private translators which will are configured *only* to accept traffic from their own customers' netblocks. They may still wish to announce such services into the DNS64 cloud, so that any of their customers using the public DNS64 service will still be directed to their own translator. This can be done just like an IXP announcement: ~~~ 2001:db8:1000::/36 64640:00004 # allowed IPv6 source 2001:db8:f000::/36 64640:00004 # allowed IPv6 source 0.0.0.0/0 64640:00004 # allowed IPv4 destination ~~~ Even the public translators could also announce their availability via the same eBGP mesh, by announcing that they are prepared to accept traffic from any IPv6 address and send it to any IPv4 address. ~~~ ::/0 64640:00005 # allowed IPv6 source 0.0.0.0/0 64640:00005 # allowed IPv4 destination ~~~ This means that the eBGP information gives a global view of all available translators. If a particular translator suddenly becomes unavailable, DNS64 can stop directing people to it (although in practice, DNS caching means that many users would be affected for a period of time) ## 464XLAT Clients will need to have a CLAT to be able to communicate with IPv4 resources directly by IPv4 address (e.g. webpages with embedded IPv4 addresses in URLs, or certain services like Skype). This is increasingly available on client devices, and the same global NAT64 infrastructure should be able to support it without additional work. (TODO: confirm that a PLAT doesn't required any more functionality than plain NAT64) There is a mechanism by which the CLAT can automatically detect the NAT64 translation prefix to use, by resolving `ipv4only.arpa` (RFC7050, limitations in RFC7051). This depends only on DNS64 returning a suitable prefix for the client's source IPv6 address. There is also a Port Control Protocol mechanism (RFC7225) although this relies on finding the PCP server in the first place. 464XLAT traffic would unfortunately not be able to use the IXP nodes, because there is no way to control which translator prefix the CLAT will use based on the destination IPv4 address. However 464XLAT represents only a tiny proportion of the total usage. ## Load management via DNS64 (This is a topic for further consideration) DNS64 servers will have some leeway in choosing the most appropriate translator for each session, and potentially this could be used as a mechanism for load management between available translators. In other words, given a degree of coordination between NAT64 operators and DNS64 operators, this could be used to shift traffic around if required. This probably should be relatively static - we don't want individual user sessions alternating between different translators - but given a desired ratio, load could be apportioned based on a hash of client source prefix. ## DNSSEC DNS64 is not compatible with DNSSEC. At the client side at least, DNSSEC has not yet seen widespread adoption and this would probably not be a barrier to using the translation service. It would be hoped that organisations who are mindful enough to deploy DNSSEC are also mindful to deploy dual-stack, and therefore no names in their signed zones would need DNS64 translation. ## Local caches End-users who wish to run their own DNS64 cache would find it easiest to forward all requests to the global DNS64 service, and let it deal with prefix selection. This means they don't even need any special DNS64 software. They *could* run their own DNS64 cache which statically returns the prefix of the nearest NAT64 device. However the server must be dual-stack; it MUST NOT send queries to IPv4-only authoritative servers via the NAT64 service. # Routing Both IPv6 and IPv4 prefixes are announced using BGP as normal. For the IPv4 space it would be desirable to be in large chunks to reduce routing announcements (or contained within existing announcements) - but in the worst case, if it were entirely made up of individual /22's, a total of /6 would require 64K route announcements. ## Anycast routing An obvious question is: why not use the well-known translation prefix `64:ff9b::/96`, and announce it from multiple nodes, thus forming an Anycast service? Indeed it is suggested in RFC6052 (3.2) that providers might announce this prefix to each other. The most serious problem with this is that the NAT64 service is stateful. Any temporary topology changes or instability on the IPv6 side could result in packets alternating between different translators, and total loss of service for the affected users. Another problem is that anycast is extremely difficult to debug, especially if the nodes are under different administrative control. If there is a problem and traffic is being disturbed, it will be very hard to identify where the problem is and to get it fixed promptly. Furthermore, the user has no way to override or bypass the selected destination. Another problem is that it would be very difficult to control the distribution of load between translator nodes. Each node would automatically attract a certain amount of traffic, and a topology change could quickly result in one node being overwhelmed while another was idle; this would mean that significant over-capacity would have to be built. However, nothing stops the nodes announcing multiple prefixes, and it would be possible (as an experiment) for some or all nodes to announce a global anycast prefix as well as their unique prefix. This might have benefit for end-users running their own DNS64 or CLAT with fixed prefix - it would be better for them to statically configure the anycast prefix than the prefix of one specific node, when a nearer node might be added in future or the old one withdrawn. It might also be reasonable to use anycast for a collection of nodes *within* one particular large ISP, under the same management domain, as long as they are located in stable core locations. ## NAT64 Prefix Selection The well-known prefix has the desirable characteristic of being checksum-neutral, but standard prefixes could be selected like this too. > 0064 + ff9b = ffff. You just need to insert a 16-bit word into the NAT64 > prefix so that all the words add up to ffff or 1fffe or 2fffd etc The examples have used e.g. `2001:db8:6403::/96` for simplicity of exposition, but the actual route announcements could just as well be `2001:db8:6403::/48` to make them more likely to be accepted via filters. If the prefix is within already-announced address space then no separate announcement is required. The NAT64 numbering scheme in RFC6052 allows the target IPv4 address to be carried higher up in the IPv6 address than the last 32 bits. For this application there seems no particular reason to do this, but it could be supported. ## IXP Transit In the IXP network diagram shown earlier, only the translator's own (IPv6 and IPv4) addresses are announced to peers, so its presence at the IXP has no effect on normal traffic flows. The DNS64 will only return responses including the translator's prefix if both the IPv6 source and IPv4 destination were learned at the IXP. In the steady-state case, the translator needs no IPv4 or IPv6 transit. However, routes are added and withdrawn, and when topology changes, a host may no longer be reachable via the IXP. DNS caching will mean that for a while, new sessions are still sent to the IXP's translator - and furthermore we would like existing NAT64 sessions not to be interrupted. It would therefore be helpful if a supportive ISP could provide some v4 and v6 transit to the translator, to allow traffic to continue to flow in those circumstances. Also, the translator will need a small amount of transit for announcing its routes via eBGP to the DNS64 route servers, and for management. ## Other IXP Routes It may be wise to set the no-advertise community by default on all routes announced from the translator. It is expected that the IXP node would announce the IPv6 route for its entire translation prefix (e.g. `2001:db8:6403::/96`) to its peers. Potentially it could announce individual, translated IPv6 routes - for example, if it learns `192.0.2.0/24` it could announce `2001:db8:6403::192.0.2.0/120`. There doesn't seem any particular benefit in doing so; certainly not if the node has IPv6 transit, in which case the whole block is reachable via a different route anyway. ## Private peering If two large ISPs engage in private peering, then this will not be affected. If they wish to make use of the translator at an IXP, those ISPs may choose to peer (separately) with the IXP translator. However in doing so, they would implicitly accept IPv6-to-IPv4 translated traffic from other ISPs - even those they would not normally peer with. Alternatively, if either (or both) of the large ISPs have their own translator, then DNS64 ought to select one based on AS topology. # Reliability and management This is a piece of critical infrastructure which IPv6-only sites would depend on. This means that the nodes and their network connectivity would need to be run to the very highest standards of reliability, and this would have to factor into the running cost. Locating the nodes at core locations with high availability is essential. If a single provider has multiple nodes within their network, they may be able to fail over between nodes or use anycast (with the proviso that such changes should be rare, as they would interrupt existing flows) There is a degree of administrative control which can be asserted via the DNS64 responses in selecting which node traffic should be sent to, as already described. This would require coordination between DNS64 operators and NAT64 operators to agree the way it is managed. In the event of total failure of a node, the ability to respond promptly to changes would be hampered by clients caching the DNS64 translated answers they had already received; similarly, CLAT clients would cache the results of the translator prefix discovery they had performed. Artificially clipping the TTL of DNS64 responses might help to a degree, but this could not be set too low without resulting in a high DNS load. Effort would be better spent ensuring the translator nodes themselves are internally resilient, and rely on network routing to ensure they are always reachable. NAT64 operators could enter into arrangements whereby a backup node could start announcing the prefix belonging to another node, on request. ## Capacity management Both global translator nodes and IXP nodes could advertise their static capacity (e.g. size of IPv4 pool and/or design traffic capacity) and current utilisation (e.g. active NAT overload ratio; traffic utilisation as a percentage of design capacity). This information can be used for capacity planning, and may be taken into account in real time by DNS64 servers: for example, as one node approaches its maximum capacity, a steadily reducing proportion of requests can be directed to it. This requires a mechanism for the DNS64 servers to learn the metrics outlined above. If they are already participating in eBGP to learn routes from the IXP nodes then this could piggyback onto additional communities - for example as communities on the announcement of the translator's own IPv6 mapping prefix. ~~~~ 2001:db8:6401::/96 64641:00001 ; translator ID 64642:00010 ; design capacity in Gbps 64643:00004 ; size of pool in multiples of /24 64644:03345 ; current traffic utilisation (33.45%) 64645:01754 ; current IPv4 sharing ratio (17.54:1) ~~~~~ (Incidentally, this also gives a way for the DNS64 servers to find out the translation prefix to use when directing a user to any particular translator node) Alternatively some other mechanism could be used to expose these metrics (e.g. REST API, XMPP). If a node wants to announce upcoming maintenance it could do so with the same mechanism, e.g. by setting its design capacity to zero. # Inbound access As mentioned before, users may judge the acceptability of IPv6-only networks based on the the ability to accept inbound connections from the IPv4 world. It will never be a 100% match to having dual-stack, but there are some services which can be built which may be generally useful to deliver traffic to IPv6-only devices. ## HTTP(S) reverse proxy service This is to allow IPv6-only sites to make HTTP content viewable from the IPv4-only Internet. Anyone who wants to use this would configure the DNS with an AAAA record pointing at their own server, and an A record pointing at a nearby proxy. ~~~ +------------+ | HTTP Proxy | | 192.0.2.1 | +------------+ +--+ | ^ DNS for "www.example.com" | |<----------' `------------ A 192.0.2.1 +--+<-------------------------- AAAA 2001:db8:1000::80 web server 2001:db8:1000::80 ~~~ The HTTP proxy does not need any per-site configuration; it would look up the Host: header in the DNS. To prevent abuse and loops, it would only listen on its IPv4 address, only make outbound connections to IPv6 addresses, and check that the DNS response includes its own IPv4 address. It should also add an `X-Forwarded-For:` header with the requestor's IPv4 address. The proxy could also listen on some additional ports (e.g. 8000, 8080) and forward to the corresponding ports on the destination. This approach also possible with HTTPS and SNI (RFC6066), or indeed any other TLS-based protocol (IMAPS, POP3S etc). The target host name is exposed early enough in the TLS handshake that the connection can be proxied without modifying the session. There are existing implementations [1](http://blog.haproxy.com/2012/04/13/enhanced-ssl-load-balancing-with-server-name-indication-sni-tls-extension/), [2](https://github.com/dlundquist/sniproxy). However in this case no indication of the client source IPv4 address could be added. ## SMTP relay or proxy A domain may list a primary MX record with an AAAA record and a secondary MX which is the relay. The relay would only forward mail if there is a higher-priority MX with only an AAAA record. A stateless SMTP proxy would be preferable to a store-and-forward relay, but would have to validate each RCPT TO address accordingly, add a Received: header, and probably perform basic anti-spam checking. This would only apply to people who want to run their own mail servers. The majority of people have their mail hosted at their ISP and would make outbound connections to it over IMAP or POP3. Ideally it would be dual-stack, but it doesn't have to be. ## Inbound port forwarding services This service would be TCP/UDP port forwarding on a shared IPv4 address, forwarding to an IPv6 address. Static port forwarding services could offered by the user's ISP, or from a third-party provider for some fee. But the NAT64 translator node could also offer dynamic port forwarding. In this case, the Port Control Protocol (RFC6887) could be used to dynamically configure port forwarding. Note that a port reserved via PCP would be completely unavailable for any other shared use, and therefore the total PCP usage should be limited to a fraction of the available port range for a particular client. The PCP server could be contacted on a well-known name or address. Note that the name "ipv4only.arpa" resolves to 192.0.0.170 and 192.0.0.171; DNS64 would map this to `<prefix>:192.0.0.170` and `<prefix>:192.0.0.171` which would therefore arrive at a suitable translator node. A protocol also exists by which a local uPnP router can forward requests over PCP to set up temporary port forwarding (RFC6970); it would have to be seen whether there is sufficient demand to make this worth while implementing. ## Static IPv4 address rental services Such services could deliver a unique IPv4 address directly to an IPv6 host using a 4-in-6 tunnel. This would give the maximum transparency to IPv4 traffic which would otherwise be unlikely to survive a NAT46 translation (e.g. IKE, SIP). A typical application would be an inbound VPN endpoint for staff travelling to hotels and airports. A market may spring up for such services; businesses are already used to having to pay a fee to have a static IPv4 address. It could also make it easier to switch providers and take your IPv4 address with you. On the other hand, users may find it easier simply to dual-stack a small part of their network. # Security considerations ## State exhaustion and DoS These issues are inherent to NAT, and the NAT64 platform needs to detect and defend against such misuse. ## Excessive use There should be statistical monitoring of both traffic and state generation from client IPv6 addresses (not only individual addresses but also aggregated per /64, per /56, per /48, and per /32), and the ability to apply temporary blocks if required. It is unlikely to happen, but a badly-designed ISP might give each end-user a /128 and put them all on the same /64. Such ISPs would appear as a huge spike from a single /64. Either they could be accommodated with static translation rules, or they could be blocked until they redesign their network. ## Pool selection A malicious user may decide to connect to a specific NAT64 translator of their choice, by manually selecting the translation prefix rather than using the one returned in DNS64. A user in America, for example, might choose a NAT64 in Asia and therefore make their (translated) traffic appear like it is coming from Asia. There is very little which can be done about this, but there is also not great benefit to the attacker as long as the source mapping is made available publically and in real time. An attacker might also use this to try to stay below detection thresholds for brute-force login detectors, by dividing attack traffic between many global NAT nodes. This potentially might be detectable, by cross-referencing active prefix information between translators, to identify cases where the same prefix is being used on many translators. But it may not be worth the effort, and is liable to false alarms from genuine monitoring and network research activities. ## IXP selection There is a similar risk of abuse if the IXP translator has IPv6 transit available, and a third party (not present at the exchange) explicitly chooses to send traffic to the IXP translator's IPv6 range. A small amount of such use could be useful for remote testing and debugging, but high-volume use could not be tolerated, and would therefore have to be carefully monitored. Such abuse would be of limited value if the IXP translator has no IPv4 transit - since people are unlikely to statically configure the IXP translator if it can only reach things at the IXP, and not the rest of the IPv4 Internet. However there are some cases where having some IPv4 transit would be useful: * The peer temporarily drops their IPv4 connection to the IXP, but there are ongoing sessions * The translator has added a new IPv4 block and is announcing it to its peers, but some peers have not yet updated their inbound filters to accept it In both cases, traffic would need to take a transit path until the problem was fixed. Having the transit available gives an opportunity to detect and fix the problem without it being service affecting. Since the translator node knows what routes it may learn in its BGP ingress filters, it can report on traffic which does not match the known (IPv6,IPv4) pairs, and whether it is originating from an IXP member or from transit. It would be reasonable to have an ACL which blocks IPv6 sources except those belonging to known peers and trusted management addresses, and destinations except for translated peer addresses. At least, this ACL could be prepared in advance, ready to apply in case of transit spikes. ## Source address (and/or source port range) selection In an attempt to evade tracing, a miscreant may try to rotate between multiple addresses in the same translator pool, or even to pick a specific address to make their traffic appear similar to traffic to another user. For example: let's say the NAT uses a hash of the top 64 bits of the IPv6 address to select a translated IPv4 address. If the attacker has control of a /48 routed prefix, they can use up to 65,536 different addressses in the pool by rotating through all 2^16 possible subnets. They may aim to intentionally clash with the address/port of a trusted user. If the IPv4 pool information and hash algorithm is public knowledge, this could be done off-line without generating any traffic. As far as possible the NAT64 will try to keep users of the same IPv4 address on different port ranges, but even if this is not possible, it will still be recorded that both the attacker's IPv6 /64 and the bystander's IPv6 /64 were using the address and port range. ## Port exhaustion Users may exhaust the available source ports at the translator. However they can only affect source ports in their own range, so this would be only a denial of service against themselves (plus any other user who happens to be mapped to the same address/port range) A user with a routed /48 would be able to exhaust 65,536 different address/source port range combinations, and therefore might be able to affect other users who were mapped to those ranges. Limiting the number of ports usable by each /64 prefix to a fraction of the total range of ports helps. ## Unnecessary use of translator service by dual-stack hosts There is no reason for a dual-stack host to make use of the NAT64 service, apart from testing and research purposes. It could be argued that: * Public IPv6 tunnel brokers SHOULD block access to the public NAT64 ranges * Dual-stack server-hosting companies SHOULD block access to the NAT64 ranges from hosted machines (except if they are providing IPv6-only hosting, e.g. for VMs) * NAT64 nodes SHOULD block access from Teredo and any other ranges which are necessarily used by users with IPv4 However in the early stages this may hamper adoption, and so it would be better not to do this until or unless necessary. ## Unnecessary translation of DNS In general, the translator should offer to translate as many types of traffic as possible, which primarily means TCP and UDP. Translating ICMP echo-request would also be worth doing: it means that an end-user who is on an IPv6 connection with CLAT device could type `ping 8.8.8.8` and get a response. For most people, this means "the Internet is working". However there is one exception, which is DNS. DNS query/response exchanges are short and generate a NAT state for each one. If IPv6 end-users were to incorrectly configure (say) 8.8.8.8 for their DNS cache, and talk to it via 464XLAT, it would result in a huge amount of unnecessary translation state activity. Therefore it is better to block TCP and UDP port 53 completely, which would force them to select a suitable DNS64 cache which is reachable over IPv6 instead. Furthermore, any device with a CLAT is supposed to run its own DNS proxy anyway. If an end-user is running their own DNS cache *on an IPv6-only network*, then they will want to be able to reach IPv4-only authoritative DNS servers. Arguably, if they have enough clue to run their own DNS cache with DNS64 translating functionality, they should have enough clue to dual-stack their network. It would be simpler for the end-user to configure their cache to forward all queries to an upstream DNS64 cache (which even avoids the need for them to run any special DNS64 software). Rather than just blocking UDP/TCP port 53, it is also possible to run a fake authoritative DNS server on the translator IPv6 prefix ranges, responding with a canned answer pointing to a web server which explains how your DNS is misconfigured. ## whois rate limiting If some sort of "whois" service is provided to query historical mapping data, it would be wise to rate-limit it per /64 (IPv6) and /32 (IPv4). ## Spam Spam is a perennial problem on the Internet, and many anti-spam mechanisms rely on IP reputation sources. The NAT64 is (or should be) a high-reputation source and is therefore an attractive resource for spammers to abuse. The translator may have to make special provisions for dealing with TCP port 25, for example: * Making use of reputation lists itself (if/when such lists exist for IPv6) * Transparently redirecting port 25 to an SMTP relay/proxy which performs filtering and rate limiting (this may have issues with TLS) * Blocking it entirely, so that users are forced to use submit mail to their own ISP's outbound mail relay on port 587 with authentication. This is probably the simplest and best starting point; many ISPs already block port 25 usage by their own customers. ## Privacy This has been covered earlier. A user connecting to a remote service has no expectation of source address privacy in their interaction with that service. But information released to the wider Internet should be at the level of /64 prefixes, and with coarse enough time granularity that it cannot be used to determine exactly when these prefixes start and stop using the Internet. It is expected this will satisfy the majority of users; anyone with higher privacy demands can build their own NAT64 translator. ## Confidentiality of routing information It needs to be considered whether the eBGP feed from IXPs constitutes "commercially sensitive information" (despite much of it being available in public looking-glasses anyway). It does not show that X peers with Y; only that X and Y both peer with the translator. However if it is considered sensitive, then there will need to be agreements in place between the providers and users of that information. # Gratuitous Aphorism > "Let us think the unthinkable, let us do the undoable, let us prepare to > grapple with the ineffable itself, and see if we may not eff it after > all." > > -- Douglas Adams, _Dirk Gently's Holistic Detective Agency_ # Additional References * RFC6052: IPv6 Addressing of IPv4/IPv6 Translators * RFC6144: Framework for IPv4/IPv6 Translation * RFC6145: IP/ICMP Translation Algorithm * RFC6146: Stateful NAT64 * RFC6147: DNS64 * RFC6877: 464XLAT * RFC7225: NAT64 prefix discovery with PCP * RFC7269: NAT64 deployment options and experience Location-aware DNS: NAT types: