[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reliable internet site



 On Fri, Jan 26, 2001 at 01:07:00PM +0000, ETI - KOUROUMA Aboubacar wrote:
> Hi all,
>     We are installing our new internet access via VSAT. For this I need
> your ideas about the new configuration of my ISP site to ensure the
> scalability and the reliability of services.
>     Actually, we have a 64 kbps link with a cisco 2511 connected by
> leased line to the Telco network (Sotelgui). From the telco we have a
> 206.98.254.128/26 Ip addresses. In a few days (6 days) we will have a
> satellite link with 512 k / 256 k bandwidth. From this satellite link we
> will get a new /24 IP addresses. We will keep active the link between us
> and the telco as back up link and all traffic to 206.98.254.0/24 will be
> routed on this link  . The default route be the VSAT link. See the
> attached .doc file.
>     How can I configure my site to ensure scalable internet services? I
> mean can I use one cisco router or I have to configure more (I have in
> mind the OSPF protocol with two routers) ?
>     Hope read you soon.
>     Cheers,
> Aboubacar.

Excellent question; I am copying this reply to a couple of lists as there
may well be people in similar situations who have come up with different
approaches which may be useful to you.

There are a few comments and suggestions I can offer. To save some typing, I
will call your 206.98.254.128/26 network "A" and your other network "B", and
I'll give it IP range 192.0.2.0/24 for now.

Whatever you do with this setup, you need to realise you won't get "full"
resilience: that is, if someone is using an IP address of 206.98.254.129,
and the link to your Telco fails, they will be unable to continue existing
TCP sessions by redirecting their packets over the satellite link. This is
because _inbound_ packets to 206.98.254.128/26 will always come in on the
Telco link, and inbound packets to 192.0.2.0/24 will only come in over the
satellite link. You will not get this level of resilience until you have
your own CIDR address range (probably a /19 or /20) and are announcing it to
your upstream providers via BGP, who in turn announce it to the rest of the
Internet.

Having said that, you are still in a good position and you can get a very
good level of resilience - you will just need to make use of some of the
application-layer failover features which are already built into protocols
such as SMTP and DNS.

Firstly, I'd say you want to make sure that always at least _one_ of your
links is able to carry traffic. That means not putting both links into the
same router, and not using the same hub or switch for your whole network.

This is easily arranged: just keep the two connections entirely separate by
using a second router for your satellite connection. Then if one router or
hub fails, the other link is not affected.

           Telco                     Satellite
             |                           |
             |                           |
             RA                          RB
             |                           |
         ----+------------            ---+--------
         206.98.254.128/26            192.0.2.0/24
           (Network A)                 (Network B)

[There is an extra advantage here: it gives you symmetrical routing. That
is, packets both to AND from 206.98.254.128 all go via the Telco link, and
packets to AND from 192.0.2.0/24 go via the Satellite. If you had a single
router, as in your diagram, the problem is: which provider do you point
defaultroute at? To achieve symmetrical traffic flows you would need to use
"policy routing" on your single router, which means using a different
next-hop depending on the _source_ IP address of each packet. If you have
two routers, then anyone on network A just points defaultroute at RA, and
anyone on network B points defaultroute at RB]

If you are using UPSs or other power filtering, you would be well advised to
put Network A's equipment on one UPS and Network B's equipment on a
different one, so you are protected against a UPS failure.

Now, this will work just fine. The only problem is, if a machine on Network
A wants to transfer data to a machine on Network B, it will go all the way
via the Internet and back... which will be very slow and waste bandwidth on
your expensive links.

To fix this, you need a short-cut path between the two networks. If this was
between two different providers this would be called "peering" and you would
use BGP, but since it's all your own network you can just run OSPF. The most
resilient way to do this is with a third router, with two ethernet ports.
This could even be a FreeBSD or Linux PC running gated.

           Telco                     Satellite
             |                           |
             |                           |
             RA                          RB
             |                           |               ^
         ----+------------ RC -----------+--------      OSPF
         206.98.254.128/26            192.0.2.0/24       v
           (Network A)                 (Network B)

Enable OSPF on each of the interfaces, and remember "redistribute connected
subnets" so they tell the other routers about the networks they are
connected to. On RA and RB you can announce defaultroute, like this:

router ospf 1
  passive-interface serial0         ! Must NOT talk OSPF to Telco/Satellite!
  redistribute connected subnets
  redistribute static subnets
  default-information originate metric 100
ip route 0.0.0.0 0.0.0.0 serial0    ! Defaultroute via Telco/Satellite

If the uplink fails at layer 1 or layer 2, the static default route will be
withdrawn, and the router will in turn stop announcing it into OSPF. [You
can get slightly better detection of a failed link by getting your upstream
provider to announce a defaultroute to you using BGP, but this probably
isn't worth the effort. Use Cisco HDLC or PPP keepalives to monitor the link]

Now, if a machine on network A pings a machine on network B, it will first
go to its defaultroute, i.e. router RA. Because of OSPF, this will have
learned the route to B via RC, so the packet goes there. Then RC will
deliver it to the machine on B.

Because this is one hop more than the 'optimal' route, RA will generate an
ICMP redirect to try and tell the machine on A the better route. This can
cause more problems than it's worth, so I strongly recommend you turn off
ICMP redirects on all interfaces on all routers - and also proxy ARP.

interface ethernet0
  no ip redirects
  no ip proxy-arp

Now, you say that you want all traffic to 206.98.254.0/24 to go via the
Telco link, presumably because these machines are all within your country
and reached via the Telco. This is now easy to arrange: on router RA only,
add a static route.

ip route 206.98.254.0 255.255.255.0 serial0

This will be learned by the other routers via OSPF. So even if a machine on
Network B pings 206.98.254.1, the packet will be routed out via RC and RA.

Unfortunately, we can't control the path the packet takes on the way back.
When machine 206.98.254.1 sends a packet to 192.0.2.1, it will hit the Telco
network which will send it out via the Internet, i.e. the "long way". If you
want to fix this, you would have to get Telco to add a static route to
network B on _their_ router - or set up proper peering using BGP.

--------------------------------------------------------------------------

Right, so now you have two networks, linked together to optimise the traffic
flow. But still, if one link fails, half your systems go down. How can you
use this to improve your service?

Well, you need to work with the built-in resilience mechanisms that exist in
application-level protocols.

(1) DNS primary and secondary

If you are running your own authoritative DNS, put the primary on Network A
and the secondary on Network B.

Then, if one of your links fail, the outside world will still be able to
reach one of your nameservers.

Note that renumbering a nameserver is not trivial - if you have lots of
zones, which have 'glue' records in the zones above them, you will have to
do some extra work.

(2) Mail delivery using MX records

You can arrange that incoming mail will continue to arrive, even if one of
your links fails, by using different-priority MX records and a backup mail
relay.

example.com.	MX  10  mail1.example.com.
		MX  20  mail2.example.com.

mail1           A   192.0.2.1
mail2           A   206.98.254.129

In this simple example, mail1 is your 'main' mailserver, e.g. which holds
POP3 accounts. Any mail to user@example.com will be delivered to this
machine if the satellite link is up, because it has the lower MX record
priority.

But if the satellite link is down, the machine on the Internet which is
sending to you will try it and fail. It will then fall back to using the
second MX record, and deliver to mail2.

If mail2 is configured properly, it will accept the mail and relay it to
mail1 (which it talk to via router RC). So mail will flow normally.

[You could choose to make mail1 and mail2 the same machine, with two network
interfaces. But this is awkward to make work because it needs to learn which
defaultroute to use, depending on which outside links are up and down. So
you would end up having to run gated on your mail server... and you are not
protected against the machine itself failing. So for these reasons, I
recommend you have two separate mailservers instead]

(3) Outgoing mail delivery

If your customers send their outgoing mail via mail1, it will normally
deliver directly out to the Internet. But what if the satellite link goes
down?

You can configure mail1 to try to deliver normally, and if it fails, to
relay via mail2. Then mail2 can try again to deliver the message, using the
link to Telco.

Configuring this sort of fall-back scenario is I think possible in Exim.

But there is a neater solution:

(4) All outgoing UDP/TCP connections

The problem with outgoing mail, in (3), is that the mail server has an IP
address on the 192.0.2.0 network. If the satellite link fails, then actually
outgoing IP packets _will_ be routed via RC and RA, because RA is announcing
defaultroute. But the return packets will never come back, because they can
only come down the (failed) satellite link.

There is a solution here: configure Network Address Translation (NAT) on
router RC, for packets which pass through it. You would need a ruleset which
says:

 "If I am forwarding a packet with a destination address of 192.0.2.0/24 or
 206.98.254.128/26, do not perform any NAT.
 Otherwise, if the source address is 192.0.2.0/24, NAT using my Network A
 interface address. If the source address is 206.98.254.128/26, NAT using my
 Network B interface address."

I don't know how to configure this on a Cisco, as I have never used Cisco
NAT. It's very straightforward under Linux though, something like the
following although I've not checked the exact syntax:

  ipchains -A forward -d 192.0.2.0/24 -j ACCEPT        # fwd but don't NAT
  ipchains -A forward -d 206.98.254.128/26 -j ACCEPT   # fwd but don't NAT
  ipchains -A forward -s 192.0.2.0/24 -j MASQ          # fwd and NAT
  ipchains -A forward -s 206.98.254.128/26 -j MASQ     # fwd and NAT

Then any machine (or modem) with a network B IP address still _will_ be able
to access the Internet via the Telco link; the source address of each packet
will be changed using NAT to a network A address, and so the responses will
come back via Telco to network A. So you don't even need solution (3); your
mailserver will be able to sending outgoing SMTP anyway.

What it _can't_ fix is incoming connections. So if you have

www.example.com.   A 192.0.2.4

in the DNS, and your satellite link fails, then users on the Internet will
be unable to reach www.example.com. If that's important to you, then host
your website outside the country, or at least a mirror of it.

But this does show it is possible to host the most important IP services in
a very resilient fashion using your two links, namely:
  - E-mail (inbound and outbound)
  - Web surfing (outgoing HTTP connections) and other TCP/UDP services
  - DNS

If you wish, you can have a degree of 'load balancing' between your two
links, by choosing which machines/modems are plugged into network A, and
which into network B.

Hope this is useful,

Brian.

Network.doc