[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: CPE router acting as host on its WAN interface (RE: draft-ietf-v6ops-ipv6-cpe-router-03.txt WGLC)



> Not quite; the OS seems to clear *FIB entries* based on the setting of
the IsRouter flag in the
> neighbor cache entry corresponding to the nexthop. The OS does not
clear entries in the nbr cache.

From RFC 4861:

   "Router Solicitations in which the Source Address is the unspecified
   address MUST NOT update the router's Neighbor Cache; solicitations
   with a proper source address update the Neighbor Cache as follows.
...
   Whether or not a Source Link-Layer
   Address option is provided, if a Neighbor Cache entry for the
   solicitation's sender exists (or is created) the entry's IsRouter
   flag MUST be set to FALSE."

> But, if the CE router subsequently sends an NA message with the R bit
(i.e., the Router bit) set to
> TRUE, the SP router will set IsRouter in the nbr cache entry to TRUE
and the danger of FIB entry
> deletion is averted.

Well, the CE Router may need to receive an RA in order to know how to do
address acquisition on its WAN interface (doing SLAAC/DHCP, etc.).
Waiting for a periodic RA may not be feasible in some deployments, so a
CE Router MAY send an RS in order to increase the chances of receiving
an RA in a timely manner.  We don't want to block CE Routers from ever
sending RS's on their WAN interface.

Garbage collecting the FIB entries based on IsRouter value in the
Neighbor Cache is not specifically prohibited by RFC 4861 - so we're not
talking about a non-compliance issue.  

From RFC 4861:

    "To limit the storage needed for the Destination and Neighbor
Caches,
     a node may need to garbage-collect old entries.  However, care must
     be taken to ensure that sufficient space is always present to hold
     the working set of active entries.  A small cache may result in an
     excessive number of Neighbor Discovery messages if entries are
     discarded and rebuilt in quick succession.  Any Least Recently Used
     (LRU)-based policy that only reclaims entries that have not been
used
     in some time (e.g., ten minutes or more) should be adequate for
     garbage-collecting unused entries.

     A node should retain entries in the Default Router List and the
     Prefix List until their lifetimes expire.  However, a node may
     garbage-collect entries prematurely if it is low on memory.  If not
     all routers are kept on the Default Router list, a node should
retain
     at least two entries in the Default Router List (and preferably
more)
     in order to maintain robust connectivity for off-link
destinations."

And, sending a gratuitous NA after an RA solely for the purpose of
preventing Linux running on the SP from GC'ing the CE Router entry has
the problems that you've already identified, and seems like a hack:

> Two problems with this however. First, it requires the CE router to
send a gratuitous NA message.
> Secondly, the CE router has no way of knowing if the SP router has
received the NA message.

I think the only other options are to say "don't GC if IsRouter is
FALSE" to Linux, which may not be an option if you run out of space, or
make sure that there's enough space that you don't GC more often than
you'd expect traffic from the CE Router to keep the entries alive, which
is already recommended by RFC 4861:

    "However, care must be taken to ensure that sufficient space is
always 
     present to hold the working set of active entries."

I think we've analyzed the problem fully now.  From a specification
standpoint, I don't know what you want us to do.  From a practical
implementation standpoint, I think you know what you're options are.

- Wes

-----Original Message-----
From: owner-v6ops@ops.ietf.org [mailto:owner-v6ops@ops.ietf.org] On
Behalf Of Templin, Fred L
Sent: Wednesday, January 06, 2010 12:08 PM
To: Hemant Singh (shemant); Fred Baker (fred); v6ops@ops.ietf.org
Cc: kurtis@kurtis.pp.se; rbonica@juniper.net
Subject: RE: CPE router acting as host on its WAN interface (RE:
draft-ietf-v6ops-ipv6-cpe-router-03.txt WGLC)

Hemant,

> -----Original Message-----
> From: Hemant Singh (shemant) [mailto:shemant@cisco.com]
> Sent: Wednesday, January 06, 2010 8:24 AM
> To: Templin, Fred L; Fred Baker (fred); v6ops@ops.ietf.org
> Cc: kurtis@kurtis.pp.se; rbonica@juniper.net
> Subject: RE: CPE router acting as host on its WAN interface (RE: 
> draft-ietf-v6ops-ipv6-cpe-router- 03.txt WGLC)
> 
> Fred,
> 
> It's a well-known problem in Linux that the OS incorrectly combined 
> the Neighbor Cache and the Destination cache causing data forwarding 
> failures and incorrect on-link assumptions.  This problem you are 
> alluding to about the IsRouter is another bug in the Linux code as to 
> why the OS has FIB clearing entries in the Neighbor Cache?

Not quite; the OS seems to clear *FIB entries* based on the setting of
the IsRouter flag in the neighbor cache entry corresponding to the
nexthop. The OS does not clear entries in the nbr cache.

> The FIB is
> the Prefix List, the Destination Cache, and the Default Router List; 
> the FIB should not touch the Neighbor Cache.  I do grant you an OS can

> independently garbage collect entries in the Neighbor Cache and the OS

> is also not non-compliant for ND if the OS deletes entries in the 
> Neighbor Cache with IsRouter flag set to FALSE.  Note ND RFC 4861 does

> not say anything about garbage collecting entries in the Neighbor 
> Cache with IsRouter flag set to FALSE.

No, I am not talking about garbage collecting *nbr cache* entries based
on IsRouter; I am talking about garbage collecting *FIB entries* which
can lead to loss of connectivity. I have said this a number of times
now.
Wes said it in his message, too.

> Now, when anyone reports a bug to me, I try to ascertain the severity 
> of the bug.  The issue you raise does not look severe to me,  It's a 
> temporary problem that can fix itself.

Fix itself how? Once the FIB entry is gone there would need to be some
protocol for bringing it back and I don't see that specified anywhere.
And, unless the nbr cache entry IsRouter flag gets set to TRUE, the FIB
entry could just be garbage collected all over again resulting in the
same loss of connectivity.

> If an OS has this garbage
> collection nuance and the Neighbor Cache entry is deleted, when the 
> next packet needs to be sent to the node whose entry was deleted in 
> the SP rtr, ND address resolution will take place and resolve the 
> address causing the Neighbor Cache to be populated again. ND also 
> specifies the packet be held in a queue till the packet's destination 
> is resolved - so the SP rtr is not likely to drop any packets.

See above - it is FIB entry deletion and not nbr cache entry deletion
that concerns me.

> Wes already asked, what if the CE Rtr always sets the IsRouter flag in

> ND messages where this flag is possible to be set and that should take

> care of your Linux problem.  If the CE Rtr sends an NA, the CE Rtr 
> will set the IsRouter flag to TRUE.

I already said this both in an off-list message and more recently
on-list. If the CE router sends an RS, then the SP router will set
IsRouter in its nbr cache entry for the CE router to FALSE. But, if the
CE router subsequently sends an NA message with the R bit (i.e., the
Router bit) set to TRUE, the SP router will set IsRouter in the nbr
cache entry to TRUE and the danger of FIB entry deletion is averted.

Two problems with this however. First, it requires the CE router to send
a gratuitous NA message. Secondly, the CE router has no way of knowing
if the SP router has received the NA message.

> Did we miss anything?

Yes, but I think I clarified it above?

Fred
fred.l.templin@boeing.com

> Thanks,
> 
> Hemant