being concerned about the correctness of the iBGP routing, the recent days were really exciting. Certainly motivated by the upcoming IETF meeting, two new drafts focussing the correctness of the iBGP routing became available. These are the "Stable iBGP Decision Process with Route-Reflection" (available since yesterday) and the "BGP Optimal Route Reflection (BGP-ORR)" (available since 2010/10/16) drafts.
Let's have a closer look at the former draft first: The basic goal of the authors, Jamak et al., is to force stable routing decisions in ASs that implement BGP Route Reflection. To reach this, the authors proposed to prefer paths learned from close Route Reflection clusters over those learned from distant clusters. Technically speaking, they proposed to evaluate the CLUSTER_LIST length right after the LOCAL_PREF, the AS-path length, and the Origin Number (i.e. right before the MED attribute, cf. RFC4271, section 126.96.36.199). In principle, this implements the necessary condition Griffin et al. specified in 2002 to ensure forwarding correctness. So, the basic idea is well known in the scientific community and the resulting property (that is that the routing converges) is formally verified.
Even if convergence independent of any network design rules is an important property, the concept comes along with two fundamental disadvantages that - from my point of view - outweigh the advantages:
Firstly, hot-potato routing is "subverted". Let assume that as shown in figure 1, two paths p and q to the destination exist. Both paths specify the same LOCAL_PREF, AS-path length, Origin Number, peer-AS, and MED. In this case, BGP usually implements hot-potato routing, meaning that traffic is forwarded via the shortest path out of the AS (here path q). In the example I sketched below, the shortest path is the cluster-external path q from v's point of view. Since the cluster-internal paths are preferred, traffic is forwarded via path p. Unnecessary IGP costs are induced.
Secondly, ASs loose the ability to specify primary and secondary exit-/entry-points by means of the MED attribute. If the primary path for a prefix is cluster-external while a backup path for the same prefix is cluster-internal (both paths specify equivalent global path attributes), the backup path is preferred. Obviously, this is highly unwanted. To avoid this problem, Jamak et al. proposed to make use of communities, but realizing this in practice may be difficult in general.
|Figure 1: Hot-potato routing does not work.|
The second draft, BGP Optimal Route Reflection, is proposed by Raszuk et al.. In contrast to the former draft, the main goal of this proposal is to ensure that hot-potato routing is realized even if Route Reflection is used. Using classical Route Reflection, a reflector provides its clients with its best path. If the topological position of a client differs significantly from the topological position of its reflector, this method may cause situations where routers do not learn the closest exit-point.
In principle, this problem could be solved by using Add-path to advertise all available exit-points to a client. However, as this scheme may cause serious scalability problems in practice, Raszuk et al. proposed to provide clients only with information on their closest exit-point.
Putting aside the principal idea, a very interesting aspect of this draft is the paradigm shift it specifies. While today, the routing information a router advertises is strictly chosen from its local point of view, this drafts recommends to implement a "receiver-based" selection of advertised routing information. In principle, this selection may differ from receiver to receiver. Thinking this idea through to the end, it leads us to a Route Server based iBGP routing architecture (slides), in which every router is provided with specific routing information that ensures optimal, consistent, and stable routing decisions.
Even if ensuring hot-potato routing is an interesting feature, Raszuk et al.'s proposals is only a small step towards inherently optimal routing decisions. Indeed, unnecessary internal forwarding costs are avoided, but traffic may still be forwarded via an exit-point that specifies a suboptimal MED for its peer-AS group. Traffic may be forwarded via backup path into an AS even if the primary exit-/entry-point is available.
Even if the new drafts can sadly not entirely solve the correctness problems of iBGP, I'm really happy to see that network protocol designers working for the global players start to think about the correctness of this protocol. Its still a long path, but I am sure we are on the right way!