IDR issues

January 31, 2011

RPKI-Based Origin Validation Operation - The Concept Behind

Today, a new version of the RPKI-Based Origin Validation Operation was published. This draft and the fact that origin validation will become an important feature in the next years motivated me to sketch the basic idea behind the draft and the basic concepts. It shall complete the discussion on how to differ between misconfigurations and attacks I gave seven weeks ago.

The Basic Problem

As I already discussed in several different posts before, the Inter-Domain Routing we know and use today is highly susceptible to misconfigurations and attacks. In principle, every Autonomous System which is part of the global routing may intentionally or unintentionally inject incorrect routing information. For the affected prefixes, traffic may be hijacked from some parts of the Internet and forwarded to a wrong destination. Even if filters (removing routing information that is obviously incorrect) applied by provider ASs may limit the problem, today traffic hijacking cannot be avoided effectively. Authentication schemes that would allow verifying routing information exist theoretically but are not deployed in production systems for several valid reasons.

Common Prefix Hijacking Events

Analyses of different papers as well as global routing information, for example provided by the Route Views and RIPE RIS projects, show that traffic hijacking is a common problem in the global Internet. They appear from time to time and influence the global routing varyingly strong. Having tens of thousands of ASs dealing with hundreds of thousands of IP-prefixes, such events seem to be pretty normal.
As already implied, on a high level of abstraction, prefix hijacking events can be classified in two different groups: Traffic may be hijacked intentionally, for example to blackhole or eavesdrop on the real origin AS, or unintentionally, e.g. due to a local misconfiguration. While intended attacks may obscure the fact that traffic is redirected (which clearly complicates detection), misconfiguration is usually very easy to detect: An AS that leaks incorrect paths for prefixes of other ASs into the global routing specifies an incorrect origin AS. From a global point of view, multiple Origin ASs can be observed for the same prefix, a so called BGP Multiple Origin AS (MOAS) conflict is induced: Paths visible at some ASs specify one origin (wlog. the correct one), while other paths visible at other ASs (wlog. the incorrect one) specify another origin AS. Today, caused by misconfiguration and thus not obscured, most hijacking events induce a MOAS conflict.

RPKI-Based Origin AS Validation

As MOAS conflicts are almost exclusively visible from a global point of view, they seem not to be an adequate indicator for routers to detect the common prefix hijacking events in real time. But even if they could be detected locally, a MOAS conflict does not allow a router to identify the correct and the incorrect announcement. The situation is further complicated by the fact that MOAS conflicts may have valid reasons, for example multi-homed private ASs or anycast routing. This is the part where the RPKI-Based origin AS validation comes into play.
The idea behind origin AS validation is pretty simple. Based on a PKI, owners of IP-prefixes can specify the valid origin ASs for each prefix. Stored in a distributed architecture, this information can be requested by routers: the origin AS specified in an update message can be validated. This is usually implemented at the border of an AS.
Compared to more comprehensive solutions like Secure BGP, the main advantage of this approach is that it can be deployed step-by-step: The only real requirement needed is the PKI. Having such an infrastructure, routers that support the new functionality can start to validate origins. The validation process specifies paths according to their origin AS as valid, unknown (notFound), or invalid. While it seems to be reasonable to filter out invalid paths in the long term, the result of the validation process may at first only be used to define the local preferences. As the draft states, "until the community feels comfortable relying on RPKI data, routing on Invalid origin validity, though at a low preference, may be common". Prefixes without validation data can simply be routed as today, routers that do not support origin validation simply work as today. A coordinated protocol update in the whole Internet is not necessary.

Current Status of the Infrastructure

As already mentioned, the basic requirement to implement origin AS validation is the public key infrastructure. This infrastructure will be operated by the five Regional Internet Registries (RIRs). At the current point in time, we are in a "beta phase". AfriNIC, LACNIC, and RIPE NCC have started their services at the beginning of this year, while APNIC offers this service since a while. Only the North American registry ARIN is not offering the service at the moment, however, they state that they will release the service "very early in the second quarter of 2011". Tools to validate the origin AS are also available, for example provided by RIPE NCC or (also using the RIPE tool) integrated in BGPmon.

Limits of the Concept

All in all, origin AS validation is a great step forward in solving the biggest problems of the today's Inter-Domain Routing. However, it should be kept in mind that origin AS validation cannot avoid intended prefix hijacking: Hijackers may simply specify an invalid AS-path that ends with the real origin AS. To identify and avoid this kind of hijacking, other, more complex schemes are required.
If you are rather interested in the technical than the conceptual details, you should have a look at the BGPmon blog.

December 11, 2010

Prefix Hijacking - How to Differ Between Misconfiguration and Intention?

Today, prefix hijacking events are mainly considered from a technical point of view, rarely from a political perspective. However, especially in the context of cyberwarfare, the OSI layer 8 perspective becomes more and more important. Considering a prefix hijacking event from this perspective, an important issue is the "intention" behind the event: Did we have observe the impact of a simple configuration error or did we fall victim to an intended attack? Even if most hijacking events we have observed so far can be traced back to misconfiguration, some events were also already associated with a deliberate attack in the past.

Reasons and Implications

Most hijacking events comply with distinct patterns and indicators. However, even if these patterns, for example the number of MOAS conflicts an AS is involved with, make prefix hijacking usually easy to detect, researchers can only read little into the intention. It is not clear whether we observe a misconfiguration or an intended attack. The fact that the reasons for an event are usually hard to determine unambiguously gives opinion leaders space for interpretation or - being more critical - to substantiate their individual positions: Attackers may cloud their intention by referring to misconfiguration, media and politicians may inflate events to increase the circulation or fan fears to reinforce the own arguments.

How to Differ Between Misconfiguration and Intention?

In principle, the best solution for the problem I sketched above would be to make use of techniques like Secure BGP (S-BGP) throughout the Internet. This would allow us to protect BGP against all likely misconfiguration and attack scenarios. However, S-BGP and comparable concepts are still far from being used in production systems, most probably as a full validation of global routing information is complex, resource intensive, and difficult to deploy globally. But as cyberwarfare will become a more and more realistic scenario in the future, we should urgently become capable to differ between attacks against the routing and misconfiguration.
Focusing this subgoal, an interesting alternative to S-BGP seems to be BGP Prefix Origin Validation, a concept which is currently under discussion in the Secure Inter Domain Routing working group. The basic idea behind the draft is not to sign the whole AS-path, but only the origin. This allows ASs to validate whether an AS originating a prefix is authorized by the prefix holder to do so. Even if this limited authentication cannot prevent all possible threats to the IDR routing, it allows operators to detect the typical globally relevant configuration errors. In principle, only those wrong updates may remain undetected where the correct origin is specified. If this is the case, i.e. if a hijacker specifies an invalid next-hop or even an invalid path, prefix hijacking is most likely not a result of a simple misconfiguration. An intended attack or at least a very good explanation by the source of the hijacking event can be expected.

Benefits of Prefix Origin Validation

All in all, a simple solution allowing operators to validate whether the origin is authorized to announce a prefix has two important advantages: Firstly, those prefix hijacking events that dominate today can be effectively detected without inducing the problems comprehensive solutions come along with. Secondly, it avoids that AS operators are falsely blamed to steal Internet traffic with intent. Both aspects are highly relevant from a technical and political perspective, which argues for the solution - even if it cannot address all relevant threats.

UB

November 27, 2010

BGP Path Selection - What's the right perspective?

Using classical BGP as specified in RFC4271, the routing information a router learns from its neighborhood depends on the perspective of its BGP peers. The peer speakers provide those paths they have chosen as best path and use for traffic forwarding. Simply speaking, we can say that BGP implements a sender-based selection of advertised routing information.

Receiver-based Selection of Advertised Routing Information

A few weeks ago, I wrote about “BGP Optimal Route Reflection”, a new draft that was published a few days before the 79th IETF meeting in Beijing. In principle, the draft proposes to combine classical Route Reflection with a receiver-based selection of routing information: Instead of advertising their own best paths, reflectors shall advertise the best known path(s) according to the topological position of the client. Generally, every client may be provided with different information. Today, I read a post in the blog of Cristel Pelsser, another researcher who is working on solutions for the iBGP anomaly problems. In her post, she describes a new concept of distributed Route Servers that provide routers with customized routing information matching to their topological position. Similar to the centralized iBGP Route Server architecture we proposed in 2009, this scheme implements a received-based selection of advertised routing information. Having now at least three schemes that implement a receiver-based selection of advertised routing information, it seems that this idea attracts the interest of more and more protocol designers and researchers. Thus, let’s have a closer look at the pros and cons of the basic idea.

Advantages of a Receiver-based Selection of Advertised Paths

Realizing iBGP via a full-mesh, a router certainly learns a path that optimizes its traffic forwarding costs (the formal prove may be found in our KIVS 2011 paper). Implementing an information reduction by means of Route Reflection (or AS Confederations), this property gets lost in general. To avoid problems at this point, the routing decision of a Route Reflector must reflect the local views of its clients. This usually limits the topological size of the clusters, which forces Network Operators to set up a high number of reflectors in their ASs.
If the best path decision of a reflector is separated from the information it provides to its clients, it can be located independently of its clients. For example, as proposed by Raszuk et al., this allows operators to centralize the reflectors. In a next step, reflectors may be replaced by several party-centralized Route Servers or even by one centralized Router Server. This may reduce the effort to operate existing or establish new POPs significantly.
Taking Add-path into account, there is no reason why routers should not be provided with several paths. As we could show in 2008, providing routers with several paths that match to their topological position, routing anomalies can inherently be avoided (without affecting the semantics of iBGP), while the scalability of the routing is ensured. Thus, a receiver-based selection of advertised routing information allows us to solve the iBGP anomaly problem in practice.

Drawbacks of a Receiver-based Path Selection

Generally, a receiver-based reduction of routing information comes along with several highly interesting advantages. However, as so often in the real world, advantages come along with disadvantages: Using classical BGP, it is very easy to implement the path announcement process. In principle, a router simply determines and advertises its best path to all BGP peers that do not already know the best path. Using a receiver-based selection of advertised information, deciding which information has to be advertised to which peer is not that easy any more: The sender must see things from the receiver’s topological perspective. In general, this perspective may differ from receiver to receiver, which results in additional effort for the sender. But even if this scheme is more complicated than the classical sender-based selection of advertised routing information, the effort seems to be manageable in practice: Up to step c) of the path selection process (comparison of MEDs), routing decisions are independent of the routers’ topological points of view. The most costly sub-decisions are identical for all routers.

Next Steps

From my point of view, standardizing techniques to implement a receiver-based reduction of routing information is a logical step to ensure scalability and solve the anomaly problem iBGP comes along with. Starting with a concept that extends the functionality of Route Reflectors certainly makes it easy for Network Operators to integrate the concept in their ASs. However, the (formally provable) benefits that come along with a server-based architecture should motivate us to think about leaving the known way of Route Reflection and think about Route Servers.

November 25, 2010

What do they exactly deny??

I am sure that most people who are interested in Internet Security have heared about the prefix hijacking event that has appeared on April, 8th 2010. Triggered by a U.S. government report published at the beginning of last week, the event gained high attention in media this month. In brief: China Telecom hijacked a huge number of address prefixes for around 18 minutes.

Plausible Denial

On wednesday last week, reuters reported that "The spokesman of China Telecom Corporation Limited denied any hijack of internet traffic". An interesting questions is what does this exactly mean? Data publicly available in the Internet and gathered from different independent ASs unambiguously show that a high number of public prefixes was hijacked by China Telecom. Of course, traffic directed to these prefixes was hijacked.

As it seems unlikely that China Telecom denies facts everyone could verify in principle, I belive the interpretation I found on dailytech.com seems to be most plausile: They reported that "China Telecom did not deny the incident occurred, but did deny that it intentionally 'hijacked' U.S. citizens' traffic." As described in my last post, this makes pefectly sense.

Prefixes and Traffic

Another aspect I want to mention here concerns the statement you find on several blogs and media that around 11/15/etc. percent of the Internet traffic was hijacked. From the techincal perspective this is not quite correct. Even if the order of magnitude matches the proportion of global prefixes that was hijacked, this does not mean that the same proportion of the global traffic was hijacked: Generally, the amount of traffic forwarded to different address spaces differs significantly. Details on that may be found in the Arbor Networks blog.

UB

November 18, 2010

U.S. Commission accuses China of data hijacking...

...is the title of an article published yesterday on Spiegel Online (German), one of the biggest news-websites in Germany (an article discussing this topic may also be found on cnn.com). Referring to a report published by the United States-China Economic and Security Review Commission on Wednesday, they raise the question whether a prefix-hijacking event observed in April 2010 and caused by a Chinese ISP could have been a deliberated (eavesdropping) attack against the U.S. government and U.S. companies. Even if the article does not give a final answer to this question, it suggests that this interpretation of the event is likely.

Motivated by this interpretation, I had a closer look at this event yesterday evening. The following analyses are based on the data provided by the Route Views Project. The event took place at April 8th, starting at around 3:54 p.m. UTC. At this point in time, AS23724 (China Telecom Corp. Ltd., the largest ISP in the People's Republic of China) started to originate at least 22,311 address prefixes. This is around 6.84% of the number of prefixes covered by the global routing table at this point in time. Before the event started, China Telecom originated 39 global prefixes. The events last for around 18 minutes.

From my point of view, four aspects seem to be relevant to assess the intention behind this event: Firstly, an important question is who is involved in the event. The report tells us that

, a state-owned Chinese telecommunications firm ‘‘hijacked’’ massive volumes of Internet traffic. [...]

China Telecom advertised erroneous network traffic routes that instructed U.S. and other foreign Internet traffic to travel through Chinese servers. [...]

This incident affected traffic to and from U.S. government (".gov") and military (".mil") sites, including those for the Senate, the army, the navy, the marine corps, the air force, the office of secretary of Defense, the National Aeronautics and Space Administration, the Department of Commerce, the National Oceanic and Atmospheric Administration, and many others. Certain commercial websites were also affected, such as those for Dell, Yahoo!, Microsoft, and IBM.

Even if this is indeed right, also organizations and companies from other countries were affected. Examples are France Telecom (109.211.0.0/16), Vodafone Ireland (e.g. 109.76.0.0/15), Sanyo (110.172.48.0/22), the Russian Institute for Public Networks (195.209.160.0/19), the Australian Department of Defence (203.10.234.0/24), and ChinaNet (many, many 110.x.x.x/24 networks), but also a lot of other companies and organizations could be mentioned. In fact, most parts of the "first world" were affected (the full list of Org-Names can be found here).

The second important aspect is the precision of the "attack". The event that has appeared on April 8th affected a lot of different organizations: We find the U.S. government, government organizations from other countries, business concerns from Europe, telcos from Asia, but also several other companies and organizations from many different countries. Obviously, purposefully redirecting such different kinds of traffic at the same time to the same destination does not really makes sense in practice.

Thirdly, the duration of the event should be kept in mind. 18 minutes is not that much time. It's seems not to be long enough to hijack specific information from any of the affected organizations (even if it is theoretically indeed enough time to gather IP- or mail-addresses). However, it seems long enough to identify and correct an error in the configuration.

Fourthly, China Telecom did not try to hide the prefix hijacking. In all new AS-paths, AS23724 can be identified as origin of the information announcement. After a few minutes, the event and its origin was clearly visible in the whole world.

All in all, from my point of view, an intended hijacking of network traffic is highly unlikely. I would guess that we have observed a simple but fatal configuration failure. If someone would try to hijack or eavesdrop on traffic, a plausible strategy would be to attack few prefixes that belong to one target. Most likely, the attacker would try to cloud the attack or at least its source, for example by manipulating parts of the AS-path.

However, even if we have observed most likely a simple misconfiguration event in this case, the basic problem lasts: BGP is highly vulnerable to misconfiguration and intended attacks. Most likely, a good attack could be hidden effectively today. But the report also has an upside: Politicians and the public start to become aware of the problem.

UB

Update: Of course, I am not the only one who had a closer look at the hijacking event on April, 8th 2010. Some further interesting details may be found in the renesys and Arbor Networks blogs.

October 30, 2010

IETF 79 IDR WG, what's up? Correctness, Correctness, Correctness!

Hi experts,

being concerned about the correctness of the iBGP routing, the recent days were really exciting. Certainly motivated by the upcoming IETF meeting, two new drafts focussing the correctness of the iBGP routing became available. These are the "Stable iBGP Decision Process with Route-Reflection" (available since yesterday) and the "BGP Optimal Route Reflection (BGP-ORR)" (available since 2010/10/16) drafts.

Let's have a closer look at the former draft first: The basic goal of the authors, Jamak et al., is to force stable routing decisions in ASs that implement BGP Route Reflection. To reach this, the authors proposed to prefer paths learned from close Route Reflection clusters over those learned from distant clusters. Technically speaking, they proposed to evaluate the CLUSTER_LIST length right after the LOCAL_PREF, the AS-path length, and the Origin Number (i.e. right before the MED attribute, cf. RFC4271, section 9.1.2.2). In principle, this implements the necessary condition Griffin et al. specified in 2002 to ensure forwarding correctness. So, the basic idea is well known in the scientific community and the resulting property (that is that the routing converges) is formally verified.

Even if convergence independent of any network design rules is an important property, the concept comes along with two fundamental disadvantages that - from my point of view - outweigh the advantages:
Firstly, hot-potato routing is "subverted". Let assume that as shown in figure 1, two paths p and q to the destination exist. Both paths specify the same LOCAL_PREF, AS-path length, Origin Number, peer-AS, and MED. In this case, BGP usually implements hot-potato routing, meaning that traffic is forwarded via the shortest path out of the AS (here path q). In the example I sketched below, the shortest path is the cluster-external path q from v's point of view. Since the cluster-internal paths are preferred, traffic is forwarded via path p. Unnecessary IGP costs are induced.
Secondly, ASs loose the ability to specify primary and secondary exit-/entry-points by means of the MED attribute. If the primary path for a prefix is cluster-external while a backup path for the same prefix is cluster-internal (both paths specify equivalent global path attributes), the backup path is preferred. Obviously, this is highly unwanted. To avoid this problem, Jamak et al. proposed to make use of communities, but realizing this in practice may be difficult in general.

Figure 1: Hot-potato routing does not work.

The second draft, BGP Optimal Route Reflection, is proposed by Raszuk et al.. In contrast to the former draft, the main goal of this proposal is to ensure that hot-potato routing is realized even if Route Reflection is used. Using classical Route Reflection, a reflector provides its clients with its best path. If the topological position of a client differs significantly from the topological position of its reflector, this method may cause situations where routers do not learn the closest exit-point.
In principle, this problem could be solved by using Add-path to advertise all available exit-points to a client. However, as this scheme may cause serious scalability problems in practice, Raszuk et al. proposed to provide clients only with information on their closest exit-point.
Putting aside the principal idea, a very interesting aspect of this draft is the paradigm shift it specifies. While today, the routing information a router advertises is strictly chosen from its local point of view, this drafts recommends to implement a "receiver-based" selection of advertised routing information. In principle, this selection may differ from receiver to receiver. Thinking this idea through to the end, it leads us to a Route Server based iBGP routing architecture (slides), in which every router is provided with specific routing information that ensures optimal, consistent, and stable routing decisions.
Even if ensuring hot-potato routing is an interesting feature, Raszuk et al.'s proposals is only a small step towards inherently optimal routing decisions. Indeed, unnecessary internal forwarding costs are avoided, but traffic may still be forwarded via an exit-point that specifies a suboptimal MED for its peer-AS group. Traffic may be forwarded via backup path into an AS even if the primary exit-/entry-point is available.

Even if the new drafts can sadly not entirely solve the correctness problems of iBGP, I'm really happy to see that network protocol designers working for the global players start to think about the correctness of this protocol. Its still a long path, but I am sure we are on the right way!

UB

October 24, 2010

My very first post - or: about this blog...

Hi folks,

I spent the last few minutes to think about the topic of the first post on "IDR issues". For sure, the best I could do is to use this opportunity to introduce myself and explain the basic idea behind this blog.

So, let's start with the latter: "IDR issues": what is this blog about? A brief look at wikipedia is not really helpful at this point: "Indonesian Rupiah" or "Inner Distribution Road" is not what I have really in mind with IDR. IDR stands for "Inter-Domain Routing", which - simply speaking - terms routing of global addresses in the Internet. Today, the Border Gateway Protocol (BGP) is used for that purposes.

Even if the Border Gateway Protocol works fine in principle, BGP in its current state seems not to be the answer to everything. For example, by applying common (Autonomous System-)internal BGP schemes like BGP Route Reflection and AS Confederations, routing may end up in sub-optimal or inconsistent states. Routing processes may behave non-deterministic or even non-convergent. Besides operational problems, also security problems are present: In principle, it is really easy to hijack IP address space or eavesdrop on someones global network traffic. Obviously, all these possibilities are highly unwanted and problematic. Discussing these - for many, many people highly important but yet unknown - aspects and potential problem solutions is what I want to do in this blog. So, this is another technical geek blog ;-).

After sketching the basic idea behind this blog, its a good idea to introduce myself. My name is Uli Bornhauser. I am a computer scientist at the University of Bonn, Germany, and researching in the area of the correctness and security of the Border Gateway Protocol. If you are interested in more details, you can have a look at my website.

Finally, I want to say a few words about the "comment policy". If you have any remarks or suggestions, I'm very happy to hear about them! For that purpose, you can use the "post a comment" function you usually find at the end of each post or directly send me an email. You find my email address on my website.

For the future, I plan to write new posts every few weeks. Looking forward to see you again...

UB

photo taken at "The Cell", Denver, CA, USA