AS-level Topology Inference

Let's begin this lecture with an elegant and concise definition of the Internet: the Internet is a network of networks. This description is strikingly effective because it captures the essence of the Internet on two levels—not only its structural design but also its decentralized and collaborative governance.

Slide 1

Slide 2

In this definition, "networks" refer to autonomous systems, which are the largest administrative units within the Internet. Autonomous systems are self-governing entities responsible for managing their own traffic routing and maintaining full control over their internal networks. Examples include organizations like Swisscom, SWITCH, Deutsche Telekom, Google, and others.

Slide 3

The "network" refers to the interconnection of these autonomous systems. The key takeaway here is that the Internet lacks a central governance structure—no single entity or organization controls it or has a complete view of its entirety.

If there is one concept to remember from today's lecture, it's this: if we represent each autonomous system as a node and their connections as edges, no one today knows the exact topology of the "Internet graph".

But this raises an important question: why should we care?

Slide 4

The reason is that, in most cases, when the destination one wants to reach is not within the same autonomous system, the Internet must route the traffic through a series of autonomous systems. In other words, it traverses the "network of networks".

Research indicates that, on average, two networks are separated by two to four intermediary networks. However, in some cases, this number can increase to as many as ten to twelve networks. This means that when connecting to another network, the traffic might potentially traverse through ten or more intermediary networks along the way.

Slide 5

We do not know the policies that these intermediary networks have in place. Nor do we know if a shorter or more optimal path—based on certain criteria—might exist to reach the destination if these intermediary networks were bypassed or behaved differently.

As Laurent also pointed out, there are many reasons why we should care about these intermediary networks and the AS paths our traffic traverses. Security is one example, along with privacy and performance. I won't go into detail again here, but very briefly, the key idea is that we want to understand the paths our traffic takes across the Internet.

Slide 6

So, we want to learn more about the topology of the Internet. However, the challenge is that achieving full knowledge of this topology would require access to all the routing tables from every network. Essentially, this would mean logging into each network, observing how it reaches every other network, and repeating this process for all networks. If we could do this, we would have a complete connectivity matrix of the Internet.

Unfortunately, we are still far from achieving this level of visibility.

Slide 7

Instead of having full visibility into the Internet's topology, we rely on measurement platforms that collect data from a small subset of networks. These networks collaborate by providing researchers with data on how they reach other networks. You can think of this partial data as a snapshot of the Internet, captured from a few hundred networks. While these snapshots are valuable, they represent only a fraction of all the networks on the Internet.

In addition to this partial view, we also rely on basic assumptions or best practices about how networks handle and propagate routing information. Specifically, these assumptions include how networks apply rules to the routes they receive from other networks and how they decide to propagate these routes further.

By combining the partial snapshots we have with these basic rules, we can attempt to reverse-engineer the policies that networks apply to the routes they receive. This allows us to infer behavior for routes we don’t directly observe. In doing so, we can develop an approximate understanding of how consecutive networks might handle routes, enabling us to predict how traffic would traverse through the Internet and ultimately reach its destination.

Slide 8

The goal of this part of the lecture is to reconstruct the AS-level Internet topology. Naturally, this presents an inference problem:

Given the observable routes and the common policies that we believe networks apply, how can we develop a model of the Internet topology that is as accurate as possible?

Slide 9

There's both bad news and good news.

The bad news is that, currently, we are still far from achieving this goal in a highly effective way.

The good news is that part of the research in our group focuses on identifying which aspects of this problem are most critical and how each contributes to inaccuracies in the reconstructed topology. We will be presenting this research next week, making this lecture as current and fresh as possible—which is the exciting part!

Slide 10

Let’s start by explaining what I mean by a partial view. Imagine, as shown in this slide, we have seven different networks. For simplicity, let’s assume this represents the Internet.

Slide 11

Now, suppose that two of these networks—AS6 and AS5—agree to share their routing tables with us. This collaboration gives us a limited view of how these networks interact with others.

Slide 12

From AS6's perspective, we obtain some routing information. For example, AS6 learns that the route to AS1 can be reached via AS3. Additionally, AS4 also shares a route to AS1, offering an alternative to the one through AS3. There’s yet another route to AS1, which is longer and traverses AS2, AS5, and AS7 before eventually reaching AS6.

Slide 13

From the collected information, we start to learn certain aspects of the topology. Specifically, we can identify some of the links in this topology. By observing the AS paths, we now know, for instance, that AS1 and AS3 are neighbors, as well as AS3 and AS6. Similarly, every green link you see in the graph represents a connection we’ve inferred based on the routes we observed. These links exist because the observed routes reveal them.

Slide 14

We also gain insights into some policies by observing these AS paths. For example, we learn that AS3 exports routes originating from AS1 to AS6, and AS4 similarly exports routes originating from AS1 to AS6. These export rules reflect the policies applied by these specific networks.

Ultimately, our goal will be to use these observations to infer behavior for the parts of the topology we cannot directly observe. Specifically, we aim to answer the question: Would a given network relay a specific route, based on the observed behavior on other routes?

Slide 15

Now, let’s apply the same process to AS5, which is the other network sharing its routing data with us. By including AS5's perspective, we discover one additional link compared to the previous example. This demonstrates that having routes from multiple vantage points, rather than just one, significantly enhances our ability to uncover more of the Internet's topology.

Slide 16

With more data, additional links appear, which is helpful. However, even after observing all the routes received by AS6 and AS5, it remains challenging to answer questions like how AS7 would reach AS3. This difficulty arises because we don’t have a vantage point at AS7, and we lack the ability to fully transfer the knowledge from the observed routes to those we haven’t observed.

Slide 17

The information we have is not inherently transferable, meaning we haven’t yet been able to fully learn from it. To enable this learning, we need a model—an abstract representation of how routing behavior operates within the topology.

Slide 18

For the remainder of this lecture block, there are two parts. In the first part, we will focus on understanding the model itself. In the second part, once we understand the model, we will revisit the partial view—the observed routes—and, using this input data, we will work on tuning our model by annotating the topology graph.

By "tuning," I mean employing a heuristic algorithm, which we will explore next week. This algorithm will help us annotate the topology based on the observed data. Step by step, this process will bring us closer to aligning the topology with the observed routes.

Slide 19

Let’s begin with the first part: the model.

Slide 20

I assume you already have some understanding of what the Gao-Rexford model is. Fundamentally, it's based on the idea that there are two primary types of relationships on the Internet.

Slide 21

The first type of relationship is the customer-provider relationship.

This is a business relationship fundamentally based on monetary exchange. In this arrangement, the provider offers the customer a service that grants access to the Internet.

The service works as follows: Any route the provider receives from the customer is redistributed to the rest of the Internet. Similarly, any route the provider receives from the Internet is redistributed to the customer.

Essentially, the provider ensures that the customer has access to the entire Internet. In return, the provider charges the customer based on the volume of traffic passed through its network. The more traffic the customer generates, the more of the provider's physical network resources are consumed, and consequently, the higher the charges.

The second type of relationship, peer-to-peer, is not primarily about generating revenue but rather about reducing costs.

In this arrangement, both parties agree to exchange routes with each other. The goal is to avoid sending traffic through their providers, which would incur charges. Instead, by passing traffic directly between each other, they reduce the amount of traffic sent to their providers and, consequently, lower their costs.

Let's explore how this concept translates more formally into the export policies I briefly mentioned a couple of slides earlier.

Slide 22

In this small graph, AS9 is a customer of AS10 and a provider to AS8. Its export policies are as follows: AS9 exports its own routes, along with the routes of all its customers, to its provider.

Slide 23

On the other hand, AS9 does not export routes from its peers or providers. Allowing this would mean that AS9 would transit traffic, for example, from AS11 to AS10. In such a case, AS9 would incur charges for both sending and receiving the traffic, which is not viable from a business perspective.

Similarly, for its peer, AS7, the economics also don’t work. Since peers don’t charge each other, forwarding traffic from AS7 to its provider (AS10) would result in AS9 bearing the cost of that traffic to AS10 without any compensation, making it economically disadvantageous.

Slide 24

The provider-to-customer relationship, which is the reverse of the customer-to-provider relationship, works as follows: Assume AS9 is the provider for both AS6 and AS8. If AS9 receives a route from one of its customers, it relays that route to its other customers because doing so is economically beneficial.

Slide 25

Additionally, in this scenario, AS9 also relays routes it receives from its providers and peers to AS8, as AS8 is its paying customer.

Essentially, the provider in this relationship offers access to any route that it can reach, ensuring comprehensive connectivity for its customers.

Slide 26

For the peer-to-peer relationship, let’s assume AS9 and AS7 are peers. In this case, AS9 relays the routes it receives from its customers to its peer (AS7). This approach is preferable because it avoids routing the traffic through AS9's provider, which would incur additional costs.

Slide 27

Slide 28

However, AS9 does not relay routes from its other peers or providers to its peer. This is because AS9 doesn’t get paid for providing this service. In fact, the opposite is true: there’s no monetary exchange with its peers, and relaying traffic to its provider would result in AS9 incurring costs.

From an economic perspective, this behavior is not viable.

Slide 29

All these export rules are tied to a concept called valley-free paths. This concept means that once a route traverses a provider-to-customer or peer-to-peer edge, it cannot subsequently traverse a customer-to-provider or another peer-to-peer edge.

Slide 30

This slide summarizes the concepts described earlier. Specifically, it highlights that: a provider-to-customer edge can be followed exclusively by other provider-to-customer edges; and that a peer-to-peer edge can only be followed by provider-to-customer edges.

The valley-freeness property translates into expected patterns for how routes should behave, as well as patterns that are not allowed to occur. These rules can be expressed through six distinct patterns.

Slide 31

The first pattern involves a route originating from a customer and moving upwards through a series of customer-to-provider edges—customer to provider, provider to provider, and so on. This pattern is valid because any provider receiving a route from its customer is obligated to forward it to the rest of the network, as doing so generates revenue for the provider.

Slide 32

The second pattern is the opposite of the first. In this case, a route is received by the topmost provider and is relayed downwards through a series of provider-to-customer edges—provider to customer, customer to customer, and so on. Based on the definitions discussed in the previous slides, this pattern is also valid, as routes are allowed to follow this path.

Slide 33

The third pattern involves a route originating from a customer, moving to a provider, and then being relayed further by the provider to its other customers. Since the provider receives this route from its customer, it forwards it to the rest of the network, including its other customers.

This is essentially an extension of the first pattern discussed on the previous slide, where the path continues downward from AS3 to AS4 to AS7.

Slide 34

The same logic applies in the following scenario: a route travels from a customer to its provider, then to another provider, followed by a peer-to-peer connection, and finally from the peer to its customer.

In this case, the provider forwards the route to its peer, and the peer subsequently relays it to its customer. This pattern aligns with the rules we’ve discussed.

Slide 35

Slide 36

Slide 37

The main idea behind all of this is that, based on the two types of business relationships we defined and their corresponding export rules, we can see how these rules combine to form the six distinct patterns.

Using this framework, when we observe a route from the actual Internet, we can attempt to match it to one of these six patterns.

Slide 38

To summarize, in the next lecture, our goal will be to tune the model such that when we observe a route, its parameters align with one of these defined sequences of rules, ensuring that the observed route makes sense within the model.

As you can imagine, the more routes we observe from diverse vantage points, the more links between networks we discover, giving us greater insights into how to adjust the edges in the model to fit these patterns.

This process will be guided by the ASRank algorithm, a heuristic approach that helps us achieve this alignment. We’ll delve deeper into this algorithm in the next lecture.

Thank you for following along, and I look forward to seeing you next week!

Slide 39

Hello, everyone, and welcome to the second part of our lecture. If you recall from last week, the problem we are addressing is the reconstruction and annotation of the Internet topology.

Slide 40

To recap, we defined two types of relationships. First, customer-provider relationships, where the provider network offers global reachability to its customer by propagating the customer's routes to all other networks it can reach. Second, peer-to-peer relationships, where the goal is not to generate revenue but to reduce costs. In this arrangement, the two networks share their own routes and their customers' routes with each other. This avoids routing traffic through their respective providers, which would result in both sides incurring charges from their providers.

Slide 41

After understanding these relationships, we introduced the concept of the valley-free property. This property ensures that routes do not traverse valleys in the network topology.

By definition, once a provider-to-customer edge is traversed, it can only be followed by other provider-to-customer edges. Similarly, if a peer-to-peer edge is traversed, it can only be followed by provider-to-customer edges.

This constraint helps us define valid routing paths within the Internet topology.

We covered all of this last week. Toward the end of the lecture, we discussed how the valley-freeness property enables six distinct types of patterns to be observed when analyzing a path.

Slide 42

The first pattern is called the uphill path. This is defined as a path that originates from a customer and traverses upwards through its providers, continuing through successive customer-to-provider relationships.

Slide 43

The second pattern is the downhill path, where the route originates from a provider and traverses downward through successive provider-to-customer relationships, reaching the customers.

Slide 44

The third pattern is a combination of the first two: an uphill path followed by a downhill path. In this case, the route starts from a customer, moves upward through providers, and then descends through customers.

Slide 45

The fourth pattern is an uphill path followed by a peer-to-peer edge. In this case, the route originates from a customer, traverses upward through providers, and then connects to a peer.

Slide 46

The fifth pattern is a peer-to-peer edge followed by a downhill path, where the route traverses a peer-to-peer connection before moving downward through customer relationships.

Slide 47

The sixth pattern is an uphill path followed by a peer-to-peer edge, which is then followed by a downhill path. In this case, the route starts from a customer, moves upward through providers, connects to a peer, and then descends to customers.

These are the six types of patterns that we expect to observe when a router on the Internet receives or observes a path. We covered these in detail last week.

Slide 48

This week's topic is the ASRank Algorithm, and by understanding this algorithm, we will learn how to annotate the Internet topology.

To reiterate our goal from last week: we started with a partial view of the Internet, which is a subset of networks receiving routes from around the globe. These routes serve as our input. Using these routes and the AS paths they contain, we aim to construct the Internet topology and annotate the relationships between ASes. The end result will be an annotated representation of the Internet topology, derived from the partial view.

The algorithm bridges this gap, transforming the input (observed routes) into the output (an annotated topology).

The core idea here is straightforward: when a router receives a route, it is tagged with an AS path. An AS path represents the sequence of ASes the route traverses. This sequence inherently defines the neighborhood relationships between the ASes, which means that each subsequent AS in the path is linked to the next.

Slide 49

From the six patterns we discussed, if there is a peer-to-peer link between AS1 and AS2, the only type of link that can follow is a provider-to-customer link. Across all six patterns, whenever a peer-to-peer link appears, it must always be followed by a provider-to-customer link, which I’ve denoted here using the "greater than" symbol (>).

The same logic applies to provider-to-customer relationships. For example, in the snippet shown, if we know for certain that the link between AS4 and AS5 is a provider-to-customer link, then the link between AS5 and AS6 can only also be a provider-to-customer link. Neither a peer-to-peer nor a customer-to-provider link would satisfy the valley-free property in this scenario.

Slide 50

Slide 51

If we revisit one of the patterns involving AS1, AS3, AS4, and AS7, let's assume we know that the link between AS1 and AS3 is a provider-to-customer link. This necessarily implies that the link between AS3 and AS4 must also be a provider-to-customer link. Furthermore, if we establish that AS3 to AS4 is a provider-to-customer link, then the link between AS4 and AS7 can only be a provider-to-customer link as well.

This means that by knowing the relationship between AS1 and AS3, we can infer and tag the remaining links in this path. This inference is based on the rules governing how such paths should behave.

Slide 52

The same logic applies to this peer-to-peer example. Suppose we know that the link between AS1 and AS2 is a peer-to-peer link. Based on the rules we discussed, the only type of link that can follow is a provider-to-customer link, with AS2 being the provider and AS5 being the customer.

Knowing this relationship also allows us to deduce that the next link, between AS5 and AS7, must also be a provider-to-customer link.

In this case, the entire AS path—from AS6 (the originating AS) to AS7—includes five different links. However, by knowing just one link in this path (the peer-to-peer link between AS1 and AS2), we were able to infer the relationships for all the other links. This is because the presence of a peer-to-peer link imposes strict constraints, such that the links on either side can only have specific types of relationships.

Slide 53

The ASRank algorithm builds on this intuition. Its goal is to identify the two types of relationships—peer-to-peer and provider-to-customer—and use this knowledge to infer subsequent links along the AS paths.

Naturally, this leads to an important question: Once we build the annotated topology from the observed AS paths, how can we be certain of its accuracy? More specifically, how can we use some known relationships as a seed to start inferring the remaining ones?

Slide 54

To address this, we rely on contextual knowledge about the Internet, and the ASRank algorithm is built upon three key pieces of this knowledge. You can think of these as the foundational assumptions or invariants that ASRank takes for granted about the Internet—properties that are always satisfied, regardless of the specific topology.

Slide 55

The first key assumption made by the ASRank algorithm is that a peer-to-peer clique exists at the top of the hierarchy. This is based on well-known business relationships and industry knowledge about the Internet.

Currently, there are approximately 10 to 15 ASes worldwide referred to as tier-one providers. These are intercontinental companies that do not require a provider to access the entire Internet. In other words, they are transit-free, meaning they don't pay any other network for global reachability.

ASRank assumes that in the Internet graph, these 10 to 15 tier-one ASes are located at the very top. They do not have any providers themselves and connect exclusively through peer-to-peer links, without paying one another.

This assumption implies that, at the very top of the Internet topology, there is a fully connected mesh between these ASes. All the links within this mesh are tagged as peer-to-peer links.

Slide 56

The second key property is that customers announce routes to their providers to enable global reachability. Intuitively, this works as follows:

Imagine a stub AS that requires a provider for global reachability. If its provider is not one of the tier-one ASes, it will, in turn, rely on another provider to ensure global reachability.

Extending this logic, from any point in the Internet, there is a chain of customer-to-provider relationships leading upward until one of the tier-one ASes at the top is reached.

You can visualize the Internet topology as resembling a bracelet. At the very top is the fully connected peer-to-peer mesh of tier-one ASes. Below this mesh, chains of customer-to-provider relationships rise from individual ASes to the mesh. Once routes reach the mesh, they descend back down through providers and customers to their destinations.

This hierarchical structure ensures global connectivity across the Internet.

Slide 57

The third assumption made by ASRank is that customer-to-provider loops do not exist in the Internet topology. This means there is no scenario where: a network is a provider for another; that second network is a provider for a third; and the third network loops back to being a provider for the initial network.

This assumption is supported by extensive research and industry knowledge, as no observed topology has shown such a loop. Therefore, one of the primary goals when annotating topology relationships is to ensure that no such loop is introduced.

From a technical perspective, such loops would also be problematic for protocols like BGP. As highlighted in one of the recommended readings, BGP cannot function properly in the presence of a customer-to-provider loop because it would cause the protocol to oscillate indefinitely. While occasional instances of such loops may occur due to misconfigurations, these are considered anomalies and not representative of a stable or functional Internet topology.

In summary, such loops are not only unsustainable from an organizational perspective but also pose significant technical challenges.

Slide 58

If we accept these properties as true for the Internet, we can begin analyzing the unannotated topology—a graph consisting of ASes and the links between them, as collected from the measurement routers.

From a graph-theoretical perspective, there are two key structures we can identify in this graph:

1. Meshes (or cliques): While the graph may contain multiple meshes or fully connected subgraphs, our primary interest lies in identifying the topmost clique—the group of tier-one ASes that form the fully connected peer-to-peer mesh at the highest level of the hierarchy.

Slide 59

The second structure we can identify in the graph is the leaf nodes, or stub networks. These are ASes at the edges of the topology that no other AS traverses to reach another AS.

While identifying these stub networks is straightforward, they are not particularly useful for our purposes. This is because: a stub network might simply be a customer of another AS; alternatively, it might have a peering relationship with another stub network.

If you recall, the relationship we are most interested in involves provider-to-customer paths. Starting with a stub network does not help much, as the next annotation in the path could still be ambiguous.

Slide 60

On the other hand, if we can identify the topmost clique in the graph—the fully connected group of tier-one ASes—we have a solid starting point. This clique consists entirely of peer-to-peer relationships, which we can be certain of. From there, for AS paths traversing this clique, we can begin annotating the remaining links in those paths with greater confidence.

This is why identifying the top clique is critical—it serves as the foundation for the rest of the annotation process.

This is how the ASRank algorithm begins. It starts by assigning a power value to each AS. These power values represent the influence or significance of an AS within the topology.

Using these assigned power values, the algorithm identifies all the meshes (or cliques) in the graph. The most powerful mesh is then designated as the tier-one clique—the fully connected group of tier-one ASes at the top of the Internet hierarchy.

Slide 61

The initial step, or Step 0, is to measure the power of each AS to determine which ASes are more influential or powerful compared to others.

Slide 62

Step 1 is to identify the top clique in the topology, as previously discussed, and mark all the links within this clique as peer-to-peer relationships.

Slide 63

From the peer-to-peer mesh at the top of the hierarchy, the algorithm works its way down through the graph in a ranked manner, starting with the most powerful AS and proceeding to the least powerful.

The process involves analyzing consecutive links along AS paths: the first link is part of the peer-to-peer clique; the next link, which is initially untagged, is assigned a tag (peer-to-peer or provider-to-customer) based on the inferred relationship.

Once a link is tagged, it informs the tagging of subsequent links in other AS paths. The algorithm continues this process, progressively annotating the graph by following provider-to-customer or peer-to-peer paths, effectively traversing the topology downward.

However, there is an exception: if the algorithm encounters a link where the inferred tag conflicts with the power ranking (e.g., a provider-to-customer link where the "customer" appears more powerful than the "provider"), that link is left untagged and skipped for later resolution.

I’ll explain this step in more detail as we proceed.

Slide 64

At this step, we don't immediately assign a tag to certain ambiguous links. Instead, we leave them unannotated for now. As I go through the algorithm in detail, you'll see why this decision is made.

In subsequent iterations, the algorithm revisits these unresolved links to clarify and assign the appropriate tags. I'll explain this process more thoroughly as we proceed through the steps of the algorithm.

Slide 65

After completing all the algorithmic steps, any remaining unresolved links are assumed to be peer-to-peer connections.

To summarize, the algorithm aims to: identify the topmost clique; determine the hierarchical waterfall of provider-to-customer and customer-to-customer relationships as it moves downward through the graph; resolve as many links as possible through iterative processing.

For any links that remain unannotated after all iterations, we classify them as peer-to-peer links by default.

I will elaborate on this process and its nuances as we proceed.

Slide 66

In this algorithm, the power assigned to each AS is based on a metric called transit degree. The transit degree of a network is calculated as the number of AS neighbors where the network appears in the middle of an AS path, rather than at the beginning or end.

To compute this: for each AS path, if the network is neither the first nor the last AS in the path, count every AS that comes before or after it in the path. Increment the transit degree accordingly.

This metric essentially measures how often a network serves as a transit point for other networks. The more networks that rely on a particular AS for transit, the higher its transit degree, and therefore, the stronger its influence or power in the topology.

Slide 67

Using this logic, we identify the topmost clique as the mesh of ASes with the highest transit degree. This clique consists of the most powerful ASes, those that serve as key transit points for the largest number of networks.

Slide 68

Here is our example topology. Let's assume AS1, AS2, and AS3 form the top clique, identified as the peer-to-peer mesh. This is represented by the green dashed triangle, indicating the fully connected peer-to-peer relationships within this clique.

The blue numbers represent the transit degrees of each AS. Naturally, the tier-one ASes—AS1, AS2, and AS3—emerge as the most powerful, with transit degrees exceeding 1000. In contrast, the transit degrees of other ASes are lower.

The ASRank algorithm now works by identifying triplets in the observed AS paths. For example: In the path AS8 → AS5 → AS3 → AS2, we identify two triplets: AS2, AS3, AS5 and AS3, AS5, AS8. Similarly, for other paths, triplets are identified. In cases where the path length is exactly three, it contains just one triplet.

This process ensures that for every AS path observed, the triplets within the path are systematically identified.

Slide 69

Slide 70

Next, the triplets are ranked based on the transit degree of the last element. The triplets are sorted in descending order, with the most powerful AS listed first.

In this example, AS5 has a transit degree of 656, as shown in the diagram. AS5 is the most powerful AS that is not part of the top clique, so it takes priority in the ranking.

Slide 71

Here is the ranked version of the triplets. Compared to the original set, the triplets are now ordered based on the last element in each triplet.

As you can see, AS5 is ranked first, followed by AS8, then AS7, and so on. The ranking process ensures that the triplets are prioritized according to the transit degree of their last element.

Slide 72

Next, we start identifying relationship patterns based on the rules discussed earlier in the lecture. Let's consider the first triplet: AS2 → AS3 → AS5.

From Step 1, we already know that AS2 and AS3 have a peer-to-peer relationship because they are both part of the top clique. Based on the nature of these relationships, a peer-to-peer link can only be followed by a provider-to-customer link.

This allows us to infer that AS3 is the provider of AS5. If we encounter this relationship in other triplets, we use the same inference to apply the tag consistently across the graph.

Slide 73

In the next step, now that we have identified the AS3-to-AS5 link as a provider-to-customer edge, we can resolve the following triplet, AS5 → AS8. Based on the same reasoning discussed earlier, we infer that the AS5-to-AS8 link is also a provider-to-customer edge.

We tag this relationship accordingly (as shown in red). Once this triplet is resolved, we proceed to the next unresolved triplet.

Slide 74

At this point, we encounter a problem. First, the nature of the first pair in the triplet is unknown. Second, if we examine the transit degree of AS7, we see that its power is greater than that of AS4.

Given this, we cannot immediately conclude that AS4 is the provider of AS7, as it would contradict the assumption that providers are typically more powerful than their customers.

For now, we skip this triplet and will revisit it in a later stage of the algorithm.

Slide 75

Next, we proceed with the last triplet, which begins with a peer-to-peer link within the clique (e.g., AS1 → AS2). From this, we can infer that the subsequent relationship is a provider-to-customer link. This relationship is tagged accordingly.

Resolving this triplet also provides additional information that could help us infer the nature of the previously skipped relationships. We will revisit those relationships later, using the insights gained here.

Slide 76

Now that we've traversed all the triplets, we revisit the previously skipped relationships to try to resolve them—specifically, the relationship between AS4 and AS7.

With the additional information gathered during earlier steps, we can now confidently infer that the AS4-to-AS7 link is a provider-to-customer relationship, and we tag it as such.

In this small topology, this was the only unresolved relationship to revisit, so this step concludes here.

However, in the upcoming exercise, we will explore more complex scenarios, including edge cases where similar situations arise. You will see how these cases are resolved and the reasoning involved. For now, in the interest of time, let's move on.

Slide 77

Slide 78

In the final step, as you can see, there are two unresolved links: AS6-to-AS5 and AS6-to-AS4. Since these relationships remain unidentified after all prior steps, the algorithm concludes that they are most likely peering relationships.

It's important to note that this is an inference—not a certainty. The reasoning is that if these links have not been classified as provider-to-customer or customer-to-provider up to this point, they are more likely to be peerings.

The ASRank algorithm is more complex than what we've covered here. For those interested, the full details are available in the recommended readings, so I encourage you to review the paper if you have time.

We will also address additional edge cases and some of the finer details during the exercise session. However, the steps we've discussed here represent the core of the algorithm.

Slide 79

Slide 80

The ASRank algorithm is not just theoretical—it is actively used in practice. The algorithm is run monthly by an organization called CAIDA, which ranks the power of ASes globally.

These rankings inform a variety of downstream tasks, including: establishing business relationships; managing peering decisions, such as determining who to peer with next for optimal reachability; identifying the most powerful ASes in the world.

Additionally, the rankings are used for security purposes. For example, when a route is received, its legitimacy can be assessed based on the characteristics of the AS it traversed.

In summary, many operational and strategic decisions in the Internet ecosystem today are built on the insights provided by this algorithm, making it a critical tool in practice.

Slide 81

Slide 82

Once the topology is annotated (as we covered in previous weeks), we can use it to compute paths through the network. For example, given the annotated topology and an originating AS (e.g., AS1), we can ask questions like: How would AS7 reach AS1 or AS3?

In the slide example: dashed lines represent peer-to-peer links identified during the walkthrough; straight edges represent provider-to-customer links.

If we compute the path for AS7 to reach AS3, the result is AS7 → AS4 → AS2 → AS3. This path is determined by the algorithm's principle of selecting the shortest policy-compliant path.

The reasoning behind this path selection is based on the algorithm's adherence to routing policies: it finds the shortest route that complies with the established relationships in the annotated topology.

This approach was likely explored in a previous exercise where you computed paths based on policy compliance. Essentially, the annotated topology derived from the partial view allows us to identify or compute paths across the Internet. The paths selected are always the shortest among policy-compliant options for a given origin and destination pair.

Slide 83

The main limitation of this algorithm is that it relies on partial data, which introduces two key issues:

Model abstraction: The algorithm uses a simplified model where each network is represented as a single node (as shown in the example). This abstraction assumes specific types of relationships between nodes, but it does not account for the actual complexity of the Internet, such as routers or more granular routing behaviors. Consequently, we cannot be certain how closely this model aligns with the real Internet.

Partial view: The view of the Internet topology we rely on is incomplete. There are undoubtedly links and relationships that we do not observe because we lack access to routing table dumps from every network worldwide. This limitation was discussed in the previous lecture.

The bigger challenge is that we don't know which of these issues—model abstraction or partial view—introduces more inaccuracies into the final output. This uncertainty complicates the validation and application of the model.

Slide 84

As an example, in the ASRank paper, the authors report that the precision of their algorithm is approximately 99%. This means that, based on their validation set, nearly every provider-to-customer or peer-to-peer link they annotated was found to be accurate.

Slide 85

However, some other studies have shown that when paths are computed using this annotated topology, the results often do not match the actual paths observed on the Internet. This discrepancy could arise for two possible reasons. First, model limitations: the abstraction used in the model may not be robust or detailed enough to accurately represent the paths in the real Internet. Second, insufficient data: the partial dataset used to construct the topology might not provide enough information to infer accurate paths.

These issues highlight the challenges of relying on partial data and simplified models to understand the Internet's complex topology.

Slide 86

In this topology, the green dashed lines represent peer-to-peer links. However, the ability to observe these links depends heavily on the location of your vantage points.

For example, if your vantage point is above a given link (e.g., at AS2), you wouldn't observe the link between AS12 and AS4, because routes traversing that link wouldn't pass through AS2. Similarly, the peer-to-peer link between AS10 and AS14, or the one between AS5 and AS8, would also remain invisible unless your vantage point is situated at AS10 and the route originates from AS14.

This highlights a significant limitation: there are many peer-to-peer links in the topology that go unobserved if vantage points aren’t placed strategically. This affects the algorithm's ability to infer such links accurately.

This issue also impacts the reported precision of the algorithm. Precision measures the success of inferences based only on the links that are visible in the data. However, recall, which evaluates success based on all the links that actually exist, would likely be much lower. Unfortunately, the true recall cannot be determined because we don't have complete knowledge of the Internet's actual topology.

Slide 87

Another issue with the single-node abstraction is illustrated here, and I believe this was discussed in previous weeks. If we didn't represent each AS as a single node, but instead included their individual routers, we would observe more nuanced routing behavior.

For example, AS1 would receive the originating prefix from AS6 through AS3, following a specific path. Meanwhile, AS5, when receiving a prefix from AS6, would traverse a different router within AS2.

This discrepancy occurs because specific routes within the same AS propagate differently depending on proximity to particular routers. In this example, AS2 propagates two different routes, depending on which router handles the traffic.

The single-node abstraction simplifies this complexity but fails to capture such detailed routing behavior, which may lead to inaccuracies in the inferred topology.

Slide 88

Slide 89

In our research, since the actual topology of the Internet is unknown, we synthesized Internet-like graphs—topologies that adhere to the three key properties discussed earlier in the lecture.

Using these synthesized graphs, we ran ASRank and similar algorithms to compute paths across the topologies. What we observed was significant:

First, even with full vantage point coverage (a vantage point in every AS), the model abstraction used by ASRank and similar approaches was not sufficient to compute paths with high precision.

Second, without ASRank, an upperbound oracle algorithm that correctly assigns relationships achieves only 57% precision when computing paths—a relatively low value.

Third, with ASRank, the precision is even lower, plateauing around 35% precision due to its inherent limitations, including its inability to assign relationships with 100% accuracy.

This result highlights that the model granularity used by ASRank is insufficient for precise path computation, even with complete visibility of the topology. This insight led us to devise a new model and a new approach to compute paths, overcoming the limitations of ASRank and similar algorithms.

Slide 90

In our research, we demonstrated that, in those synthesized topologies, our new model significantly outperforms the abstraction and path computation approach explained today.

Unfortunately, we don't have time to dive into the details of our model in this lecture.

Slide 91

We'll be presenting more details about this research in three weeks. If you're interested in this subject—whether it's improving the precision of algorithms like ASRank or developing new models and algorithms that better leverage Internet measurements—please feel free to reach out to us. This is an ongoing area of research, and we welcome collaboration and new ideas.

Slide 92