Routing Algebras

In this lecture, we will learn about routing algebras and how we can use them to reason about the fundamental properties underpinning distributed routing protocols.

Slide 1

We'll start by introducing the concepts behind routing algebras, starting with some reminders about graph theory, before diving into them in the second part of the lecture.

Slide 2

Slide 3

A graph is defined by two sets: a set of vertices, $V$, and of a set of arcs (i.e. directed edges), $A$. (In this lecture, we will consider the case of directed graphs). For the graph on the slide, we have $V = \{a, b, c, d\}$ and $A = \{ab, ac, cb, cd, bd\}$.

The notion of in/out-neighbors is particularly useful when reasoning about routing and forwarding in computer networks. The in-neighbors of a node corresponding to the predecessors of the node, while the out-neighbors correspond to the successors of the node, that is, the possible next-hops.

For instance, $b$ has two in-neighbours: $a$ and $c$. These two neighbours have a directed edge that "points" towards $b$. In contrast, $b$ has only one out-neighbour (i.e. one successor or possible next-hop), which is $d$.

Slide 4

An important graph notion in the context of networking is the one of an in-tree, which one can interpret as a forwarding tree towards a given destination (or root).

Formally, we denote the in-tree towards $t$ as $G_t^{in}$. An in-tree is a subgraph onto which a few conditions necessary hold: there is a root (the destination); all the nodes have exactly one successor, except for the root which can have more than one; and there must exist a path from each node to the root. If all the nodes are present in the tree, we say that the tree is spanning. In networking, we often care about spanning trees: we want all nodes to be able to send traffic to the destination.

The bottom part of the slide lists several examples of in-trees towards $d$, $G_d^{in}$. The first two in-tress (in the middle) are not spanning, while the two on the right are. As you can see, many in-trees (spanning or not) typically exist.

In the lecture, we are interested in understanding and in designing distributed routing protocols that can compute "optimal" in-trees, i.e. in-trees minimizing (or maximizing) some kind of metrics such as the distance between any two nodes.

Slide 5

To reason about optimality, we will assume that each arc is mapped to some attribute(s) given by a function $l$. For instance, $l$ can associate a weight in $\mathbb{R} \cup \infty$ to each arc representing its delay (i.e. its physical propagation time). In this context, associating $\infty$ to a link means that it cannot be used (its delay is infinite). Observe that any directed graph can be turned into a complete graph where any missing arc is associated to $\infty$ (this will be useful later).

A path is a sequence of vertices connected by arcs. There will typically be many paths between two vertices. We denote with $P^{st}$ the set of all the possible paths between $s$ and $t$.

Slide 6

As mentioned before, in networking we are interested in computing optimal in-trees where the most common example is computing shortest-paths trees.

We denote the shortest path between $s$ and $t$ with $D_{s,t}$. $D_{s,t}$ is the minimum path length, $l(p)$ amongst $P^{st}$ where $l(p)$ is corresponds to the sum of the arc weights alongside the path. On the graph on the right, we see that $D_{u,t} = 4$ which corresponds to the length of path $uwt$: $l(uwt) = l(uw) + l(wt) = 4$.

Clearly, the shortest-path problem can be solved efficiently by well-known algorithms such as Dijkstra's algorithm which sits at the core of distributed routing protocols such as OSPF or IS-IS.

Finding the shortest paths is only one useful example. In practice, operators might be interested in forwarding traffic alongside, say, the widest paths (the ones with the most bandwidth) instead, or the shortest-widest paths, etc. We'll see several examples in this lecture. Interestingly, as we shall see, not all these optimization problems can be solved (perhaps surprisingly since they look alike) optimally using distributed routing protocols.

Slide 7

It turns out that expressing the path (optimization) problem that BGP is solving is more tricky. Note only are the metrics that BGP relies upon more complex (local-preference, AS-path length, MED, etc.) but also the process to select routes (the so-called BGP decision process) is more complex that minimizing or maximizing a set of values (BGP is more in the business of finding lexicographical orders).

Another important difference is that, unlike link-state protocols such as OSPF, BGP is a distance-vector protocol (technically, it is a path-vector protocol) which relies upon distributed computation to compute its path, whereas in OSPF only the flooding of the network topology is distributed, while the computation is done centrally using Dijkstra's algorithm.

Slide 8

In the rest of the lecture, we'll ask ourselves what kind of paths can distance/path-vector routing protocols compute in a distributed way by iteratively exchanging their current known best path to each destination until convergence.

Slide 9

So eventually we want to understand what paths can be computed by distributed vector-based routing protocols. So you can think of that question as essentially: "What optimization problems can we solve using these distributed protocols?". You will see not all optimization problems can be solved with these protocols. We are of course interested in understanding for which kind of family of problems these protocols will converge. I already told you this; BGP in the general sense is not guaranteed to converge at the time. Deciding whether a BGP configuration will converge or not is as hard as figuring out whether any algorithm will terminate or not.

So we would like to understand what are the key ingredients that make a protocol possibly diverge forever. We will see that we can actually capture that. And when they do terminate, we would like to understand when it is that we are guaranteed to find an optimized solution. Again, BGP in the general sense, is not guaranteed to find an optimized solution. So not only does it not always terminate but when it does terminate it sometimes terminates on sub-optimized solutions. It is clearly an undesired outcome, yet this is the protocol that we are all dealing with today.

Also, another part that we will try to understand is when can we use destination-based forwarding. In destination-based forwarding, you have the destination address in the header of the package that you are forwarding. Then each node on the path is performing a lookup. That means that each node on the path is making a decision on where to send that packet next. If you have $n$ nodes on the path, you have $n$ decisions. All these n decisions need to be in sync, they somehow need to be synchronised for the packet to actually make it closer and closer to the destination, until it reaches a direct neighbor of the destination and finally the destination. That means that when you are relying on destination-based routing you implicitly have this constraint about the subsequent decisions that are being made. This prevents some kinds of paths from being used when you are using destination-based forwarding. And it is quite logical. If the downstream node that I am sending the traffic to is not computing a compatible decision with my decisions then we have an issue, we cannot use destination-based forwarding anymore.

Another alternative which is not used on the internet would be to use source routing. In source routing, the source is actually writing the entire path it wants to use in the header. So here there is only one entity deciding versus $n$ entities deciding. In this case, there is no constraint on the kind of path that you can use. It is much more flexible that way. But it has of course many other drawbacks that make it not being used on the internet today.

Slide 10

Reasoning formally about protocols is something that internet engineers are notoriously bad at. This is a picture of the first origin of the BGP routing protocol. The BGP routing protocol is known as the tree-napkin protocol because it was designed by two engineers one from IBM and one from Cisco and the spec was essentially written on tree napkins over lunch. You can see that there are some of the key ingredients already there, like an AS number, some kind of updates, attributes, the state machine, etc. So when you think that this is how this protocol came to be, it is not entirely surprising that we end up with protocols that can diverge or compute sub-optimal paths. People, when they designed these protocols, did not think about trying to formally capture the objectives. This is something that came much much later. The funny story is that when academics tried to learn what is BGP doing from a formal viewpoint they could not actually figure out what the problem that BGP is solving was. So people had to define what the problem that BGP is solving is in the first place in order to study it.

Slide 11

I would like to go back to the shortest path routing problem and would like to reason now about how can distributed protocols solve this problem. Of course, for shortest path routing it is clear that distributed vector protocols can solve this problem. Bellman-Ford, RIP, and ERGRP are examples of distributed protocols that are solving this problem. But I need to define a bit more formally what is it that a distributed vector-based protocol is doing.

We talk about stable routing as the problem that these protocols are solving. The easiest way to think about it, in my perspective, is to think that what these protocols are doing at the end is just solving these fixed point equations. They are solving these equations iteratively until they converge. What this equation is saying is that if you think about the node u towards the destination t, the cost of the path towards the destination t is going to be either 0 if u happens to be the destination or the minimum across the cost that my neighbours are advising to me, plus the cost to reach my neighbours. So we exchange these lengths ($E_t$) iteratively and in each round I compute the minimum over the lengths I have learned. By exchanging messages over and over, these distances will go down and down. At some point, nothing will move anymore and we have reached an equilibrium. At the end of this process, if all goes fine, we will end up with a spanning in-tree so all the nodes in the network will have exactly one neighbour, the one with the minimum cost. When we define stable routing we also are really thinking about being able to use destination-based programming as well. So not only do you reach this equilibrium, but you also would like to use destination-based forwarding.

Now let's come back to the question do stable routing based on weights and shortest paths agree on the same forwarding decision? Can stable routing solve the shortest path problem? The answer is absolutely yes.

Slide 12

Slide 13

On the left is the same graph, as before, the shortest path is computed. On the right, you can see the solution that will be computed by a vector-based protocol. So eventually, when the protocol converges it will converge on these forwarding labels for each of the nodes. And you can see that this is perfectly in sync with the shortest path, they are perfectly equivalent to one another. This is not surprising really because perhaps you know this already, but any subset of the shortest path is also a shortest path. So it's quite logical that if u sends to w because it believes that this is going to be the path that will be used. So there are no conflicts here between the decisions that an upstream node is making and a downstream node is making.

Slide 14

That's for shortest path routing, we look now at different flavours of routing problems. Let us look at the widest path routing problem which is also something that people sometimes do in networks they don't want to compute the shortest path they want to compute the path with the maximum capacity between two nodes. For instance, perhaps you don't care that it might be a path with a high delay, the only thing that you care about is the amount of data they can get through. (for instance, for backup purposes). First, we give different semantics to the label of the edges, we talk about capacities, not delays. We also change the way we do the operations on these attributes. So when we want to compute the weight of the path in the case of shortest path we are doing a sum of the weight of each link. In the case of widest path, the capacity of a path is the capacity of the bottleneck of the path. That is the minimum capacity along the path. Furthermore, when I am computing what is the best path between s and t, I am not minimizing anymore, I am now maximizing. We have different edge weights, we have different optimization functions, and we have a different way to compose these different edges along the path. And then we end up with a perfectly valid different routing problem. By the way, if you compute the widest path between each node and t, you will see that all of them are widest paths except wt. From w to each t we prefer to go by v.

Slide 15

So again now the question is: if I would like to use a distributed routing protocol to compute solutions, are distance or vector-based protocols capable of doing that? Yes or no? To answer this, I need to change the definition of stable routing to encompass not length but width and it is very similar to before. I am just using two different operations instead of plus and min: min and max respectively.

But you see structurally they are very similar to one another. So can you use a vector-based protocol to compute widest path?

Absolutely, this will work just fine.

Let's think about what a distributed protocol will do in this network. We are again routing towards t. So it will be the first one to advertise the message and it will advertise an infinite capacity towards itself, because when you are yourself, you can consider that you have reached this destination with infinite capacity. It will advertise a message towards v and w. v and w will receive this message and it is the first message they know, so for v this will be the best one and the capacity will be the min between infinity and 20, which is obviously 20. For w it is going to be the minimum between 10 and infinity which is 10. As you can see here in this very first state, we have not converged yet. But for w, the best way to reach t is not to go directly to t. But when v and w will exchange messages as well and w will be very happy because right now the best capacity that it knows is 10 and then it will learn that it can reach t with a capacity of 20. Eventually, you will have w that will direct traffic to v and u is going to direct traffic to w. You can also see that destination-based forwarding here does not come in the way so the decisions of u, w and v are compatible from the point of view of widest path routing.

Slide 16

Slide 17

So now we can put them next to each other and we can look at the problem that shortest-paths is solving and the problem that the widest-paths is solving. This is already going towards Algebras. There are three main operations that you have to keep in mind when you think about the routing protocols computing paths.

The first thing is how you combine the cost of the attributes alongside the path. So if you think about shortest paths, you are doing plus, if you think about widest paths, you are doing min. That's what we'll call the extension operation.

Then you have to think about how you compare two paths. In the case of shortest paths, you compare using smaller than or equal, in the case of width, you compare using greater than or equal.

Finally, how do you select one path? There you use min for the shortest path and max for the widest path.

At a high level, this is already defining an Algebraic structure on these protocols.

Slide 18

But now we can start playing with slightly more complex attributes. Instead of using just one value, I am going to use two values and this is an example we already went through in the first lecture of this course. Now I want to compute the shortest, widest path, so I am looking for the widest path and then if there are multiple of them, I would like to pick the shortest one out of them. That means now to each edge, I will attach two attributes: the capacity first and then the delay of the length. I am also going to have different extension, comparison and selection operations. For selection, I am going to minimize the delay between all the paths that have the maximum capacity.

Slide 19

If you compute the shortest-widest path in this graph, you should end up with this solution on the left, so this is the solution which you reach assuming each node knows the entire graph. From u's viewpoint, the best path will be $(10, 2)$ it's going to be wt. For w it is going to be $(20,3)$, over vt. For v it's going to be $(20,1)$. Here you can see that we have something iffy going on because now we have a violation you can already see here that u and w, they don't agree anymore. And that already means that destination-based forwarding here will not work, it's going to lead us to a sub-optimal solution.

Slide 20

Destination-based forwarding will not be compatible with widest-shortest path. The question though is why? We almost didn't change much, we just combined two protocols together and yet we ended up with somehow a violation! Somewhere, alongside these operations, something changed that made the protocol not being able to compute the best solution.

Slide 21

It's actually even worse than that. Let's imagine wv fails, the corresponding link disappears from the network. The routers will re-exchange messages and they will converge to a new state following that. On the right is the state that the routers will end up computing after the failure. What you can see, which is quite strange, is that you have a link that fails and the shortest and widest paths between u and t will also change although the link that failed was not on the original path between u and t (over v). You have a link which is not on that path that fails and then boom the path between u and t changes! That is weird, and that would not happen with shortest paths routing. When you have a link that fails and is not on the shortest path, nothing changes we even saw that as one strategy, if you remember, in fast convergence. This saved us a few cycles in computation.

What is even weirder, is that you have a link that fails and then from U's perspective, the path is improved from (10,3) to (10,2). This is again something which you will not see in shortest paths routing. When there is a link failure, the only thing that can happen is that the cost to the destination will increase. So there is some weird behaviour that is going on and the question is why? For that we need to introduce the notion of an algebra so we try to capture what the protocol is doing by summarizing it with a four-tuple.

Slide 22

A routing algebra is a four-tuple $\langle A,\preceq,\oplus,\cdot \rangle$ so these are four things you need to define in order to have an algebra.

$A$ is the attributes. Every link or path has an attribute that we can get with the function $f:P_{uv}\to A$.

The comparison relation with this symbol $\preceq$ preceding or equal. It essentially tells you: given two paths which one is the better one? For instance here a is better than b.

The extension operation is this plus operation symbol $\oplus$. The extension operation is how you go from the path and the link into a path composing both together. So here you have a path that has cost $b$ and you extend it with the link with cost $a$. Then you end up with a path that has length $a\oplus b$.

Then one special attribute $\cdot$, think of this as the least preferred attribute. This is the attribute that you essentially never would consider. Typically, this is the element infinity or zero or so.

Slide 23

A few properties now about the comparison relation. This presupposes a total order, so the relation should be a total order. In case you have forgotten about it, what this means is that you have these four properties for this relation: reflexivity,anti-symmetry, transitivity and convexity. In a nutshell, a total order means that you can always compare two paths to one another. Whatever two paths you will take, you should be able to compare them. It has to be a total order, not a partial order where some elements cannot be compared with one another.

A few words about the special attribute, this $\cdot$. It is the least preferred, any attribute will always be preferred to $\cdot$. And it is also absorbing so whenever you do something plus $\cdot$ you have a $\cdot$ as the output. If you think about the shortest pass, and $\cdot$ being infinity, and any path plus infinity will be infinity.

Slide 25

So now we can look at shortest paths. The attributes are positive real numbers and infinity, infinity being the least preferred attribute. Comparison is the usual less than or equal and then the extension is the usual addition.

Slide 26

Widest path is a different set of attributes. The attributes are the non-negative reals, comparison is greater than or equal. The direction now is that we prefer paths that have a higher capacity than paths that have a lower capacity. The extension is not plus anymore but it is min we do the minimum between w and x and then we get the capacity of the path. The least preferred attribute is zero. The capacity of zero essentially means you have no capacity, you cannot reach the destination, as if you had no bandwidth.

Slide 27

Shortest widest, slightly more complex but I think you get the intuition by now. We have a tuple as an attribute, composed of a width and a length. The relation is defined as given two paths with attributes $(w,1)$ and $(x,n)$, you would prefer $(w,n)$ if $w>x$ and then, if the widths are equal, then you would tie break using the length that is shortest. How do you do the extension? You take the minimum capacity of the first attribute and then you add the lengths. It is nearly like the parallel composition or product of the two previous algebras. The least preferred attribute is the composed least preferred attribute of both previous algebras: $(0,\infty)$.

Slide 28

Then you can flip shortest-widest around and you can do widest-shortest. It is not quite the same, here you will first prefer shortest path and then if you have more than one, you will then pick the widest so you tie-break using the capacity instead of tie-breaking using the length. A few things change, the extension operation doesn't change but the comparison operation will change.

Slide 29

As in normal algebra, you can reason about the associativity and commutativity of these operations. We can think of whether an algebra is associative or commutative. This is interesting perhaps from a theoretical viewpoint but actually also from a practical viewpoint. When you think about associativity for instance, you can think of it as whether you can compute the attribute of a path from the origin to the destination and/or from the destination to the origin. If the algebra is associative, it doesn't matter, you can compute it in both directions and you will get the same thing.

That can be interesting for instance, if you think about what Dijkstra does. It computes the path from the source towards the destination. However, when you think about a vector protocol they do the opposite, they start from the destination and then they go towards the source. So if you have an algebra which is associative that means technically you could think about perhaps using Dijkstra as an algorithm. But if you don't have associativity, then you have to be careful about which version you are using.

Commutativity that is whether $a+b$ is equal to $b+n$. Essentially, if your algebra is commutative, then paths will be symmetric.

All algebra has been so far we are both associative and commutative. But that will not always be the case.

Slide 30

Now let's start to look at BGP. We want to model BGP algebraically. You already know very well now how different ASEs are computing the decision using BGP. So we consider classical provider-customer and peer-to-peer relationships. A customer pays a provider in order to get activity towards the entire internet and then two peers, they peer with each other to exchange traffic between their respective customers without monetary compensation.

In AS, we transmit data packets from the in-neighbour to the out-neighbour if and only if it is paid to do so. And the aim is to maximize revenue, so you can define an algebra that captures this.

Slide 31

The attributes are going to be a little bit different than the real numbers. We now have a discrete set of attributes $\{C, R, P,\cdot\}$ where: C is the customer path R is the peer path, P is the provider path and $\cdot$ is the invalid path.

Note we have a special path which is the empty path which is kind of the equivalent of an AS or a node which is able to reach itself. But it is not very important so we will leave it aside for most of the rest of the lecture. For comparison, we follow the economical preferences so we prefer a customer over a peer, over a provider over a $\cdot$. The extension is a bit more complicated because this is where we have to integrate both the selection logic and the extension logic of BGP.

Here, the way to think about this is that when you have an existing path y then you will extend it with a link x. If you are an AS and you learn the path either from a customer, a peer or a provider. Then you will extend it with a new link, which could be a customer link, a peer link or a provider link, and then you are thinking about the resulting path actually from the point of view of the one you are advertising the path to. So you are thinking about how is the path going to be looking like once it reaches the other end of this link. For example, if I am an AS and I learn the path from a customer, I will advertise that path to everyone. This is why you see on the table that you don't have $\cdot$ when I am advertising this path to another provider. From the point of view of my provider, this is going to be a customer path. It stays a customer path if it goes to a provider. If it goes to a peer, it will become a peer path. And if it goes to another client of mine it will become a provider path. You see that a path is going to transition to a provider if I receive a route from a peer or a provider. I will not transmit to another peer or provider. So that is why you have $\cdot$. This relation really captures the policies that by now you should know well.

Slide 32

Interestingly, the algebra for BGP as defined previously is neither associative nor commutative. So it is really an algebra which has not much going on for itself. Perhaps it is easier to see commutativity first, here you can see two instances of a situation. You are at this node in the middle and you are receiving a path from a client so of course you will advertise this to another client of yours. This is going to be a provider path P+C=P. But if you do the other direction, which is C+P, which corresponds to the case when you receive a path from a provider. You will not advertise that path to another provider in the opposite direction C+P= $\cdot$ because you don't want to transit traffic between your providers. That shows that this algebra is not commutative. You really need to compute things from the destination towards the source which is a consequence of the algebra not being associative. Why it matters, is because of the practical consideration about the algorithm that you can use to compute paths.

Slide 33

Now we can think again about stable routing. What is the outcome of a protocol using this algebra? You should already know the notation, we have dash lines for peers straight lines for provider-customers with providers drawn above customers. 1 is the provider of 5, 5 is the provider of 8 and 9.

Slide 34

You have different paths that you can think of between 1 in the source and 9 in the destination. If you think about all the possible paths in this graph you have four of them: a,b,c,d. You can compute the attributes of each of them using the algebra. What you realize when you do that is that for paths c and d the attribute will be $\cdot$ because these are not valid paths. For path c, the entire computation is below. You can do that for all the paths, we will do that in the exercise session. The only two paths that have a non-$\cdot$ attribute are a and c. This makes sense, because a and c are the customer paths from one view, and can be used from a BGP standpoint.

Slide 35

In this particular algebra, we don't look at the AS path yet. That means here you really have two customer paths from one viewpoint. However, we know that in BGP, we would prefer the one on the left to the one on the right because of the AS path length. As of right now, it's not yet in the algebra, we don't care about it. But we can of course extend the algebra in order to take this into account.

Slide 36

You see how playing with the definitions of attributes, comparison and extension, we can model more and more complex routing protocols. This is now a more interesting closer to reality algebra that captures BGP. As before, we prefer customer over peer over provider, but we also tie break on the AS path length. So now we have a tuple as an attribute composed of the path type and a length. Then, in the extension, we just add the corresponding lengths. So it's really as we have done before with the shortest-widest for instance we were adding the lengths to the width.

Slide 37

Now we can compute paths again. If we now look at the shortest paths between 5 and 10, 5-9-6-1 is the shortest path with length 3. But it is invalid from the BGP viewpoint, so this path is forgotten in terms of BGP language. 5-1-3-4-6-10 is valid, it's a provider path of length 5. But then observe that there is another valid provider path which is 5-1-2-6-10. So this path is valid as well and it's shorter because now the length is 4. However, BGP will not compute the second path, it will compute the first path. BGP is therefore not computing an optimal solution because the traffic will flow alongside 5-1-3-4-6-10 while from 5's viewpoint, a better path will be 5-1-2-6-10. So it's the same kind of behaviour as we have seen happening before for shortest-widest.

Slide 38

To understand really what is happening, let me talk about two fundamental properties of algebras, that you probably have seen before in other contexts. The first one is monotonicity. Really what monotonicity means is just the fact that when you do a plus on an attribute and you get another attribute out of it, the other attribute that you get out is always less preferred than the original attribute. In routing terms, that means that the route does not get better as it propagates in the network and then comes back to you. All routing algebra we have seen until now, including the BGP one, were monotone. So routes were not getting better as they were propagated. Monotonicity is very important to guarantee that we converge to something at some point. Intuitively, if routes can get better as they traverse the network and come back to me, that is a recipe for routing loops. That means we can start going to a circle in which I am sending a route and in comes that route with better attributes and we send the route again and in comes the same route and we never converge.

Slide 39

So there is a theorem, that we will just state, we won't prove it. But if you are interested in that, there are papers of course. But if the routing algebra is monotone then there exists a stable routing for every destination in the network. It has unique attributes and you can compute it with a generalised version of Dijkstra's algorithm. What algorithm you use is not really that important. What is important is that it is guaranteed that the protocol will converge to a stable routing and there will be a solution to the distributed algorithm.

Slide 40

This is a classical textbook example of a non-monotone algebra. It is an algebra in which we are solving a shortest paths problem but we are allowing negative link lengths. When you are allowing negative link lengths you can end up with cycles that are of negative length. For instance, here we have -3 between a and b. That means going back and forth between a and b will yield a negative length cycle. If you are trying to compute the shortest paths from a to c using the distributed protocol as we have seen before it will diverge forever because the paths will start to go in circles, we always prefer to go once more through the a-b-a loop because it allows them to decrease the length of the paths once more. That is why you want monotonicity.

Slide 41

Shortest-widest was monotone. If you remember, shortest widest was not computing the best optimal path but it was converging onto something. The intuition of this is just to observe that if you take any link which has attribute $(w,l)$. And if you add a link to it $(w',n')$, the result is going to be at least worse or equal to the original. Why? Because either the $w$ and the $w'$ are equal, and then you add the lengths, it is worse because the length will be greater. Or $w'$ has a lower capacity, in which case when you do the comparison operation, the minimum is going to be $w'$ and the extended path will be worse as well. Whatever you do, whatever case you are in, the resulting attribute is going to be worse than the original. So we will compute a stable routing. We saw that we could not compute optimal paths using vector-based protocols though. These were not computing the optimal paths. So there is something more that we need in order to reason about optimality. Monotonicity gives you convergence guarantees, it does not give you optimality guarantees.

Slide 42

What does give you optimality guarantees is isotonicity. The idea of that property is that if you have two attributes b and c, where b is referred to c, then you want this relation that b is preferred to c to be preserved if you extend b and c with another attribute, here a. In other words, if b is preferred to c, $a\oplus b$ should be preferred to $a\oplus c$.

Slide 43

And it so happens that you can again have a theorem that shows that if the algebra is isotome then the in-tree paths, the result of the stable routing problem, will be the optimal path. So isotonicity gives you optimality. If you have both isotonicity and monotonicity then you are in the best world possible, as this guarantees convergence and optimality. So ideally, if you were to redesign BGP you would like to redesign it in a way in which we can guarantee isotonicity and monotonicity. Again it was not designed this way but should we redesign it, it should have these properties because that would solve a lot of issues.

Slide 44

Remember, shortest-widest paths algebra does not compute optimal paths. In fact, the problem is that it is actually not isotone. The way to see it is by an example, if you put first yourself in the shoes of w, you are preferring $(20,2)$ over $(10, 1)$. Why? Because the capacity of the former path is greater than the capacity of the latter path. Now let's look at what happens to these two attributes as we are propagating them to u. If I am propagating $(20,2)$ to u over the link $(10,1)$, that attribute will become $(10, 3)$. If I am propagating this attribute $(10, 1)$ to u, this attribute will become $(10, 2)$ $(10, 2)$ is better than $(10, 3)$. So you can see that the extension of these two attributes flips the ordering between the preference between these two attributes as well. This is a violation of isotonicity so we lose optimality.

Slide 45

Shortest commercial type algebra, which is BGP with AS path length, is also not isotome. It is the same thing happening previously. If you look at 1's viewpoint, 1 prefers to go via a customer over the one going via the peer. But then if you take these two paths, the customer path and the peer path, and you extend them by the attribute of the link 5-1, so $(P,1)$, you will see that from 5's point of view, it is better to go via the peer because the length is less. It is just another illustration of the fact that this algebra is not isotone.

Slide 46

Just a few words more about monotonicity. If the algebra is not monotone, then there are networks and destinations without stable routing or multiple stable routings. If you remember, in lecture 2 or 3, I was showing you gadgets, we were talking about route reflections there. I was showing you that in some configuration of route reflections, you can have multiple stable states and you can also have this same diversion forever. You can really model the route reflections algebra if you wish and then realize that this is because we lack monotonicity. Also, you have to understand that we always talk about customer, peer and provider but in reality, operations are not really the same, they use many more internal routing policies, and many more different types of neighbours than just these three (C,R,P). So the way that they prefer one path over another can quickly make the resulting algebra lose isotonicity or lose monotonicity. So that's why in theory BGP can diverge. Even though in practice, it often doesn't, because in practice it's often monotone.

Something that we won't see in this lecture but you can reason about for convergence is to think about properties of the cycles that you can have in your network. There is very nice research that shows that if all the cycles in your network have a property which is called free property, then it's guaranteed that you have a unique stable routing solution in your network. So essentially, kind of the equivalent of this monotonicity. We won't go into the details here.

Slide 47

If you are interested in going more into the details though, this is the seminal paper that introduced some of the things that we presented today. "An algebraic theory of dynamic network routing", is where you can find way more details, like the reasoning about the cycles I was describing before. It is the first paper that showed that when you have eBGP with only C, R and P, so without comparing lengths, then it is actually isotone and has this property on cycles that guarantee convergence. So this paper was one of the first ones to really cleanly analyse convergence properties of eBGP and prove what is required in order for eBGP to converge. So it is a very nice paper.

Slide 48

Slide 49

Remember Dragon? We saw that in the 4th or 5th lecture. Actually, Dragon relies also on isotonicity in its core to prove the optimality of the algorithm. In Dragon we are filtering routes and the condition that we are using to filter routes is that I am an AS I compare the attributes of the parent and the child and then if I realize that the attributes of the child are less preferred than the attributes of the parent, I filter out the child. That is in a nutshell what Dragon is all about. And what the paper shows is that if your routing policies or your routing algebra is isotone, then this Dragon algorithm will lead to the optimal savings, we will filter as much as we can. And it makes perfect sense because if the algebra is isotone, it means that if I prefer P over Q, you would also prefer P over Q if you had to make the decision for me. Because when I am extending this P and Q over a link, I will not change the order of preference. In essence, it means that local decisions lead to a global optimal. So it's a very nice property because it aligns local incentives with global results. So usually, I think isotonicity really helps when it comes to interdomain routing.

Slide 50

Finally, just to mention it and show that routing algebra is a very useful framework to analyse any kind of routing problem. This is a paper that has applied routing algebra to the problem of quantum networks. So I am concerned about figuring out routing in quantum networks so very different types of networks than BGP. Yet still you can abstract it to a routing problem at the end as you are just computing a path in a network. That means that the tools of routing algebra also apply there and you can use some known results to prove some properties about your network.

Slide 51