Introduction

Welcome you all to this lecture in which we're going to dive, deep dive sometimes, in many aspects of computer networks that you probably have not seen in the introduction lecture I'm assuming you all have had.

Slide 1

I see some faces here from previous lectures, but also many new faces. So in case you do not know me, just a few words about myself. I'm originally from Belgium, from Brussels to be precise, where I was born and raised. I've done my education there, I have a master and a PhD in computer science from the University of Louvain, even though I'm part now of the electrical engineering department. I've then done a two-year postdoc in Princeton University, in the US before starting here at ETH almost 10 years ago (time flies!). First, as an assistant professor and now as an associate professor. Right now, I'm leading a research group at ETH of around 13 people, both PhDs and postdocs.

Slide 2

Slide 3

Since June, so very, very recently, we just co-founded a startup, NetFabric. I will try to touch a few words on it as we go through the lecture, but in a nutshell, we are trying to build a framework to observe any network data and reason about it.

If you are interested in the research or the more industry side of things, we have projects in both my research group, where we offer semester, master and internships, but also now within the startup. Just ping me.

Slide 4

What will you learn in this lecture? Essentially you will learn how to improve networks. To be slightly more precise, we will look at performance, scalability, flexibility, reliability, security, sustainability, and if that was not enough, we will also look at how we can better use the network or improve network from the end hosts viewpoint (avoiding bottleneckes created by transport protocols).

What I would like to do today is kind of like go through the high level of this topic, just to give you a gist of the lecture so that you can, if you have not decided yet about whether you would like to take the lecture, you can hopefully make an educated decision.

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Let me speak a little bit about performance. In your intro lecture so far, you probably have learned a lot about shortest-path routing. Shortest-path routing is notoriously inefficient, and it's also pretty boring if you were to ask me.

Slide 11

Here is one example about its inefficiencies.

In this super simple network, we have seven routers, two traffic sources are connected to R1 and R2, and the destination connected to R5. All the links are 10-gig links except for the links connected to the source and the destination that have unlimited bandwidth (think of them has 100-gig links, for instance). We are running vanilla OSPF, so shortest-path routing.

So first question of the lecture, probably one of the easiest questions: what is the maximum throughput between the two sources and the destination that can be reached in this basic network, assuming both sources transmit all the time at their maximum allowed rate.

Slide 12

Slide 13

Slide 14

To answer this question, we need to compute the shortest paths in this network. As I hope you see, both flows collide and go over this path over there. So that means we have two flows which are going over a bottleneck capacity of 10-gig per second, in which case TCP congestion control will kick in, and allocating throughput to both flows according to a max-min fair allocation. So, the answer is 5 Gps.

This is kind of like wasting a lot of capacity in this network, because as you can see there is this entire here path that is essentially completely unused, and over which if we could put one of the two flows, then we would allow the other flow to reach 10-gig per second as well. So that would be a better allocation of flows to path, but shortest path is preventing us from reaching this allocation.

Slide 14

Slide 15

In this lecture we will see how we can compute better paths, and we will more specifically look at how we can do that in a distributed way, meaning paths that we can compute using distributed algorithms. We will not consider the case in which there is kind of like an oracle that can see the entire network and then directly compute the best path for all the flows. We look more at real networks that run distributed routing protocols, like OSPF and BGP, and kind of like the paths that they can compute.

We will also look at different techniques that destination-based forwarding, to actually kind of like forward traffic alongside the path. And then we will also look at how we can load balance traffic onto multiple paths. Because once you allow a network to use more than one path to reach a destination, let's say we could also use this path over there, then comes the question of how do you do to spread the traffic onto this particular path.

Slide 16

Slide 17

Let us consider this first problem: computing better path. So here I will consider a routing scheme that goes beyond shortest path routing just for the sake of the example. This scheme is known as shortest and widest path routing. As the name indicates, the idea is that we would like routers to forward traffic alongside paths that share two characteristics: they have the maximum capacity and the minimum latency. This is different from shortest path routing, whose goal is just to route traffic alongside the path that minimizes the sum of the link weights alongside this path.

Slide 18

Let us consider a simple network in which we will apply this shortest, widest path. What you can see is that if I want to optimize for more than one objective, I need metrics that reflect them. So, instead of just one value per link, which is the link cost, we will now have a couple, two values. One is the width (think of the capacity of the link). So for instance 20 could be 20 gig per second, 100 would be 100 gig per second. And then the second member of the couple is the length, think latency. One millisecond for this link for instance, four milliseconds for that link, etc.

Slide 19

When you think fundamentally about what the routing protocol is doing, there are two key operations you need to model. The first one is how do we add multiple attributes together. In shortest-path routing, if you have a link that costs 10 and then you have another link that costs 20, if you go through both you do the sum, 10 plus 30. That's the cost of the paths of going through two links, you just do the sum of both link points. Here we have a couple, we have a width and we have a length. So we need to define what does it mean to combine a couple together with another one. So we need essentially to define an addition operation onto this counter.

Slide 21

What does it mean to add (W,L) to (W',L')? As I was saying, the idea is that we are trying to find the paths with the biggest, largest capacity. And as I hope you remember from the intro lecture, if you think about the capacity of a path, it's given by the capacity of the bottleneck bandwidth of the link alongside this path. So if you have a link of 100 gig connected to a link of 10 gig, the path, which is composed of both, has a bottleneck capacity of 10 gig per second.

So whenever we do a plus in our world now, what we do is that we take the minimum of the two capacities. That will become the capacity of the combined metric. And then for the latency, it is exactly as the shortest path, we just do the sum.

Slide 22

Slide 23

To give you one example, if you have a link that has attribute (10,3), and another link that has attribute (10,5), the paths that I can get by going through both of these links back-to-back will have a cost of minimum between 10 and 100 is 10, 3 plus 5 is 8, so that (10,8). That's how you combine these metrics together.

Slide 24

That's step one. I need to be able to combine these metrics together, and that's how you do it. Step two is to model how do routers select a path. So whenever routers learn more than one path to reach a destination (which they usually do), they need to pick one. In shortest paths routing, what you do is that you pick the path that minimizes the sum of the costs. Here, we say that we prefer a metric of a path (W, L) over another metric (W', L') of another path, if the capacity of the first path W is higher than the capacity of the other one W', or the capacities are exactly the same, but the latency is smaller from the first path L than the second path L'. That defines, as I hope you can see, a lexical routing order amongst the path.

Slide 25

Let us now play a little bit here. I'm asking you what is the shortest, widest path from R2 to R4. So here, to give you a little hint, you are this router over there. (As you can see, the links are directed in this network.) Essentially, you have only two paths that you need to compare against one another. This one, and this one over there. So just take a second, and like applying this rule that I just mentioned, tell me which is the shortest, widest path from R2 to R3.

Slide 26

So this is the best path between R2 and R4. Again, you have two of them that you have to compare. You have the path that goes over R3. What is the metric of that path? (20, 5). You take the minimum between 20 and 20, that's 20. And 4 plus 1 is 5, that's (20,5). And you compare that against (10, 2). As I said, you prefer the path at a higher capacity. So (20,5) wins over (10,2).

Slide 27

Now, same questions between R1 and R4. So again here, I'm not talking about distributed routing. I'm just asking you in this graph, what is the shortest, widest path between 1 and 4. And here you have three paths that compare against one another. And again, there is one answer. It's (R1, R2, R4). That's this path over there.

Slide 28

So, in blue, we have the shortest, widest path from R1 to R4. And then in red, we have the shortest, widest path from R2 to R4. So, do you see something interesting here with this path? As you can see there is essentially a clash between what R1 thinks is the best path and what R2 thinks is the best path. And, here in this clash, R1 will lose because R1 has to go through R2 or R3 in order to get the destination. By sending the packets to R2, R1 will then be bound by the decision of R2. So effectively, what we observe is that, I'm already spoiling a little bit of fun there, that the distributed routing protocol might not send traffic alongside optimal path.

Slide 29

Slide 30

This is actually the next question: what is the path that will be computed by the distributed routing protocol? Here, I'm telling you to consider a standard distance vector routing protocol. Just a quick reminder of what the vectoring protocol is. It's a protocol in which each router send to their neighbors what is their currently known best attribute to reach the destination. And they do that ad nauseam until there is normal movement, in which case the network converges.

Here, R4 will start advertising itself at the destination. You can think that the attribute there will be infinity, it has an infinite bandwidth to itself, and then the latency is zero.

So R4 will advertise (infinity, 0) to R3 and R2. R3 and R2, they will select which path. R3 will pick this one over there, it doesn't have a choice. And R2, as we have said, it will receive the message from R4, it will also receive the message from R3, and then it will pick this one over there. Now both R3 and R2 will send their message to R1, and then R1 will compare between the two, and then it will select the path that goes to R2. So as you can see here, this is an example where this computation is not able to actually compute the best optimal path in this case.

Slide 31

Slide 32

So there is something going on here, there is something about this optimization and this objective that actually makes the routing protocol fail to compute the best path. This would not have been the case, of course, in shortest path routing. There, the routing protocol is able to compute absolutely all the time the best possible path.

Slide 33

In this lecture, I will teach you a framework, a formal framework, where we use an algebraic model to model routing computation. This is a very nice framework that allows to reason about absolute fundamental properties of protocols, like whether a protocol will converge, whether a protocol will compute optimal paths, and whether a protocol will support destination-based forwarding.

Slide 34

These properties will be direct consequences of properties of the algebra that govern the computation that we are looking at. And here, just to spoil a little bit this part of the lecture, but just to give you the gist of it, here there is a problem that the algebra we are looking at lacks a fundamental property which is called isotonicity. That just means that it's a property of operations. More specifically, it's a property that pertains to the extension, the add operation. We say that an add, so the extension, is isotone if the relative preference between any two attributes is preserved when both are extended by a third attribute. And this is violated, this property is violated here in this example. And when that property is violated, you can actually prove, that vectoring protocols will fail to compute optimal paths in general.

Let us go back to our example, and let us look at the two paths that R2 has towards R4. There is this path, so that goes from R2 via R3 to R4. And this has attributes (20, 5). And then there is the direct path between R2 and R4 that has attributes (10,2). So (20, 5) versus (10,2). The former wins. So if we look at the definition of isotonicity. We want this precedence to be preserved by extending both attribute by a third. Let us extend these attributes with (10,3). If I do (20, 5) plus (10, 3), I end up with 10 to 8. And if I do (10,2) plus (10, 3), I end up with (10, 5). As you can see the latter is better. You see, we've inverted now the preference between these two paths by extending them with a third attribute, which is a violation of isotonicity.

Of course this is just one example here. We'll see many others in the lecture, but that's just to give you the gist of how really, at a very high level, one can reason about what protocols can and cannot do provably.

Slide 35

Slide 36

Slide 37

Slide 38

All right, enough about computation. Another thing we will look at is the forwarding of the traffic itself. So in your intro lectures, again, you have seen a destination-based forwarding. Going back to our example, with destination-based forwarding, if the traffic from source one and source two are going to the same destination, once we reach R3, then traffic would be together forever, they effectively merge (and collide). This is because R3 has one entry in its forwarding table towards the destination, and all the flows that match this entry will go to the same next hop. This means this prevents us from using this lower path over there.

In the lecture, we will look at more advanced forwarding scheme, namely label switching, that allows routers to forward traffic not only according to the destination, but also according to other fields in the packet. And in this case, we look at labels (you can also think of them as tags or colors). We'll have the flexibility of adding tags, different tags to different packets, and then have the routers forward traffic differently according to the tag. And so here's something that you can do right away, is to attach, for instance, here one tag that's a dashed yellow, and over there another tag which is plain yellow. And then you can have R3 in its forwarding table forward the two differently. So that allows us to do now some kind of source-based forwarding. This allows both flows to reach 10 Gbps independently of what the other is doing.

Slide 39

Slide 40

Then, a third thing we look at here is how we can spread traffic over equivalent paths. Up to now, I was assuming this one path per flow. But if you think again, considering the case where there is only a single flow, that single flow would be bottlenecked at 10 gig per second, while there are actually two paths, each of which has 10 Gps. So if only we could break the flow onto two different paths, we could enable it to reach 20 Gbps. But default forwarding schemes don't allow you to do that.

Yet, there are load balancing schemes that exist and are deployed, especially in data centers, that allow to perform load balancing at a level which is finer grain than the flow level. So that allows you then to split traffic of the same flow over different paths. Now what you might think about is what about reordering. Whenever you start to spread traffic onto multiple paths, you have the risk that the packets that you send on the lower path and the packets that you send on the upper path, they might be reordered. Because at some point, this path is a little bit congested, so the traffic that you send there takes much longer than the traffic you send over the upper path. If you remember TCP, what it does when there is reordering, it mistakenly believes this as a loss which means that throughput will take a huge hit, and essentially you are kind of ruining the benefit of load balancing in the first place. So that's why normally you try to avoid sticking the flow over into the path, because you run into the risk of reordering.

This load-balancing technique (flowlet) prevents reordering. We'll see how it does that. But the trick is to divide the flow to minimize the risk of reordering.

Slide 41

Slide 42

Let us now speak a little bit about scalability. We will talk about two things. Scaling routing and scaling forwarding. So if you think about what scalable routing means. So here a good analogy is the road network and the signposts that you see in the streets. What you see here on the left is an example of a scalable routing system: it maintains detailed information about nearby destinations. If I'm here, say in Polyterrasse, I see this signpost. I have detailed information about how to reach Technopark, for instance, or the market. These are Zurich-based destinations. At the same time, I have very coarse-grained information about how to reach Bern or Basel, or let's say another country via the highway.

Slide 43

Slide 44

If you think about what BGP gives you in the context of the internet, it's essentially a little bit of a mess, where you have detailed signposts about every single destination in the world. So it's a little bit like you are here in Zurich, and there will be signposts for every single street in Brussels, for instance.

So this obviously is not a scalable routing system. And indeed that creates a lot of issues when you look at BGP scaling.

Slide 45

This is a plot that shows you the evolution, since 2004, of the size of the routing table, the BGP routing table. As you can see, as of yesterday when I took the latest number, we are reaching 1.1 million prefixes. Each prefix can have many paths associated to it. So we are ending up very often in a network with dozens of millions of paths, of routes, pertaining to 1 million prefixes.

Slide 46

As you may imagine, there is a cost to maintaining all this state. And so every now and then you see these articles that ask what happens when you're going to hit the next milestone. For instance, there was a very famous bug, well not exactly a bug but there was a limitation that some routers had a hard limit of 512k prefixes. So the one day a couple of years ago when we hit that limit, all these routers started crashing, essentially. So what happens when a BGP router crashes? All its neighbors consider that as a failure, and they start propagating withdraws and updates. That ain't great but then the routers that have crashed, they then rebooted. What happens when arouter reboots? It re-establishes the sessions, learns all prefixes until... it hits 512k, and then crashes again. And so that particular day was very bad on the internet because it was extremely "churny". We had massive amounts of prefixes going up and down, up and down, for a while, until essentially the operator put a hard limit, a filter, on the router, preventing it from reaching the 512k point, or just took them down altogether.

This happened a few years ago, but this is just like a moving target. These numbers keep growing, and there will always be some routers whose max will be reached. And so this is, of course, a question that operators have to always keep in the back of their head.

Slide 47

Slide 48

So in the lecture, we'll see how we can build scalable networks. We will actually start the lecture with that question. In the leture, we'll consider state-of-the-art network sizes: we will look at how we can build networks with up to 10k routers, carrying up to 10 million routes, for up to 1 million prefixes. We will look at two fundamental techniques. The first is hierarchical routing. As always when you want something to scale, you should introduce a hierarchy. The second is prefix aggregation and filtering.

Slide 49

Slide 50

Let us now speak about flexibility. If we think about, let's say, a networking device up to 2010, essentially, if you were a network operator, you were buying a completely "closed" box which would come with closed software and closed hardware. So the only thing that you can do with these things is configure them and then connect them to its neighboring devices. And then that's it. You cannot modify how the software works. And forget about, of course, modifying how the hardware works. So, around early 2010, people were tired of the lack of flexibility of networks and network devices. And then what happened is network programmability.

This completely changed the name of the game. With network programmability, you can have a network which is controlled by open source software, like you can download from GitHub, for instance, and use standardized interfaces to program forwarding state in reprogrammable hardware devices. So now everything is kind of open, and you can have a lot of flexibility if you want you to adapt your network behaviors. You're not bound by what routers vendors sell anymore.

Slide 51

Slide 52

Slide 53

In our lecture, we'll focus more on this part, the reprogrammable hardware. Here, we will look at a specific programming language, P4, which is a domain-specific programming language, which allows you to program and to specify how a device, like a network switch, should forward traffic through it. Reprogrammable hardware like this really exists. This is not just an academic pipe dream. For instance, this is a subset of our lab in which we have eight of these reprogrammable switches, each of which can forward 3.2 terabits per second in the backplane. Their behavior, their forwarding behavior, is defined by P4 code. That you can change, recompile, and reload into the device.

While we won't be able to give you all access to a hardware switch, you will still be able to play with the technology in virtual networks. You will actually write P4 code in the lecture.

Slide 54

Slide 55

Let us speak about reliability. Probably this is not a surprise for most of you, but managing network is very hard. Hard to the point that it's actually quite common that you can kill entire network infrastructures by just making some mistakes.

Slide 56

Slide 57

There are very famous examples here. This one happened in 2021, in Facebook. They essentially misconfigured the network and doing so took down the entire infrastructure. That means Facebook.com down, WhatsApp down, Instagram down, and a couple of others. It was actually so bad that they could not even enter their buildings anymore because the access control of the doors were connected to the network and the network was dead. So they had to find the physical keys to access the routers and reboot the entire thing. It was very, very bad. But still you could say perhaps the world productivity increased on that specific day. So it's not a huge problem.

Slide 58

Microsoft had similar issues where also part of their platform went down. As you may know, Microsoft runs emails for a lot of businesses. No emails that day. Good or bad for productivity?

Slide 59

Now a much more annoying one, and this one is very close to us. There was a big outage in Swisscom in 2020 where the emergency numbers went down. Swisscom has a mandate from the Swiss confederation to run the emergency numbers. This traffic is actually voice-over-IP-based and so it goes over an IP network. That maintenance work went (very) wrong. And, because of that, all the emergency numbers were unreachable for a couple of hours. Now we are really talking about life-threatening events. We could laugh about the first two, but this is really serious. You might have people dying just because of that. All this to say that you have a (big) responsibility when you are managing a network to actually really ensure that the network is always correct and always doing the right thing.

Slide 60

Guaranteeing network correctness is very hard to do though because of us humans. Because the complexity of the network is so big that it goes beyond the brain and even the brightest ETH students. What you can see here is a bit of a sour illustration of that. So this is the percentage of incorrect BGP announcements as a function of the day of the week. And you can clearly see that on Saturday and Sunday the internet is actually working better. Why? Because humans are less touching it. And so because of that, there are less problems in the internet.

Slide 61

Those of you who have lectured with me in their bachelor, they know that managing network is hard thanks to the mini internet project. For those of you who did not do the project, what we do is we give groups of students their own little AS, their own little network, then they have to configure it and then we connect with this network together into a mini internet. This allows us (the teaching team) to track internet-wide connectivity across all the groups. This is something that no one can do in the real internet. But we can. And we do so through a connectivity matrix. In this matrix, each cell indicates whether a group can reach another group. We start the project with some green cells because these are the networks that we, the TA team, manage. These networks are pre-configured.

At the beginning of the project, we ask each group to first configure its own network. That (normally) creates a diagonal matrix, with each group being able to reach itself. From there on, we connect groups together and they have to start using BGP to enable what the internet does, which is to provide end-to-end connectivity between every pair of IP (theoretically).

What you will see now is a movie of the evolution of the connectivity matrix over the course of the project. You will see that we start with a red matrix, which will then start to turn green more and more, and then we will see of course many red cells appearing, each of which corresponding to an outage.

At the end of the day, the students ended up with 98% connectivity, meaning the matrix is 98% green. Which was the record until this year.

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Alright, now I would like to go through an example of a subtle misconfiguration. And see whether you can pinpoint what is wrong with this network.

Slide 67

The example we'll consider as simple as it gets: a BGP network with only three routers which rely on shortest-path routing internally (OSPF) with these weights: 1, 1, and 10. On top of that, I'm assuming some external connectivity towards neighboring networks: one as a customer, one as peer and one as provider. I am assuming basic BGP policies, the ones you have seen in the intro lecture.

These policies, if you remember, tell you that, to your customers you should provide transit because they are paying you. So you want your customer to be able to reach any prefixes that is announced by your provider or your peer. So this direction of traffic is totally allowed. Likewise the opposite direction is also allowed: you want your provider and your peer to be able to reach your customer prefixes because you are receiving money for that.

What you never want is to provide transit between your provider and your peer, because essentially you are losing money. You shouldn't want to see traffic coming from a peer and then going to a provider. That is like red flags in an ISP network.

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

How do you configure such a thing? Well the default most simple way, again a textbook way to configure that, would be to use tags or communities, labels on the routes. So you would configure route maps/filters on your routers, on R1, R2, and R3. The way it works is that you attach to all the routes that you learned from a customer, the customer tag, to all the routes that you learned from a peer, the peer tag, and then to all the routes that you learned from a provider, the provider tag. That's it.

In the opposite direction, for the routes you announce to the exterior, you would only allow the routes that make sense economically. To your provider, you will only allow the routes with the customer tag. Same thing on R3. You only allow it to advertise out to a peer routes that come from the customer, else you deny. To your customer, you essentially don't filter because you want them to be able to reach everything.

Run this slide!

Slide 73

The configuration you see here is perfectly correct, meaning that the peer and the provider, so this guy over here and over there, they will only learn the customer prefixes. No problem whatsoever.

Run this slide!

Slide 74

But then, as an operator, you monitor your network, and when you start wiretapping this link, surprisingly to you, you observe this. You observe traffic coming from your peer and going to your provider.

How could that be? What is going on there? What if you were to assume that your peer is actually malicious? Well, the peer might be tricking you and setup a static route that essentially redirects the traffic for Google and Netflix towards R3 as the next stop, even though it doesn't learn a route for these two destinations from you.

In this case, your peer is essentially violating the contract. So your peer is not receiving a route for destination, and yet is installing a static route for the destination pointing at you, trying, and then seeing whether it works. In many cases, it does work.

Slide 75

Slide 76

Slide 77

So what you could do to prevent this situation from happening in the first place? One way is by making the configuration a bit more complex, by configuring more route filters on R3, preventing it from learning routes from the internal neighbors that are not customer routes.

This means configuring filters on the iBGP sessions, and not only on the eBGP sessions. On the internal sessions, I allow R3 to learn customer routes, but I do not allow R3 to learn provider routes or peer routes. When you configure that, R3 won't have in its routing and forwarding table an entry for Google, nor for Netflix. So when your peer is putting a static route towards you, all this traffic will be dropped. With that, you kind of claim victory.

Issue is that this is only working until it is not working. And I posit that it's probably going to stop working during the night, and in this case, your phone rings, and your customer, now, cannot reach any destination anymore. And so, of course, you are woken up, you are awake now in the middle of the night, because you have to fix the problem now. This is a major issue: all the customer traffic is dropped on the floor.

What could possibly have happened? It was working before, and now it's not working anymore. Maybe the link between R1 and R2 fails? Yeah, exactly. This is a problem that was not there before, because your customer traffic is transiting by this path over there, because it's the shortest path internally. And so, whenever your customer traffic is transiting by R3, it will be dropped, because R3 does not carry any provider routes. So the only way your customer traffic can reach a provider is by going over the target link over there. If that link fails, all your customer traffic is dropped on the floor, going to a provider.

Run this slide!

Slide 78

Run this slide!

Slide 79

Slide 80

Slide 81

Slide 82

Run this slide!

Slide 83

I posit that detecting this misconfiguration is very hard, because, your network works fine until there is a particular, specific link failure that triggers the bug. So the bug is "hidden", if you want. You don't see it until it's activated, and then all traffic is gone. And it could be extremely specific: it could be that multiple links have to fail in specific ways in order for the bug to happen. The example here is very basic in that sense.

Slide 84

Slide 85

So, how can we protect against that? Well, you should really not rely only on humans to say that a configuration is correct. That's essentially the lesson here. This problem relates to software bugs, which is only triggered when there is a particular code path which is taken.

And for software, there is program verification, of course. So here, the equivalent is network verification. And so in the lecture, we also see this. We will see how we can formally prove that the network is correct. So we will see how we can specify a property, for instance, that we want to guarantee connectivity or reachability, and how we can guarantee property under all possible link failures. The output of the verification will be either "fine", meaning the configuration is correct in all these environments. Or no, it doesn't, and here is a counter example.

Slide 86

Slide 87

Another facet of reliability pertains to the ability of the network to react upon failures. So when a link does fail and there is no misconfiguration, you still need the network to reconverge and send the traffic around the link after that.

Slide 88

It turns out that reconverging can take a lot of time. As an illustration of that, we measured the time the previous generation of Cisco routers that ETH was using. We had a little experiments where we injected more and more routes into the router, up to half a million. And then we were looking at the time it takes for the router to move all the traffic for up to half a million destinations from one link to another, following the failure. And these were the numbers we were observing. In the worst case, we were observing convergence time that were above two minutes. So that means that you are losing traffic, not for all the destination but for some of them, you can lose traffic for north of 2.5 minutes. That's again a major downtime, if you think of it, and definitely downtime that you will notice as a user on the network. So anything north of a few seconds, you would notice you would probably start to complain even.

Slide 89

Slide 90

Slide 91

Slide 92

Slide 93

Slide 94

So what takes so long? Well, as router adjacent to the failure, the first thing is that you have to detect the failure, then you have to report the failures (inform your neighbors about the failures), then you have to generate new routing messages, you have to recompute the routing path, you have to redo the shortest path computation, redo the BGP computation, you have to download all this new routing information, all this new forwarding state, you have to download that in the hardware, and then you have to activate these entries that you have downloaded in the hardware.

As you can see on the slide, there is one clear bottleneck: the pipe between the control plane and the data plane (between the software and the hardware). So if you have to reprogram in the hardware 1 million forwarding entries, that takes time: a few microseconds per entry you have to reprogram, multiply this by 1.1 million, and then you end up with hundreds of seconds.

If we want to speed things up, we need to take a holistic view and look at each component and make each of them faster, focusing on this one, communicating the next hops to the router's hardware.

Slide 95

Slide 96

In this lecture, we'll look at fast reroute technologies that are used in practice. By using these technologies, we'll make networks able to converge within a second for any single link failure or any node failure.

Slide 97

All right, let me be a bit quicker on the last two points, or three points. First, security. We'll look specifically at DDoS in the lecture. DDoS, which stands for Distributed Denial of Service attacks, are common threats of all networks out there. So we look at how we can essentially defend ourselves from them.

Slide 98

As you can see DDoS are still all the rage. In 2020, four years ago, the state-of-the-art attacks, they were reaching six million requests per second, and now in 2023, we are reaching 2 orders of magnitude more: 400 millions request per second. That's the kind of attack now that a network like Google could be seeing. So we are really reaching now terabytes per second of attack. Besides looking at attack vectors, we will also look at the defense mechanisms that exist, both the generic ones, but also the more advanced ones that use network programmability.

Slide 99

Slide 100

Towards the very end of the lecture, we talk about sustainability. Here, we will talk about the fact that network infrastructure, they consume a lot of energy, and it would be nice if we could actually reduce that. (The IT infrastructure in general, is a huge consumer of energy worldwide.)

Slide 101

When we zoom in on networks, in particular, they are particularly bad because they tend to be underutilized (they are often over-provisioned), meaning the average load is actually quite low. Plus, they are power angry: network devices consume a lot of power because they have not been optimized for that. They are quite inefficient when it comes to the amount of power they draw as a function of the utilization. As you can see it's kind of like a flat function: they very quickly draw almost the max power. That means that if you run a network at lower utilization, you are in a very bad regime. You are carrying very little traffic, but your devices are essentially consuming almost the maximum energy that they would ever consume. It's a very bad cocktail, in a sense.

In this part of the lecture, we will try to answer concrete questions. For instance, such as what is the carbon footprint of a user streaming one hour of Netflix? We will look at how we can measure such a thing. It's very hard, as you may imagine. It's sometimes a bit hand-wavy. We'll also look at how we can improve things as well.

Slide 102

Slide 103

Slide 104

Slide 105

And then the final piece, the end host. It's not only enough to have a great network, it's also about having great networking stacks in your phone, in your laptop, in your servers, so that they can actually leverage the great network that you're building.

Slide 106

And so here, once you get a very good network, TCP can start to be a limiting factor pretty quickly. TCP is an old protocol from the '70s. And even though there has been a gazillion of extension to it, at its core, it still suffers from fundamental problem. One of them is that it is still very slow to establish connection. Another one is that it can suffer from what is known as "head-of-line blocking". This means that if one of the packets is lost on the way to the receiver, then all subsequent packets must be held in the receiver's TCP buffer until the lost packet is retransmitted and arrives at the receiver. This can unnecessarily delay entire connections. Also, it's also very hard to change anything in TCP. So for instance, if you want to change the congestion control schme that you are using, unless you can mess up with the kernel, you are essentially lost. So for instance, on your smartphone, unless you are rooting it, you wouldn't be able to change the congestion control that you are using.

Enters QUIC. QUIC, as the name indicates, stands for QUIC UDP internet connections. It runs on top of UDP, not on top of TCP. And it's a protocol that's been proposed by Google a few years ago already, because Google was frustrated by the TCP shortcomings. QUIC has been designed for speed. Speed in terms of faster connection establishment. It also supports multiplexing to avoid head-of-line blocking. And it is extensible, because QUIC put everything into user space.

This means that, for instance, the congestion control code is not part of the application, and it can be updated whenever you update the application (e.g. Chrome).

There will be two lectures on this topic: one on QUIC itself as a transport protocol, and then we will also look at advanced congestion control.

Slide 107

Slide 108

Slide 109

Slide 110

All right, that's it for the overview. Clearly, we'll see a lot of things in this lecture!

Regarding course organization, the lecture is going to be organized in blocks. There will be seven blocks, some of which we spend a few weeks on. For instance, we will start with advanced routing, scalability specifically, and that will occupy us for three weeks, including this one. Then, they will be two weeks on programmability, two weeks on network measurements, and we figure out properties about network traffic. Two weeks on verification, actually more 2.5 on weeks on verification. Two weeks on transport, one week on security, and then two weeks on sustainability.

It's kind of like a tentative plan. We'll adapt as we go.

Slide 111

Slide 112

Thankfully for you and for me, I won't be alone in this ordeal. There will be an entire team of TAs coming from our group who will help you with each of the blocks.

We use, as I hope you have seen already, we are using a website as a course platform. So before the lecture, systematically the slides will be uploaded there. So the slides will be uploaded before this lecture. So check it out regularly. You can access all the resources there. So yeah, that's the main platform that we use.

Slide 113

Slide 114

Final exam, you probably care about that as well. It's a written exam, a closed book exam, with essentially one question per block. So you already kind of know what's going to happen. It will be seven questions, one per block. And of course, the length of the question will be what you gave me according to the length of the block as well. Exams from previous years will be available on the website, including from last year. I must say that this lecture is a changing lecture, so we are trying to make it as up to date as we can every year. Last year was roughly similar, but there are new things still this year. So you need to take a grain of salt when you are looking at a previous year's exam because the materials has been moving.

Slide 115

So, another question I'm expecting. I'm actually surprised we didn't receive during the break. Recordings. So yes, there are recordings. I'm recording now. You see the recording over there. So the lectures are recorded. But you won't have access to recordings immediately. We release them later in the semester.

Slide 116

Slide 117

So with this long intro, should you take this course? I think if you like computer networks in general, yes, absolutely. You will learn, I hope, a lot. That said, you shouldn't really take a lecture. If you create programming, it is a huge burden for you to touch any piece of code, because there will be some pieces of code in the labs. You don't need to be a programming guru. You don't need to learn specific language. But you need to not be adverse to programming. If you can't work at all during the semester, and that goes back to what I was saying before, I wouldn't recommend you take the lecture. You should be able to work throughout the semester as well. Finally, as I said, the lecture is still new and is moving as well. So there is a little bit of exam history now, but clearly there is not a 10-year exam history. So this is also something that perhaps you should factor into the decision.

Slide 118

Slide 119

Slide 120

Slide 121

Slide 122