Romain Jacob
Welcome back, everyone, for today's last lecture of the AdvNet class.
So today, quick program. I'm going to make a very short recap of the main takeaway from last week's lecture, less than three minutes. Then we're going to finally talk a bit more about networking and sustainability aspect related to networking.
Well, see, there's a lot in the slides. I'm not entirely sure I will cover everything that is in the PDF that you found on the website. If not, the material I do not cover won't be part of the exam, but I still give you all the material if you want to have a look.
Before I start, let's do the recap first. So last week, what did we see?
We talked about three important points. The first is that when we talk about sustainability, you need to be clear on what type of reasoning or what type of discussion you're having. Otherwise, you will very likely misunderstand the person you're interacting with.
We talked about the difference between operational and embedded costs.
Attributional versus consequential reasoning. We will talk about this a little bit more today.
Top-down versus bottom-up type of analysis. So top-down, we take the total, and then we attribute that total to the sub-parts. Whereas bottom-up, we start from the ground, and we say, "Okay, so this piece consumes that much, this piece consumes that much, this piece consumes that much..." and we sum things together.
Then we talked about how emissions can be categorized or "scoped" between 1, 2, and 3. Scope 1 being direct emission, so the fuel that you burn yourself or the gas that you emit yourself. Scope 2 the gas that is emitted as to produce the electricity you buy. And Scope 3, essentially all the rest.
Then we talked about Netflix, and we saw that Netflix reported this number of 55 gCO2eq per hour of streaming. We talked about how this can be measured.
I highlighted the fact that - it was a surprise to me, I suppose it was a surprise for you as well - the actual action of streaming, the digital part of it, is responsible for a very small part of Netflix total carbon footprint.
I really tried to emphasize, and I hope this is clear, that this number is not fundamentally wrong, but it needs to be understood as an attributional number. It is a way of dividing up the total footprint of Netflix per hour of useful work Netflix does, which is you streaming the movies or the series you like.
Importantly, because it's an attributional number, it
All right. Then in the second part, we talked about trust. And I tried to emphasize two important takeaways.
The first is that you should be critical of everything you read, even everything I say. Be critical. Not everything is guaranteed to be correct. I know for a fact not 100% of things are correct, usually. So be critical.
In particular, I emphasized two things. The first is that it's great that some companies are making a serious effort to try to buy green energy to supply their own services with renewable energy. But you need to be aware that if the total amount of electricity being produced doesn't change, then in a grand scheme of things, that doesn't really help anyone.
It means: you get the green one, I get the brown one. At the end, the same amount of brown electricity is being used. So it's not really helping much.
And the second very important thing - we will talk about it a little bit more today, in particular during the exercise - be careful with ratios. Relative numbers are very useful to get a sense of how we are doing in terms of cost versus benefit.
But it should not hide the absolute trend. If the absolute trend increases, like in this case (right side), you have data for the power draw of different ethernet technologies, running at different speeds. The speed increases faster than the power, but the power still increases.
If the power increases, we consume more electricity. If we consume more electricity, we emit more carbon. So improving the efficiency is a good thing, but the goal is to reduce the absolute number. Try to always keep that in mind. When you see just the relative number, ask yourself, "Okay, but what's the total?"
And finally, I mentioned that efficiency improvements must be taken with care because, in certain cases, improving the efficiency of something or the efficiency of consuming a resource can lead to an increase of consumption of that resource. That's a phenomenon known as the "Jevons paradox" or "rebound effects." That's a term you should know and you must know after taking any lecture on sustainability.
Let's move on to part two.
I presented this slide last week. We said about the efficiency can be viewed as the product of three terms. I said very rapidly some things about the last two terms and today we'll really only focus on the first one.
So here we are talking about the efficiency of how much energy you use per useful task and we'll see different examples of how this can be done in network, in networking context.
So first question of the day. Let's take a step back and look at an energy profile for performing a task. You have two options. Either you draw a big amount of power for a short amount of time or you draw a small amount of power for a longer period of time. We assume here that the two blue areas are the same.
Which option do you think is more efficient? The high power one? The low power one? Or both things are the same? Come on. You know what to do. High up. Don't be shy. OK? Areas are the same. Keep you hand up.
What about now? What do you think is more energy efficient? Come on. Okay, so you all think that high power is more energy efficient. Keep that thought, we'll get back to this.
Indeed, on this slide, it seems that doing this is more efficient. Why? Because we only need to draw the idle power when we actually do the task.
This is fundamentally the basic idea of how to improve the energy efficiency of a device. I left my phone over there ot the table; my phone and your phones do this all the time. Your phone goes through the day mostly because the screen is off, I don't know, 99% of the time. Because as soon as you don't need it, it turns off. If it were not, then the idle power would kill you. You would drain your battery in a matter of hours, and that's game over.
So that's the basic idea. This is generally known as "sleeping" but you can think about this as turning components off. This is a very classic principle. We use this in laptops, in phones, in your digital watch, if you have one.
How does that translate to networks?
I showed this slide before last week when I introduced the problem of proportionality. The idea is that here, in this case, 100W is the idle power. And 200W would be the max power for a laptop at 100% CPU utilization, let's say. Those, by the way, are reasonable orders of magnitude for a laptop power draw today.
The issue in networking is that we don't have those curves. Vendors don't tell us how this looks like. Typically, the only thing we can do- and this is what we've been doing, I'll talk more about this- is to directly measure routers like those in the lab to get a sense of how this curves look like.
And it looks kind of like this. For that particular router, max power at 100% load is about 200W, about as well. And the idle power is 158W.
So if you think back to the example I had before, this split of things is kind of in the same ballpark.
The fact that the power profile is extremely flat is called "inelasticity."
You may wonder how big of a problem that is.
There is a positive side to it. The good news is that if you look at this data from Telefonica and Cogent - two of the largest networks, so-called Tier-1 ISPs, some of the largest - they report the same thing.
They report that over time, the total amount of traffic they forward is increasing. But the energy somewhat remained stable. This is (in part) because we are inelastic. In other words, sending more traffic doesn't cost much more energy.
But it is important to understand that this increase in traffic is not consistent over time.
(I apologize for the quality of the plot.) Let me try to explain what it shows. This is data from Telefonica alone. On the x-axis you have all the different hours of the day. And the points show the difference between the peak and the valley for each hour. More specifically, is shows the difference between the maximum amount of traffic that the network sees and the minimum every hour.
And so what it shows is that, of course, during nighttime, there is generally less traffic. So the difference gets smaller. But the important thing is that the curves get higher and higher. Year after year after year, the difference between min and max grows.
If we make an analogy with a network being a set of roads, what we are saying is that today, this is what we have. We have some times at night where the network is pretty much empty. And we have times where it's congested.
In the future, what we're heading towards is this situation. Same thing, but things get bigger. We have more roads. And at times, they are empty. And at times, they are full. Same scenario, but because the scale grows, then the magnitude between empty and full increases and gets bigger and bigger in absolute terms.
And this is the dark side of this inelasticity. Because when we have little traffic, we essentially consume the same amount as when the network is full. It is why some people estimated that the Internet core consumes more energy per bit than if you were using wireless (back-of-the-envelope sort of calculation).
That was 20 years ago. And the wireless 20 years ago was nowhere near as energy efficient as the one we have today.
Those two researchers were trying to kick people around a little bit to make them realize that how inefficient the internet core effectively is. Why is that? How can it be that it is inefficient? I already gave some of the clues, but it's not enough to really understand everything.
The first thing is something you might know. When you deploy a switch or a router in a network, you deploy and it stays on until it dies, pretty much. You may reboot it once in a while, but it's designed to stay always on. Whereas, hopefully, you turn off your laptop once in a while.
The second point is the one I already made. The network devices have an energy profile that is flat or inelastic.
So the consequence, again using that analogy with roads, a network is not really a set of roads, because a road does not consume energy if there is no car driving onto it.
Whereas, a network is more like a set of hyperloop tubes. You need to spend a lot of energy to create void inside the tube so that the train can go fast. But if there is a train or no train, you still need to spend the energy to make the void. So hyperloop tubes are a better analogy to have in the case of energy and in networking.
The third factor is that network devices are fundamentally underutilized. This is for a reason. It's because internet service providers typically overprovision their network on purpose so that they can tolerate the peak traffic (remember from the earlier slides: the peak gets bigger and bigger) and to be resilient to faults. You've seen in the earlier parts of the lecture how to do fast reroute and so on in case of link failures. In order to do fast reroute, you need to have a route to reroute to.
But the consequence when you compound those things is that we operate the network most of the time in the least energy efficient region.
So you may wonder to which extent this is true. In particular, the third point. So let me introduce you to the Switch network. Switch is the internet service provider of ETH. You can see here the network with the different nodes. You can guess what the cities are depending on the two letters. Anyway, here is what the network looks like.
At the scale of the Internet, it's not one of the smallest networks, but it is a reasonably small ISP.
If you look at the average link utilization in the Switch network, this is what it looks like. So we see daytime, nighttime, daytime, and here a weekend period. So we see the fluctuation of traffic following your activities as students. Makes sense.
Now what do you think is the average link load? The average link load is 2.1%.
The average link utilization in the Switch network is around 2% on average. Of course, you should not generalize this to any ISP, but I've talked about this problem with several ISPs of different size, different domains. What I can confidently say is that single digits are not rare.
Keep in mind that when we're talking about average utilization in a network, we are talking about very low numbers. And so effectively, that means that on average, we operate the network here in this region. So if P0 is very high, we're spending a lot of the total energy just to keep things on with very little added benefit.
[Question] Is this 2.1% number attributional?
Not really, no. These numbers come from the sum of load reported by each interface in the network. So it is a bottom-up approach. And it's not really attributional because attributional would be you more like you know everything that enters your network, and then you divide it up by the number of links. But that's not what we do. What we do here is that we look at each individual interface, and sum the packet counters and byte counters. You could see this as attributional, but I said "not really" because there is not really any consequential alternative here. We are talking about the total utilization, but not about the effect of "something" on the utilization, which could be analyzed either in an attributional or consequential sense. [/Question]
[Question] Can we conclude that, if we were to divide the number of links by 10, then the utilization would go up roughly by 10? So we would be at roughly 20%?
So the answer is a bit more complicated than this because the links do not all have the same capacity and because turning a link of may change the length of the paths taken by the traffic. But essentially, yes. We've actually done this, and we're going to talk about it today. [/Question]
So we are in this situation. This is a fact. This is a reality that you should know if you care about the energy in networks. We are having things that look like the red curve and what we would want for energy efficiency would be to get closer to proportional.
So if we go back to the example we had before, the idle power dominates, yes. And you've all said, well, then, high power is more efficient.
Actually, what we would want is to make the idle smaller.
But the idle, you don't pay only when you process the task. You pay all the time. This is the point I made before with the difference between a road and an hyperloop tube. You have to pay the price for idle all the time, not just when you run the task.
And actually, in practice, the idle power dominates.
So what we need to do is make things look somewhat like this. So there is some idle cost that you will have to pay no matter what if you're on. But you need to try to reduce the power as low as possible when your note doing anything. And then when you need to do something, you need to ramp up, do your task as fast as possible, and go back down. This is generally the most efficient strategy for energy efficiency. It's not the only one, but it's the one we're going to talk about today.
So if we go back to this picture, there are two ways you can improve your efficiency that you could think of. The first one is to go further to the right. If you go further to the right, then most of your idle cost will be used for useful work. So it would be going from a profile like this to a profile like that. So you will have better efficiency.
Or you could try to take this red curve and try to put it lower down. You can do this in two ways. The first one is the sleeping; this is what I showed just before. The second approach is called "rate adaptation." I will explain what this is in a second. Today we're going to focus on approaches to take low-utilization power down.
What can we turn off in a router? I said in your phone, the screen, the radios, all those things are being turned off as soon as you don't need them.
In a router, you could turn off ports, line cards - which are essentially a set of ports that are bundled together - or maybe the entire machine. If the load is very, very low, like in the switch network, maybe you can get rid of an entire device. Why not?
But there are also things inside your router that you may not need. Maybe you don't need all your memory because you don't need to contain BGP entries for all the Internet prefixes in every single machine. So maybe you could save power there.
Power supplies is another idea: We're going to talk about that more later.
Note that when I say "turning things off," it doesn't have to be an either/or, really turning things completely off. For example, if you have a port like the one we have on this device, they are capable of forwarding 100 Gbps of traffic, but you can also configure them to forward only 10 Gbps. And if you do this, you can save some power. That is what we call rate adaptation, which I was mentioning before.
So I've introduced to you what proportionality is. Now we're going to look at how we can implement proportionality in networks. We're going to talk a little bit more about link sleeping and rate adaptation, things that have been proposed and how efficient they are.
Then we'll see that if we really want to assess them, we need to talk about how we can model the power draw of network devices. How can we get power curves like I showed before?
Then, if time allows, I will go a bit deeper into power supply units and how interesting they can be for power savings.
So let's dive into link sleeping.
Today I'm going to talk maimly about turning ports off and a bit about rate adaptation.
It's an old story. I've explained before: Turning things off is a very old idea. It's sort of the default strategy for energy efficiency. IoT devices do this for decades.
15 years ago, some academics proposed that we could reduce the network energy consumption by either sleeping or rate adaptation. They made a theoretical study of "what would happen if?" And what they found is that for a certain parameter gamma-which I'll introduce in a second-if you have an average utilization in a network which is low-which, as we saw, is the case in the Switch network-then you could hope to save 50+% of the total energy consumed by the network. It's a lot! More than half of the energy could be saved...
This part of the lecture is essentially the story of us trying to make this happen.
So how does this work? The basic idea is, for the link sleeping at least, is to do what the paper calls "buffer and burst." The idea is that if we want to sleep, we need to bundle packets together. Rather than letting messages going in and out of the network as they want, you buffer them at the entrance of the network. Then you wait for a certain amount of time, you turn everything on, you send everything in bursts, and then you go back to sleep as soon as you're done.
It's not quite as easy as it sounds. There are some tricks here and there so that you don't add buffering delay at every step of the way, but that's the basic idea. They explain how you can do this in the paper.
First, we need a model to understand what we're talking about. The energy model is as follows. The total energy for a link is the time you spend active multiplied by the power when active, plus the time you spend idling by the power when idle. The first side illustrate those parameters with our previous visuals. That's the default.
Now, if we sleep, the idea is that when we sleep (the bottom right image), the gamma that we had before is the ratio between the power when we sleep (p_s) versus the power when we are idling as normal (p_i). That is the definitionof gamma. But it doesn't change anything when we are active.
Rate adaptation is different. The idea of rate adaptation is that you take both p_a and p_i down, but the time you spend active gets bigger. For example, in the case of CPU, you can reduce the clock frequency: it will take longer to complete the task, but you will spend less energy per cycles. This is the analogy for networking.
In this model, the switching between modes (be it rate adaptation or sleeping) takes some time. This time is the wake up delay delta from the previous slide.
So how does that play out? Well, it plays out somewhat as you would expect. Here, we see the average utilization on the x-axis. And we have different curves for how fast we can wake up, assuming we buffer traffic for a maximum of 10ms. Naturally, the faster we wake up, the more we can sleep. And if the wake-up time grows up to the same order of magnitude of how long we allow to buffer, then we can no longer sleep because we need to wake up as soon as we start buffering: the buffering traffic must be longer than the DA. That makes sense.
Here we have a different graph with the transition time, delta, and the different curves are different buffering times. And again, as you would expect, the more you allow to buffer, the more you can sleep. It makes just perfect sense.
On the other hand, obviously, the more you buffer, the more you're going to delay your traffic. So if you say "easy, I buffer for an hour and save a lot," it means that, if a packet is unlucky and just miss a burst, then it will have to wait an hour until the next burst.
So this is a graph I showed before. Remember, gamma is the delta between the power when we sleep versus the power when we're idling.
We asked ourselves, "Can we make that happen? So how much would we really save if we implement this today? And how fast could we actually wake up? What is the delta that we can effectively support?"
So let's see first how much we can save.
I showed the graph on the left before. This is just an abstract representation. On the right is how the power profile of a router effectively looks like. That's the same I showed before. We are a bit below 200 watts when we're at maximum. And idle, we have 150-something watts. The purple curve here is when we implement sleeping. So what are we saying? We're saying: I have one port on, I can forward 100 Gbps of traffic. So as long as I don't have more than 100 Gbps of traffic, I am good with one port. And as soon as I need more than 100 Gbps, I turn on the second port. Then a third port, etc. Can you see those steps in the purple curve? Those steps are every time I turn up one port, one port, one port, one port. This graph tells us that when we implement sleeping, we are better. We can take the idle power a bit lower down by sleeping. We're still far from being proportional, but we win a little something.
But wait a minute. Doing this implicitely assumes that we have one router with 32 interfaces facing another router with 32 interfaces, and they're all connected together. And that's all there is, because otherwise, you cannot just turn on and off any link. If you have one router that connects to four different machines, it doesn't matter what is the total load. You still need the four links if you want to connect to the four neighbors.
Besides, it also assumes that we would normally keep all ports up even if we don't use them. We don't. That's not quite realistic.
To make them more realistic, we started looking at the savings depending on how many parallel connections we have in between two routers.
If we have only one link between two routers, well, we cannot sleep, because if we sleep, we loose the connection between the two. We can use rate adaptation, however. In that particular router, we have three rates that are available, 10, 25, and 100 Gbps. We can see (left plot) that as soon as we have some traffic, we set the link at 10 Gbps: we get the first green segment. Then we need to switch to the 25 Gbps mode, and then we need to switch to 100 Gbps mode. That let is get a little bit closer to the power proportional line.
When we have two links (middle plot), we can start combining the approaches. We can sleep, we can do rate adaptation, and we can combine the two.
And as we go on, the more links we have in parallel, the closer and closer we can get to proportionality (right plot).
That's nice, but don't get fooled by the plots. Look at the axis. What we save is very small. The blue dashed line is actually the idle power when you turn the machine on. And when you consider only a couple of links, then the power increase is quite small.
So yes, the power profile gets more proportional. That's nice, but the absolute savings are fairly small.
So how much does it save in practice? The paper assumed a gamma of 0.3. That means: "when you're sleeping, you consume 30% of what you would consume normally when everything's on." Compare this with the numbers from the previous plot: We're very, very, very far from this.
So gamma values of 0.3 or smaller are clearly not realistic.
The second important factor is that we need to be able to wake up faster than the buffering delay, which is 10 ms. So we started plugging links on and off and see how long it takes to start.
You need to understand that turning things off is easy. You can say, "Oh, I don't need you right now. Just go to sleep. Turn off." It takes some time but that doesn't really matter how long it takes in this case.
What matters is how long it takes to turn back on. Because if you need the link
If you were having your phone in your hand, you press the button and it would take three seconds to turn on the screen, I would bet it wouldn't be long until you buy a new phone. So the wake up needs to be fast.
The point is to be on when you need it. And there are two ways you can do this.
Either you're reactive, like on your phone. You click, it turns up very quickly so that you don't see really the delay, you don't perceive it.
Or you need to be proactive. This is, for example, the way the electricity grid works. The electricity grid needs to maintain a balance between demand and production of electricity. This is done by predicting hour per hour or quarter hour per quarter hour, how much the demand is going to be, so that the production is adjusted to match.
So now, that's the theory. It assumes a wake-up delay of one ms.
We measured this in practice. The delay is in seconds.
So the absolute number looks good. The first time I saw the plot, I was like "It's not so bad!" and then "Wait, what?" The numbers are in seconds.
The different colors are different types of transceivers. One the left are copper cables (the big one that you saw on the slides last week). Then we have RJ45. And to the right are optical transceivers. The transceiver types don't really matter.
The point is, the order of magnitude is completely off: the wake-up delay is 1,000 times too slow than what we would need if we wanted to sleep at the traffic time scale.
So we can't sleep at the traffic time scale as predicted in this paper.
So bottom line, we cannot do reactive and it's really, really hard to do proactive in networking. That would be the topic of an entire lecture of its own, let's not open that box. Take my word for it, people tried at it's really hard.
But remember this graph. Those peaks and valleys: They are pretty strong, they are pretty clear, and they are pretty stable.
We are not the only one observing this. You can see this is almost every network. So we thought "Okay, maybe we cannot do it at traffic time scale, but we could do it at daily time scale. We could do it at night or over the weekend." At night, we don't need as many links as during peak time.
It may take a couple of seconds to turn on and off, but if you do it only two or three times a day, that sounds reasonable. So let's try it out and see what happens.
How would you do this?
Here's what you need to do. There are four different steps. You look at your network, and you look at the links that are highly loaded or lowly loaded. You can define arbitrary thresholds for this.
Among the links that are lowly loaded, you can select a subset links that you could spare; you do not need those to forward the traffic you have right now.
Then you would effectively turn them off by issuing a "port shut down" command.
And then you need a mechanism such that, if something happens (e.g., a new version of whatever is your favorite game just gets released, and everybody wants to download it on the ETH network)and the links get overloaded, you need to wake up quickly.
Nothing is hard in this process except some subtlety in step two. How do you select the link that you can spare? We propose an algorithm to do this that we call Hypnos.
This algorithm is extremely simple. Here's what Hypnos does. We select those links using four heuristics.
We take all the links, and we start by ranking them by utilization.
Then we start turning things off, and we look for each link, how much traffic does each link carry. We sum those amounts of traffic and put a cap to the total: this is becuase we don't want to have a lot of traffic being rerouted when we start turning links on and off.
Then the idea is that, for each router, you want to guarantee that you're not going to overload any of your interfaces. The idea is as follows: Picture that you have one router with four interfaces. If you say, "I want to turn north down," then the traffic that used to go north now will have to go out through the other interfaces. Thus, you need to make sure that the other interfaces have at least enough capacity left to absorb the traffic that used to go up through north. Not crazy, right? This is what checking for local bottlenecks means.
And of course, the fourth one is we don't want to disconnect our network.
What happens if we do this? It works really well, as we'll see in a second. Actually, it works really well because the average link utilization of all the links in the Switch network is very low.
This plot shows the CDF of the link utilization. You can see that up to more than 50% of the links have an average link utilization in the very low numbers that I showed before. That makes sense.
And here is what happens when we implement Hypnos. So this is simulation. What you can see on this graph is the number of links are being turned off as a function of time. The Y-axis values are percentages. So we turn off more than 36%. Those flat sections here are the night times. And then you can see that as soon as traffic picks up, Hypnos realizes "Hey, there is more traffic now. I need to back off a little bit and turn some links back on."
We can also see the fluctuation between weekdays and weekends, where the peaks is not that strong.
Finally, it becomes flat here because we hit the fourth heuristic: at this point, if we turn off one more link, then we start disconnecting the network.
So the point here is that we can do this. And it seems that we can do this without causing any congestion. At least, this is what we saw in simulation.
What other problems could happen? What would happen in a network if you start turning off links?
[Answers from the class]
If we have road reflectors and we disconnect a link that the road reflector was using, and the other link fails, then we end up with a situation where we may have a lot of rerouting going around in a network.
Yes. Correct.
More fundamentally, what other performance metric could be affected by turning links off?
Latency.
Indeed! I'm talking about congestion. What about latency? More generally, what about timing?
[/Answers]
Well, what about it?
Here's a graph that is a bit convoluted. So let me try to explain what happens.
This shows data from an experiment in hardware where each line shows one TCP flow. The horizontal length of the bars shows the start and stop time of each of those flows. The shaded area shows points in times where we activate Hypnos to turn some links off.
Finally, the color of the bars shows the difference in flow completion time between the cases when link sleeping is active or not. So we run the exact same scenario with sleeping activated and sleeping not activated and compared the two. Green is when the flow completion time doesn't change. Red is when it gets longer when sleeping is on. And blue-there are some blues up there, and some left and right-is when the flow actually gets faster when sleeping is enabled.
This experiment shows that you can see a flow completion time increase if enable sleeping, but we're able to observe this
Why is that? Because all the other flows have time to catch up after congestion is resolved.
You turn links back up, and after a couple of seconds, TCP does its thing and the full link capacity is available again. The congestion window increases, and the flow catches up.
What this slide doesn't show is that we actually had to work quite hard to craft a scenario when you see any red at all. Why? Because if you don't have enough traffic, then even if you sleep, you don't create congestion at all. And if you have too much traffic, then you don't turn anything off, because you observe already with those simple heuristic that you would create congestion if you do.
So, in practice, we observed only a tiny range of just about the right amount of traffic-but not too much-so that if then you put links to sleep, then you start making some mistakes, and you can create some congestion that can induce some delay in your network.
So congestion: not that big of a deal.
Flow completion time: not that big of a deal.
Latency (which is what you mentioned at first) can increase, because the path that your traffic will go through will change when you start turning links off. Note however that it's not at all guaranteed that it will necessarily increase, because you don't know whether the packets in the non-sleeping state actually use the fastest path. It remains to be seen whether latency is an actual problem or not.
Right. So I'm not going to say anything more about this. Here are all the references, if you're interested in looking at it. This was a short paper we presented earlier this year. There's all the code and all the yada yada that goes with it.
Now the key thing is how much does it save. We said we cannot sleep at the traffic timescale, but we can sleep maybe at the daily timescale. How would you quantify the savings?
Well, you could look at the power demand of the transceivers that you are turning off. You can look at the distribution of links that you are turning off in the network and you combine those things together.
Roughly speaking, we turn off 1/3 of the links, we make about 1/3 of the energy savings. That makes sense.
... except that keep in mind we're only talking about the transceiver power here!
So the question you need to ask yourself is "How much does this power cost? How big of a share do the transceivers represent wrt the total network power?
And the issue here is that we don't really know.
If you take a fairly small router, like the one I showed in the slides many times before: It draws about 200W max. This router is a fairly small router that has a small power footprint of its own.
If you plug in 32 long-range optical transcievers in it, which draw about 10W apiece, then the transceiver power is going to account for more than half of the total.
But if instead you consider a core router, a larger beast that draws 600, 700 or 1kW on its own, then, in relative, the transceivers will represent a smaller share.
So the issue is that we don't really know how big the gray box is. And actually, it's also not so clear how big the orange box is. But I'll come to that a little bit later.
When we put transceivers to sleep, what we are saying is "Okay, we turn off 1/3 of the transceivers, we get 1/3 of the savings of the orange box, however big the orange box actually is.
But we would also expect to save power on the router side. Because you turn off not only the transceivers, you also turn off the I/O lines that are in the box that would serve to forward the traffic in and out of that interface.
So. We saw two different ways you can implement some form of "turning things off."
But what I'm hoping you understand by now is that if we really want to know how much those techniques can save, we need to understand how big is the gray box and how big is the orange box.
Then, we will be able to run the numbers and make some estimates of how much those techniques (or others) could save.
This is what we're going to talk in the next part of the lecture. I'll explain how we can derive power models and how this enables us to say things about how much we could hope to save.
[Break]
So in the first hour, I introduced the concept of power proportionality. I've showed you that we have rather bad proportionality in wired networks. And I showed you some means, a.k.a. link sleeping or rate adaptation, that could help make power more proportional.
Now, the problem is the following: If we really want to know how much we can hope to save, we need a way of modeling-that is, a way of predicting-how much we can save. So, we need to talk about modeling power.
Before I get to that, let me explain why we need to be inventive about this.
The first reason is that the power data that we have (before we model it) is very, very limited. The first source of information we have are data sheets. If you take the data sheet of any router today, like this one or another one, you will typically have-when you are lucky-two power values that are typically called max and "typical" power.
When you are even luckier, the vendor grants you with some description of what that means. For that particular router, it would be at 25 degrees and at 55 degrees celcius. Okay... It is somewhat under-specified. We don't know which transceivers. We don't know how much traffic. We don't know pretty much anything. You get those two numbers, and you have to be happy with that.
Those numbers are intended for designing the power supply of your rack or of your server room. That means: You need to know how much power could be a drawn, at maximum, from all the machines in you rack to make sure that you don't try to draw more power than you can deliver.
So, we expect those numbers to be overestimates of the device's power demand, even when that device is under full load. If you remember the first half of the lecture, we know that most of them are never under full load.
So let's look at some numbers.
Here (right column) are numbers from the data sheets for some routers you're going to work on during the exercise session. Those are different router models that are used in the Switch network. (You don't need to care about what those router model names and what they mean.)
And these (middle column) are some measurements that we did for the actual power drawn by the routers. You can see that indeed the data sheets, as suspected, overestimate the actual power.
So that sounds fine until we consider two more router models for which the data sheet underestimates-by a lot-the power that we effectively measured.
So we're already in a situation where we say "Okay, data sheets are overestimates, so we cannot really make any predictions based on those." We wanted to validate our hypothesis, we tried, and this is what we find: Data sheet values are not even guaranteed to be overestimates.
Note that, although it is not really our concern here, this is a pretty serious problem. It means that if you were to dimension your rack saying, "For ten of those units, I need less than 3kW" then actually deploy those in a rack with 3kW of power supply, you will have a so-called "brown-out" failure: the hardware fails because it lacks power. Not great.
Anyway. So, data sheets: meh. It is not useful for our modeling need.
The next idea is to say "Okay, well, those routers have modern power supplies. Those power supplies can measure how much power they deliver. So we can collect that data and we will know what is effectively going on in the router."
There are two issues with this.
First: The way those measurements are done is not standard. So it's really hard to compare what's going on in different routers. It's also not always exposed to the user. But this is a software issue that could-should-be fixed.
The second issue is more fundamental: We don't really know whether those numbers are correct. So we know we have power supply units. We know they measure something and report a value in watts (Or sometimes in dB, because why make your life simple?). But can we trust this?
The only way you can answer that question (Can we trust those numbers?), is to effectively compare the internal measurement of the PSU with external power measurements like the ones I mentioned before.
So we've been trying to do this, and the comparison of internal versus external; that is, power supply unit measurement versus external power meter measurements.
To make the external measurements easier, we run a student project. Here's Jonathan, the bachelor student I worked with earlier this year who worked on this project. He decided to call the system AutoPower.
In short: This is taking a power meter, stack on top a Raspberry Pi that you connect to the network, so that you can just deploy your unit, plug it in, and then you start measuring and controlling the measurements remotely. It's a pretty cool project. You can have a look at the report or the repo on GitHub if you're interested in more details.
And so what does it shows? We'll come back to it in a bit.
But even assuming the measurements are trustworthy, they're not really what we need.
Why? Because the measurements tell you what you would draw
What I could do is: I increase the luminosity of my screen and then I could see in the measurements what is the resulting power effect. But I have no way of
If you think back to what we discussed last week, measurements are inherently attributional. You measure something (the total power) and then you try to see what could explain this total?
Whereas what we want is a way of answering "What happens if I do x?" So we need something that gives us a consequential power, or the ability of doing consequential reasoning. That is, a power model.
Modeling those things are kind of tricky for a number of reasons.
First is that the vendors (Cisco, Juniper, and so on) obviously don't want to tell you what's in the box, because it's kind of what they sell you.
Usually, we cannot open the box. (Well we can. We actually do it. Don't tell anyone.) But even if you do, you cannot easily measure the power draw of the individual components in it. It's not easy. You can do it, but there are some safety risks involved in that.
We know that the box internally measures its power, but if the OS doesn't expose those values to you, you can't really look at them, because the OS is closed source. And so, you basically have no idea what is effectively being measured.
So what it means is that we are kind of fundamentally limited to top-down approaches, where we measure the whole router-this is what we have access to-and then we need to find a way of attributing this total to the different parts. This is the top-down kind of reasoning.
The attribution works with a power-delta approach.
The idea is: You try to change something, and then you see what happens. This is what I explained before with my laptop example. So you measure some reference power. You change stuff. You loo if the power change. If yes, then you assume that this change in power is attributable to whatever you changed. And if not, then you try something else, until you tried everything you could think about. That's pretty much it.
So what would you try? What would you try to change in a router and see if that induces a power change? What do you think could affect the amount of power that is drawn by a router?
[Answers from the class]
Plugging in and out some cables. Yes. So plugging in and out a transceiver in an interface. Yes, this is one.
Deactivating LEDs or other user interfaces, like console port or so.
CPU fan speed. yes, very good.
Traffic. Finally!
[/Answers]
We've talked about this the entire first lecture. We say "There is no traffic." What happens if there is? What happens if there is more traffic? What happens if there is less traffic?
Remember, we talked about turning links off. When you turn links off, the traffic doesn't disappear. It just goes somewhere else. So you need to wait. What is the impact of having this traffic transmitting somewhere else versus what I saved by turning the transceiver off?
So all your previous answers are things that you can try to turn off and see what happens. We tried some of them; not all of them, and we'll see together why.
Here is kind of like a depiction of the different steps that we follow in our power model. On the top right there is a depiction of a (beautiful) router with two interfaces, two ports. And this is one link with two transceivers.
The baseline is there is "nothing". You plug the router on. It's turned on, but nothing is active. There is no transceiver plugged in.
Then there's something we call the idle stage: you plug in the transceivers, and that's it. You don't change anything. You don't activate any port. You don't do anything.
Turns out, we already see a power increase there, sometimes.
Now, you activate this line (in orange). You take the upper port up, but you keep the lower one down. That means that there is no active connection between the two ports through the link, because this is open. So this is what we call the "static" case but, whatever, the names don't really matter.
Then if you close the other line, then the link goes up. There is a power cost for the transceiver being able to effectively forward traffic.
And then, at last, we have the traffic.
So here, you need a second machine that would send traffic in to one port and the router forwards the traffic through the second port. And you measure the power impact of traffic by doing that.
Those are the different things we tried, and so the different power delta that we can measure. Idle versus base; static versus idle; and so on and so forth.
It's actually a lot of experiments that one has to do. Because it's not just those five things, obviously.
For the last four experiment types, you want to be using different transceivers, optical versus electrical, or different range of optical, and see the impact of those parameters. I've mentioned rate adaptation. What happens if you set the interfaces at 100 Gbps? What happens if you are 10 Gbps? If we consider the tests with traffic, we want to know what happens with high bandwidth, low bandwidth, big packets, small packets, what is the impact of all this? All of this actually matters for power.
And of course, you need to repeat all those experiments multiple times, because there's a lot of fluctuations in the power values. If you want to have reliable answers, you need replicability. So all this to say that it can be done. We've done it. We've automated this, but it still takes a while.
Ultimately, if you can combine all those tests, you can post-process those results-I won't get into those details, we can talk about it more if you are interested-and derive a power model that looks like this.
Note: We assume here that the transceiver power is constant (which matches well what we observe experimentally). So, we allocate a constant power value for the transceiver itself.
For the router, we have a sum of four terms that are essentially the base power (just what is drawn when it's empty) plus some static power for each port that you activate, plus the traffic costs. Here, there are two costs for traffic. There's an energy per bit and an energy per packet. Can you think about why we need both? We need both. Why?
The answer is, the router has two different tasks to perform for each packet. When the packet arrives, we need to look at the header to decide where this packet should go. That is a unit of work that needs to be done. That is a fixed cost per packet-at least we assume so. Then, once we've made the forwarding decision, the bits need to be copied over from the input buffer to the output buffer. And there's, again, a fixed cost per bit.
Note that those two costs are independent. For 100 Gbps of traffic, if I have big packets, then I will pay the energy per packet price less often-because the packets are big-I have fewer header processing tasks to do. If I have smaller packets, then I need to do more processing.
Schematically, here is how this works.
We have the power meter that we saw before, the device under test (that is, the router you are interested in modeling) and we have an orchestrator that, at the same time, configures the router, controls the starting and stopping of the power measurements, and eventually sends traffic to the router (if it's a test with traffic).
At the output of the modelling pipeline, you get a power model that looks like this (right side). Here on the slide is an actual power model we derived for one router model, where you can see for different port types,different transceivers running at different speeds. You don't have to look at the numbers themselves right now.
In practice, it kind of looks like this. We have a small workstation that will orchestrate all this. On this picture, we were running a test with traffic, so we connect interfaces from the router to the workstation, we loop the traffic inside, and we go back.
So we said we need a power model. I've told you, briefly, how we can derive one. Now we have it. So finally, let's answer the two questions we've asked earlier.
Can we trust the PSU measurement?
Are we able to predict the power draw with the model?
Here is the output of some measurements we've done. What do we have on this graph? In red are the internal power measurements; that is, the ones from the power supply units themselves. Blue is what we measure externally with our AutoPower units. And green is the prediction made by the power model. We look at the power on the y-axis over time.
So what can we see here?
Note: We assume that blue is correct. We can debate whether this assumption holds or not another time if you want, but here the assumption we make is that blue is the ground truth.
The first observation we can make is that, compared to blue, red and green are somewhat... They look somewhat right, but with some pseudo-constant offset.
The second observation is that the model in green seems to capture very well the daily fluctuation of traffic. We can see those spikes and valleys that are actually weekdays and weekends.
If we zoom in-here is the same data zoomed on one week where we manually corrected the offset-you can see that green and blue matches really well over time. That means the model effectively captures the traffic costs, and (if you look back at the previous slide) it also captures well those steps, which are interfaces going up and down.
At least some of them, because there are some steps that appear on green that don't appear either on blue or red. This is a bit mysterious, for now.
OK. Now let's look at another router. Same plot. Again, we can make some observation.
The first one is, again, that the model in green seems to match reasonably well the dynamic patterns.
Although, if we zoom in, it's not as well as before. The magnitude seems to be a bit off. OK, first observation.
Second obervation: We have, again, this offset between green and blue. But red is kind of like: eh? The red curve (remember: red are the internal power measurements) looks pseudo-constant, with some jumps that we're not really able to explain.
And then there is this.
It might look like a coincidence that we have this big drop in red just when blue starts. Turns out, it's not a coincidence. Note that those routers have two PSUs, which is why we can install a power meter without have to turn the device off.
Turns out, what happened is: we unplug one PSU, we plug in the power meter, then we unplug the other PSU, plug in the other power meter. And when we did this, for one of the two power supplies, after we replugged it, it started reporting 7W less than before.
Except that nothing changed on the router side. Nothing whatsoever. Like, all of a sudden, boom, -7W.
The takeaway here is that, out of two routers we look at, one has power supply measurements that seems reasonable, and another one has some that don't. So the only conclusion we can draw is that we cannot blindly trust those measurements because there's no guarantee that they actually give us anything useful...
The power model, however, appears able to capture what it was designed for. We are able to catch, at least, some interface going up and down, and we capture the traffic power quite well. But clearly, we are missing some power costs, right? Because there are some steps that we don't get, and we saw that in one of the examples, the magnitude is not quite right for the traffic.
And for the PSU measurements - to be honest, that was not entirely a surprise - some routers have good measurements, some routers don't. The problem is that, until we try it out, we cannot know which we can trust and which we can't.
So now we have this model and we validated somewhat that it doesn't output garbage: This green power profiles from before seemed reasonable.
So we can use this model to quantify how much we can save with Hypnos.
And the answer is: almost nothing.
So this is where you should be thinking: "Wait, how can we turn off 1/3 of the link and save less than 1%?"
If you followed what I said before, this result would suggest that the orange box is really small. Remember, we were wondering how big is the orange box compared to the gray? Now that suggests that the orange box is small.
Turns out, this is not true. The problem is not that the orange box is small.
The problem is this. It's not because you "turn off" that you actually "power off."
So think about what that means for a second. It means that we're sending the command "Please turn that link off" and the OS says, "Okay, link is down" but remains, in fact, powered on.
I'm laughing, but it's actually not funny.
In fact, we can see this in the power model.
If you look at the numbers in detail, P_TRX is the power cost for the transceiver, which is the sum of two costs- the cost when you plug in (P_TRX_IN) and the cost when the link goes up (P_TRX_UP). PCC stands for "passive copper cable." Those are short range electrical cables that don't draw much power. Here, we have roughly a 40/60 split. That is, you pay 40% of the power cost when you plug that transceiver in. This is what the model says.
Below we have an optical transceiver that draws 10x more power. You can see that the cost is pretty much 100% when you plug in. So that means: You take that transceiver, you plug it into a machine, not connected to anything, it draws 5W. You can have the fiber dangling in the air, you plug the transceiver in: boom, 5W.
Not great.
And this is exactly what happened here.
By looking at the event log of that router, we could see that there was one link that kept going up and down. The reason was there was a problem on the other side of the link, on the other router. So, at some point, at this point in time, precisely, the link was taken down manually (that is, issuing an "interface shutdown" command to the OS) in order to fix the issue on the other side.
We can actually see that the power goes down a bit on the red and blue curves, but not that much. But there is a big drop on the green curve. Why? Because the transceiver
The model sees the port as down, so it doesn't count any power for this, because it doesn't know that the transceiver is still plugged in. In other words, you assume that the transceiver disappeared, but effectively, it is still plugged in.
This is what explains the difference in magnitude between the drops on the different curves.
And so if we can expand those lessons learned a little bit: We are missing some power cost in the model.
One reason is that, so far, we've used packet counters as a signal to identigy whether there is something plugged into a port or not. As soon as the port is active, the interface will start counting the packets and the bits that go in and out of that interface. And we thought "Okay, as soon as we have data, the port is on." This is correct. The problem is that the reverse is not true. The absence of traffic data does not imply that there is nothing plugged into the interface.
"Absence of evidence is not evidence of absence," as the saying goes.
Turns out the presence of packet counters is not the right signal for identifying transceivers plugged into a router. If we do this, then we effectively miss the power cost of all the transceivers that are plugged in into inactive ports.
Another insight of things we have learned was: It seems we also miss other potentially important parameter. At that may not be what you are thinking about... We talked about fan power before. One of you mentioned this. You might have noticed, it's not in the model. There's something else that is not in the model:
The version of the operating system runnign on the router.
Now you should think: "Why should this matter?"
This data happens to be from the exact same router we looked at before, though that is kind of a coincidence. On that day of March 2024, the operating system was upgraded. Boom! Plus 45W. That's a 10% increase in power!
Any guess what happened? Why does the operating system update increased the power so much?
[Answers from the class]
The CPU was more active than it used to be before.
The OS has a new default where ports would be set at higher speeds.
Ta-da! Here we are. Fans...
[/Answers]
Effectively, what happenned - which you cannot see on this graph - is that, as soon as the update kicks in, we reload, and then the fan starts going like "wooooooom!" going crazy, while nothing changed. The need for cooling did not change, but now there is a different driver for controlling the fans, and ta-da!
(This update was rolled back.)
This is what happened.
The point here is it's really hard to know what we miss. Again, remember last week we talked about top-down versus bottom-up approaches. Here we're trying to do bottom up. We're trying to see which parameters we can count. But we have data that come from top-down (power measurement of the entire device) and we're trying to match the two things together.
It's pretty difficult. There will always be parameters that we miss. The question is, do we miss parameters which hurt our ability to predict what we need to predict? There is of course no way our model predicts the effet of this OS update, but it was not designed to do that, because there is no information in the model that says which version of the operating system you run. For the same reason, we did not include anything about fan speeds, because it's actually really, really hard to model.
Think about it: if you would want to model the fan power with the delta approach I mentioned before, what you would need to do is forcefully set the fans to a higher or lower speed and see what is the power delta of doing that. Except the router OS won't let you control the fans. So if you're really stubborn and you want to do it anyway (which is what I wanted to do), the only thing you can do is prevent air from coming into the fans. It makes for funny experiments, but don't do this at home...
More seriously, you can't really do this in a trustworthy and replicable way.
I gave one example where we
So vendors actually reported that, on some of their machines, there is a default in the OS that the lines that connect the ports to the chip are always on. Even if you take the interface down, they're always on. They're always drawing power, even if you have nothing plugged in.
So that means you log into this router, you can see ports shut down, and internally everything is powered on...
This is a bug. This is a software thing. There is no reason that the hardware has to be behaving this way. And, in this great blog post, they actually provide a simple work around. You can use this additional magical keyword "unused." If, for a particular port, you configure "shutdown" AND "unused" then you finally power off the line.
The takeaway here is that the problem is not necessarily entirely in the hardware. It isn't.
We need to fix the software. We need to have better firmware that would most likely help us save a lot of energy, because we would have better power proportionality by design.
As soon as we would disable the port, we would save power. Those are things we hope to investigate by looking at routers that support open operating systems.
[Question]
Can you power off interfaces in software?
Yes. We believe it's software-fixable, but I have no proof of this yet, because we haven't managed to fix it. There is no technical reason that we know of that would justify why those hardawre components have to be kept powered on. Transceivers do not have to be powered on as soon as you plug them in. There is nothing that justifies this, as far as we know for now.
So I'm going to talk about the PSU sleeping now, which relates to what we're going to do in the exercise.
So here, what I'm talking about is the idea that maybe we could turn off some power supplies in the network.
Power supplies have efficiency ratings. One standard is known as "80 Plus."
That efficiency quantifies how efficient you are at converting power from 220V AC to 12V DC, as a function of the load of the power supply. The load is how much power the power supply is delivering compared to the max that it can deliver.
If you look at those curves, they get very bad - or at least worse - as the power load goes closer to zero. That's a very well known fact in power conversion.
Maybe the electrical engineering students in the room understand why; I don't. But that does not really matter here. What is important to know is the shape of the curve.
Looking at this curve, you could have the same idea as we before: if we could push the power load to the right, we would operate at a higher efficiency point and potentially save energy.
If we look at data from the Switch network, it is not really surprising that the power loads (x-axis) are very low. Remember: we saw that the utilization of the links is low. If the link utilization is low, it's likely that the power you draw from your power supplies is also going to be far from its maximum.
One this graph, each dot is a snapshot of the efficiency of each PSU in the network.
What was more surprising was to see how inefficient some of the power supplies appear to be compared to how efficient some are.
The numbers are all over the place!
If you look at specific router models, you see different patterns.
Some routers appear to have power supplies that are generally efficient.
Some appear to have generally inefficient.
And some are all over the place. And we cannot really explain why things might be so different for different routers of the same model.
But hold off before you go and tell that to your friends at dinner tonight. It's very unclear how much we can trust this data, for two reasons.
First, it's only a single data point per power supply unit. We use this because it's the only data we could get.
Second, we already mentioned earlier that we don't know whether we can trust measurements from the power supplies.
So, be cautious with the conclusions you draw here.
But let us assume this data is correct. What it tells us is that if we were to take every single point here and raise them up along the y-axis to the different 80-Plus levels, this could save between 2% and 7%. This is what the sample efficiency data we have allow us to project.
If we were using only one power supply unit - what I mentioned before- meaning: let's say those two power supplies belong to the same router. You turn off one of them. One dots goes to zero, the other moves this way. And so the efficiency would go up. If you were doing that everywhere, we would expect to save around 4%.
You could combine both ideas. You turn one off, and this one power supply is more efficient, you would roughly add up the benefits.
That's already it for this part. So, again, be careful with the conclusion here because it's not yet clear whether we can trust this data. We need more research and more data collection to understand this better, which is something we're actually currently doing.
But assuming the data is somewhat correct, it means that having more efficient PSUs or using only one of them could be a very effective mean of energy savings in routers.
Compare the numbers I just showed to the <1% I mentioned before for link sleeping - using the same sort of model, the same reasoning - that appears a lot more promising. We are investigating PSU sleeping in the Switch network and also in other, larger networks.
Let's recap.
Today we talked about link sleeping and how it can be done. We cannot really do it at traffic timescale. We can do it at daily timescale.
Then I showed that if apply a decent power model to link sleeping, it seems we cannot save too much, actually. However, this is not really because sleeping is bad per se. It's in part because we tend to have really poor firmware for network devices today that, when we turn things "off", doesn't effectively "power off." And so turning things off doesn't buy you much. Or not as much as it should.
And finally, I showed that - even though we are not really talking about something networking-related here - optimizing power supplies in routers it's an energy-saving vector worth investigating.
So the last part of the slide pack - which I won't have time to cover - discusses the following.
Rather than proportionality, what happens if we try to reduce the footprint instead?
Remember, I said last week - I said again at the beginning in the intro today - proportionality is a
We've covered quite a lot of ground. I hope this was enlightening to a certain extent.
If you want to go further, I've put together different types of resources, some websites that are interesting in sustainability, not only in the ICT sector, but also elsewhere.
If you're into podcasting, here are some very interesting podcasts I would recommend.
A number of tools that relate to ICT sustainability: Electricity Map (which I showed last week), things about efficiency on the web, power measurement in Linux or software in general.
If you want to check out more research, there are a couple of key venues to look at. I put some pointers there.
And if you are yourself into sustainability and you know about something you like that is not there yet, please let me know! I'm always interested in new resources.
With that, that's all from my side.
I hope you've enjoyed the lecture in general. I hope you enjoyed the sustainability block in particular.
Looking forward to see you at the exam.