Data-plane programmability

Prof. Laurent Vanbever, Georgette Weingärtner (Scribe 1), Fabrice Longchamp (Scribe 2), Max Striebel (Scribe 3)

We'll now speak about data-plane programmability through the lens of the P4 programming language and starting with a practical example.

Slide 1
Slide 1
Slide 2
Slide 2

The example will be about how we can implement a router in P4. We'll consider this topology composed of three routers in which two of them act as gateways to two distinct Local Area Networks (LANs), and where the third one routes traffic in-between them. Hosts on the left LAN use IP addresses in 1.2.3.0/24, while hosts on the right LAN use IPs in 5.6.7.0/24.

A P4 program defines the forwarding behavior of a router, not its routing behavior. The latter is left to userspace processes running protocols such as OSPF or BGP. These processes are responsible for computing for each router where to forward the traffic for each prefix (that is, the content of the forwarding tables). In contrast, a P4 program specifies what happens to packets entering the router on a given port before they leave the router on another port.

Slide 3
Slide 3
Slide 4
Slide 4

When forwarding an IP packet, routers essentially perform four simple actions. They first perform a lookup in a forwarding table to figure out what is the next hop (i.e., the output port) to forward the packet next (if any). They then update the source and the destination MAC addresses. Remember, in layer 2, the MAC addresses need to be updated on each link. The routers tend decrement the TTL by 1 before finally forward the packet on the chosen output port.

To build a router out of a P4 switch, we need to implement each of these four steps in P4.

Slide 5
Slide 5

A P4 program is composed of three parts: a parser, a match-action pipeline, and a deparser.

Slide 6
Slide 6
Slide 7
Slide 7

The parser part, as the name indicates, specifies the parsing logic of the program, that is, the specific headers that the P4 switch will try to extract from any received packet. (In general, P4 programs only instruct the switch to extract the headers that are needed to perform the corresponding forwarding logic.) What headers do we need to extract to build a router? Well, for one we need to extract the Ethernet header because the router needs to update the MAC addresses. A router also needs to extract the IP header because it needs to perform an IP lookup on the destination address as well as updating the TTL.

If you wanted to implement, say, a firewall, you would typically also extract the transport headers (TCP/UDP) so that you can also match on the port numbers to drop accordingly.

The fields extracted by the parser will then go through what is known as the match-action pipeline.

Slide 8
Slide 8

The match-action pipeline part actually executes the forwarding logic on the extracted headers using a sequence of match-action tables. These match action tables rely on (some of) the extracted headers as keys and return the specific actions to perform such as adding headers, modifying their values, or removing them.

Slide 9
Slide 9

Finally, the deparser part specifics what the packet should look like when sent on the output wire. (The deparser is the exact opposite of the parser.)

Slide 10
Slide 10

In a P4 program, all these parts are packed together into the equivalent of a main() method. You can see there are a few other things I'm skipping for now, we'll come back to them later.

Slide 11
Slide 11

P4 is a rather simple language which is very close to C syntax-wise. It's also very restrictive. Why so? Well, the idea is to be able to run P4 programs at line rate, possibly at terabits per second, meaning a P4 switch might have only a few nanoseconds to process a packet before the next packet hits it. Understandably, this seriously limits the amount of processing that can be done.

Slide 12
Slide 12

Let us now start a deeper dive into the various pieces of a P4 program, starting with the parser.

When you buy a P4 and take it out of the box it cannot do a single thing. It cannot forward any packet. Actually, it does not even know what "a packet is". It is the p4 program itsel which should specifies what the headers look like, for instance, that the Ethernet header is composed of a 48-bits destination address, followed by a 48-bits source addressed, followed by a 16-bits type field.

Defining all these headers might look like a lot of work but it ain't really as this part tends to be highly repetitive across programs. At the same time, what it allows you to do is to define your own networking protocols and have hardware that can forward it at line rate. You could create IPv5, IPv8, IPv10, today. This flexibility is used in practice. (For instance, people have been creating new protocols geared towards specific applications such as high-frequency trading).

Slide 13
Slide 13
Slide 14
Slide 14

The parser itself is implemented as a state machine where each state extracts a sequence of bits, typically the bits pertaining to a specific layer of the protocol stack (Ethernet, IP, TCP, UDP, etc.). The transition between states is typically defined based on the content of the parsed headers. For instance, depending on the type field in the Ethernet header, the next state will be either IPv4 or IPv6. Likewise, depending on the value of the protocol field in the IP header, the next state will be TCP or UDP.

Slide 15
Slide 15
Slide 16
Slide 16

The match action pipeline executes the forwarding logic using the parsed headers and the metadata. The metadata are extra information that are added to the parsed headers by the switch. Think, for instance, of the arrival timestamp or the ingress port. These information are not written in the packet headers but are critical nonetheless in many forwarding logics. Both parsed headers and the metadata will flow through the pipeline, which is composed as a sequence of match-action tables. Each table in this pipeline can perform operations on the extracted headers and/or on the metadata.

Slide 17
Slide 17
Slide 18
Slide 18
Slide 19
Slide 19
Slide 20
Slide 20

Match-action pipelines (typically) rely on tables to implement their logic. A P4 program specifies the structure of these tables, which are then populated by the control plane. (This was the case for the router example we saw together.)

Concretely this means that, if you are only looking at a P4 program, you only get "half" of the picture, with the other half being populated by the control plane at runtime.

Slide 21
Slide 21

Specifying a table involves defining a key, that is the set of fields that will be used in the lookup, together with the action(s) that will be applied upon a match.

Slide 22
Slide 22

Alongside with that, specifying a table also involves defining a size of the table (how many entries it can hold), the matching strategy (such as longest-prefix match, exact match—more on that coming soon), alongside with a default action (if you want one). As the name indicates, the default action will be applied in case the table does not match.

Slide 23
Slide 23
Slide 24
Slide 24

Here we look again at the specification of a table that is used for forwarding IP packets. Logically, the key for this table is gonna be the destination IP address, and the matching strategy is gonna be longest prefix match.

Slide 25
Slide 25
Slide 26
Slide 26

The action is either to forward the packet, in which case the ipv4_forward() action is called, or to drop the packet with the drop() action.

Slide 27
Slide 27
Slide 28
Slide 28

Here is the content of the ipv4_forward() action. This function first adapts the MAC addresses, decrements the TTL, before setting the egress port.

Observe that the egress port is set by defining the value of a standard metadata field commonly known as egress_spec. As we'll see soon, this metadata field will always be present.

Slide 29
Slide 29
Slide 30
Slide 30

Compiling a P4 program will, in addition to produce a binary that can be loaded into a switch, produce a runtime agent that the control plane can communicate with using a REST API to modify the content of the forwarding tables that are defined in the program.

This slide illustrates this communication. Here the control plane populates two forwarding rules in the table, defining the prefixes (keys) that should be matched: 1.2.3.4/24 and 5.6.7.0/24, respectively, alongside with the corresponding actions parameters. In this case, the action parameters correspond to a MAC address alongside with an output port (these corresponds to the parameters of the ipv4_forward function in Slide 29).

Slide 31
Slide 31

With respect to the parser, the deparser is not that much interesting. The deparser instructs the switch which headers should be put on the outgoing packet. To do so, you use the emit() function.

Slide 32
Slide 32

For instance, in this example, we tell the switch to put back the ethernet, IP, and TCP header onto the ongoing packet. If you had omitted, say the IP, and the TCP header, the switch would produce a packet that only contains the Ethernet header followed by the payload of the packet. In most cases, you'll emit the same headers than the ones you have instructed.

The payload of the packet is automatically appended after whatever headers have been emitted. In particular, when the packet enters, whatever bits are not parsed are consider as payload. Unlike the parsed headers, the bits of the payload are not be processed by the match-action pipeline.

Note that, in practice, there tends to be hard limit on the amount of bits that can be parsed (and therefore go through the pipeline). The reason for that is simple: hardware switches only have a few nanoseconds to forward one packet, and parsing takes time. (It's actually one of the few operations which is inherently sequential meaning that it can't be done in parallel.) In practice, most parsing in P4 will be limited to a few hundred bytes max.

Slide 33
Slide 33
Slide 34
Slide 34
Slide 35
Slide 35
Slide 36
Slide 36
Slide 37
Slide 37

We are using (and have been using) P4 extensively in our group. In the following, I'd like to showcase a few recent example works.

Slide 38
Slide 38

The first work is a paper from 2022 that describes a way to protect against DDoS attack. Most of the DDoS attacks nowadays tend to be very "dumb" (e.g. TCP SYN flood attacks), and area relatively easy to filter out. Alongside these simple attacks though are sophisticated ones. Amongst the recent sophisticated attacks, we find "pulse-wave" attacks. In pulse-wave attacks, the attackers morph the shape of the attacks every few seconds in order to confuse existing. defense mechanisms.

In order to defend against pulse-waves attacks, you need a system which is as fast as them. In this paper, what we do is building such a system in P4 so as to run it directly inside a P4 switch. The system relies on an online classifier which adapts the forwarding decision in real-time depending to how "malicious" the traffic hitting the switch is.

Slide 39
Slide 39

This paper shows how we can use P4 to detect gray failures. Unlike hard failures (a link or a node going down), gray failures lead to only a subset of the traffic being dropped. For instance, think of a link that drops randomly 1% of the traffic. Detecting gray failures is particularly hard as, most of the time, the beacons (whether they are generated by BFD or not) will happily fly on that link: BGP will see no problem, OSPF will see no problem, etc.

The paper describes a way to detect gray failures by having directly-connected P4 switches exchanging counters about how many packets they see hitting each of their forwarding entries, and comparing them in real time. Any discrepancies is then reported as a gray failures. (As you may imagine, there are many problems to solve to make this practical including how we deal with synchronization, how we scale to many counters, etc. If you are interested, you should check the paper out.

Slide 40
Slide 40

This paper is about packet scheduling. (This is something we won't see in the lecture.) Packet scheduling is all about deciding in which order to send out packets in case of congestion (that is, when not all packets can make it through). By default, most network devices will simply follow the First-In-First-Out (FIFO) order, which gives you no guarantee whatsoever on which packets are gonne be dropped. Of course one can be smarter and optimize the ordering, e.g. prioritizing voice packets over video packets.

In the paper, we show how one can program scheduling algorithms in P4 and run them on existing hardware switches.

Slide 41
Slide 41

This is another paper about fast convergence where we look at how we can speed up convergence when remote links fails. (The techniques we have seen in the previous lectures indeed only work within a network).

In the paper, we show how we can track data-plane signals in P4, namely the number of TCP retransmissions over time, to detect remote failures in (almost) real time. The intuition here is that that when you have a lot of TCP flows that get hit by a remote failure, they will all retransmit at more or less the same times, creating a signal that can be extracted.

Slide 42
Slide 42

The following two papers relate to security, more specifically they related to obfuscating some aspects of the network, namely its the traffic characteristics and its topology.

In both papers, we look at how we can avoid inside attackers to infer important information about the network or about the applications that are used in the network. In the first paper, we use P4 to "fill up" every link in the network with the exact same traffic pattern that repeats forever. Doing so removes any signal that one can get from analyzing traffic.

Slide 43
Slide 43

In the second paper, we use P4 to reroute (traceroute) probes so as to present a different (more robust) topology to the exterior.

Slide 44
Slide 44

Finally, in this workshop paper, we ask the question of whether one can remove the control plane altogether and have the data plane directly compute its own forwarding state. The dream being having routing packets being processed while they are being forwarded and the forwarding state being updated at line rate.

The paper showcases how to implement simple shortest-path routing (using a simplified Bellman-Ford algorithm). The particular constraints imposed by P4 (e.g. the limited amount of space) makes this quite hard, but not impossible, as we show.

Slide 45
Slide 45

In the rest of the lecture, I'd like to deep dive into P4. We'll talk about three things. First, we define the programming environment surrounding P4, in particular, we'll define the concept of a (P4) target. We'll then spend time describing the different language constructs, and focus a bit more on how we can build stateful applications in P4. By stateful, I mean applications whose state is maintained across several packets.

Slide 46
Slide 46
Slide 47
Slide 47

Let us start with a quick historical recap. P4 is quite young programming language which appeared ~10 years ago. Two versions of the language exist: P4$_{14}$ and P4$_{16}$, where the subscript represents the year in which the specification first appeared. As of today, P4$_{16}$ is still the latest version, and the one we'll use in the lecture.

Slide 48
Slide 48
Slide 49
Slide 49

P4$_{16}$ introduces the concepts of a target and of an architecture. The target is the concrete entity you're programming. This can be a hardware target (e.g. a programmable switch) or it can be a software target (e.g. the linux kernel). As a language, P4 is target-agnostic: this means that, in theory, you should be able to run the same P4 program on many different targets, and at different speeds. In practice though, P4 programs tend to be target-specific as different targets often have drastically different constraints in terms of memory and functionalities (as we shall see).

Besides the target, you'll also see the concept of an architecture, which should you think of as a programming model for the particular target you're programming.

Slide 50
Slide 50

Whenever you program a target in P4, you need to provide two things: the P4 program which will run on the target, alongside with the control-plane that will "drive it", i.e. compute the forwarding state that the P4 program will rely upon. These are represented in blue on the slide. (Note that the control plane can be coded in any programming language you want.)

Besides these two things are vendor-supplied elements (here, in yellow) such as the architecture model, the compiler and, of course, the target itself.

Slide 51
Slide 51

Let's look at the specific architecture model we'll use in the lecture.

Slide 52
Slide 52

The model is known as the Protocol-Independent Switch Architecture, or PISA. It is similar to what we seen before in that there is a parser, match-action pipelines, and a deparser. What changes is that the switching logic (the entity which "moves" packets from the input port to the output port) is now made concrete (the one we saw earlier, e.g. in Slide 7 didn't define the switching logic), and that there are two match-action pipelines: one prior to the switching logic, known as the ingress pipeline, and one posterior to the switching logic, known as the egress pipeline.

Most of the P4 program tend to be ingress-heavy, meaning that most of their logic will be defined in the ingress pipeline.

Slide 53
Slide 53
Slide 54
Slide 54

Different architectures/targets support different types of metadata. Again, think of metadata as extra pieces of states that flow alongside with the parsed packet headers throughout the pipelines. Metadatas come in two flavors: standard and intrinsic. Standard metadatas are supported by all targets. They include things like the input port onto which the packet was received (ingress_port), the output port onto which the packet will be switched to (egress_spec) or the timestamp at which the packet was enqueued (enq_timestamp). Notice the semantic difference between egress_spec, which is the output port the packet should be switched to, and egress_port which is the output port the packet was switched to. The latter is only defined when the packet reaches the egress pipeline, while the former will be defined in the ingress pipeline. Observe that metadata can (and will be) written into. The egress spec, for instance, is a metadata that should be overwritten by your P4 program. (And that was the case in our IP router example, see Slide 29.)

Intrinsic metadatas are target-specific, meaning they will change depending on the target you're programming. (This is one aspect of P4 programming that makes it "target-specific".) On the right side of the slide, you see the intrinsic metadatas that the software model you'll use in the lecture supports. These would be different if you are programming, say, an FPGA-based target, or an ASIC-based target (like a Intel Tofino chipset).

Slide 55
Slide 55

Besides metadatas, architectures also defines different sets of external functions (known as "externs") that you can call in your P4 code. Externs are similar to Java interface in that you only get their specification, not their implementation.

Slide 56
Slide 56

This slide lists the externs that are supported by the architecture you'll use in the lecture (the v1model). Externs typically involve functions whose implementation tend to be hardware-dependent. For instance, functions that pertain to memory access (cf. the register extern here), computing hashes (hash), or generating random numbers (random). Accessing the memory is indeed very different depending on whether you're programming an ASIC, the Linux kernel, or an FPGA. The same goes for hash and random functions, in which you might (or might not) have hardware support.

Slide 57
Slide 57

Like intrinsic metadata, relying upon externs tend to P4 programs target-specific. As an illustration, this slide and the following one shows the metadata supported by a NetFPGA target, which is an FPGA board geared towards networking applications. This particular board comes with 4 SFP+ transceivers to connect to optical fibers.

(Most of the big players nowadays—such as the Google, the Microsoft, or the Amazon of the world—rely upon similar kinds of boards inside their servers to accelerate some of their workloads.)

Slide 60-61-62

That's it for the environment. Next stop is the P4 language itself.

Slide 58
Slide 58

Let's start with the basics—data types, operations, and statements.

Slide 62
Slide 62

P4 is a statically-typed language meaning that the type of variable must be defined at declaration time, directly in the source code. Base types include bool, bit, int (where W indicates the number of bits the integer is defined on), and varbit (where W indicates the maximum length of the integer, meaning it can be shorter—this is useful to parse variable-length fields such as IPv4 options). Since P4 is about networking applications, most of them relate to binary strings. Also, to guarantee efficiency, P4 doesn't support types such as float or string.

P4 also supports other data types such as error which you can use to represent... error codes.

Slide 63
Slide 63

Alongside base types, P4 also support what are known as derived (or composed) types. One common one that we have seen already is the header type. You can think of the header type as a struct in C, that is an ordered set of named elements. The header type is used when parsing. The example here defines the Ethernet header which is composed of a 48-bit destination MAC address, a 48-bit source MAC address, and a 16-bit Ethertype. Note that this header definition could be extended to support VLAN tagging using 802.1Q.

Slide 64
Slide 64

A header type has a validity bit, which will be set by the parser as it tries to populate it from the packet headers. (This is the only difference with a classical struct in C.) By checking this validity bit in your code, you can then particularize your application logic depending on whether the packet (or pieces thereof) was parsed correctly.

Slide 65
Slide 65

Another common derived type if a header stack to support protocols such as MPLS. As we have seen, the MPLS header is composed of 32 bits of which 20 bits are used for the label and 1 bit to indicate a Bottom-of-Stack (BoS). Header stacks are defined using an array-like notation and the size of the array must be defined at compilation time. Here, for instance, the header stack can be up to 10 MPLS headers in length, not longer. Defining the maximum size of the header stacks allows to bound the size of the loops required to parse them. (And note that this is the only place in P4 where you can have a loop.)

Then there is the header_union type which, as the name indicates, simply refers to the union of two simpler types. The example here illustrates how one can abstract from IPv4 and IPv6 and simply refer to an "IP" header instead.

Slide 66
Slide 66

The semantic of struct and tuple is the same as in C. So nothing special there.

Slide 67
Slide 67

Type specialization is useful for readability. It allows you to define an alias for a type definition, e.g. macAddr_t for bit<48>. You'll often see these syntactic sugar used in P4 code, e.g. for MAC addresses (as here) or for IP addresses.

The enum type is similar in spirit, it allows you to refer to distinct values (here, High and Low) using their name directly. This is again a syntactic sugar, allowing you to make your life slightly easier while writing and reading your code.

Slide 68
Slide 68

Operations, as types, are simple in P4 and mainly restricted to simple arithmetic and binary/logical operations. Arithmetic operations include the classical +, -, and * but do not include division nor modulo. (This is again for performance reasons.) Binary operations include complement, and, or, xor, as well as shifting bits to the right or to the left (which can be used to approximate division). On top of that you can also do bit slicing, and bit concatenation.

Slide 69
Slide 69

Of course, you can also define variables. Being statically-typed, all variables must be defined with their types. In the example here, the variable $x$ will be stored as a 8-bit binary string. As mentioned before, it is often useful for readability purposes to define new types (you can also think of them as aliases). Here MyTupe will be used as an alias for bit<8>.

Variables can be defined as constants as well, using the const keyword. Using constants is useful to ensure that values that shouldn't been modified indeed stay that way. (Any reassignment to $x$ in the example will trigger a compilation error.)

Slide 70
Slide 70

An important point to know in P4 is that variables are not maintained across code executions. Stated differently, variables will be reset to their defined values every time a packet enters the switch. (You can think that the entire P4 program will be re-executed for each packet.) For instance, in the previous slide, if you increment the value of $x$ by 1 while processing a packet, the next packet will again see the original value of 123.

To maintain state across packets in P4 (to build stateful applications), one needs to use a different construct, registers, which we'll gonna see shortly.

Slide 71
Slide 71

P4 also includes common statements such as return, exit, conditionals, etc. These behave as in other programming languages.

Slide 72
Slide 72

That's it for the P4 syntax. Let us now dive a bit more into the parser, match-action pipelines, and deparser.

Slide 73
Slide 73

The parser relies upon a state machine which instructs the switch how to extract the packet headers.

Slide 74
Slide 74

The state machine is composed of a set of states. Each state will (typically) extract some parts of the header using packet.extract() before transitioning to another state according to what was parsed.

You can see this as play on this slide where we start parsing the Ethernet header and transition to the IPv4 state only if the value of the etherType field is 0x800. (Check out this webpage in case you care about the other values that the Ethertype can take.) The same logic is followed in the parse_ipv4 state, we first extract the IP header (hdr.ipv4) before transitioning according to the value of the protocol field.

Slide 75
Slide 75

Let me know illustrate the usefulness of having a programmable parser by showing you how add support for two non-default protocols: tunnelling and sourcing routing.

Slide 77
Slide 77

The first one is tunneling think digital private network. I've got a question there about this in the break so many ISPs, Swisscom is one of them, Deutsche Telecom another one they provide different private network services. You can go to them like for instance let's say you are UBS you have hundreds of branches all over Switzerland and the world and you can go to Swisscom and you can ask them to provide a digital network connecting all your branches you get it.

That means that you will have the branch in Basel and the branch in Zurich that are connected to the same network it's a digital network and they can use private IPs inside the network and so that means now you have like this I don't know let's say UBS is using prefix 10/8 which is a private IP space so you can have in Zurich you have the branch in Zurich, Bahnhofsstrasse is 10/24the branch in Basel which is 10.0.1.0/24.Then you can have this host in Zurich essentially sending traffic to a host in Basel through the Swisscom network and here you have to understand that these are private IP space so in Swisscom you cannot route these packets normally especially because also you can have UBS that has one IP space and you can have crossfinance which is also a client of Swisscom, and that is also using 10.0/8 internally. It's actually likely that they are like using the same private space internally, and so not only your problem in Swisscom to route these private addresses but also you can have particular clients that are exactly the same IP address.

So you need to disambiguate, long story short, between different IP space. The way you do that you typically add something else to the IP header to the packet header to do this disambiguation. In practice, oftentimes it's an MPLS label. You will attach let's say the MPLS label blue to UBS, red or yellow to crossfinance. Then now on your packets, whenever there is a packet that enters Swisscom from one of UBS branches, the first router of Swisscom will attach the blue label to the packet. And same thing when there is a crossfinance packet that enters one of the routers in Swisscom that hits one of the routers in Swisscom, that router will attach the yellow label to it. And then all the routers inside Swisscom are matching on the label and the IP destination. This is exactly kind of like what is going on there, and you can implement this before very easily.

Forget about UBS, Credit Swiss, and crossfinance. Here we have ETH and Uni Zurich. As you can see there are different branches that work as well for us. We are here in Centrum, there is also Hönggerber and same thing for Uni Zurich, Centrum, Irchel. Let's say we use exactly the same IP space as our friends. What you would do for instance is attach a color green to Uni Zurich, blue to ETH Zurich. You would have this router over here matching; it will be matching on the color and the destination IP. So this router when it receives an IP packet with destination 10.1/24 and the color is blue, we use this next up. And when it receives a destination IP address, the destination IP is 10.1/24, it goes to up and it’s green, it goes to up. So you see by having these extra colors we can do the same thing. Eextremely common usage of MPLS and you can absolutely implement that.

Slide 78
Slide 78

In P4, it's actually pretty easy, I think by now you should be able to do it. In practice you would have a new tunnel header. You would create for instance either you have an MPLS header or just create your own tunnel header. You can do that now because you are looking at P4. So you could create like for instance a tunnel header with like two 16 bits field with the protocol and then the destination ID is just the color. So think the destination ID would be ETH or Uni Zurich, and then the protocol ID is just telling you what is kind of like contained within that tunnel packet, it's an IPV4 packet, an IPV6 packet, etc., so that you know what to parse next.And then you would put that tunnel header in between Ethernet and IP4 like MPLS header for instance. Now you can see when I'm in Ethernet, I will do my selection according to the ether type, and then you create a new ether type that makes sense for you. For instance 1212, you can fix that, I mean it could be 4242, it doesn't matter. And then there you will parse your tunnel header. Otherwise, if it is 800, that's IPv4 for instance, then you will parse IPv4. So essentially you are implementing this graph now. You see that you can go from ethernet to IPV4, that's a non-tunnel packet.

Of course, Swiscom internally wants to be able to route its own packets to 10.1.0.4. Swiscom itself might be using this private IP space, totally fine. But then when it is a tunnel packet, then you first parse the tunnel header, then the IP header, and then you go to accept. That's an example of a VPN application you can't do before.

Slide 79
Slide 79

Another example of an application that is very common is source routing. So in source routing, as the name indicates, you want to give the forwarding decisions in the hands of the source. Instead of like each switch performing its own forwarding decision. One way to implement that which is very common you will add a stack of labels on top of your packets and this stack of labels will contain the physical next hop of each intermediate node. You can think that each switch will have just an index for each of its out going port like one two here, one one two there. And then when the packet will hit the first switch the switch will look at what is the output port it should use now it's written inside the header and then it's one so it will send it to the corresponding output port one to that switch. Then it will also strip away the outermost label. That switch would then forward according to this header one, strip it away, that switch one, strip it away and then this switch arrives here, this packet arrives here and then it's two, strip it away and then the packet goes out there.

You see that with source routing the source now is in charge of defining the path inside the network. If you have heard about SCION for instance it's kind of like using this idea of source routing. There are different schemes that benefit from source routing. Tt's very easy to implement. Before, again and as I said like for that you need to be able to parse a varying number of headers. Here in this particular path going there you can see that I have four labels. If I had a packet that was only going here and then out there I would have maybe only two labels. If I have a packet that goes through 10 more hops there I would have perhaps 14 labels. So depending on the length of the path you would have more or less header in your stack. So in advance you don't know how many you have so you need to have a little bit of flexibility in being able to parse the varying upper length in terms of these headers.

Slide 80
Slide 80

The way to do that as I said is simply use this notation where you can define like a stack of headers and then you have to take a number set it to the maximum diameter of the network if you want to have connectivity. Essentially that means now that you can loop in order to parse this stack. As you can see there is a loop here so when I'm parsing this stacked header I will have a transition which depends on whether this is the last header in the stack on off. If it is the last header in the stack I will now parse the ipv4 packet. If it's not the last header in the stack I will loop back to parse source routing and I will do this dance again. As you can see I have a loop there in the parser. I have one state that points to itself and that transition can be taken MAX_HOPS time. That's how you should interpret it and what this will do is essentially unpack all the headers there.

Question: What happens if the maximum hops is reached? Answer: If the maximum hops is reached and you haven't hit the bottom of stack you will get a parsing error so then the validity bit will be set to zero and then you can you can detect that. If it is adversarial and you hit that I mean you could also have garbage you could have like the bottom of stack which is set but there are still headers after it's a bug in a sense. Then you will have your parser that will start to interpret the rest as ipv4 so everything will be garbled then and everything will be weird but the parser doesn't know. I mean for the parser it just bits right and I'm taking 30 bits and then it happens to be I'm expecting to get an IP address and then I'm getting 12 bytes of the 12 bits of the labels then of course the application will fail and that will be very hard to debug by the way.

Slide 81
Slide 81
Slide 82
Slide 82

I mentioned this before as well this is the use of variable length of bits. Typically you will use that for parsing optional bits so these are fields that may or may not be there. Here this is typically the case with ipv4 options. As I said the ipv4 header can be up to 320 bits. How do you know what is the actual size when it's written in the packet header you have this field in the packet header which is the IHL which essentially tells you how many 32 bits words there are in the ipv4 header.

What this computation is doing is simply computing how many bits should be parsed extra in order to catch them all. You should convince yourself that if the value of IHL is five because these are it's written as a 32 bits number then this should lead to zero because that means that there are no options. when you decide there are no options and then that can go up to I think it's 10 and then that would mean that you are then at 320 bits. This is just like a computation which tells you how many bits of options there are and then you ask the parser to extract these bits. That's it, really nothing else there. The actual computation really doesn't matter I mean we won't ask you at the exam to reproduce that in case you care about this.

Slide 83
Slide 83

The parser contains tons of other things you won't use them in the exercise but in case you actually get into p4 coding later you can do error handling in the parser so you can adapt the parsing logic according to whether you have errors or not. You can also do lookahead so you can adapt the parsing logic depending on what is coming after. You can have subroutines all kind of like stuff that go beyond this spectrum. Okay so that's for the parser.

Slide 84
Slide 84

The pipeline now. Probably be brief there, but essentially in the pipeline you have tables, actions then control flow. Control flow you should think of it as gluing between these different tables. That's how to think about this.

Slide 85
Slide 85
Slide 86
Slide 86
Slide 87
Slide 87

Table we already discussed about one example where I was matching on the destination address. You can match on more than one field by the way. This here combined you have to think of this as doing an and between the two matches. You will match the destination header at the destination address using a ternary match and you will also match the ip4 version using an exact match. You can match on two fields on k fields at the same time.

Slide 88
Slide 88

The type of match: You can do exact match, that's easy it's like a switch. You can do npm longest prefix match like a router and then you can also do ternary match which essentially is matching a field with a mask like a firewall. That's nice with a p4 switch. You can implement essentially a variety wide variety of devices ranging from switch to firewall to router. And then you can, in some architectures you can also check whether a value is in a given range.

Slide 89
Slide 89

Okay so you define these tables then again just put that really in your brain, in the p4 code you only define the structure of the table the content of the table is pushed by the control plane. When you compile your p4 code it will essentially also produce a software agent that the target will run and then the control plane will connect to. The control plane can use that software agent, it's a RESTful API, to push forwarding state into the target.

Here I'm using this RESTful API to push two rules in the ipv4_npm table these two rules pertain to ipv4 forward action and then you have the parameters there that's really the way to think about it.

Slide 90
Slide 90

Okay actions. The only thing perhaps interesting to mention about actions is that they can take directional parameters. Actually they have to take directional parameters. What does it mean is that you will see oftentimes the parameters prefix with in, out, or inout. As the name indicates it's really kind of like self-explanatory. The in parameter it's essentially instructing the compiler that this parameter will be essentially read-only inside the action. An out parameter means that this is a parameter which is initially uninitialized and which will be written inside the action. And inout as the name indicates is a combination of both. This is something a bit different as a normal programming language and we do that to help the compiler in analyzing essentially the behavior of the p4 code.

Slide 91
Slide 91
Slide 92
Slide 92
Slide 93
Slide 93
Slide 94
Slide 94

For instance this is an example of this where we are looking at the package reflector. It's a very useful P4 application that essentially takes a packet that enters on one port and then blast it back on the same port. That's it, it doesn't do really that much. You have the source mac and the destination mac that gets flipped and then the out port becomes the in port. The source mac and the destination mac they are set as inout, because they are written and read by the action. The in port is only in because it is read but not written to and then the out port is out because it is not read but only write written to. That's the way to understand this.

Slide 95
Slide 95

This is perhaps a more visual depiction of what is going on. Source and destination you need them in and out. In port just read and out port just write.

Slide 96
Slide 96

And then not all parameters have this, to make it a bit more confusing. But whenever you have actions that come from the table look at them and you don't need them. These are a little bit like the cores of P4 that don't really matter for this lecture. If you are interested in that, the spec of course describes everything there. I mean for this lecture you don't need that that much.

Slide 97
Slide 97

Control flow is more interesting most of them are actually. Control flow is essentially how do we piece together the results of these different tables we are going through. We have this match action pipeline, it's a pipeline of table 1, table 2, table 3, etc. Typically you have a dozen of tables that you can go through in the ingress there are dozens of tables in the egress so 24 tables in total where you can do stuff. But essentially you can kind of like stitch these tables together. You can do a match in table 2 according to the results of table 1. And according to the results of table 1 you might for instance skip table 2 and go to table 3. You can kind of like build a control flow going through these different tables according to what's going on in the lookup phase for each.

The way to do that, to make the p4 code essentially perform a lookup, in the table you use the apply command. This essentially takes the p4 code to apply the ipv4 npm table. Then there will be a lookup performed then you can check whether there was a hit. This is using this conditional. You can also check what action was executed. This is using that command. For instance you could if the action run corresponds to a label added you might want to count for instance how many packets with that label have been created. This is just one example I'm making up but just to show you like the kind of logic you may want to build with this.

Slide 98
Slide 98
Slide 99
Slide 99

Another thing that you need to do as a router because you're modifying the headers like for instance changing the ttl you need to updated checksums. There are also functions to do that. You can verify checksum and update the checksum. I mean it's a bit mechanistic but essentially you define the fields onto which the switch has to compute the checksum. This address, source address, protocol all the way to ipv4 version and then what you are saying is that if it is a valid packet then you will recompute the checksum onto all these fields. With this algorithm checksum 16 and you will write the results into the header checksum. Most of the time you just copy this from an example because it's quite standard at this point.

Slide 100
Slide 100
Slide 101
Slide 101

Other stuff cloning packets and sending packets in control plane. Very common when you want to for instance telemetry, do measurements about the network you would like to let's say some kind of traffic in order to understand how is the traffic driving the network you will send packets to control plane.

You can also do recirculation which is something we very often use in research. I told you you have 12 tables in the ingress 12 tables in the egress 24 tables in total that's not that much you can do 24 things. Then the packet has to get out so there is a loophole in this in which you can take the switch to kind of like recirculate the packet back to the beginning and then reapply the 12 tables and then the other 12 tables. And so by doing this effectively you can create more complex logic by recirculating the packets. Of course it costs like you are losing throughput by doing this you half all the traffic that goes through this loop. Also you have a 100 Gib of recirculation throughput on the switch so that will be the limiting factor as well.

Slide 102
Slide 102

Okay so now I have enough to kind of like introduce how you would implement the example of PIC that we saw in fast convergence if you remember. This is really like as we see it's very simple to in P4. If you're remembering PIC what I told you is that you would like to have in the data plane you would have to have two tables. The first table which is populated by BGP and the second table which is populated by the IGP. In the first table matching a destination prefix to an index a pointer and then in the second table I'm matching this pointer to an actual physical port. That's really what we have seen one-to-one there. Now you can do that in P4. You can define two tables ipv4_lmp and then you can define a second table forward. The first table when it will do match on this destination IP and then it will set the next hop index and that next hop index will then be matched by the forward table. According to that next hop index I will choose an actual output port.

Here you can see that this set_next_hop_index action where it does is that it writes into this metadata. That's probably something I don't mention yet but you can add metadata on your own as well. Here I'm creating metadata, I'm creating a new index which is an 8-bit value that represents the output port onto which this package is going. This is totally fine. Again this destination prefix to this index and then the second table index goes to 4.

Slide 103
Slide 103
Slide 104
Slide 104

You define your two tables it's not even 20 lines of code and then you need to group them together. If I've just defined my two tables I've done nothing. I've just defined tables. What I need to also tell the P4 code is that first you match there and then you match there. I need to glue them together. I need the control flow and the way to do that is just by essentially kind of like using conditions like subsequent conditions.

Here you first check that the header is valid, first check that ipv4 was stored correctly then you apply the ipv4.lpm table and then if there is a hit in that table you apply the other table. You could have like you know up to 12 tables here for instance, if you were in need of that. Here are only two that are kind of like switched together using these conditions.

Question: you mentioned is that like from P4 to standard or just now in our course that that's the limit? Answer: It's really what you should expect to see in popular hardware targets, but that is subject to be changed. The next generation I think is around 20 but it's not going to be hundreds. It's going to be a small number but it's not written in the standard. In the standard you have like an arbitrary number and then the targets will be in it to a value and we need to be aware of that.

Slide 105
Slide 105

State I told you variables are not the right way to maintain state in P4. This will be all the time erased for each packet. The way to maintain state in P4 is to use one of these four objects: table, we have seen already; it's made by the control plane. Registers is the most interesting one; that's the one that we will see together. Then you can also count and iterate things; that one I'm pretty sure we see. Register, though, it's an important one.

Slide 106
Slide 106
Slide 107
Slide 107
Slide 108
Slide 108

Register, as the name indicates, it's well, essentially a memory construct—a memory unit into which you can write and read. That memory unit will be preserved across different packets. So you can write something into this memory unit with one packet. There is the first packet that arrives: you write, for instance, the timestamp of that packet. Then when there is the next packet that arrives, you read the timestamp of the previous packet, and that allows you to compute the time that has elapsed between the two packets. This is a typical usage of this, actually you will do that in the exercises.

Slide 109
Slide 109

These registers they are like allocated in arrays so you will define an array of registers of a given length. For instance you can define I don't know 1064 bits counters and then you will have that in the array from 0 to 999 with each of them containing one counter of 64 bits and then you can write into a given index a value or you can read into a given return variable from an index. It's a bit twisted that they kind of like use the index is the first position and the second position depending on read and write but I guess you need to get used to it.

Slide 110
Slide 110

That's the example that I was mentioning about calculating the intro packet gaps. You will define a register, here 16k entry 48 bits wide. 48 bits because time stamps in these targets tend to be defined in 48 bits. Then you want to compute the intro packet gaps so you will define a temporary variable that's packet timestamp when there's a packet that arrives. You will read into that variable what was the timestamp of the previous packet. You can you compute the interval by doing the current timestamp minus the previous one and then that's the interval. And then you write the interval into the corresponding register so that the next packet is computed from current.

In the exercise you will do that to actually figure out whether there has been enough time that has elapsed within the packets that you can actually change the forwarding decision without running to reordering. If you want to load balance traffic across a set of paths it's tempting to try to do what is known as packet spraying. Which you send one packet here one packet there one packet there. But if you do that then you're running to the risk of reordering. And so with this very simple piece of code what you can do is essentially ensure that you don't do that at the packet level but you allow still to do it between bursts of packets. Essentially the way to do that is to define a threshold for instance we define 50 milliseconds and then if the elapsed time between two packets is more than 50 milliseconds then it's okay to change the load balance decision. If you can guarantee that your round trip time in the network is less than 50 milliseconds you can guarantee that you won't have reordering as well.

That simple stuff was is implemented for instance in advanced switches but you can implement that in P4 with really 10 lines of codes and then you can benefit from the most advanced capabilities inside the network as well.

Slide 111
Slide 111

Just another popular example is stateful firewall. This is very very common. what is the stateful firewall doing? Well at least by import what it does is that it will allow connections that are created from the inside of the network and it will drop connections that are created from the outside of the network. If you are a host here and you create a connection to Facebook, this is a connection which is initiated from the inside. It crosses the P4 switch from that direction in towards out. That's allowed. Should be allowed to go through and then the return packet the same ACK, if you think about TCP, should be allowed to go through as well.

But if you have like an attacker which is sending you a SYNC ACK or even a SYN packet without anyone inside your network having contacted that attacker that you should drop.

Question: What's the difference between this and the NAT? Answer: Here there is no address translation that happens whatsoever. The only thing that happens is the decision of like whether to allow or to drop. These typically combine with the NAT that will also rewrite the source IP. But here as you will see in the code there is no change in address. It's really just do I let this one through or do I drop it?

Slide 112
Slide 112

Okay, so these are actually the state machine that we can implement. We have a TCP packet that writes the TCP segment. If it is from the internal, you check whether it's a same packet or not. If it's a same flag, then you add this to the register so that you remember that now this host has tried to open a connection say to Facebook. You will store host 1 Facebook inside the switch. Then if you receive a packet it is not from internal, it's a return packet from Facebook, you will check whether that flow is in the register. In this case it will be, then you will forward it, while if it is not then you will drop it. That's essentially the idea of a stateful firewall. Again you can buy devices that are doing this.

Slide 113
Slide 113

You can also just increment the set so this one is a bit more involved if you wish even though it's really not that complicated once you get to read it a bit more. Essentially you can see we are like checking whether TCP parsed correctly. Then I'm checking if the ingress port here internal or is the ingress port here wide area network or external. Here internal means one external means two. If it is coming from the internal then I check if it is a SYN flag if so I will keep track of the fact that that flow has been initiated from the inside and then if it comes from the exterior I will read this non-flow's register array and then I will check whether it's a 1 there or not. Meaning that there was like a previous write that has been done for that flow.

This is typically implemented if you are running I don't know like perhaps at home you are running like a Linux firewall this very commonly implemented in Linux firewall, like any kind of like stateful jumping it's based on this project.

Question: What is the memory model for using registers? Is only one packet at a time being processed? Answer: Yeah it's a good question so the question is about, yeah you're correct packets they go into these pipelines, you can have multiple packets in the pipeline meaning you can have a packet here a packet there and a packet there but the registers they are attached to the different stages of the pipeline. That means that you can have a packet here reading these registers at the same time a packet there reading another registers but they won't read the same registers at the same time. There is a guarantee of isolation there there is no parallelism at that point otherwise it would be very complex you would need locks and these kind of things. And you cannot again you have so little time that you cannot start to implement locking mechanisms, you need to ensure isolation when it comes to memory access but that's not some questions.

Slide 114
Slide 114

In addition to registers you can also use counters and meters. Counters they are just kind of like specific registers that are used to count. With counters you can count how many bytes a port has seen, how many packets a port has seen. With meter you can measure rates, packets per second, or bytes per second that a port is seeing, for instance. They are very similar to registers. They also define an array. You can have an array of counters of different types: by packets, bytes, and packets and bytes for instance.

Essentially you can pass the switch to count by just for instance, if you want to count how many packets go onto an ingress port, you just call count onto the ingress port and then the switch will do plus one for you. Essentially counters and meters you can think of them as kind of like helpers you can implement them with registers yourself or if the only thing that you are doing is counting then you can use a counter and have the switch do a little bit of the processing for you.

Slide 115
Slide 115
Slide 116
Slide 116
Slide 117
Slide 117
Slide 118
Slide 118
Slide 119
Slide 119
Slide 120
Slide 120
Slide 121
Slide 121
Slide 122
Slide 122
Slide 123
Slide 123

Same thing with meter. Meter typically will be used for rate limiting applications. There typically you can use the switch as a kind of like coloring device. You can define different rates and then you can have the switch color the packets according to the incoming rates. Where packets in green would be within the limit packets in orange is close to the limit packets in red above the limit and then you can do that for instance to define different routing. For instance you can start dropping perhaps the red packets. I mean the logic is up to you to define. They also define as an array but as I said I think for the lecture registers is what you need to know and we have covered them the two others you don't need to know them as much.

Slide 124
Slide 124
Slide 125
Slide 125
Slide 126
Slide 126
Slide 127
Slide 127
Slide 128
Slide 128
Slide 129
Slide 129
Slide 130
Slide 130
Slide 131
Slide 131

To finish, here is a brief summary of how you can access the different stateful constructs in P4. Observe that registers are the only constructs that can be read and written to in the data plane. All the others require to use the control plane for at least one direction. Counters, for example, cannot be read in the data plane, only by the control plane.

Based on this, it might look like registers is the way to go because of they are the most flexible construct. This flexibility though comes at a price and the amount of registers memory available on hardware-based target will typically be very limited. It is therefore better to use them with care, when you really need read/write support in the data plane.

Slide 132
Slide 132