Network Verification

In this lecture, we will discuss about network verification (how do we proof that a network configuration is correct) and synthesis (how do we automatically generate correct network configuration).

Slide 1

Slide 2

Let's start by considering a network. Imagine we are operating an Internet service provider (ISP) network. Suppose a customer contacts us, reporting they cannot access a critical website. What should we, as operators, do?

Slide 3

A standard, somewhat simplistic approach that many operators currently employ is to perform pings and trace-routes to investigate the problem. Ultimately, they hope to determine the root cause and fix the problem. The root cause of the issue can vary greatly. For example, it might be a configuration bug introduced in a recent update that only now surfaced because the customer complained. Alternatively, the problem might be more subtle, emerging from an overlooked network bug that has existed for an extended period (even years) until a specific link failure triggered it. While the technique of performing pings and trace routes work for identifying issues, it is inherently reactive—it does not prevent problems from occurring in the first place. It also lacks comprehensive coverage, as it only tests a single destination.

Slide 4

Slide 5

So, how can we improve our understanding of the network's functionality and proactively prevent these bugs from occurring?

Slide 6

One potential solution is establishing a monitoring infrastructure that continuously sends pings or trace routes to a defined set of destinations. This way, we can identify problems before the user realizes an issue. While this strategy increases our chances of detecting problems earlier, it remains reactive and costly if we monitor numerous destinations. As a result, we networks only monitor a tiny subset of destinations, leaving parts of the network unmonitored.

Slide 7

Slide 8

Another approach is to simulate the entire network by replicating everything in a simulated environment—whether through simulation, emulation, or even with physical hardware outside the production network. In this simulated environment, we can test whether our desired properties hold under various scenarios, such as link failures or changes in advertised routes from peers. This approach is finally proactive, allowing us to test configurations extensively before deploying them to the network. However, one challenge remains—due to the potentially infinite number of scenarios, we can't guarantee that our network will function correctly under all circumstances.

Slide 9

Slide 10

This brings us to the key question: how can we provide guarantees so operators can trust their configurations are correct? This is the purpose of network verification. The objective is to guarantee that a network adheres to a formal specification across various environments.

Slide 11

Slide 12

What does a specification look like for a network? For instance, in an ISP, specifications can involve desired routing or forwarding behaviors. We strive for reachability and prioritize routes from customers over those from providers to maintain revenue. Other policies include ensuring that firewall rules are enforced and solid rules are in place for a demilitarized zone. Additionally, operators can consider performance-related properties, such as congestion, delay, packet loss, and even convergence time. Overall, there are numerous specifications we could formalize.

Slide 13

The goal is to verify whether these specifications hold across various environments. Think of an environment as the part of the network that operators cannot control. While operators manage their network's topology and configuration, they lack control over routing inputs from neighboring networks, possible link failures, or the traffic demands placed on the network. Verification can provide guarantees that the specification is satisfied under all these external conditions.

Slide 14

Network verification is a system that operates as follows: we provide it with the network's topology, configuration, specification, and a description of the environments. The verifier will then respond with either "yes," indicating that the specifications are satisfied across all the given environments, or "no," indicating that they are not.

Slide 15

Slide 16

Slide 17

We can classify the approaches discussed before on a spectrum between validation on the left and verification on the right. On the left side, we have the first approach of manually pinging individual destinations to debug issues. We call such an approach validation: it validates that a specific destination in the current environment is reachable. The second approach of periodically monitoring many destinations is still on the validation side, as it only validates properties in the current environment. A formal proof that a specification is satisfied in all environments is verification. Simulation is somewhere in between, depending on the number of environments we iterate over. Generally, verifying is more time-consuming than validation, but it can also provide more guarantees than validation and can be done before deploying the configuration.

Slide 23

Let’s look at a practical example: Consider a network consisting of four routers—A, B, C, and D—connected to one provider and customer. The network's configuration is simple: an IGP with all link weights set to one and BGP configured in a full mesh with no route maps. Our specification mandates that routes from customers must always take precedence over those from providers to ensure profitability. The environments we’re examining include all potential routing inputs we could receive from the provider and the customer.

When we run a network verifier on these inputs, it should return that the specification is violated and provide a counter-example. The counter-example is one environment in which the specification is violated. In this case, such a counter-example might be the routing inputs where both the provider and customer advertise the same route but where the provider's AS path is shorter. This causes the network to prefer the provider's route over the customer's.

Although the counter-example does not explain the reasoning behind the failure, it is invaluable for understanding the issue. We can further investigate the underlying situation by running simulations in the environment as described by the counter-example.

The question is: How can this counter-example be efficiently found?

Run this slide!

Slide 28

Slide 29

To tackle this, we turn to a different problem: SAT (Satisfiability) and SMT (Satisfiability Modulo Theories). These are powerful tools used widely in network verification and across many areas, including program analysis and combinatorics.

At its core, SAT deals with propositional logic. It revolves around atoms, i.e., Booleans, that can be true or false. Atoms can also be variables that can take the value of true or false. If you reference the same variable twice in the formula, both instances must take the same value.

A formula may consist of one or more atoms combined through logical operations like conjunction, disjunction, and negation. A solution to a formula is an assignment (also called an interpretation or a model) of the variables that make the formula evaluate to true.

Slide 30

SAT is the problem of determining whether a solution exists that satisfies the formula. A SAT solver returns a specific interpretation that solves the formula if such a solution exists, or it reports that there exists no solution. Using the SAT terminology, we say a formula is either satisfiable (if an interpretation exists that solves the model) or unsatisfiable (if no such interpretation exists).

SAT is as an NP-complete problem, yet various algorithms can efficiently handle many practical applications. Usually, modern SAT solvers can solve complex problems within a reasonable time.

Slide 31

Now, what about SMT? SMT builds on SAT by incorporating additional theories, like integers, real numbers, arrays, and bit vectors. It allows for more complex formulas involving first-order logic, i.e., the universal and the existential quantifiers, enabling us to express certain properties and constraints in more detail. First-order logic allows us to assert properties to hold for all (or for at least one) elements in a set.

Slide 32

Multiple SMT solvers exist, including Z3, which originated at Microsoft Research and features a user-friendly Python API. With Z3, we can create symbolic variables and build formulas and expressions seamlessly within Python. We can impose various constraints and check for satisfiability.

As a quick demonstration, you can import Z3 symbols, define symbolic variables, and construct expressions. You add constraints and use the solver to check satisfiability, i.e., that added constraints are simultaneously true. If satisfiable, you can retrieve an assignment that satisfies all conditions; otherwise, the solver will indicate that no solution exists. Z3 calls such an assignment a model.

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

In your homework assignment, try solving a simple 4x4 Sudoku puzzle using SMT. Although it’s straightforward to solve manually, applying SMT will help solidify your understanding of SMT. Begin by creating a 4x4 array of symbolic variables and encode the Sudoku rules, ensuring each digit from 1 to 4 appears exactly once in each row, column, and box. Then, check for a solution and print the results.

Slide 38

Slide 39

Let's turn our focus back to network verification. Our central idea to do verification is to represent the entire network as a set of equations and let the solver find the solution.

Our goal is to model the network as a combinatorial circuit. SAT solvers are widely used in combinatorics and chip design, and we also leverage these concepts.

We model the network as a system with inputs and outputs and relate the inputs to the outputs through equations and constraints. In our network context, the inputs represent the environment, such as routing inputs and link failures. The system is the network, and the output corresponds to the network's state, including the routing and forwarding states. From this state, we can verify if the specifications are satisfied

We must encode the complete routing semantics and forwarding behavior within SMT. Recall that SMT doesn't know anything about networks. We must teach SMT how networks operate by developing equations that describe the network semantics.

Slide 40

We build the SMT model by separating the problem into three parts: The network model, specification, and environment restrictions.

The symbolic network model captures the system's behavior through equations. We will represent both the routing inputs and the routing state as symbolic variables—we don't know beforehand what the exact routing inputs will be. The symbolic network model describes how the routing inputs and routing state relate. For now, think of this symbolic network model as one large formula that is true when the routing inputs yield the corresponding routing state and false otherwise.

The specification is a boolean formula (equation) for the variables describing the routing state. It evaluates to either true or false, indicating whether the specification is satisfied.

Finally, the environment conditions form restrictions on the set of environments to consider. This restricts the space of environments for which the specification must be satisfied.

Slide 41

Slide 42

Slide 43

Slide 44

How can we combine these three components into a single SMT model to verify the network? Our goal is to design an SMT problem that either proves that the network satisfies the specification in all given environments, or that yields a counter-example.

Slide 45

The first approach is to ask the SMT solver to find an assignment that satisfies all three equations. However, this does not yield the correct results. If expressed like that, we ask the SMT solver to find one environment where the resulting network state satisfies the specification. But we want to verify that the specifications are satisfied in all environments.

Slide 48

Instead, we invert the specification, a commonly used trick. By inverting the specification, we instruct the solver to find one assignment of routing inputs and routing states where the network equations are satisfied, and the environments are valid, yet the specification is not. If such an assignment exists, we have identified a counter-example. Conversely, if the solver cannot find such an assignment, we have proven that the specification is satisfied across all environments, as no counter-examples exist.

In summary, we are not simply asking whether the network fulfills the specification but whether it fails to do so. We can either identify a counter-example indicating failure for specific environments or demonstrate that the specification holds for all environments.

Slide 53

We now cover these three components in detail: the symbolic network model, the specification, and constraints on the environment. Understanding the network model is challenging, but we can generate it automatically. In contrast, the specification and environment constraints are easier to understand, but correctly writing them is challenging. Let’s start with the symbolic network model. We rely on the model presented in a paper by Becket et al., which is referenced in the reading materials.

Slide 54

The symbolic network model we intend to encode consists of the routing state and the routing semantics. The routing state encompasses the variables used to characterize the network, while the routing semantics describe how these variables interact. The semantics are governed by the processes through which routes disseminate through the network and how routes are selected. We focus mainly on BGP (Border Gateway Protocol). We can extend it to other protocols, such as OSPF, but we will focus solely on BGP in this lecture.

Slide 55

In BGP networks, routes are the essential states we need to represent. Recall that a BGP route is comprised of a set of BGP attributes, including local preference, next-hop, the AS path, and others. We must encode all these attributes using types in SMT. For instance, the LocalPref can be represented as a simple integer and the AsPath as a sequence of integers. The next-hop, however, should refer to another router. In the following, we represent each router using a unique identifier, allowing us to encode the NextHop as an integer.

There’s also the Available flag, which is a bit more intricate; it indicates whether a router knows a route or advertises it to a neighbor. If this variable is false, it signifies the absence of a route. Nevertheless, we must account for other attributes because when constructing the model, the availability of a route is not predetermined.

Slide 56

Slide 57

Slide 58

Slide 59

There are additional attributes that I still need to explain, one of which is the prefix itself. The prefix is special, as BGP treats each prefix independently of others, allowing us to consider one prefix at a time. Keeping the prefix symbolic allows the model to discover which prefix yields a counter-example. Consequently, we can verify all prefixes concurrently by running the SMT model only once.

Slide 60

Slide 61

In the following, every symbolic variable or expression starts with a capital letter. In contrast, fixed variables begin with a lowercase letter.

Slide 62

Next, I will show you how we represent the complete state of the network. We duplicate these variables for each router and capture them as the attributes or routes each router selects. For example, we can track which routes are visible to a router or whether router r_C is aware of a route.

Slide 63

If we input these variables into the SMT, the solver can assign arbitrary values to all variables, resulting in nonsensical results. To avoid this, we must construct equations that effectively relate these variables. Ultimately, we desire all variables to be influenced or defined by the routing inputs from the external network.

Slide 64

These routing inputs, which are symbolic variables, represent unknown routes that external networks may advertise. We aim to formulate equations that closely relate the routes from the external network to the internal routing state.

Slide 65

I'm now delving into the routing semantics, which comprises two critical aspects: the propagation of routes through the network and the selection of optimal routes. We must encode both elements.

Slide 66

Slide 67

Let us first discuss route propagation. The fundamental concept is that a router learns a route from a neighboring router. We add another variable that indicates where the router acquires its route. This variable, termed SelectsFrom can take the ID of a neighbor as its value. For instance, if router r_A learns its route from router r_B, this variable would take r_B's ID as its value. Alternatively, it may be invalid, denoting that the router does not learn any route.

Slide 69

We must link this SelectsFrom variable back to the available variable since the route's availability is contingent on learning it from a valid neighbor. If the route is Available, then the router has learned a route from one of its neighbors; thus, SelectsFrom must be valid. Conversely, if Available is false, SelectsFrom must be invalid. This constitutes the first equation we establish.

Slide 70

Continuing with router r_A, its route is available only if it selects a route from one of its neighbors—either r_B, r_C, or r_D.

Slide 72

To encode whether r_A selects its route from one of its neighbors, we create a conjunction over all neighbors whether A selects its route from that neighbor. We do so by checking if A.SelectsFrom equals the neighbor's router ID.

Slide 73

In pseudocode, we iterate over all routers R and require that R.Route.Available is true if and only if R learns its route from one of its neighbors, which in terms is a conjunction over all neighbors N of R, whether R.SelectsFrom == N.id.

Slide 75

Now let's consider how router r_A learns its route from router r_B. To understand what r_B advertises to r_A in BGP, let's revisit how BGP disseminates routes.

Slide 76

In BGP, when r_B sends a route to r_A, it doesn't transmit the original attributes but instead transforms them. For example, the local preference can be modified during the advertisement process—perhaps by route maps that increase or decrease the preference. The next-hop is typically set to the sending router when sending via iBGP to maintain connectivity to the destination. Furthermore, the AS path generally remains unchanged.

Slide 77

Slide 78

Slide 79

Slide 80

Slide 81

To express this process, we can create new variables for the transformed route, and add constraints describing how these new variables depend on those of router r_B's selected route. However, this tends to become unwieldy due to the numerous variables involved.

Slide 82

Slide 83

Instead, we represent this transformation as expressions, i.e., formulas, linking the attributes r_B advertises to r_A to those attributes selected by r_B. We write this transformation as TransformedRoute, or TR for short. For instance, TR from r_B to r_A indicates that we are transforming the route such that its LocalPref is modified according to the route maps from r_B to r_A, the NextHop is the ID of r_B, and the AS path remains untouched.

Slide 84

Router r_A selecting its route from r_B (i.e., A.SelectsFrom == B.id) implies two properties. First, it implies that r_B advertises a route to r_A, which we can write as TR_B,A(r_B.Route).Available. Second, r_A picks the attributes from the route it receives from r_B.

Slide 87

In SMT, we write that by asserting that all attributes of r_A are equal to the attributes of TR_B,A(r_B.Route). Remember that all complexities of BGP's rules, such as route reflection and route maps, are encapsulated within the transform function. Accurately encoding these rules is essential for our purpose, but we can hide this complexity using the abstraction of a transformed route.

Slide 88

Next, we must articulate how routes are selected. Thus far, we have not imposed preferences indicating that router r_A favors the route from r_B over r_C, for example. We express the BGP selection process by encoding the stable state of the network in SMT.

Slide 89

Slide 90

Let's again consider the session from r_B to r_A, and assume that r_B advertises a route to r_A, but r_A selects a different one. For this to be a valid, stable state, r_A must prefer its rote over the one it receives from r_B.

Slide 92

How do we express the preference of two routes, i.e., r_A.Route > TR_B,A(r_B.Route)? Recall that BGP compares routes by their attributes in a defined order, first comparing the local preference, then the AS path length, followed by MED (which we don't model in this simple verifier) and IGP cost. r_A's route is preferred over the one it receives from r_B if either r_A.LocalPref is larger, or if both LocalPref attributes are equal, but r_A.AsPathLen is smaller, or both the LocalPref and AsPathLen are equal, but r_A.IgpCost is smaller, etc...

Slide 95

Working with SMT can sometimes be tricky—you need to express every last detail of the system you wish to model. If you miss one corner case, the solver exploits it and returns unreasonable results. The most common problem is when there are cases in which some key variables are unconstrained, allowing the solver to pick an arbitrary assignment.

The equations we built up so far have such an issue; what happens if r_A does not select any route, i.e., what if r_A.Route.Available is false? In that case, there are no constraints on the actual attributes of r_A's route (we only do so if r_A selects its route from a neighbor). This allows the solver to pick attributes that are preferred over the one r_A receives from r_B. Therefore, the current set of equations does not accurately describe the stable state; it also accepts a state where routers don't select a route but receive a route from neighbors they should choose.

This issue can be resolved by asserting that r_A does select another route if r_B advertises a route to r_A (i.e., if TR_B,A(r_B.Route).Available), but r_A doesn't select its route from r_B (i.e., if r_A/SelectsFrom != r_B.id). We extend the implication, requiring that both r_A.Route.Available and r_A.Route > TR_B,A(r_B.Route).

Slide 96

Slide 97

Slide 98

The equations for propagating and preferring routes have a similar structure. Both equations are implications based on whether r_A selects its route from r_B (or not). We can combine them into one single conditional expression. If r_A selects its route from r_B, then it also takes all the attributes it receives from r_B. However, if r_A doesn't select its route from r_B, then it prefers its route over the one it receives from r_B.

Slide 99

Slide 100

To build up the network equations, we iterate over all BGP sessions. For each session from Src to Dst, we compute the route that Src sends to Dst. Then, we add the conditional constraint from the last slide for Dst, concerning the route from Src: If Dst selects its route from Src, then Src must advertise a route to Dst (i.e., R'.Available), and Dst takes all attributes from the route it receives (i.e., Dst.Route == R'). If Dst doesn't select its route from Src, then Src's route must be preferred over the one it receives from Dst. That is, if Dst.SelectsFrom != Src.id, and if R'.Available, then Dst must itself select a route (i.e., Dst.Route.Available), and it must prefer its own route (i.e., Dst.Route > R').

Slide 101

This concludes the symbolic network model and its two parts; the routing state and all its variables, and the routing semantics that describe both the route propagation and selection in the stable state.

Slide 102

Now, let’s pivot to the specifications.

Writing precise specifications is often the most challenging aspect of verification—sometimes even exceeding the complexity of modeling configurations. The difficulties of writing specifications slow down the industry's adoption of network verification.

This challenge is well known in domains such as software engineering, where static program analysis tools still need to be utilized due to their complexity.

Slide 103

We encode the specification as constraints (or equations) on the routing states. The resulting equations should determine if the specification holds based on the routing state.

Slide 104

For instance, we may require to always prefer routes from customers or prevent transit routes between providers. This means we would avoid advertising a route from one provider to another, as it could incur financial losses.

Slide 105

We can also define requirements based on the resulting forwarding state, such as ensuring all packets traverse specific waypoints or avoid forwarding loops. These properties are more challenging to articulate due to the complexity of representing forwarding paths. The forwarding path itself is recursively defined and might loop. There are encodings suitable for this task, but they are outside the scope of this lecture.

Slide 107

Slide 108

For an illustrative example, we require the network to prefer customer routes over provider routes. Our specification should not constrain the routing state if the customer doesn't advertise a route. In contrast, if the customer advertises a route, then all routers must select the route from the customer.

Slide 110

Whether the customer advertises a route is Customer.Route.Available. To express that all routers select the customer's route, we can assert that C selects its route from the customer, and all other routers select their route from C.

This specification only works in an iBGP full-mesh. To encode this specification in a general network, we can augment the routing state and add an attribute, encoding whether the customer initially advertised the route.

Slide 111

Slide 112

Now, let's discuss constraints on environments, the third component of our discussion. When verifying, we define the environments in which the network must satisfy the specification.

Slide 113

In our current model, we focus on routes that eBGP neighbors advertise, encompassing all possible attributes they may advertise or whether they choose to advertise at all.

Other environments could include link failures, necessitating modifying our model to account for how these affect IGP costs and the resulting changes to the forwarding paths. In that case, operators typically bound the number of failures, adding an additional constraint to the model.

Slide 115

Now, I will briefly demo everything we have discussed so far. In the exercise, you will work on the same verifier shown here and complete it by writing the equations.

The skeleton provides a variable called net, which encodes the topology and the configuration. It also contains all symbolic variables of the routing state and several helper functions to work with them.

The NetworkModel is the list of the equations we just discussed. The Spec is the exact specification that we have seen before. We build a solver s that should satisfy the NetworkModel, and the inverted Spec. When calling s.check(), Z3 tries to solve the problem, returning either sat or unsat. Recall that the equation is satisfiable if the specification is violated and a counter-example exists. In that case, we can get the counter-example by calling s.model() (which lists the assignment for all variables). We provide a convenient method net.pprint(s) to interpret the counter-examle and display it in a human-readable form.

Slide 117

Many state-of-the-art network verifiers leverage a model similar to the one I presented today.

Slide 118

The first few systems generate a comprehensive equation for the entire network, which becomes unwieldy in larger networks with hundreds or thousands of routers.

Slide 119

Recent systems leverage modularization strategies, splitting the equation into smaller pieces that can be solved independently. These methodologies enable the independent verification of network segments, thereby enhancing scalability.

Slide 120

In the second portion of today, I will touch on synthesis and our group's research in this general area. Similar techniques from network verification can be applied for direct synthesis as well.

Slide 121

In verification, we input a configuration alongside the topology, the specification, and environment constraints. The verifier then determines if the configuration satisfies the specification. If not, it returns one specific counter-example. This counter-example is a specific combination of routing inputs or environments, yielding one routing state linked to that environment.

Slide 122

The operator must then figure out how to resolve the bug in the configuration, often without a clear starting point. After making adjustments, the operator can rerun the verifier to confirm if the problem has been resolved or if a new issue has been introduced.

Slide 123

Slide 124

This process is inherently challenging for a couple of reasons. First, the counter-example itself does not indicate the error—it merely communicates what could go wrong. Further, the operator must review the entire set of specifications, as a potential fix may unintentionally introduce new bugs related to entirely different aspects of the specifications. Moreover, finding a solution can be particularly challenging for performance goals like load balancing or delay and jitter requirements, which are complex to optimize.

Slide 125

This brings us to configuration synthesis. Given a topology, a specification, and a set of environments, we aim to discover a configuration that satisfies the specifications across all desired environments.

Slide 126

Let’s consider a specific example using the same network as before. The specification dictates that we must prefer the customer's route when available, and the set of environments comprises all potential routes from customers or providers. The synthesizer aims to produce a configuration adhering to the specification, such as assigning a local preference of 50 to routes from the provider and 200 to the customer's routes.

Slide 127

Achieving configuration synthesis renders verification unnecessary since the synthesizer inherently creates a correct configuration. However, synthesis still utilizes verification to ensure that the generated configuration satisfies the specification.

Slide 128

Synthesis enables automation in various ways. For instance, we can automate the configuration repair process by asking for a new configuration that minimally deviates from the current one while ensuring compliance with specifications. Intent-based networking also emerges from this, where operators express their desired network-wide behavior, and the system automatically configures all devices accordingly. Additionally, synthesis enables optimizations that often surpass human capabilities due to the complexity of the configuration space.

Slide 129

Configuration synthesis is the problem of finding a configuration that satisfies the specifications across all possible environments.

Slide 130

But what constitutes finding the configuration? It involves handling a large and often unstructured text file or several CLI commands. However, SMT can only deal with symbolic variables of fundamental types, such as numbers, bools, and lists. Thus, synthesis requires us to parameterize the configuration and represent it using symbolic variables. This allows the SMT solver to find an assignment of these variables.

Slide 131

Examples of parameters include link weights and route maps. We can parametrize link weights using a single variable for each weight. However, parametrizing route maps requires many individual variables used in different ways.

Once we find a suitable parametrization and represent it using symbolic variables, we can write the network equations in terms of those parameters. We then feed that equation into the SMT solver, which finds an assignment of the parameters that satisfy the specification. Finally, we reconstruct the configuration from these parameters.

Slide 133

Let’s revisit our earlier example with parameters for which local preference value we want to assign to routes from the customer and provider. We assume there are no additional parameters and focus solely on finding an assignment for these two parameters.

Slide 134

We now express network equations in terms of those parameters. For instance, we state that the local preference of the customer's transformed route is now equal to the parameter instead of a specific numeric value. We similarly construct the complete network equation, replacing numeric values with the parameters.

Slide 135

The second part of our task involves ensuring these configurations hold for all environments.

Slide 136

Slide 137

In verification, we assert that specifications must apply universally across all environments. Fortunately, we could rely on identifying one counter-example to avoid quantifiers. If no such counter-example exists, then the specification is satisfied.

Slide 138

In synthesis, the SMT solver must ultimately return a set of valid parameters across all environments. There is no trick to avoid this for synthesis as there is for verification.

Slide 139

For complicated formulas, SMT cannot efficiently deal with the universal quantifier. It usually checks the specification to be satisfied against all possible inputs. This approach doesn't scale to large networks and many parameters.

Slide 141

Instead, we can borrow recent advances in program synthesis from the field of programming languages. The idea is to synthesize a configuration that complies with a few selected environments. Then, we can verify that the generated configuration satisfies the specification in all environments. If the verifier yields a counter-example, we add it to the set of environments and repeat the process.

Slide 142

This method is known as counterexample-guided inductive synthesis. Starting from an empty set of environments, we find a random configuration. We then verify the configuration works for all routing inputs. Either the configuration does satisfy the specification, in which case we found a solution. Otherwise, we extend the set of environments by the generated counter-example and go back to synthesizing a configuration that works for all counter-examples that we already found.

Slide 147

Let’s illustrate this further with an example. We again utilize the parameterization method from before. The two parameters represent local preferences assigned to routes from the provider and customer. Again, we specify that the customer's routes must be preferred over the provider's.

Slide 148

We can visualize all possible configurations in the 2-dimensional space spanned by these two parameters. We deduce that satisfying the specification requires the local preference for customer routes to exceed that of provider routes, i.e., where C_cust > C_prov. We visualize this space of valid solutions in the configuration space with the green shaded area. Notice that the positive diagonal line—where C_cust is equal to C_prov—is not part of the space of valid solutions.

Slide 149

Initially, we start with an empty set of environments. We then let SMT determine an initial configuration randomly. Let's say it returns a configuration with c_prov = 100, and c_cust = 200.

Slide 150

This configuration falls outside the valid solution space. Thus, the verifier can find a counter-example. Let's say it returns the counter example x₁, where the customer advertises a route with a shorter AS path than the route advertised by the provider.

Run this slide!

Slide 151

Consequently, the counter-example x₁ invalidates all configurations for which the customer’s local preference is lower than the provider's, i.e., C_cust < C_prov, visualized in red. The synthesizer can no longer generate a configuration from that space, as any such configuration will yield an invalid state for x₁.

Slide 152

In the next synthesis step, the SMT can find any configuration outside of this red space, i.e., those configurations in which the network satisfies the specification under the counter-example x₁. Say the SMT finds a configuration where both local preferences equal 100.

Slide 153

The verifier yields another counter-example, where the customer's route has a longer AS path than the one from the provider.

Run this slide!

Slide 154

Again, this counter-example invalidates any configuration in which C_cust is equal to C_prov.

Slide 155

With those two counter-examples, we return to synthesis, tasked with finding a configuration that satisfies the specification for both invalidated environments. Now, the solver must return a valid solution in which C_cust > C_prov, as the two counter-examples invalidate all other options.

Run this slide!

Slide 156

This strategy works in general; each iteration reduces the space of possible solutions until the synthesizer either finds a valid solution by chance or until only valid solutions remain.

Identifying the absence of a solution might require considerable time. To that end, the synthesizer must first collect enough counter-examples so that a configuration that works for all counter-examples no longer exists.

While this approach can synthesize configurations for smaller networks or a few parameters, we eventually experience scaling issues. Scaling synthesis to large-scale networks remains an open research question.

Slide 157

We in this group have done extensive research in network verification and synthesis.

Slide 158

Early efforts evolved around analyzing the probabilities of the specification being satisfied according to a given likelihood of environments. Specifically, we focused on the probabilities of links failing and, thus, on how likely forwarding properties are violated. This allows us to provide guarantees—say 99.999%—that the network adheres to the specification without actually checking all environments.

Slide 159

We are currently looking into probabilistic verification, not only for link failures but also for routing inputs. Most counter-examples we find with verification are corner cases—routing inputs that appear with an extremely low probability. It might be more critical for operators to focus on the behavior of the network upon the most likely routing inputs.

Slide 160

In verification, we typically model the network. These models must accurately reflect the network for the results to be valid. In this paper from a couple of years ago, we presented a system that automatically and thoroughly tests these models.

Slide 161

Currently under submission is a system for verifying worst-case link loads that could occur in response to arbitrary route changes in the network. This requires simultaneously reasoning about all destinations instead of focusing on one in isolation.

Slide 162

An open research questions remain on how we can synthesize configurations to improve network performance.

Slide 163

Our earlier work in configuration synthesis marked a significant initial effort in achieving network-wide synthesis by representing the network as a stratified Datalog program.

Slide 164

A later focus sought to refine the synthesis process and improve performance by using counter-example guided inductive synthesis, similar to what I have explained before. Accepting and completing a partial configuration enables operators to control the generated configuration precisely or extend their existing one.

Slide 165

Recent explorations involve employing large language models to enhance configuration synthesis, showcasing various methodologies beyond traditional SMT approaches.

Slide 166

Work conducted two or three years ago tackled the computational complexity of synthesis problems. We distinguished the problem of synthesizing configurations for protocols such as BGP or OSPF. We found that some problems are solvable in polynomial time, while others are NP-complete or NP-hard. While SMT provides a valuable starting point for synthesis, it isn't always the optimal choice; sometimes, alternative techniques like linear programming might be a better fit.

Slide 167

The synthesis framework allows us to play with inputs and outputs to solve many networking problems.

Slide 168

One of which is determining the precise specification that a configuration satisfies. In other words, the largest set of properties that the network satisfies in all (or most) of the given environments. In this paper, we showed how to do this efficiently. We maintain a set of candidate properties refined iteratively by verifying and analyzing the data plane in counter-examples.

Slide 169

Slide 170

The generated specification is a very long list of individual properties, which makes it difficult to see the actual behavior of the network. An open research question remains: How can we summarize the specification to describe the networks' behavior?

Slide 171

Another verification problem we can formulate is finding the set of environments where the network satisfies the specification. This then becomes a measure of the robustness of the configuration: how many external events can occur while the network still behaves correctly. Our group is currently working on this problem.

Slide 172

Slide 173

Slide 174

Finally, there is the question of reconfiguration. How can we safely transition from the initial to the final configuration, assuming both configurations satisfy the specification?

Slide 175

With different initial and final states, the network somehow converges from one to the other, visiting transient states in the process. For instance, the convergence process can cause transient black holes and forwarding loops.

Slide 176

Slide 177

The idea is to find intermediate configurations that ensure each intermediate state satisfies the specification.

Slide 178

Our first attempt computes the difference between the two configurations and generates a set of CLI commands that must be applied to reconfigure the network. Then, it orders these commands so that every intermediate configuration satisfies the specification.

Slide 179

However, even those small CLI commands can cause transient anomalies. With Chameleon, we generate a reconfiguration plan that is proven to satisfy the specification throughout the reconfiguration, even in any transient state. We achieve this by synthesizing a convergence process free from anomalies and enforcing that process using temporary configurations.

Slide 180

Both these approaches assume we know the final state of the network. However, most convergence processes in the internet occur upon receiving BGP messages from external networks. We are currently exploring such convergence processes, understanding which ones can cause forwarding anomalies and for how long.

This concludes the part on network verification and synthesis.

Slide 181