Today’s networking and systems infrastructure faces a pressing need for feature velocity (i.e., deploying new features quickly) and high availability (i.e., zero downtime upgrades). While we are adept at user-level application updates, performing live upgrades to the infrastructure is challenging, as it risks disrupting all upper layers. The infrastructure’s tight coupling and interaction with hardware further constrains flexibility. This project aims to develop runtime programmability for the next-generation computing infrastructure, where one can reprogram the entire infrastructure stack, vertically from host kernels down to the NICs, and horizontally across a path of switches extending to the other end, in a matter of seconds without downtime.
In our vision for the next-generation infrastructures, desired changes will be incorporated instantaneously by live reconfiguration across the datacenter. The infrastructure provides a collection of basic utilities (e.g., L3 routing, access control), and on demand, extensions are partially reconfigured into the infrastructure by injecting, removing, or overriding specific functions. This creates new possibilities unavailable in today’s infrastructure. For instance, we can summon security defenses into the network or end hosts precisely when needed. Defenses can migrate to the attack location or replicate across the infrastructure to maximize their effectiveness. They can even shapeshift in real time to mitigate changing attacks.
Our project is structured along four dimensions. Runtime programming needs to control infrastructure-wide datapaths and their real-time changes as a whole, while ensuring runtime portabilty across software and hardware devices; this requires a new programming system. Compiling a whole-infrastructure program onto a heterogeneous substrate, while continuously reoptimizing for runtime changes, requires novel compilation support. Changes always come with risks—thus, we must simultaneously innovate on runtime verification and validation tools alongside the programming and compilation systems. Finally, to manage the unprecedented dynamic footprints of FlexNet programs (e.g., runtime migration), we need a powerful management system. The end-to-end realization of these four building blocks will form the foundation for a new ecosystem, where new apps are developed, optimized, verified, and managed in a closed loop for novel use cases.
This project is supported by a $3M NSF CNS Large grant.