If you read the TripleO setup for network isolation, it lists eight distinct networks. Why does TripleO need so many networks? Lets take it from the ground up.
Table of contents
WiFi to the Workstation
I run Red Hat OpenStack Platform (OSP) Director, which is the productized version of TripleO. Everything I say here should apply equally well to the upstream and downstream variants.
My setup has OSP Director running in a virtual machine (VM). To get that virtual machine set up takes network connectivity. I perform this via Wireless, as I move around the hose with my laptop, and the workstation has a built in wireless card.
Let’s start here: Director runs inside a virtual machine on the workstation. It has complete access to the network interface card (NIC) via macvtap. This NIC is attached to a Cisco Catalyst switch. A wired cable to my laptop is also attached to the switch. This allows me to setup and test the first stage of network connectivity: SSH access to the virtual machine running in the workstation.
Provisioning Network
The Blue Network here is the provisioning network. This reflects two of the networks from the Tripleo document:
- IPMI* (IPMI System controller, iLO, DRAC)
- Provisioning* (Undercloud control plane for deployment and management)
These two distinct roles can be served by the same network in my setup, and, infact they must be. Why? Because my Dell servers have a NIC that acts as both the IPMI endpoint and is the only NIC that supports PXE. Thus, unless I wanted to do some serious VLAN wizardry, and get the NIC to switch both (tough to debug during the setup stage) I am better off with them both using untagged VLAN traffic. Thus, each server is allocated two static IPv4 address, one to be used for IPMI, and one that will be assigned during the hardware provisioning.
Apologies for the acronym soup. It bothers me, too.
Another way to think about the set of networks you need is via DHCP traffic. Since the IPMI cards are statically assigned their IP addresses, they do not need a DHCP server. But, the hardware’s Operating system will get its IP address from DHCP. Thus, it is OK if these two functions share a Network.
This does not scale very well. IPMI and IDrac can both support DHCP, and that would be the better way to go in the future, but is beyond the scope of what I am willing to mess with in my lab.
Deploying the Overcloud
In order to deploy the overcloud, the Director machine needs to perform two classes of network calls:
- SSH calls to the baremetal OS to lunch the services, almost all of which are containers. This is on the Blue network above.
- HTTPS calls to the services running in those containers. These services also need to be able to talk to each other. This is on the Yellow internal API network above. I didn’t color code “Yellow” as you can’t read it. Yellow.
Internal (not) versus External
You might notice that my diagram has an additional network; the External API network is shown in Red.
Provisioning and calling services are two very different use cases. The most common API call in OpenStack is POST https://identity/v3/auth/token. This call is made prior to any other call. The second most common is the call to validate a token. The create token call needs to be access able from everywhere that OpenStack is used. The validate token call does not. But, if the API server only listens on the same network that is used for provisioning, that means the network is wide open; people that should only be able to access the OpenStack APIs can now send network attacks against the IPMI cards.
To split this traffic, either the network APIs need to listen on both networks, or the provisioning needs to happen on the external API network. Either way, both networks are going to be set up when the overcloud is deployed.
Thus, the Red Server represents the API servers that are running on the controller, and the yellow server represents the internal agents that are running on the compute node.
Some Keystone History
When a user performs an action in the OpenStack system, they make an API call. This request is processed by the webserver running on the appropriate controller host. There is no difference between a Nova server requesting a token and project member requesting a token. These were seen as separate use cases, and were put on separate network ports. The internal traffic was on port 35357, and the project member traffic was on port 5000.
It turns out that running on two different ports of the same IP address does not solve the real problem people were trying to solve. They wanted to limit API access via network, not by port. Thus, there really was no need for two different ports, but rather two different IP addresses.
This distinction still shows up in the Keystone service catalog, where Endpoints are classified as External or Internal.
Deploying and Using a Virtual Machine
Now Our Diagram has gotten a little more complicated. Lets start with the newly added Red Lap top, attached to the External API network. This system is used by our project member, and is used to create the new virtual machine via the compute create_server API call. In order:
- The API call comes from the outside world, travels over the Red external API network to the Nova server (shown in red)
- The Nova posts messages to the the Queue, which are eventually picked up and processed by the compute agent (shown in yellow).
- The compute agent talks back to the other API servers (also shown in Red) to fetch images, create network ports, and connect to storage volumes.
- The new VM (shown in green) is created and connects via an internal, non-routable IP address to the metadata server to fetch configuration data.
- The new VM is connected to the provider network (also shown in green).
At this point, the VM is up and running. If an end user wants to connect to it they can do so. Obviously, the Provider network does not run all the way through the router to the end users system, but this path is the “open for business” network pathway.
Note that this is an instance of a provider network as Assaf defined in his post.
Tenant Networks
Let say you are not using a provider network. How does that change the setup? First, lets re-label the Green network to be the “External Network.” Notice that the virtual machines do not connect to it now. Instead, they connect via the new, purple networks.
Note that the Purple networks connect to the external network in the network controller node, show in purple on the bottom server. This service plays the role of a router, converting the internal traffic on the tenant network to the external traffic. This is where the Floating IPs terminate, and are mapped to an address on the internal network.
Wrap Up
The TripleO network story has evolved to support a robust configuration that splits traffic into its component segments. The diagrams above attempt to pass along my understanding of how they work, and why.
I’ve left off some of the story, as I do not show the separate networks that can be used for storage. I’ve collapsed the controllers and agents into a simple block to avoid confusing detail, my goal is accuracy, but here it sacrifices precision. It also only shows a simple rack configuration, much like the one here in my office. The concepts presented should allow you to understand how it would scale up to a larger deployment. I expect to talk about that in the future as well.
I’ll be sure to update this article with feedback. Please let me know what I got wrong, and what I can state more clearly.
First really good explanation of the various networks in a Tripleo deployment. Good job!!!