NSX Edges and DRS Rules
I am currently working for a large enterprise customer on a NSX Deployment. We have designed and implemented the following high level setup based based on the customers requirements.
The NSX Network topology looks like this:
As you can see we have various NSX components belonging to two different environments.
- Production
- Management
Each environment has its own set of Edge Services Gateways in ECMP mode, Distributed Logical Router Control Edges in HA mode and various Logical Switches.
There are four physical compute resources that host these Edges in a special NSX Edge Cluster. The Management and Compute is currently out of scope for this blog article.
So because we have four physical hosts we need to spread these NSX components across these hosts in the most effecient way that will cater for at least one failure.
The following “golden rules” should be taken into account:
- Initially the two Production ESG’s should never be on the same host
- Initially the two Production DLR's should never be on the same host
- Initially the the two Production ESG’s should never be in the same host together with the two Production DLR’s
The same rules apply for the Management ESG’s and DLR’s.
The reason for this is to form the failure domains in order to make sure the failure level limited (in case of one failure) and that the traffic is stil able to flow whenever one host fails. Obviously when we place two ESG’s that are responsible for production traffic on one host and that host fails the complete production environment is down.
We can first start placing the NSX Edges manually how we think what the best way is to spread the components across the hosts. In my case I did it like this:
Now we need to define some DRS rules to in order to make sure the edges move around the way we want to in case of a failure.
This is done with the following steps:
Place the Edges on the host we want.
Do initial DRS configuration on the cluster:
---> Keep specific VM’s together 1) Create DRS Rule to keep ESG-PRD-1 + DLR-MNG-01 (Active) together on the same host
2) Create DRS Rule to keep ESG-MNG-1 + DLR-PRD-01 (Standby) together on the same host
3) Create DRS Rule to keep ESG-PRD-2 + DLR-MNG-01 (Standby) together on the same host
4) Create DRS Rule to keep ESG-MNG-2 + DLR-PRD-01 (Active) together on the same host
---> Separate the PRD and MNG ESG's 5) Create DRS Rule to keep ESG-PRD-1 + ESG-PRD-2 separate from each other
6) Create DRS Rule to keep ESG-MNG-1 + ESG-MNG-2 separate from each other
In order to test this I will placed one of my ESX hosts into maintenance mode (esx-07) and we see that the Edges that where first on this host are now moved in a specific was to another host. (see the video) If you look at this you see that a second failure can happen and the traffic will still be able to flow! AMAZING!
Now lets bring that host back up en get it out of maintenance mode. We see that the edges will stay on that host where it failed over to.
This does not necessarily have to be a problem, but I would have a better feeling if the Edges would move back if that host is available again.
So how do we do this?
In order to do that we need to do the following:
> create host groups with one host in the group
7) Create a Host DRS Group with ESX-01 (name is HOST-GROUP-1)
8) Create a Host DRS Group with ESX-02 (name is HOST-GROUP-2)
9) Create a Host DRS Group with ESX-03 (name is HOST-GROUP-3)
10) Create a Host DRS Group with ESX-04 (name is HOST-GROUP-4)
> create VM groups with set of two VM's
11) Create a VM DRS Group with ESG-PRD-1 + DLR-MNG-01 (Active) (name is VM-GROUP-1)
12) Create a VM DRS Group with ESG-MNG-1 + DLR-PRD-01 (Standby) (name is VM-GROUP-2)
13) Create a VM DRS Group with ESG-PRD-2 + DLR-MNG-01 (Standby) (name is VM-GROUP-3)
14) Create a VM DRS Group with ESG-MNG-2 + DLR-PRD-01 (Active) (name is VM-GROUP-4)
> pin host groups to vm groups
15) Create a DRS Rule where you say that VM-GROUP-1 "should run on" HOST-GROUP-1
16) Create a DRS Rule where you say that VM-GROUP-2 "should run on" HOST-GROUP-2
17) Create a DRS Rule where you say that VM-GROUP-3 "should run on" HOST-GROUP-3
18) Create a DRS Rule where you say that VM-GROUP-4 "should run on" HOST-GROUP-4
We now see that the Edges are placed on the hosts how we want them to if a failed host will come back up again. Again AMAZING!
The related video can be found below: