After seeing first hand what @virtualhobbit went through, I was just amazed by how many "gotcha's" are involved with the VSAN HCL, especially if you are trying to deploy this in your homelab. This blog is about VMware gotcha's, so a post was in order. I hope this can help others avoid some of these mistakes.
Here's a (non-definitive) list:
In case you have been living under a rock, you should know a device on the standard VMware HCL does not automatically qualify for use with VMware VSAN. There is, in fact a separate HCL. Failure to know this before deploying VSAN will cause you lots of misery. The VSAN HCL has details for I/O Controllers, HDDs and SSDs - everything else, assume the normal HCL is valid, at least until they show up in the VSAN HCL...
Once you access the VSAN specific HCL option from the drop down, the interface does not resemble the normal HCL, where you can search for components (especially the four handy PCI device registers: VID, DID, SVID and SSID). You are greeted with a page that allows you to select a "VSAN Ready Node" which is a pre-configured full server configuration. In this example, I checked all Dell servers compatible with VSAN 6.0U2
Right. So the idea is you then go tell your server vendor "sell me exactly this". But what if you are looking for a specific hardware device, such as a SSD or Raid Controller? Where would you check the HCL to choose what to buy?
Somewhat hidden below the VSAN Ready Node selector interface is a disclaimer that tells you if you are willing to "brave the path of building your own server" (I paraphase) you can get to see the actual HCL
The URL to the real VSAN HCL? It's exactly the same as the URL for the Ready Node HCL. I know. I got a tip in the vExpert slack - to access it directly, go to http://vmwa.re/vsanhclc
Ehm... where are the four hardware identifiers we've been relying on, for so long, to unequivocally verify the HCL? It's not used here - at least to do the initial find of the devices. You will have to browse first using the available options. You will be able to verify the PCI registers from the results, but sometimes you get little jewels like this
"The device PID string (model) is truncated in ESXi. Please use both model number and firmware version when trying to identify the device. When in doubt, please consult with the hardware provider."
I'd like to offer a screenshot of this but I don't have one (I found this in the Intel P3700 NVMe drive, which is definitely a PCI device). If you do, please send one, as I'm really curious of why a lspci -v would truncate the PCI registers?
Each type of device has different columns. Please be mindful as these details can be extremely important. For example:
- certain I/O (RAID) controllers are only supported in a particular mode and have specific VID, DID, SVID and SSID values.
- certain SSDs can only be used in a particular Tier (All Flash, Hybrid Caching, etc)
- certain capacity drives are only supported in a certain disk series. Go find that out.
This apart from the driver/firmware requirements you have come to love.
Everybody knows NVMe is wicked fast and it's the future - but only a handful of drives are available today on the VSAN HCL. As far as I can tell, they are the Intel drives - HP just happens to sell them too and provide their own firmware release. I'm told Samsung and others are coming.
I know the VSAN team is hard at work trying to make this process easier. It's not easy to squeeze all the performance out from the wide variety of devices out there. Add to this the inherent human inefficiencies and costs associated with certifying and supporting all vendor hardware combinations and you can imagine how difficult things can be.
My hope is that very cool things that have stemmed from trying to help the VSAN administrators will make it into regular vSphere. I think particularly the VSAN HCL check, part of the included VSAN Health Check, should be easy to port, and a welcome addition for all of us that manage VMware HCLs (which is everybody...).
In the meantime, this particular PowerCLI script looks promising as long as we can find the "regular" and corresponding HCL JSON file locations. I wonder if someone has already thought of that and been able to get it to work? It would sure make a nice addition to my documentation templates effort!