Sunday, February 17, 2019

Reclaiming disks for vSAN when they used to have a datastore, especially a coredump or scratch location that prevents deletion

Typically, if you want to reuse a disk that was hosting a datastore for vSAN, you delete the datastore and for good measure use the UI to erase partitions, and life is good.

In some cases, you may get an error deleting the datastore. I find this happens typically in homelabs, but I've also seen posts in the VMTN communities when googling this topic, so it's not an uncommon thing. To be fair, this is an old phenomenon, it can happen whether your intention is to reuse a disk for vSAN or not :) This particular blog post was written with ESXi 6.7u1.

Especially when ESXi is installed to "remote" media such as USB and SD cards, the first datastore will also automatically be configured as the location for the Core Dump and the Scratch space. This can even happen post installation, right after or on the next reboot after the datastore is created, because ESXi wants a local disk for these locations. A more in depth explanation can be found in the ESXi installation guide , KBs such as and through the web.

You will not be able to delete the datastore or erase all partitions until those two settings are changed. To do this, I prefer opening a SSH session into the host and running the following commands:

esxcli storage filesystem list

This lists your datastores, and provides the Datastore UUID; we will focus on Datastore1, the one I couldn't delete:

Mount Point                                        Volume Name                  UUID                                 Mounted  Type            Size          Free
-------------------------------------------------  ---------------------------  -----------------------------------  -------  ------  ------------  ------------
/vmfs/volumes/900eb6ff-a901e725                    LenovoEMC_PX4-300D_NFS_ISOs  900eb6ff-a901e725                       true  NFS     211244736512  161516716032
/vmfs/volumes/5be9ba64-49b90678-5ec4-f44d3065284a  Datastore1                   5be9ba64-49b90678-5ec4-f44d3065284a     true  VMFS-6  255818989568  254282825728
/vmfs/volumes/0a71fde6-7fce32f8-8357-9857d9c81feb                               0a71fde6-7fce32f8-8357-9857d9c81feb     true  vfat       261853184     113819648
/vmfs/volumes/d973d9e5-0b4c944c-4341-5608ca2f3424                               d973d9e5-0b4c944c-4341-5608ca2f3424     true  vfat       261853184     107634688
/vmfs/volumes/5c476983-01be6fdc-53a3-f44d3065284a                               5c476983-01be6fdc-53a3-f44d3065284a     true  vfat       299712512     116998144

Changing the Scratch location

You can run this simple command to confirm which datastore is hosting the Scratch:

cd /scratch

cd /scratch

Or a little more complex, vim-cmd hostsvc/advopt/view ScratchConfig.ConfiguredScratchLocation

[root@esxihost:~] vim-cmd hostsvc/advopt/view ScratchConfig.ConfiguredScratchLocation
(vim.option.OptionValue) [
   (vim.option.OptionValue) {
      key = "ScratchConfig.ConfiguredScratchLocation",
      value = "/vmfs/volumes/5be9ba64-49b90678-5ec4-f44d3065284a/.locker"

We can change the Scratch location in the UI (under advanced options) or from the command line. If this is for production, then you really want to read and set a network location with each host having a dedicated folder. This is critical when you are installing ESXi to usb/sd media, which is considered remote and has terrible write endurance!

But in my case, this is for my homelab, so I'm going to use a "bogus" location, /tmp. The following command comes from the KB, which does a great job of listing several options, including PowerCLI. You would change the part after "string" to an actual datastore location. Again, don't use /tmp in production, /tmp gets deleted with each reboot and you could lose all Scratch files when you most need them!

vim-cmd hostsvc/advopt/update ScratchConfig.ConfiguredScratchLocation string /tmp

The setting requires a reboot to take effect. You will get a "System logs on host esxihost.ariel.lab are stored on non-persistent storage." alert until you set a proper Scratch location (check the KB again on how to setup a proper one). 

Changing the CoreDump location

To check where your core dump is configured, you can run these commands:

esxcli system coredump file get
esxcli system coredump network get
esxcli system coredump partition get
   Active: t10.SanDisk_Cruzer_Fit______4C530012450221105421:9
   Configured: t10.SanDisk_Cruzer_Fit______4C530012450221105421:9

In this particular case, ESXi is meant to run from a USB disk, and the last command confirms that the coredump is configured on the USB disk. If it was mapped to the datastore, you will need to change it, and then reboot the host so it takes effect. 

You can use the file, network or partition option and a variety of list and set commands; you will need to reboot after setting a new location for the change to take effect. This is a good blog post with screenshots, this one a bit more advanced . You can take a look at this command to "unconfigure" the dump partition once you have a network location setup.

esxcli system coredump partition set -u

We should be able to delete the datastore now, since nothing special is using it anymore, it's just a datastore. The disk can now be re-used for vSAN. But what happens if you still can't delete it?

Check partitions

You can take a different approach by listing the disks and their partitions, and then figuring out what they are:

ls /vmfs/devices/disks/

ls /vmfs/devices/disks/
t10.ATA_____INTEL_SSDSC2BX400G4R______________________BTHC5215055Y400VGN  vml.0100000000202042544843353231353035355934303056474e494e54454c20
t10.NVMe____PLEXTOR_PX2D256M8PeGN____________________EB88003056032300     vml.010000000034433533303031323435303232313130353432314372757a6572
t10.NVMe____PLEXTOR_PX2D256M8PeGN____________________EB88003056032300:1   vml.010000000034433533303031323435303232313130353432314372757a6572:1
t10.SanDisk_Cruzer_Fit______4C530012450221105421                          vml.010000000034433533303031323435303232313130353432314372757a6572:5
t10.SanDisk_Cruzer_Fit______4C530012450221105421:1                        vml.010000000034433533303031323435303232313130353432314372757a6572:6
t10.SanDisk_Cruzer_Fit______4C530012450221105421:5                        vml.010000000034433533303031323435303232313130353432314372757a6572:7
t10.SanDisk_Cruzer_Fit______4C530012450221105421:6                        vml.010000000034433533303031323435303232313130353432314372757a6572:8
t10.SanDisk_Cruzer_Fit______4C530012450221105421:7                        vml.010000000034433533303031323435303232313130353432314372757a6572:9
t10.SanDisk_Cruzer_Fit______4C530012450221105421:8                        vml.0100000000454238385f303033305f353630335f3233303000504c4558544f
t10.SanDisk_Cruzer_Fit______4C530012450221105421:9                        vml.0100000000454238385f303033305f353630335f3233303000504c4558544f:1

By identifying the disk we can explore its partition table. Very important, note the characters in black bold, including the quotes!

partedUtil getptbl "/vmfs/devices/disks/t10.NVMe____PLEXTOR_PX2D256M8PeGN____________________EB88003056032300"
31130 255 63 500118192
1 2048 500115456 AA31E02A400F11DB9590000C2911D1B8 vmfs 0

This output tells us there's a VMFS partition. In this case, since I've already removed the Scratch and CoreDump, they don't show anymore, but before you would have been able to see more partitions of type vmkDiagnostic apart from the VMFS datastore.

So if you've already moved the Scratch and Coredump partitions and you still can't delete the datastore, it may have been used for other things, such as HA heartbeat. You will have to work through the partitions to figure out what they are. This is a good KB to read

Once you only have the VMFS partition you should be able to delete it, since nothing special is using it anymore; it's just a plain datastore. The disk can now be re-used for vSAN.