VM Stun
VM Stun – The cause I keep seeing, the why and how to fix it!
Posted on September 25, 2020 by moodraman
VM Stun typically happens when a VM is being snapshotted. If you’re using a backup software that leverages VM snapshots (as most do for VMware today) then you run the risk of VM Stun.
The Why
*Please note, trying my best to explain in layman’s terms.
In order to take a snapshot of a VM and have it continue to run, VMware creates a checkpoint where it saves all the new writes to that VM as they come in. Once the snap is then consolidated, VMware must roll these changes back into the original VM. At some point in the consolidation process, VMware must flip the writes from the delta file back to the original disk, thus pausing the VM very briefly. The larger the VM, and the longer the VM Snapshot is opened, the more impactful the VM consolidation will be.
In highly transactional environments, this additional load, and even the slightest pause of a VM can cause a VM Stun, where the VM is briefly unresponsive. This can cause a myriad of problems.
NOTE: < sales rant > Anytime you take a VM Snapshot in vSphere, you run the risk of VM Stun. I work for Veeam, and we tell people we can help mitigate this risk by leveraging Backup from Storage Snapshots (This process explained more later) but I have seen marketing for OTHER “Modern” backup solutions claim they can “Completely eliminate VM Stun with Backup from Storage Snapshot”. Unless their process runs agents, or is snapshotless in VMware (Certain architectures of Veeam can do this.) Then, this is a flat out lie. Remember, if ANY snapshot is taken in vSphere, you still run the risk of VM Stun. < /sales rant>
Back to that Customer First theme 😉
HOWEVER!
Modern infrastructures and the hypervisors supporting them have improved dramatically from the early days of virtualization. Shortly after the inclusion of Storage vMotion into VMware, the consolidation was changed to leverage some of these new efficiencies and mitigate this issue. On the infrastructure side, modern infrastructures have drastically improved the storage performance, the network or fibre performance as well as more powerful compute.
In a nutshell, if you are running modern infrastructure with the latest version of a hypervisor you should not be experiencing VM Stun. If you are, either you’re working with a VM that is so sensitive, it can never withstand a VM snapshot. In which case another backup methodology for this data is the best bet. Or something is wrong with your infrastructure or your backup product.
How to decrease the likelihood of VM Stuns
The best way to decrease the likely hood that you’d experience a stun is to keep the amount of time the VMware snapshot is actually opened as short as possible. Modern data protection products have started integrating with storage vendors, so that they can leverage the native storage snapshot capabilities to decrease the time a VM Snap is needed.
Enter… Backup From Storage Snapshot with Veeam.