Post Mortem

On 22/10/2020 the DC was powered off, the cause of this is currently still pending investigation.
This caused the failure of two defects:

  • DNS stopped working
  • Authentication on the media drives was not available, thus the data was inaccessible

A change was made previously to change the CPU type of all VMs to ‘host’ instead of an emulated model.
This did not pose any issues, in fact it improved performance.
However, while testing Windows Server VM Performance, the kvm option was disabled for the DC.
Because of this reason, the DC was not able to boot again.
This misconfiguration was eventually detected and rectified, this allowed for the DC to boot up again and DNS services were back online and available.

Due to an incorrect time on the DC clock, permissions were not able to sync with the storage systems.
Because of that, media drives were still unavailable.
For this reason, streaming services were stopped to prevent a loss of metadata of the libraries.
After the clock was corrected to the current time, ActiveDirectory permissions were able to sync again.
Media drives were immediatly re-mounted and streaming services restarted which resolved this downtime in the short run.

Long-Term, investigation is needed why the DC was powered down in the first place.
Another investigation is required why the DC still reverts to an incorrect time despite being set to the correct timezone.


TLDR: DC crashed and took down DNS and access to media drives.
An issue with the DC VM was fixed and time corrected after which everything worked again.
There are some pending investigations for finding the rootcause of this incident.