The gooey filling
2 Windows 2003 Servers running Exchange 2003 SP2, clustered
1 shared external hard drive array (RAID 5)
The flammable parts
0 IT staff who were involved in setting up current environment
0 documentation of current environment
The ignition source
1 recently hired Network Administrator who is on his way out after three months because he's a dumbass
1 new Network Administrator (to replace the above)
1 incompetent support person
We're going to assemble our flambe in layers. First a bit of prep work
- Old Network Administrator gives new Network Administrator access to the domain administrator account. On his first day.
- Old Network Administrator leaves New Network Administrator to poke around Active Directory to learn the structure of things.
- It is Old Network Administator's last day, so he is gone. Old Network Administrator didn't set up any of the current environment, but he had some knowledge of it.
- New Network Administrator realizes most servers have some level of local security implemented, so he downloads and installs a software package on all the servers.
- Something in the software breaks the driver for the card that connects the Exchange server to the external RAID array.
- The RAID array starts flashing a light indicating a connectivity error.
- Call support
- Reboot the servers, then call support if it doesn't reconnect
- Power down the servers, reboot the array, reboot the servers, then call support if it doesn't reconnect
- New Network Administrator calls HP Support.
- Support identifies one drive has a significant number of errors.
- New Network Administrator is instructed to upgrade the firmware on the drive.
- After the firmware is upgraded the drive is offline.
- It is pulled and reseated to allow the array to reinitialize it.
- The RAID controller doesn't recognize the new firmware, and it is flagged as failed.
- New Network Administrator discovers the hardware failure in Device Manager and applies all outstanding Windows Updates.
- Because the servers are failing back and forth at a rate of about every five seconds, the connectivity to the array is sporadic and this causes the drives to start thrashing.
- With one drive already in a failed state, another one starts flashing a warning.
- New Network Administrator pulls the drive flashing a warning. While another drive is offline.
An HP support rep finally came on site, dug through some logs, and determined that the firmware that was installed was completely wrong for our environment. Also, there is a known problem with one particular Windows update that breaks the installer for the drivers, so new drivers can't be installed and the old ones can't be reinstalled.
He replaced the failed drive, let the array rebuild, replaced the drive that had the bad firmware, and by some miracle we were able to recover three of our five Exchange storage groups. Now the New Network Administrator and the PC Tech are piecing together backups and trying to reconstruct messages from logs on our SMTP relay.