Many businesses and government organizations, especially smaller ones, tend to ignore their computing environments, preparing for nothing and praying for the best. In some cases, it is argued, the money saved by not paying for hardware and software maintenance can be used instead in emergencies. We recently had a client that was in that situation, and surprisingly, the results were inconclusive.
Most readers, considering the tone of articles such as this one, would assume that they would be warned that they were playing with fire, suck it up and pay for regular maintenance. After all, the argument for that methodology typically says that such maintenance saves money over the long term. In this case it was not so clear.
The business in question was an acquisition by a major manufacturing company. The acquired company had spent next to nothing over the past several years to maintain its computing environment. Further, the acquiring company also spent very little since the acquisition a couple of years ago. So we were left with hardware that was about 8 years old, running VMware ESX 4.1, which went end-of-life in 2014, and Windows 2003 … which, for those keeping count, is about 15 years old now.
After several years of basically being ignored, the inevitable happened. The server ran out of disk space. The fix sounded simple: Use an existing storage area network owned by the parent company, connect it to the VMWare host, and all would be good.
The problem is that the SAN is a relatively new model and could not be made to work with this old version of VMware. Normally, VMware, by virtue of its market share, is compatible with most anything you’d want to connect. However, given the age of this version, there was a lack of available support, both officially from VMware and unofficially via web searches.
We abandoned that SAN and tried to implement a new SAN, using a function built into Windows Server 2008r2 and later. While we were able to bring up the new SAN and make it work with other servers, we still could not make it work with the problem server.
Parallel to the SAN efforts, we set out to acquire new hard drives. Luckily, the problem server had free space for additional hard drives. Newer servers often do not have this luxury, since they are built to rely upon external solutions for storage, such as SAN.
As all local information technology folks know, sometimes, new gear meeting the required specifications might be a day or two away. We got lucky. It turned out that an old, recently decommissioned server had the drives we needed. We cannibalized that server, added the drives into the problem server and — voila! — back up and running.
Total system downtime when all was said and done: just over 24 hours. Cost: about 20 hours of consultant time, as well as internal IT resources. Rates for such services vary from company to company but can be over $200 per hour, especially for after-hours emergency work.
In the end, the tangible cost turned out to be less than regular maintenance costs over the years. So was this a good approach? It depends on the intangible cost. How much did 24 hours of downtime cost? In addition to lost sales (this server provided the database for their online sales), a number of employees could not do work due to system downtime.
Further, a little bit of luck helped minimize the downtime in this case. Without the old drives that just happened to be lying around, the downtime could have been much more extended, perhaps another 24 hours or even more. Remember, as computing resources get older, making them function properly gets harder if you don’t have spare gear.
John Agsalud is an IT expert with more than 25 years of information technology experience. Reach him at jagsalud@live.com.