Jobs for the troubleshooters
Steve White is having problems with compatibility…
Recently I was called to investigate a problem with a machine that was getting close to fully booting, and then blue-screened. Apparently there were no log files to help identify the problem, no way of instrumenting the machine to find out what was going on (and I found out afterwards that the support staff were being economical with the truth, there are plenty of ways of instrumenting it to find out, but since they hadn’t thought of them they chose not to mention them). The red herring was that this was first boot following a major patch upgrade, so the effort went to rolling back the patch installation – still the machine would not boot.
There seems to be a battle of fundamental philosophy in IT provision at the moment –manufacturers who develop and make hardware to a standard and others who write software to a standard and the customers put the two together and run the world on it – and the other hand where a single manufacturer owns both the hardware and operating system stack. This is a philosophical fight between Windows and the Linux distributors who run on compatible hardware, and Sun (the Oracle brand) and Apple, who own the hardware and operating system stack (And of course Oracle, who own the hardware, OS and application for a specific definition of ‘application’). Add to the mix the emulation software which allows an operating system to run as an application on different hardware and we have a third battalion joining the fray.
Putting the ‘I’m a PC; I’m an Apple’ argument to one side for a moment, and concentrating on the differences between the troubleshooting of the two environments, it matters whether there is one throat to choke or many.
This company were unlucky in that the hardware provider was not responsible for the crash, it was someone else’s problem. The operating system was written by someone who does not own the hardware, and it took several days for the appropriate instrumentation and the right people to be brought to bear on the problem. It was only when a machine not just patched also failed in the same way that further investigation switched from the patching being the possible cause to there being ‘something else’. That thought direction revealed the reason – a completely different server for automatic backups had a full /var file system which did not show on any alert dashboard and it’s mode of failure caused a bad thing in the driver of the affected clients.
The visible downside of the one provider model for users is the ongoing dispute between Apple and Adobe. From the Apple site I quote “We know from painful experience that letting a third party layer of software come between the platform and the developer ultimately results in sub-standard apps and hinders the enhancement and progress of the platform.” This is where the philosophical war is being fought. If Apple holds their position it may be inconvenient for users in the short term, and it will be a visible upside for troubleshooting, availability and reliability for users into the future.
The apple quote is from www.apple.com/hotnews/thoughts-on-flash/







