Troubleshooting - Top Tips to Make Files Work
So, you built yourself the world's most perfect, IT-based file workflow and you have a state-of-the-art media facility? Congratulations, have a beer, but at some point it's going to go wrong!
Given that fewer than 50% of support cases in any company that I have worked at result in changes to software, it helps to have a good analytical approach to finding the causes of problems and communicating them effectively to your suppliers.
What do you do? How do you find those problems?
The first approach is to separate symptoms from causes, and being specific is crucial. Telling support: Our MAM system triggers an API refresh cycle at 2 minutes past every hour, which seems to happen 5 minutes before the storage anomaly is a great starting point. This phrase informs support that you are seeing something happen regularly and that there is some correlation between the symptom and another event in the system / workflow.
As we all know, correlation does not imply causation. So how do you find causes and not symptoms?
One of my favourite troubleshooting techniques for the IT and the broadcast & media industry is The 5 Whys. This technique is widely attributed to one of my all-time engineering heroes Sakichi Toyoda. Wikipedia has the following definition: The 5 Whys is an iterative interrogative technique used to explore the cause-and-effect relationships underlying a particular problem. The primary goal of the technique is to determine the root cause of a defect or problem by repeating the question ˜Why?'
Here is an example where chaining the question why:
The Problem: The media file can't be transcoded
Why #1 The source file can't be opened.
Why #2 Permission failure. The file cannot be opened from the transcode machine with the account associated with the transcode service.
Why #3 Group settings. We check the account on the transcode machine and discover it is in the wrong group so we change it. It still won't open.
Why #4 Drive mount settings. The way in which the drive was mounted in the operating system had an override for the group control. Updating the mount instruction remaps the group to the correct value and the system works. But why did the system go from working to not working?