Regular readers of this column will know that I am frequently tasked with covering the sexiest of broadcast industry subjects. Indeed, just two months ago I got to discuss test and measurement (T&M). Well, if you thought that was intense, just wait for this month's topic. Not only is it potentially even less glamorous than T&M, it has the potential to be an instant cure for insomnia. Unfortunately, it's also a hugely important subject if you're talking about storage and archive (as this issue of TV-Bay is). What am I talking about? Is it a gizmo? Is it a widget? Nope, it’s metadata. Hold on to your hats. This is going to be wild.
Now, before I wade into the metadata stuff, let’s give this article some context. Whether owned by a broadcaster, a producer or a library, a content archive isn’t just a historical record of its acquisition efforts and its televisual or filmic output. In many cases it is its biggest asset.
The exploitation of archive content is a big thing. Rich media content is in demand and, if you can make it available quickly and easily, archive footage and programmes can be re-licensed for a myriad of uses from historical documentaries and primetime clip shows to one-off mobile downloads.
To make this happen, more and more archives are being digitized, turning film, tape and digital tape into files. This allows for instant retrieval and exploitation, micro payments, process automation and more.
Yet, while digitization is a must, I see two major inconveniences when it comes to long-term preservation of file-based media: the choice of format that the footage should be archived in; and the amount of metadata required for it to be at its most useful.
Currently, television is produced by anything from a smartphone to a 4k camcorder, in any format from mp4 to AVC-Intra. The commonly held belief is that you archive in the highest quality possible. Which would be AVC-Intra. But in several years time another format will be in vogue. So, if we archive in AVC-Intra now, in order to re-use that content in the future, we would need to match it up with the ‘new’ format. To this you would have to transcode the AVC-Intra. And when the next format comes round after that, you would have to transcode it again. Is that a problem? Well, um, yes, because as each transcode takes place the content gradually gains coding impairments. And if you’re compressing or re-compressing it, you get even more impairments. This is far from ideal.
What may be required (please don’t shoot me) is ANOTHER format. An agreed format that we stick to for archiving purposes. It must be one that is high quality, lossless, open, widely adopted and easy for future computer systems to recompile. The suggested format should be either lightly compressed (in order to keep storage requirements realistic), lossless compressed or completely uncompressed and it should be wrapped up in something like MXF.
John Zubrzycki, the Section Leader for Archives Research at BBC R&D has done a lot of good work on this subject. He urges archive owners to “work together to present common requirements to industry” and argues for what he calls a “light compression standard” that can be used for SD and HDTV archiving. This would avoid the need to recode footage every time production moves forward. Which makes a lot of sense.
So, that’s a potential solution to the format problem. What about this here metadata stuff then? In the new file-based world that TV now inhabits, metadata rules (pun intended) and the efficient implementation of metadata is key to content management and file-based workflows.
Technical metadata is used, for example, to drive entire MAM systems or playout operations and, without it, some files simply won’t work properly in certain devices. While descriptive metadata (shot length, content, music type etc) is what interests humans and is the information required for indexing and archiving, monetizing and more.
Unfortunately, as a result of its increasing importance, metadata requirements have got very complicated. To quote Niall Duffy, the managing director of the media technology consultancy Mediasmiths, “you’ve currently got people who for very correct reasons want to come up with very structured metadata models because from their point of view that is essential for building any sort of long-term archive. But from an archive user’s point-of-view the more fields there are, the less they’ll find as it becomes too confusing for them.”
Having lots of metadata fields to complete also makes data inputting nigh on impossible and unrealistic for human beings. And, in fact, when you look closely into how researchers actually use metadata and what they look for, what you find is that searches revert back to what we all know best: a Google-type search.
With that in mind, metadata should not be about increasing the number of fields to improve archive or asset search. Instead it should be concerned with thinking about how people actually seek out content.
The ideal scenario, it would seem, is for footage to be given a small set of structured metadata fields that allow for things like categorization and process automation and then a large unstructured data field that is automatically generated, user generated or based on tagging.
This approach would allow researchers (human or otherwise) to ‘discover’ based on their own requirements rather than the restrictions of the metadata fields. In short, less is definitely more when it comes to metadata.
To my mind, in order to make this work, metadata therefore needs to be dealt with further up the production chain. It cannot be left to the archivists or the data wranglers. Producers need to take responsibility for it and that doesn’t mean reminding a runner or a camera assistant as an afterthought. If that happens, and the importance of metadata is not asserted, you get poor metadata. And programmes and footage with poor metadata will have a limited archival afterlife.
So, there you have it: what we need is a new format for archive preservation and a different approach to metadata. It’s not sexy. But it is big. And it is clever. When it comes to archives and storage, metadata rules.
I wonder what subject is up next?