The semantic web made simple

TV-Bay Magazine
Read ezine online
There is a new term creeping in to discussions in the broadcasting business, and particularly around those involved in asset management and research projects. The term is “semantic web” and, I suspect, there are a lot of people who are neither completely sure they know what it means, nor trust its relevance in our business. I aim to provide some simple answers to those questions.
As for the expression itself, it was coined by Tim Berners-Lee, who since the Olympic opening ceremony we all know was the inventor of the world wide web. He first talked about it in Scientific American as long ago as 2001, describing it as “a web of data that can be processed directly and indirectly by machines”.
Putting it simply, it first assumes that you will want to look at multiple sources of information to put together the full picture, and second that some form of machine intelligence will do this information collection, evaluating the sources as it goes to present to the user a prioritised and ordered set of data.
If that sounds like a complex and theoretical definition, perhaps a more practical application may help.
Take the example of a typical broadcaster. While you may hear consultants and asset management specialists talk about bringing all a broadcaster’s information into a single system, in practice that rarely happens. The news department will have its own archive, as will sport. Other departments will also have their own approach to information and asset management.
What happens if a documentary-maker is commissioned to make a programme about, say, Usain Bolt? At the research stage they will want to look at what the sports department has to say about him, and see what can be gleaned from the news department. The programme-maker will probably already be engaged with the subject and have both a store of background information and an outline of the story to be told.
There will also be an acknowledgement that valuable information may lie outside the broadcaster’s internal resources. Some will be hard and reliable sources, like the records of the Olympic’s and international athletics bodies, and some will be softer sources, ranging from to Wikipedia tweet feeds and gossip sites.
Under pressure, and using traditional search techniques, the documentary maker may go to the sources that provide the easiest route to verification. Potentially interesting information may never be discovered or be overlooked because it cannot be positively confirmed from multiple sources. The more time that is spent in researching and confirming the data, the better the programme will be, but the bigger the budget required.
Automating research
What the semantic web offers is the ability to collect information from many, many sources automatically. It does this by moving away from the traditional, keyword-based Google-type searching to a new way of exploring links between data, so that the picture grows organically.
At IPV we have been in the business of enabling collaborative broadcast workflows for more than 15 years, and our products include asset management systems where they are appropriate. But since 2010 we have had growing success with a product we call Teragator, which is designed to be the platform for semantic web searches.
It came about to meet just this application: the need to make sense of federated media asset management, aggregating data from disparate sources in a non-invasive way, and without changing the way that individual departments operate. It has a lot of very clever technology underpinning its data mining, but from a user’s point of view it presents a way to identify and manage complex relational links between assets and information in a simple and readily understandable fashion. You certainly do not have to understand the semantic web to be able to use it.
Unlike conventional search engines that match keywords to a predefined schema, Teragator allows you to uncover seemingly unlinked data which is relevant to the story you are trying to tell. It allows data sources to be analysed in a single, uniform application, but modelled with different views to explore the way that parts of the story interlink. Through the analysis of the data and identification of entities, people, places, etc, new links are discovered that were not even on the agenda at the start of the process, enriching editorial and offering new ideas and even new stories. Use of the semantic web makes for a better programme.
The engine that enables this linking and association of data is critical to the success. As well as forming the links it has to be able to present it in a way which is readily understood by the typical non-specialist user. There is no point in enabling comprehensive research if learning how to interpret the data is going to take longer than sitting down with an outline script and Google.
We have invested a great deal of time and effort into developing this user interface. It is hard to describe it on paper, aside from saying it is really obvious once you get your head around it. If you think you have an application, ask for a demonstration.
Metadata Central
Underlying the associative intelligence there is another layer, which does the core work of accessing multiple databases and, most important, evaluating the results. That in itself is a powerful tool, so we also offer it as a separate module if required, called Metadata Central.
In the USA the Golf Channel covers all the professional tournaments on the tours, so is constantly adding content, all of which needs to be tagged. Each week is a different set of players at a different course. Simply loading the basic information – the players and their current rankings, the yardage of the course, the history of the tournament and the course – would be almost unmanageable.
But all of this information exists, on the websites of the PGA, the tournament, the sponsors and the club. So Metadata Central polls these sources and sets up the logging screens so that when the tournament starts, all the right information is presented to the loggers. It eliminates a huge amount of work. And better core information for the loggers makes for better metadata from the event, which makes the archive more valuable in the future.
The same advantage would apply to aggregator broadcasters, for example. Say you have just bought a package of movies. You could get a team of researchers to sit and copy-type the cast lists and synopses for your database. Or you could use Metadata Central to talk to IMDB and set it all up, virtually instantaneously, and with good metadata you help consumers find content on VoD and over the top platforms which boosts access and revenues.
Combining Metadata Central with some of the added intelligence in Teragator and you can make value judgements on data which is automatically collected. You can, for example, interpret tweets as a programme is being transmitted to determine the general audience reaction. If it is a live show you can report on them, or even respond to what the audience is saying by changing the production. Intelligence in the system can determine the difference between genuine sentiment, irony and sarcasm for comic effect, making the results reasonably reliable.
IPV has demonstrated a topical application in the use of reality TV based programmes that are heavily reliant on audience reaction and sentiment. Using Teragator, they are able to poll multiple sources of live audience data (social networking data) and provide a valuable insight to the Production team, both during the show and in commentary afterwards. Using this data in realtime and monitoring social interaction around characters and presenters, for example, allows the show producers to optimise the show and its reach.
That is the aim of semantic web tools such as Teragator and Metadata Central from IPV. They make it easy to explore and correlate many, many sources of information, to evaluate what you are finding, and to explore links and lines of development which may never otherwise appear.

By giving researchers and programme makers more powerful and more comprehensive tools they speed up the process and shine a spotlight on new angles to the story. Ultimately, they help make better programmes even when the budgets tighten, and they help broadcasters find and serve audiences with the content they need

Tags: iss070 | ipv | web searching | semantic web | N/A
Contributing Author N/A

Read this article in the tv-bay digital magazine
Article Copyright tv-bay limited. All trademarks recognised.
Reproduction of the content strictly prohibited without written consent.

Related Interviews
  • IPV at IBC2011

    IPV at IBC2011

Test, Measurement and Standards
Alan Wheable The Alliance for IP Media Solutions (AIMS), is a non-profit trade alliance that fosters the adoption of one set of common, ubiquitous, standards-based protocols for interoperability over IP in the media and entertainment, and professional audio/video industries.
Tags: iss135 | omnitek | aims | SNMP | hdr | ai | Alan Wheable
Contributing Author Alan Wheable Click to read or download PDF
The making of The Heist
Tom Hutchings Shine TV has never been one to shy away from a challenge, be that in terms of using new technologies, filming ideas or overall formats: we pride ourselves on being ambitious and risk-takers.
Tags: iss135 | liveu | heist | streaming | cellular | mobile | connectivity | Tom Hutchings
Contributing Author Tom Hutchings Click to read or download PDF
Your two week editing future
Alex Macleod

So here we are - January again! Usually a good time to reflect on the year just gone by, and a good time to look forward to the coming months as the new year begins.

When I was reflecting on my 2018, and when thinking about what to write for my first article for Kit Plus - I kept coming back to one theme - organisation.

Tags: iss135 | editing | mediacity training | premiere pro | dit | Alex Macleod
Contributing Author Alex Macleod Click to read or download PDF
21st Century Technology for 20th Century Content
James Hall A big challenge facing owners of legacy content is rationalising and archiving their tape and film-based media in cost effective and efficient ways, whilst also adding value. Normally the result of this is to find a low cost means of digitising the content – usually leaving them with a bunch of assets on HDD. But then what? How can content owners have their cake and eat it?
Tags: iss135 | legacy | digitising | digitizing | archive | James Hall
Contributing Author James Hall Click to read or download PDF
Future proofing post production storage
Josh Goldenhar Advancements in NVMe (Non-Volatile Memory Express), the storage protocol designed for flash, are revolutionising data storage. According to G2M Research, the NVMe market will grow to $60 billion by 2021, with 70 percent of all-flash arrays being based on the protocol by 2020. NVMe, acting like steroids for flash-based storage infrastructures, dynamically and dramatically accelerates data delivery.
Tags: iss135 | nvme | sas | sata | it | storage | post production | Josh Goldenhar
Contributing Author Josh Goldenhar Click to read or download PDF