While admirable in its intention, Google-owned video service YouTube's automatic captions should not yet be seen by anyone as being a cheap substitute for human subtitlers, says SysMedia CEO Andrew Lambourne.
Google's recent announcement of the introduction of automatic captions (auto-caps) for YouTube caused a predictable flurry in the world of subtitling as people wondered just how good their automatic speech recognition technology would prove to be.
Lambourne says, "In Google's own words, 'The captions will not always be perfect'. In practice they vary from quite impressive to truly awful; and subtitlers understand only too well the reasons why."
Automatic Speech Recognition (ASR) has advanced significantly in the last ten years, and auto-caps should be applauded for aptly demonstrating that point. However, the new service also highlights the challenges inherent in unleashing automated speech processing technology on real-world problems.
ASR has now reached the point where someone speaking clearly and fairly consistently can be recognised to an accuracy of perhaps 90 per cent or more, provided (and it's a big proviso) that there's no background noise or other speakers interrupting, nor any particularly unusual vocabulary. In reality, very few media clips are like that, and so useful captioning of professional content will need the human touch for many years to come.
"As subtitlers and their audiences know only too well, errors in subtitles can be very confusing for people who cannot hear or understand the original soundtrack," says Lambourne, adding, "For example, try to make sense of, "We're about to witness the most trying to draw my natural world the mayor persist well Frank."
ASR, along with automated translation technology, certainly has its place in the subtitling workflow, and SysMedia specialises in products that blend the productivity-saving benefits of these technologies with the more adaptable skills of the human to achieve quality subtitles all the time.
SysMedia is currently developing WinCAPS Quantum, a brand new subtitle production platform for the broadcast subtitling market, which combines the efficiency of the latest technologies, including automated speech transcription and machine translation, with fast and targeted manual editing.
Lambourne concludes, "Automation technology is continually improving, and as it does so SysMedia will continue to harness it to make subtitling more efficient. But I suspect that for a good while yet, human subtitlers will still be an essential part of the editorial process."