HPA Tech Retreat — How AI/ML Empower Revolutionizing Content Creation & Distribution
his presentation will discuss the importance of having a full set of micro services to accomplish a full range of tasks. We’ll include real-world examples across the content production and delivery chain: content acquisition, contribution, and post production. We’ll talk about some applications of AI in the content delivery area, primarily mining insights from the distributed content in ways that allow them to do product placement and advertising, but also in the area of compliance issues.
Vice President, Sales & Marketing at Digital Nirvana
Head A/I Products and Services at Digital Nirvana
Co-Founder, President, & Chief Executive Officer at Digital Nirvana
Transcription of above Video:
Russell Wise: Good morning or good afternoon, depending on where you’re at. I’m Russell Wise with Digital Nirvana. Today, we’re going to be presenting how we apply AI to various media workflows. We’ll take actually two passes on this.We’ll be looking at a specific customer application where we deployed AI to increase the speed of production. Secondly, we’ll be looking at the trajectory of how we envision applying AI to some future applications. First use case today, we’ll be talking about a developer and distributor of entertainment news with a very specific challenge. I’ll be asking Hiren Hindocha to step you through the business challenge. Hiren is also CEO of Digital Nirvana.Take it away.
Hiren Hindocha: Great. Thanks Russ. So the business problem that we set out to solve was this.Our client had, on a regular basis, a requirement to take 20 hours of footage boiling down into a 20 minute show with a turnaround time of two hours. So the entire problem was that you had to create accurate transcripts so that editors can quickly find content of interest and edit it into a show. And then, once the show was generated, we had to quickly generate the closed captions for that, not only in English, but in Spanish as well. And all of this had to happen within two hours. So that’s where the technology comes into play,the ability to take as many files as you’re dealing with and have a constant turnaround time.Whether it’s 10 hours of footage or 100 hours of footage, using speech-to-text you can quickly generate the transcript that allows editors to go in and make the edits quickly. That’s what the technology allows you to do.And then that — we’ll show how that is done. And then once the show is generated, then the ability to quickly generate captions off of that show,and then once the English language or the primary language captions are generated, the idea is how then using AI technology, you can do quickly the translation into another language, whether it be Spanish or French or any of the languages that the system supports. That’s the — that’s what the system allows you to do is quickly turn around content fast. So with that, I will hand it over to Russell Vijayan to showcase or at least to show how the system works. Take it over, Russell.
Russell Vijayan: ThanksHiren. So I’ll just explain a little bit more detail about or show you an example of how this automatic transcription captioning and translation all work on the platform, Trance. So the media gets ingested from differentit — could be different sources, either from the production asset management system or from a cloud location or directly uploaded to a particular portal such as this where the speech-to-text gets generated and — or the system should also have the ability to do any type of — or any speech-to-text provider, whichever gives the best for that content, and then present it in a way, like you see, in a word editor form that could be accessed or processed easily by users,simple things where they can focus on errors or wherever the system feels low confidence,and they can go in and quickly do edits and then copy the content or export the content back into the production asset management system, like Hiren said, and the content becomes searchable.
Now, transcription is also a process, a preliminary process to generate captions. So once you have the transcripts ready, then you can use commonly used parameters preset in the system so that the system automatically converts this transcript into a form like this, where you can define whether it’s two-line captions or three-line captions and you can define what would be the reading speed, what would be the number of characters per second, all can be predefined. And then, it will be presented in such a way to the user so that the user can quickly go in and make necessary changes and export the captions. So when if — since this use case, specific use case involve translations,now, you can click Add Language on to this so that the users will get access to what we call a double-pane window where — or dual pane window because the user now have access to the source video, the source language captions and an automatically translated version of the other language, now you can translate into any number of languages using this.So it automatically translates and then you can quickly go and make the changes and export the output for both English, French, Spanish, whichever languages that you’re doing and process it into — or export it back into your automation system. So this is anoverall explanation of — visual representation of how this whole process has been done by the system. That’s all I have. Russ, over to you.
Russell Wise: Yeah, great. Thanks for that Russell. Appreciate it. So we just showed a very concrete example of how customers using AI speech-to-text and translation engines to increase the speed of developing content and also quality. We continue to see media customers want to consume easily these types of capabilities and apply them to workflows. So we’re going to trace our steps a little bit and look at futuristic applications, not so futuristic, things we’re actually working on today but are in the process of being looked at and deployed.So Hirencan step us through some of the newer things where we’re seeing traction with sort of harnessing AI for the media workflows.
Hiren Hindocha: This is the — I believe we are in the beginning stages of the golden age of AI and machine learning, and its use case in multiple industries, but especially in the media and entertainment space, is tremendous. The ability to create content and make it searchable, translate into multiple language is allowing content from all over the world to be consumed by users from anywhere. The — I’ll talk briefly about the initial use case that we talked about, where you have tons and tons of raw footage, 20 hours, 30 hours of footage, which you have to boil down into a 20 minutes or a 30 minutes show is an example of application of that, which is you take your content, run it through a speech-to-text engine that automatically generates the metadata of speech-to-text. But in addition to that, there are areas where you can take this and not only generate speech-to-text, but rich metadata, using video intelligence. Now, computer vision in the last couple of years has taken a tremendous leap and the machine learning models that are able to identify logos, facial recognition is huge,and this is what I call rich metadata. Not only do you have text to speech, but then you also have the ability to identify objects in a video stream, identify faces in a video stream, identify logos in a video stream, and the use cases of that are huge. Let’s take a sporting event, for example, where you’recontractually obligated to show a logo a number of times in the event.Using computer vision, you can quickly identify all of the instances where this logo was shown. The work that is currently being done by humans can be delegated to machines that can do it much faster, more accurately.
In addition to logos, you can have billboards — identification of billboards in sporting events. So we talked about the area of metadata generation of translation. Translation is another example of natural language processing that has taken a huge leap in the past few years. The ability to take spoken language, turn it into speech-to-text and then convert it into any other language with a high degree of accuracy is — opens up new possibilities for anybody in the media and entertainment industry. Netflix has shown us that people want content, good content, and good content is available from all over the world. The ability of a technology to enable this content to be viewed by anyone is tremendous. What is coming down the line — I mean, I’m really excited by what is the possibility of application of AI and machine learning. I’ll talk about one example that we’re dealing with, which is, you are an MVPDand you’re bound by the FCC regulations of ensuring that certain ads do not play or certain ads only play during certain time on certain channels, which — what we call, let’s say, restricted ads or unrestricted ads, whether they be political ads or financial ads or gambling addicts or alcohol-related ads.There are rules and restrictions that apply to this advertisement. Today, all of — or most of the MVPDs or all of the MVPDshave a team of people that view each and every ad that comes in and then specify or classify those ads as restricted or unrestricted. Now, imagine how difficult this task becomes as more and more MVPDs go into personalization of ads wherein every MVPD has invested money into creating ads that are local to that area. In that scenario, the number of ads that they have to deal with increases dramatically, even exponentially. And that scenario is the area where machine learning and AI can come into play, where we can take an advertisement and automatically figure out using speech-to-text and computer vision and machine learning what that advertisement is about, whether it’s restricted or an unrestricted ad.And this tremendously enhances the workflow, reduces the time to put an ad out into the market. So, there’s a huge application of this.
Another application that we’re excited or we feel that AI and computer vision can be applied is in the detection of objectionable content with tons and tons of footage being exported to different countries.Each country has its own regulation on what content should be and what should be shown.As an example,in some countries, when you show the on-screen character smoking, there has to be a disclaimer at the bottom that says, you know, “Smoking is harmful to the health.” So imagine being able to do this automatically. That’s what AI and machine learning enables you to do. Just the — just in the past year, this is something that — potential uses of AI and machine learning. Amazon released its celebrity voices wherein the system mimics the voice. So they started off with Samuel L. Jackson then Michael B. Jordan,and you can mimic their voices, and the system mimics the voice of the celebrities. Now, imagine the application of something like this.If you’re able — if the system is accurately able to mimic the voice of a celebrity,the use case in radio advertisements, even in dubbing is huge, wherein you can take a celebrity’svoice in an original language and have that translated into another language, but yet it seems like Michael B. Jordan is speaking in Spanish. So the applications of this are mind boggling.
Now, keeping mind that since this is a cloud-based solution, the turnaround time remains constant, whether you have 10 ads or 100 ads. Everything is instantly classified as restricted or unrestricted within a set amount of time. With that, Russell, do you want to take over and show?
Russell Vijayan: Thanks, Hiren. Here is an example that you would generally see in ad classification where the system automatically generates a set of metadata, combines the metadata to derive a classification of what the ad is all about and whether these are restricted based on certain categories,and if so, what is the reason for the restriction, which would give the end user the ability to quickly gather information on the ad assets.That’s an example of ad classification.And over to you, Russ.
Russell Wise: Yeah. Thanks for that Hiren. And thank you for being with us today in this presentation.