Microsoft's Video Indexer is still in preview but from what we have seen it is set to revolutionise video transcription. Elastacloud Data Scientist, Bianca Furtuna has completed a deep dive of this new Azure service and predicts massive productivity gains and deep insights from intelligent search.
Video Indexer will be be a powerful tool for researchers and content managers that are working with a large number of video files that need transcription. It delivers the ability to automate out many manual processes massively driving productivity.
Bianca tested the service with a cohort of researchers and content managers who use video and audio interviews as part of their research. These interviews are typically stored and shared for each project. Due to the high cost of human transcribers, not all interviews for each project are transcribed. Therefore, for some of these interviews, search within the video/audio to extract the key information is performed manually by personnel. Video Indexer now automates this process making it quick and easy obtain a searchable transcript for all the videos
Audio Transcription: Video Indexer has speech-to-text functionality, which enables customers to get a transcript of the spoken words. Supported languages include English, Spanish, French, German, Italian, Chinese (Simplified), Portuguese (Brazilian), Japanese and Russian (with many more to come in the future).
Face tracking and identification: Face technologies enable detection of faces in a video. The detected faces are matched against a celebrity database to evaluate which celebrities are present in the video. Customers can also label faces that do not match a celebrity. Video Indexer builds a face model based on those labels and can recognise those faces in videos submitted in the future.
Speaker indexing: Video Indexer has the ability to map and understand which speaker spoke which words and when.
Visual text recognition: With this technology, Video Indexer service extracts text that is displayed in the videos.
Voice activity detection: This enables Video Indexer to separate background noise and voice activity.
Scene detection: Video Indexer has the ability to perform visual analysis on the video to determine when a scene changes in a video.
Keyframe extraction: Video Indexer automatically detects keyframes in a video.
Sentiment analysis: Video Indexer performs sentiment analysis on the text extracted using speech-to-text and optical character recognition, and provide that information in the form of positive, negative of neutral sentiments, along with timecodes.
Translation: Video Indexer has the ability to translate the audio transcript from one language to another. The following languages are supported: English, Spanish, French, German, Italian, Chinese-Simplified, Portuguese-Brazilian, Japanese, and Russian. Once translated, the user can even get captioning in the video player in other languages.
Visual content moderation: This technology enables detection of adult and/or racy material present in the video and can be used for content filtering.
Keywords extraction: Video Indexer extracts keywords based on the transcript of the spoken words and text recognized by visual text recognizer.
Annotation: Video Indexer annotates the video based on a pre-defined model of 2000 objects.
For further information contact: Gary Hunter, firstname.lastname@example.org