The legal system generates a huge and ever-increasing amount of data. In the UK there are in excess of 100,000 new court cases each year which increases the body of knowledge that a lawyer has to get to grips with to do their job. Judicial ruling, precedents and interpretations of legislation all create more data and within this within witness statements, court logs and judge’s summaries contain hidden insights that could help win legal arguments.
Innovation within regime
It is surprising that until recently there has been little innovation in the way that the legal profession uses Big Data. That is now all changing with the arrival of the modern IT stack and a new breed of innovative data savvy lawyers and IT professionals.
The first data tools for lawyers focused on billing, time management, marketing and customer relations functions. Now, data scientists and lawyers are teaming up developing tools for the profession to deliver automated research and case preparation which is the core of their job.
Currently, the world of legal data-driven research is ruled by three entities; LexisNexis, Practical Law Company and Westlaw. These companies hold databases containing huge amounts of case details and are often the default starting point for legal researchers. However, they mainly function as simple text based search engines and offer little in the way of advanced analytical tools. They have a habit of being hit and miss and demand high monthly fees for the privilege
Modern Textual Analytics
Machine learning based textual analytics is the use of analytics to surface the detail contained within legal documents. A page by page review of such documents used to take hundreds of expensive man hours. Today, with digital textual analytical tools, such a review can be done automatically, and surface critical insight. Using Hadoop-based systems, millions of data points can be processed in hours, not weeks.
Microsoft’s Video Indexer service cuts the huge cost on manual transcription for both video and audio evidence. Video Indexer now automates this process making it quick and easy obtain a searchable transcript for all the videos and audio content
Audio Transcription: Video Indexer has speech-to-text functionality, which enables customers to get a transcript of the spoken words. Supported languages include English, Spanish, French, German, Italian, Chinese (Simplified), Portuguese (Brazilian), Japanese and Russian (with many more to come in the future).
Face tracking and identification: Face technologies enable detection of faces in a video. Users can also label faces to build model based on those labels and can recognise those faces in videos submitted in the future.
Speaker indexing: Video Indexer has the ability to map and understand which speaker spoke which words and when.
Visual text recognition: With this technology, Video Indexer service extracts text that is displayed in the videos.
Voice activity detection: This enables Video Indexer to separate background noise and voice activity.
Scene detection: Video Indexer has the ability to perform visual analysis on the video to determine when a scene changes in a video.
Keyframe extraction: Video Indexer automatically detects keyframes in a video
Sentiment analysis: Video Indexer performs sentiment analysis on the text extracted using speech-to-text and optical character recognition, and provide that information in the form of positive, negative of neutral sentiments, along with timecodes.
Translation: Video Indexer has the ability to translate the audio transcript from one language to another. The following languages are supported: English, Spanish, French, German, Italian, Chinese-Simplified, Portuguese-Brazilian, Japanese, and Russian. Once translated, the user can even get captioning in the video player in other languages.
Visual content moderation: This technology enables detection of adult and/or racy material present in the video and can be used for content filtering.
Keywords extraction: Video Indexer extracts keywords based on the transcript of the spoken words and text recognised by visual text recogniser.
Annotation: Video Indexer annotates the video based on a pre-defined model of 2000 objects.
Machine learning’s core technologies align well with the complex searches lawyers perform daily but can deliver far more accurate information and great insight by uncovering patterns hidden in multiple digitised documents. Many of the algorithms being developed are iterative, designed to learn continually and seek optimised outcomes for the researcher. These algorithms iterate in milliseconds, enabling lawyers to lock onto pin-point accurate research with optimised outcomes in minutes versus months.