Video recog, audio training data pave way to more intelligent machines
At the Google Cloud Next ’17 event on Wednesday, Fei-Fei Li, chief scientist of Google’s cloud-based machine learning efforts, introduced the company’s Cloud Video Intelligence API. Presently in private beta, the service allows developers to query video content using words that describe depicted objects and their characteristics.
This can already be done in a limited way with video that has been annotated with metadata or associated with closed-captioned descriptions.
But Google – by analyzing every video frame using image recognition algorithms and data models developed through machine learning – can provide a far more comprehensive set of data for finding things depicted in videos through its API.
Amazon, for what it’s worth, launched something similar, called Rekognition, last year.
Google Cloud Machine Learning and Cloud Data Lab both entered general availability, which matters to enterprise customers. The Cloud Vision API 1.1 (still in beta) can now recognize entities from Google’s Knowledge Graph and has gained OCR capabilities to better extract text from documents. And the Cloud Jobs API gained the ability to query employment opportunities based on commute times and preferred modes of transportation.
Also, Google has launched what it’s calling its Advanced Solutions Lab, a facility for training enterprise customers to train their machines to learn.
Meanwhile, the Mountain View, California-based data hoarder has opened another section of its attic to reveal AudioSet, a collection of more than 2 million 10-second YouTube clips annotated with sound labels and categorized into more than 600 different classes.
The AudioSet ontology, or data model, describes the audio content in terms of event categories, which include human and animal sounds, musical instruments and genres, and environmental sounds.
For example, one JSON object described in the model is labelled “Chuckle, chortle.” It includes a citation field that points to a text definition of the sound and it lists seven YouTube clips, with embedded start and end times, that present audible examples of chuckling.
Google’s goal for this sound and fury is to help researchers and developers train machine learning models that can be made available to applications for identifying sounds.
Those working with images can already avail themselves of a variety of public data sets for model training. Music and speech datasets are also available, though there are fewer of them.
Now those looking to teach their software to identify, say, throat clearing or the rustling of leaves, don’t have to resort to collecting their own training samples. ®