Object tracking focuses on identifying and locating individual objects within a video frame, using bounding boxes and labels. For instance, in a video of vehicles at an intersection, object tracking would highlight and label each item like "car," "truck," or "bike," with bounding boxes and time-stamped segments showing their positions and
Object tracking focuses on identifying and locating individual objects within a video frame, using bounding boxes and labels. For instance, in a video of vehicles at an intersection, object tracking would highlight and label each item like "car," "truck," or "bike," with bounding boxes and time-stamped segments showing their positions and movement over time. The AI currently can recognize over 20k different type of objects.
The AI Face Analyzer feature identifies faces in videos, providing segments where faces are present. It optionally offers bounding boxes for each detected face and can also detect attributes like headwear, eye visibility, glasses, and expressions when enabled in the FaceDetectionConfig.
Audio to text capability transcribe audio to text blocks, emphasizing US English. Features comprise alternate word suggestions, profanity filtering, transcription cues, audio track selection, automatic punctuation, and speaker diarization for multi-speaker distinction.
The Logo Recognition feature detects, tracks, and recognizes over 100,000 brands and logos in videos, quantifying brand presence by counting appearances and measuring on-screen duration.
The AI Text Detector employs Optical Character Recognition (OCR) to pinpoint text in video frames, offering content with precise timestamps. Valuable for media, it extracts text such as cast lists or subtitles across 60+ languages.
Explicit Content Detection identifies material in videos that may be unsuitable for minors, typically those under 18. This includes a range of content such as nudity, sexual acts, and explicit imagery. The technology also detects similar themes in animated or anime-style content.
Label detection identifies and annotates various entities like objects, locations, and activities in video frames, providing labels such as "train" or "transportation". Each label comes with a time segment and an entity ID for further exploration, unlike object tracking which focuses on individual objects within bounding boxes.
Shot change detection marks segments in a video where abrupt visual transitions occur. Each segment starts with a frame signifying a sudden change in the shot, distinguishing it visually from the preceding frame.
Person Detection feature identifies humans in videos, tracking them with bounding boxes. It not only detects people but also identifies specific body parts and characteristics like clothing color and type, providing detailed annotations for each detected individual.