Entry Date:
February 22, 2018

Moments in Time Dataset: A Large-Scale Dataset for Recognizing and Understanding Action in Videos

Co-investigator Nick Montfort


Moments is a research project dedicated to building a very large-scale dataset to help AI systems recognize and understand actions and events in videos.

Today, the dataset includes a collection of one million labeled 3 second videos, involving people, animals, objects or natural phenomena, that capture the gist of a dynamic scene.

Moments: Three seconds events capture an ecosystem of changes in the world: 3 seconds convey meaningful information to understand how agents (human, animal, artificial or natural) transform from one state to another.

Diversity: Designed to have large inter-class and intra-class variation that represent dynamical events at different levels of abstraction (i.e. "opening" doors, drawers, curtains, presents, eyes, mouths, and even flower petals).

Generalization: A large-scale, human-annotated video dataset capturing visual and/or audible actions, produced by humans, animals, objects or nature that together allow for the creation of compound activities occurring at longer time scales.

Transferability: Supervised tasks on a large coverage of the visual and auditory ecosystem help construct powerful but flexible feature detectors, allowing models to quickly transfer learned representations to novel domains.