EGGNOG: A continuous, multi-modal data set of naturally occurring gestures with ground truth labels

By: Isaac Wang, Mohtadi Ben Fraj, Pradyumna Narayana, Dhruva Patil, Gururaj Mulay, Rahul Bangar, J. Ross Beveridge, Bruce A. Draper, Jaime Ruiz
Colorado State University, University of Florida
2017 IEEE 12th International Conference on Automatic Face & Gesture Recognition (FG).

Paper     Dataset

People communicate through words and gestures, but current voice-based computer interfaces such as Siri exploit only words. This is a shame: human-computer interfaces would be natural if they incorporated gestures as well as words. To support this goal, we present a new dataset of naturally occurring gestures made by people working collaboratively on blocks world tasks. The dataset, called EGGNOG, contains over 8 hours of RGB video, depth video, and Kinect v2 body position data of 40 subjects. The data has been semi-automatically segmented into 24,503 movements, each of which has been labeled according to (1) its physical motion and (2) the intent of the participant. We believe this dataset will stimulate research into natural and gestural human-computer interfaces.

Here are a few sample frames from the dataset:

A sample frame from EGGNOG dataset.

The goal of this data set is to support research into recognizing the types of gestures that occur during human communication. The data set, including video, depth, and body pose data, is publicly available here, along with the corresponding segment boundaries and ground truth labels.