UNITN Social Interaction(USI) Dataset Descriptions: |
|||
The USI Dataset consists of 4 types of two-person interactions: Talking, Shaking, Hugging and Fighting.
Each type of two-person interaction has 16 samples, with the total number of 16x4 = 64 samples. All the videos are taken outdoors in '.avi' format, with the resolution of 320*240, frame rate 30fps. The detections of STIPs (Spatial-Temporal Interest Points) are also provided as annotations. The positions, spatial-temporal scales, histograms of gradient(HOG) and histograms of optical flow(HOF) of all the interest points in each video are listed in a text file. |
|||
Talking; | Hugging; | ||
Shaking; | Fighting; | ||
Video sequences and annotations are provided in .zip file: | |||
Version 1.0 - Videos and Annotations | Db Paper "Real Time Detection of Social Interactions in Surveillance Video" of ECCV 2012 Workshop - Videos and Annotations | ||
Db Paper "Exploiting visual search theory to infer social interactions" of SPIE/Electronic Imaging 2013 - Videos and Annotations | |||
Please cite this dataset as:
@inproceedings{rota2012real, |
|||
All rights reserved. Multimedia Signal processing and Understanding Lab. - University of Trento |