UNITN Social Interaction(USI) Dataset Descriptions:

The USI Dataset consists of 4 types of two-person interactions: Talking, Shaking, Hugging and Fighting. Each type of two-person interaction has 16 samples, with the total number of 16x4 = 64 samples.

All the videos are taken outdoors in '.avi' format, with the resolution of 320*240, frame rate 30fps.

The detections of STIPs (Spatial-Temporal Interest Points) are also provided as annotations. The positions, spatial-temporal scales, histograms of gradient(HOG) and histograms of optical flow(HOF) of all the interest points in each video are listed in a text file.

Talking; Hugging;
Shaking; Fighting;
Video sequences and annotations are provided in .zip file:
Version 1.0 - Videos and Annotations
Db Paper "Real Time Detection of Social Interactions in Surveillance Video" of ECCV 2012 Workshop - Videos and Annotations
Db Paper "Exploiting visual search theory to infer social interactions" of SPIE/Electronic Imaging 2013 - Videos and Annotations
Please cite this dataset as:

    title={Real time detection of social interactions in surveillance video},
    author={Rota, Paolo and Conci, Nicola and Sebe, Nicu},
    booktitle={Computer vision--ECCV 2012. Workshops and demonstrations},

All rights reserved. Multimedia Signal processing and Understanding Lab. - University of Trento