Affect Task

The 2012 Affect Task: Violent Scenes Detection
This task requires participants to deploy multimodal features to automatically detect portions of movies containing violent material. Violence is defined as "physical violence or accident resulting in human injury or pain". Any features automatically extracted from the video, including the subtitles, can be used by participants.

The data (and the information on purchasing the movies) for the Violent Scenes Detection 2012 task has been made available by the task organizers here:
https://research.technicolor.com/rennes/vsd

This challenge derives from a use case at Technicolor. The use case involves helping users choose movies that are suitable for children of different ages. The movies should be suitable in terms of their violent content, e.g., for viewing by users' families. Users select or reject movies by previewing parts of the movies (i.e., scenes or segments) that are the most violent.

This task is a follow-up of last year’s edition and therefore will see little change from the 2011 Violent Scenes Detection task, which served as a pilot. The 2012 task has been designed to be interesting for both last year's participant teams and for teams that will tackle the task for the first time this year.

Target group
Researchers in the areas of event detection, multimedia affect or multimedia content analysis.

Data
A set of ca. 15 Hollywood movies that must be purchased by the participants. The movies are of different genres (from extremely violent movies to movies without violence).

Ground truth and evaluation
The ground truth is created by human assessors and is provided by the task organizers. In addition to segments containing physical violence (as defined above), annotations include the following high-level concepts: presence of blood, fights, presence of fire, presence of guns, presence of cold arms, car chases and gory scenes, for the visual modality; gunshots, explosions and screams for the audio modality. Note that participants are welcome to carry out detection of the high-level concepts. However, concept detection is not a requirement for the task since these high-level concept annotations are provided for training purposes.

Several performance measures will be used for diagnostic purposes: false alarm rate, missed detection rate, AED-precision and recall, mean average precision. Last year’s MediaEval cost, which was a weighted combination of the estimated probabilities of respectively false alarms and missed detection will also be computed for the sake of comparison with last year’s results. Whenever possible, detection error trade-off curves will also be used, to avoid the sole comparison of the systems at given operating points.

Task schedule
2 May: Development set release
1 July: Test set release
10 September: Run submission

Recommended reading

Bermejo Nievas, E., Deniz Suarez, O., Bueno Garca, G. and Sukthankar, R. 2011. Violence detection in video using computer vision techniques, Proceedings of the 14th international conference on Computer analysis of images and patterns - Volume Part II, Springer-Verlag, 2011, 332-339.

Chen, L.-H., Hsu, H.-W., Wang, L.-Y., Su, C.-W. 2011. Violence Detection in Movies, Eighth International Conference on Computer Graphics, Imaging and Visualization (CGIV), pp. 119 -124.

Demarty, C.H., Penet, C., Gravier, G., Soleymani, M. 2011. The MediaEval 2011 Affect Task : Violent Scenes Detection in Hollywood Movies, Working Notes Proceedings of the MediaEval 2011 Workshop.

Giannakopoulos, T., Makris, A., Kosmopoulos, D., Perantonis S. and Theodoridis, S. 2010, Audio-visual fusion for detecting violent scenes in videos, Artificial Intelligence: Theories, Models and Applications. Lecture Notes in Computer Science, Volume 6040/2010, pp. 91-100.

Gong, Y., Wang, W., Jiang, S., Huang A., and Gao, W. 2008. Detecting Violent Scenes in Movies by Auditory and Visual Cues
Advances in Multimedia Information Processing - PCM 2008. Lecture Notes in Computer Science, 2008, Volume 5353/2008, pp. 317-326.

de Souza, F.D.M., Chavez, G.C., do Valle, E.A. and de A Araujo, A. 2010 Violence Detection in Video Using Spatio-Temporal Features 23rd SIBGRAPI Conference on Graphics, Patterns and Images, pp. 224 -230.

Lin, J. and Wang, W. 2009. Weakly-Supervised Violence Detection in Movies with Audio and Video Based Co-training. PCM'09, pp. 930-935.

Task organizers:
Mohammad Soleymani, University of Geneva, Switzerland
Claire-Helene Demarty, Technicolor, France
Guillaume Gravier, IRISA, France
Cedric Penet, Technicolor, France

This task is made possible by a collaboration of projects including:

MediaEval Benchmarking Initiative for Multimedia Evaluation

The "multi" in multimedia: speech, audio, visual content, tags, users, context