Affect Task

The 2011 Affect Task: Violent Scenes Detection
This task requires participants to deploy multimodal features to automatically detect portions of movies containing violent material. Violence is defined as "physical violence or accident resulting in human injury or pain". Any features automatically extracted from the video, including the subtitles, can be used by participants.

This challenge derives from a use case at Technicolor. The use case involves helping users choose movies that are suitable for children of different ages. The movies should be suitable in terms of their violent content, e.g., for viewing by users' families. Users select or reject movies by previewing parts of the movies (i.e., scenes or segments) that are the most violent.

Target group
Researchers in the areas of event detection, multimedia affect or multimedia content analysis.

Data
A set of ca. 15 Hollywood movies that must be purchased by the participants. The movies are of different genres (from extremely violent movies to movies without violence).

Ground truth and evaluation
The ground truth is created by human assessors and is provided by the task organizers. In addition to segments containing physical violence (as defined above), annotations include the following high-level concepts: presence of blood, fights, presence of fire, presence of guns, presence of cold arms, car chases and gory scenes, for the visual modality; gunshot, explosion and scream for the audio modality. Note that participants are welcome to carry out detection of the high-level concepts. However, concept detection is not a requirement for the task since these high-level concept annotations are provided for training purposes.

Recommended reading
T. Giannakopoulos, A. Makris, D. Kosmopoulos, S. Perantonis and S. Theodoridis, "Audio-visual fusion for detecting violent scenes in videos", Artificial Intelligence: Theories, Models and Applications. Lecture Notes in Computer Science, 2010, Volume 6040/2010, 91-100.

Yu Gong, Weiqiang Wang, Shuqiang Jiang, Qingming Huang and Wen Gao "Detecting Violent Scenes in Movies by Auditory and Visual Cues"
Advances in Multimedia Information Processing - PCM 2008. Lecture Notes in Computer Science, 2008, Volume 5353/2008, 317-326.

Task coordinators:
Mohammad Soleymani, University of Geneva
Claire-Helene Demarty, Technicolor
Guillaume Gravier, IRISA

This task is made possible by a collaboration of the following projects:

MediaEval Benchmarking Initiative for Multimedia Evaluation

The "multi" in multimedia: speech, audio, visual content, tags, users, context