Emotional Impact

The 2018 Emotional Impact of Movies Task

Task Results

Task overview: [slides] [video]
Task results: [presentation videos]
For all papers, see the MediaEval 2018 Working Notes Proceedings: http://ceur-ws.org/Vol-2283

Task Description
Affective video content analysis aims at the automatic recognition of emotions elicited by videos. It has a large number of applications, including emotion-based personalized content delivery, video indexing, summarization and protection of children from potentially harmful video content. While major progress has been achieved in computer vision for visual object detection, scene understanding and high-level concept recognition, a natural further step is the modelling and recognition of affective concepts. This has recently received increasing interest from research communities, e.g., computer vision, machine learning, with an overall goal of endowing computers with human-like perception capabilities. Thus, this task is proposed to offer researchers a place to compare their approaches for the prediction of the emotional impact of movies.

This year’s task builds on last year's edition, integrating the feedback of last year’s participants. The task consists of two subtasks. In both cases, long movies are considered.

1. Valence/Arousal prediction: participants’ systems are expected to predict a score of induced valence (negative-positive) and induced arousal (calm-excited) continuously (every second) along movies;
2. Fear prediction: the purpose here is to predict beginning and ending times of sequences inducing fear in movies. The targeted use case is the prediction of frightening scenes to help systems protecting children from potentially harmful video content.

Target group
This task targets (but is not limited to) researchers in the areas of multimedia information retrieval, machine learning, event-based processing and analysis, affective computing and multimedia content analysis. Note that even though the 2018 task is a sequel of last year's task, potential participants will not be disadvantaged if they join this year without having taken part last year.

Data
The dataset used in this task is the LIRIS-ACCEDE dataset (liris-accede.ec-lyon.fr). It contains videos from a set of 160 professionally made and amateur movies, shared under Creative Commons licenses that allow redistribution.

A total of 44 movies (total duration of 15 hours and 20 minutes) selected from the set of 160 movies are provided as development set with the annotations according to fear, valence and
Arousal. Additional data will be provided as test set.

In addition to the data, participants will also be provided with general purpose audio and
visual content descriptors. In solving the task, participants are expected to exploit the provided resources but the use of external resources (e.g., Internet data, pre-trained deep models) will be also admitted.

Ground truth and evaluation
In order to collect continuous valence and arousal annotations for the first subtask, annotators have to continuously indicate their level of arousal or valence while watching the movies using a modified version of the GTrace annotation tool and a joystick. Movies are divided into two subsets. Each annotator continuously annotated one subset considering the induced valence and the other subset considering the induced arousal. Thus, each movie is continuously annotated by at least three annotators. A post-processing is then realized to remove noise and create a continuous mean signal of the valence and arousal self-assessments.

Fear annotations for the second subtask are generated using a tool specifically designed for the classification of audio-visual media allowing to perform annotation while watching the movie (at the same time). The annotations are realized by two well experienced team members of NICAM both of them trained in classification of media. Each movie is annotated by 1 annotator reporting the start and stop times of each sequence in the movie expected to induce fear.

Standard evaluation metrics will be used to assess the systems’ performance. We will
consider Mean Square Error and Pearson correlation coefficient for the Valence/Arousal prediction subtask, and Intersection over Union of time intervals for the Fear prediction subtask.

Recommended reading
E. Dellandrea, Martijn Huigsloot, L. Chen, Y. Baveye and M. Sjoberg, The MediaEval 2017 Emotional Impact of Movies Task, In Working Notes Proceedings of the MediaEval 2017 Workshop, Dublin, Ireland, September 13-15, 2017.

E. Dellandréa, L. Chen, Y. Baveye , M. Sjöberg, and C. Chamaret, The Mediaeval 2016 Emotional Impact of Movies Task, In Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, The Netherlands, October 20-21, 2016.

Baveye, Y., Dellandrea, E., Chamaret, C., Chen, L. LIRIS-ACCEDE: A Video Database for Affective Content Analysis. In IEEE Transactions on Affective Computing, 2015.

Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, Deep Learning vs. Kernel Methods: Performance for Emotion Prediction in Videos, in 2015 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), 2015.

A. Hanjalic, Extracting moods from pictures and sounds: Towards truly personalized TV, IEEE Signal Processing Magazine, vol. 23, no. 2, pp. 90–100, March 2006.

Eggink, J., A large scale experiment for mood-based classification of TV programmes. In IEEE ICME 2012.

Benini, S., Canini, L., Leonardi, R., A connotative space for supporting movie affective recommendation. In IEEE Transactions on Multimedia, 13.6, 2011, 1356-1370.

Task organizers
Emmanuel Dellandréa, Ecole Centrale de Lyon, France (contact person) emmanuel.dellandrea at ec-lyon.fr
Martijn Huigsloot, NICAM, Netherlands
Liming Chen, Ecole Centrale de Lyon, France
Yoann Baveye, Capacités, France
Mats Sjöberg, Aalto University, Finland

Task schedule
Development data release: 15 June 2018
Test data release: 30 June 2018
Runs due: 1 October 2018
Working Notes paper due: 17 October 2018

MediaEval Benchmarking Initiative for Multimedia Evaluation

The "multi" in multimedia: speech, audio, visual content, tags, users, context