Affect Task

The 2010 Affect Task

Task
Boredom detection: This task involved identifying videos with high and low levels of dramatic tension. In particular, distinguishing between video content that cause the viewer to feel bored and video content that cause them to feel engaged. Visual features, speech transcripts and metadata were all supplied with this task. Participants could make use of spoken, visual, and audio content as well as accompanying metadata.

Target group
The target audiences are the researchers in the field of multimedia content analysis who are interested in understanding affective dimension of their content. Estimating the audiences’ affect and emotion leads to a better summarization and tagging of multimedia. This analysis can be done by means of text, speech/audio, or visual content. Low level prosodic audio features and video content features such as color energy, motion components, and color variance are typical examples for affective multimedia content analysis.

Data
The video set consisted of short videos from a travelogue, Bill Bowel's travel project, My Name is Bill. Bill is a filmmaker and traveling video blogger. Each episode tells a story about a place visited during his travel around the world. The videos are about two to five minutes long and chosen to vary along a broad spectrum with respect to their potential to be either boring or entertaining. The videos, the extracted speech by automated speech recognition and the available metadata including the episodes' popularity, and the annotations were provided.

Ground truth
The ground truth was created using the crowdsourcing platform Mechanical Turk. Crowdsourcing is generally piecemeal work, and in order to gather boredom scores for the data set, it was necessary to ensure that crowdsourcing workers watched all of the videos in the set. We developed a methodology that we refer to as "high commitment crowdsourcing" in order to collect the annotation scores. This methodology and the creation of the dataset is explained in greater detail in:

Soleymani, M. and Larson, M. Crowdsourcing for Affective Annotation of Video: Development of a Viewer-reported Boredom Corpus. In Proceedings of the SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation.

We were pleased when this paper won the runner-up award for innovation at the workshop, which was sponsored by Bing.

Evaluation
The estimated boredom level was used to rank the videos from the most entertaining (least boring) to the most boring and the ranking distance measure of Kandall’s tau (by the same definition given in Yi-Hsuan Yang and Homer H. Chen, Music emotion ranking, ICASSP 2009) was computed.

2010 Results and Links to Related Working Notes Papers
Three groups completed this task. The University of Geneva was the only site that exploited low-level visual features and achieved its best results with a regression model that combined these features with features related to the amount of information in the video, the relative obscureness of the location, video length and number of shots. Delft University of Technology proposed five simple approches based on characteristics of the video that reflect humor, "cuteness", dynamism, interactivity and popularity. The top performing approach modeled humor by the raw count of laughter events in the video detected by the speech recognition system. The SINAI group at the University of Jaen used a regression model trained on tf-idf vectors built from words extracted from the speech recognition transcripts. In all, the task proved to be a challenging one, but results suggest that it is indeed feasible to build a system to automatically predict general-audience levels of reported boredom when watching Internet video.

Perea-Ortega, J.M., Montejo-Raez, A., Martin-Valdivia, T. and Urena-Lopez, L.A. SINAI at Affect Task in MediaEval 2010.

Shi, Y. and Larson, M. First Approaches to Automatic Boredom Detection: DMIR tackles the MediaEval 2010 Affect Task.

Soleymani, M. Travelogue Boredom Detection with Content Features.

Acknowledgments
Thank you to ICSI and SRI International for supplying the speech recognition transcripts for the affect task.

Thank you to Bill Bowles for making available his video material for this task.

Task coordinator: Mohammad Soleymani, University of Geneva
(Mohammad dot Soleymani at unige dot ch)

MediaEval Benchmarking Initiative for Multimedia Evaluation

The "multi" in multimedia: speech, audio, visual content, tags, users, context

The 2010 Affect Task