Emotion in Music

The 2014 Affect in Music Task (Emotion in Music)
Emotions are central to our experience in response to content. Therefore, there is a strong human component that is being studied in this task. There are already examples of mood based or emotion based online radios, e.g., Stereomood (www.stereomood.com), however, these systems do not employ traditional models of emotion and only rely on user generated tags. This task pushes forward the state of the art by focusing on models of emotion.

This task comprises two subtasks:

The first task is feature design. Affective features for music are interesting and important because they can be used in recommendation and retrieval platforms. Approaches to this task can be a combination or a rework of existing features already used in Music Information Retrieval (MIR), but it has to be reproducible. The features will be evaluated according to their correlation with arousal and valence on the test set. The features should be generated from any audio signal, and code that can regenerate the features should be submitted together with the runs.
The second task is the continuous emotion characterization task. The emotional dimensions, arousal and valence, should be determined for a given song continuously in time. The quantization scale will be per frame (e.g., 1s). We will provide a set of music licensed under Creative Commons from Free Music Archive (http://freemusicarchive.org/) with human annotations.

For the dataset, we extend the existing data set from 2013 which will serve as the development set for this year.

Target group
Researchers in the areas of multimedia affect, or music retrieval, e.g., MIREX community.

Data
We will use an extension of 744 songs dataset developed for the same task at Mediaeval 2013. The annotations are collected on Amazon Mechanical Turk. Single workers provided A-V labels for clips from our dataset, consisting of 744 30-second clips, which are extended to 45 seconds in the annotation task to give workers additional practice. The labels will be collected at 1Hz. Workers are given detailed instructions describing the A-V space. We target extending this dataset by another 700 songs which have tags on last.fm.

Ground truth and evaluation
The ground truth is created by human assessors and is provided by the task organizers. For the feature design task and the dimensional approach, we will use R-squared coefficient of determination.

Recommended reading
[1] Barthet, M., Fazekas, G., Sandler, M. Multidisciplinary Perspectives on Music Emotion Recognition: Implications for Content and Context-based Models. In Proceedings of CMMR Computer Music Modeling and Retrieval. London, UK, 2012, 492-507.

[2] Kim, Y. E., Schmidt, E. M., Migneco, R., Morton, B. G., Richardson, P., Scott, J., Turnbull, D. Music Emotion Recognition: A state of the Art Review. In Proceedings of the 11th ISMIR International Society for Music Information Retrieval Conference. Utrecht, Netherlands, 2010, 255-266.

[3] Soleymani, M., Caro, M. N., Schmidt, E. M., Sha, C.-Y., Yang, Y.-H. 1000 Songs for Emotional Analysis of Music. In Proceedings of the 2nd ACM CrowdMM International Workshop on Crowdsourcing for Multimedia. ACM, Barcelona, Spain, 2013, 1-6.

[4] Yang, Y.-H., Chen, H. H. Machine Recognition of Music Emotion: A Review. ACM Transactions on Intelligent Systems and Technology, 3(3), 2012.

Task organizers
Mohammad Soleymani, Unversity of Geneva, Switzerland
Anna Aljanaki, Utrecht University, Netherlands
Yi-Hsuan Yang, Academia Sinica, Taiwan

Task auxiliary
Sung-Yen Liu, Academia Sinica, Taiwan

Task schedule
5 May Development data release
2 June Test data release
10 September Run submission due
15 September Results returned
28 September: Working notes paper deadline

A special Thank You to the CVML Laboratory of the University of Geneva, without whose support the data set creation would not have been possible for this task.

MediaEval Benchmarking Initiative for Multimedia Evaluation

The "multi" in multimedia: speech, audio, visual content, tags, users, context