The 2017 AcousticBrainz Genre Task: Content-based music genre recognition from multiple sources
Register to participate in this challenge on the MediaEval 2017 registration site.

This task invites participants to predict genre and subgenre of unknown music recordings (songs) given automatically computed features of those recordings. We provide a training set of such audio features taken from the AcousticBrainz database and genre and subgenre labels from four different music metadata websites. The taxonomies that we provide for each website vary in their specificity and breadth. Each source has its own definition for its genre labels meaning that these labels may be different between sources. Participants must train model(s) using this data and then generate predictions for a test set. Participants can choose to consider each set of genre annotations individually or take advantage of combining sources together.

Audio-based music genre recognition is a challenging task within the Music Information Retrieval (MIR) community. It is commonly used as a benchmark for audio analysis algorithms and machine learning techniques in the context of the larger problem of indexing digital music libraries. The definition of genres is often fuzzy and culture/context-dependent. Musical genre labels vary in specificity (rock, metal, death metal), may be subjective and context-dependent (alternative vs indie rock), and are not always mutually exclusive. For this reason, several different genre labels can be legitimately applied to the same music piece, and there may be no uniquely correct genre label annotation for a given recording [7].

The goal of our task is to understand how genre classification can explore and address the subjective and culturally-dependent nature of genre categories. Traditionally genre classification is performed using a single source of ground truth with broad genre categories as class labels. In contrast, this task is aimed at exploring how to explore and combine multiple sources of annotations, each of which are more detailed. Each source has a different genre class space, providing an opportunity to analyze the problem of music genre recognition from new perspectives and with the potential of reducing evaluation bias.

To this end, we will release a dataset of music features precomputed from audio and genre annotations for a large corpus of music recordings. The music features are taken from the community-built database AcousticBrainz. The ground-truth genre annotations come from four independent websites. The annotations include both broad genre categories, and their subgenres, forming a two-level tree structure. These datasets partially intersect, where each music recording in the corpus is not necessarily annotated by all data sources. Each instance in a dataset is thus a vector of music features representing a particular music recording (a song) annotated with one or more genre labels from one or more ground truths.

The task has two sub-tasks. In both tasks participants must create a system that uses provided music features as an input and predicts genre labels as an output.

Subtask 1: Single-source Classification. This subtask will explore conventional systems each one trained on a single dataset. Participants will submit predictions for the test set of each dataset separately, following their respective class spaces (genres and subgenres). These predictions will be produced by a separate system for each dataset, trained without any information from the other sources.

AB_Subtask1

Subtask 2: Multi-source Classification. This subtask will explore how to combine several ground-truth sources to create a classification system. We will provide the same four test sets, each created from one of the four data sources. Participants will submit predictions for each test set separately, again following each corresponding genre class space. Predictions may be produced by a single system for all datasets or by one system for each dataset. Participants are free to make their own decision, however, about how to combine the training data/ground truth.

AB_Subtask2

Submissions for subtask 1 will serve as baselines for subtask 2, which will allow us to study the effect of learning from multiple sources at the same time.

Target group
Researchers in the areas of music information retrieval, machine learning, mathematical and computational modeling. Because we will release pre-computed features instead of raw audio, experience with audio and music processing is not necessary.

Data
We will provide a corpus built using data from AcousticBrainz, an open database of music features extracted by open-source audio analysis tools for a large number of music recordings (currently more than 2.2 million unique recordings) [1, 2]. This corpus includes editorial metadata (artist, album, track name) and unique recording identifiers (MBIDs - MusicBrainz Identifiers) from MusicBrainz. We have collected ground truth genre annotations from various music databases for the recordings which exist in AcousticBrainz by matching editorial metadata to these databases [3].

At least four datasets containing genre annotations for specific MBIDs will be provided (including development/testing sets for each one). Annotations between different sources can be compared by using MBIDs. They will be created using various genre annotation sources:
  • Discogs (community-built database of editorial metadata) 730,849 recordings, ~1.6 genre tags per recording
  • AllMusic (online music database with annotations by expert editorial staff) 789,423 recordings, ~1.3 genres per recording, ~3 subgenres per recording
  • Last.fm (genre annotations inferred from collaborative tags) 1,031,100 recordings, 22.9 tags per recording (all tags, not just genres)
  • Tagtraum (genre annotations inferred from collaborative tags)
All four datasets are based on different genre taxonomies with different vocabularies. Discogs and AllMusic are build by music enthusiasts and experts, and contain explicit genre and subgenre annotations according to a fixed schema and guidelines. Last.fm and Tagtraum follow a more free-form folksonomy model, which has no imposed structure, from which genre and subgenre annotations will be extracted following tag cleaning and co-occurrence analysis [6]. As these sources are independent, there is no expectation that there is agreement on genre labels given to a specific recording, or even the meaning of the labels themselves. Depending on participant interest, we may also provide a combined dataset created by refining the annotations from all four sources [6].

Ground truth and evaluation
Ground truth annotations will be mined from online music databases [2]. By using various datasets for evaluation we will try to avoid potential evaluation bias due to intrinsic subjectivity of many genre categories (e.g., Rock vs Alternative) leading to inconsistencies in genre annotations across different ground truths. In addition, we will study how system comparisons change when considering results on a dataset-by-dataset basis as opposed to all datasets at the same time.

Participants will train their systems on provided development sets and submit genre/subgenre estimations for the respective test sets available in each subtask.

For evaluation, we will use standard measures for classification adapted to our setting [4]. The task is multi-label and it uses a two-level hierarchy, so we will mainly employ hierarchical versions of Precision (hP), Recall (hR) and F-measure (hF) as defined in [5]. In particular, for each track we will compute hP, hR and hF of the predicted genres, which will be macro-averaged (hF will be the main evaluation measure). This will allow us study expected performance for a new track. In addition, for each genre we will compute binary (non-hierarchical) P, R and F of the tracks for which the genre was predicted, again macro-averaging. This will allow us to study possible biases across genres.

Recommended reading
[1] Porter, A., Bogdanov, D., Kaye, R., Tsukanov, R., Serra, X. Acousticbrainz: a community platform for gathering music information obtained from audio. In Proceedings of the 16th International Society for Music Information Retrieval Conference. Málaga, Spain, 2015, 786-792.

[2] Bogdanov, D., Wack, N., Gómez, E., Gulati, S., Herrera, P., Mayor, O., Roma, G., Salamon, J., Zapata, J., Serra, X. ESSENTIA: an audio analysis library for music information retrieval. In Proceedings of the 14th International Society for Music Information Retrieval Conference. Curitiba, Brazil, 2013, 493-498.

[3] Porter, A., Bogdanov, D., Serra, X. Mining metadata from the web for AcousticBrainz. In Proceedings of the 3rd International workshop on Digital Libraries for Musicology. New York, USA, 2016, 53-56. ACM.

[4] Silla, C.N., Freitas, A. A. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 2011, 22(1-2), 31-72.

[5] Kiritchenko S., Matwin S., Famili A.F. Functional annotation of genes using hierarchical text categorization. In Proceedings of the ACL workshop on linking biological literature, ontologies and databases: mining biological semantics. 2005.

[6] Schreiber, H. Improving genre annotations for the million song dataset. In Proceedings of the 16th International Society for Music Information Retrieval Conference. Málaga, Spain, 2015, 242-247.

[7] Pachet, F., & Cazaly, D. A taxonomy of musical genres. In Content-Based Multimedia Information Access, RIAO 2000, volume 2, 1238-1245.

Task organizers
Dmitry Bogdanov, Music Technology Group, Universitat Pompeu Fabra, Spain (first.last @upf.edu)
Alastair Porter, Music Technology Group, Universitat Pompeu Fabra, Spain (first.last @upf.edu)
Julián Urbano, Music Technology Group, Universitat Pompeu Fabra, Spain
Hendrik Schreiber, Tagtraum Industries Incorporated, USA

Task schedule
1 May: Development/Test data release
14 August: Run submission
21 August: Results returned to participants
28 August: Working notes paper deadline
13-15 Sept: MediaEval Workshop in Dublin

Acknowledgements
AcousticBrainz and Audio Commons

audio-commons-icon_64pxAcousticBrainz_logo_short_horizontal

This research has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688382.