Genre Tagging

The 2011 Genre Tagging Task
The task requires participants to automatically assign tags to Internet videos using features derived from speech, audio, visual content or associated textual or social information. This year we will focus on tags that reflect the genre of the video (e.g., "tutorial", "sport", "spoof").

Target group
Researchers in the area of multimedia retrieval, spoken content search and social media.

Data
The data set will use the 2010 Wild Wild Web Task data set for a different task. The data set was gathered from a range of blip.tv shows (i.e., channels). It contains ca. 350 hours worth of data for a total of 1974 episodes (247 development / 1727 test). The episodes were chosen from 460 different shows, shows with less than four episodes were not considered for inclusion in the data set. The set is predominantly English with approximate 6 hours of non-English content divided over French, Spanish and Dutch. All videos are shared by their owners under Creative Commons license.

Participants are provided with a video file for each episode along with metadata (e.g., title + description), speech recognition transcripts and social network information (gathered from Twitter, i.e., who twittered whom about which video).

Ground truth and evaluation
The ground truth will be genre-related tags assigned by users to their videos. We will carry out a manual process for normalizing and de-noising the tags. The official evaluation metric will be Mean Average Precision (MAP).

Recommended reading
Brezeale, D. and Cook, D.J., 2008. "Automatic Video Classification: A Survey of the Literature", Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on , vol.38, no.3, pp.416-430, May 2008

Task organizers:
Martha Larson, Delft University of Technology
Sebastian Schmiedeke, Technical University of Berlin
Christoph Kofler, Delft University of Technology
Isabelle Ferrané, Université Paul Sabatier

MediaEval Benchmarking Initiative for Multimedia Evaluation

The "multi" in multimedia: speech, audio, visual content, tags, users, context