MediaEval 2018

MediaEval is a benchmarking initiative dedicated to developing and evaluating new algorithms and technologies for multimedia retrieval, access and exploration. It offers tasks to the research community that are related to human and social aspects of multimedia. MediaEval emphasizes the 'multi' in multimedia and seeks tasks involving multiple modalities, e.g., audio, visual, textual, and/or contextual. Our larger aim is to promote reproducible research that makes multimedia a positive force for society.

Register to participated in MediaEval 2018 at MediaEval 2018 registration site. Once you have registered, please sign and return the data usage agreement:

2018 Task List

Multimedia Satellite Task: Emergency Response for Flooding Events
The purpose of this task is to combine the information inherent in satellite images and social multimedia content in order to provide a more comprehensive view of disaster events. In 2018, we will again focus on flooding. The task involves two subtasks: “Flood classification for social multimedia” and “Flood detection in satellite images”. Participants receive data and are required to train classifiers. Fusion of satellite and social multimedia information is encouraged to solve either of the subtasks. The task moves forward the state of the art by concentrating on aspects that are important to people, but are not generally studied by multimedia researchers. Specifically, we look at the level to which an area has been affected by a flood in terms of access, which is a practical, human-specific aspect of a flood event. Read more...

Medico Multimedia Task
The goal of the task is efficient processing of medical multimedia data for disease prediction. Participants are provided with images and videos of the human gastrointestinal tract, and are required to develop classifiers that minimize the necessary resources (processing time, training data). The ground truth labels are created by medical experts. The task differs from existing medical imaging tasks in that is uses only multimedia data (i.e., images and videos) and not medical imaging data (i.e., CT scans). A further innovation is its focus on two non-functional requirements: using as little training data as possible and being computationally effective. In addition to that this years task will also ask to perform one shot classification as one of the sub tasks. The task lays the basis for automatic, real-time generation of medical reports on the basis of recordings made by standard and capsule endoscopies. Read more...

AcousticBrainz Genre Task: Content-based music genre recognition from multiple sources
In this task, participants are provided with a rich set of features that have been extracted from a very large collection of music tracks, and are asked to create a system that can automatically assign genre labels to the tracks. They are provided with ground truth genre labels from multiple genre trees each representing a different genre class space. The larger goal of the task is to understand how genre classification can explore and address the subjective and culturally-dependent nature of genre categories. Traditionally genre classification is performed using a single source of ground truth with broad genre categories as class labels. In contrast, this task is aimed at exploring how to explore and combine multiple sources of annotations, each of which is more detailed and represents a different genre class space. Making use of multiple genre class spaces provides an opportunity to analyze the problem of music genre recognition from new perspectives and to support insights into how evaluation bias can be reduced. Read more...

Emotional Impact of Movies Task
In this task, the goal is to elaborate systems designed to predict the emotional impact of movies.The task involves two subtasks: (1) predicting induced valence and arousal scores continuously along movies, and (2) predicting beginning and ending times of sequences inducing fear in movies. The training data will consist of Creative Commons-licensed movies (professional and amateur) together with human annotations of valence, arousal and fear. The results on a test set will be evaluated using standard evaluation metrics. Read more...

Predicting Media Memorability Task (New!)
The purpose of this task is the automatic prediction of multimedia content memorability. Understanding what makes a content memorable has a very broad range of current applications, e.g., education and learning, content retrieval and search, content summarization, storytelling, targeted advertising, content recommendation and filtering. Efficient memorability prediction models will also push forward the semantic understanding of multimedia content, by putting human perceptions through memorability in the center of the scene understanding. For the task, participants will be provided with extensive datasets of multimedia content (images and/or videos) associated with memorability annotations, and with pre-extracted state-of-the-art audio-visual features. The corresponding ground truth consists of objective measures of memory performance and had been collected through recognition tests. Participants will be required to train computational models capable to infer multimedia content memorability from features of their choice. Models will be evaluated through standard evaluation metrics. (The task is an evolution of the Predicting Media Interestingness task.) Read more...

Human Behavior Analysis Task: No-Audio Multi-Modal Speech Detection in Crowded Social Settings (New!)
An important but under-explored problem is the automated analysis of conversational dynamics in large unstructured social gatherings such as networking or mingling events. Research has shown that attending such events contributes greatly to career and personal success. While much progress has been made in the analysis of small pre-arranged conversations, scaling up robustly presents a number of fundamentally different challenges. This task focuses on analysing one of the most basic elements of social behavior: the detection of speaking turns. Research has shown the benefit if deriving features from speaking turns for estimating many different social constructs such as dominance, or cohesion to name but a few. Unlike traditional tasks that have used audio to do this, here the idea is to leverage the body movements (i.e., gestures) that are performed during speech production and that are captured from video and/or wearable acceleration and proximity. The benefit of this is that it enables a more privacy preserving method of extracting socially relevant information and has the potential to scale to settings where recording audio may be impractical. Read more...

GameStory: Video Game Analytics Challenge (New!)
E-sport is huge. Already in 2013, concurrent users for a single event exceeded eight million for a League of Legends Championship. In 2017, approximately 143 million viewers accessed esports streams frequently. The rich bouquet of data including audio and video streams, commentaries, game data and statistics, interaction traces, viewer-to-viewer communication and many more channels, allow for particularly challenging multimedia research questions. In this task we encourage participants to think up and investigate ways to summarize how e-sport matches ramp up, evolve and play out over time. Instead of iterating highlights, the summary needs to present an engaging and captivating story, which boils down the thrill of the game to its mere essence. Training and test data for the task are provided in cooperation with ZNIPE.TV, which is a rapidly growing platform for e-sport streaming. Participants are asked to create a specific number of summaries, which are evaluated by an expert panel. The expert panel will include professionals from ZNIPE.TV and researchers from the field of game studies and non-linear narratives. The exact criteria for evaluating submissions will be available to the participants within the in depth task description. Read more...

Recommending Movies Using Content: Which content is key? (Brave New Task)
This task explores ways in which multimedia content is useful for movie recommendation. Participants are provided with (audio-visual-textual) features computed from trailers and scenes, corresponding to 800 movies in the MovieLens 20M data set. They are required to create an automatic system that can predict the average ratings that users assign to movies and also the rating variance. The emphasis will be on exploiting features derived from trailers and selected movie scenes. Currently, it is not known which parts of a movie need to be analyzed for recommendation, or whether the trailer is a better source of features than content drawn directly from the film. This task will provide insight into how content can be best used to replace user-item interactions for successful movie recommendation. Read more...

Pixel Privacy Task (Brave New Task)
This task develops image enhancement approaches that project user privacy. Specifically, it is dedicated to creating technology that invisibly changes or visibly enhances images in such a way that it is no longer possible to automatically infer the location at which they were taken. Participants receive a set of images (representative of images shared on social media) and are required to enhance them. The enhancement should achieve two goals: (1) Protection: It must block the ability of an automatic pixel-based algorithm from correctly predicting the setting/location at which the photo was taken (i.e., prevent automatic inference) and (2) Appeal: It must make the image more beautiful or interesting from the point of view of the user (or at least not ruin the image from users’ point of view.) The task extends the state of the art by looking at the positive (protective) ability of adversarial machine learning, and also exploring how people’s natural preference for appealing images can be linked to privacy protection. Read more...

NewsREEL Multimedia: News recommendation with image/text content (Brave New Task)
The goal of this task is to gain insight into the relationship between images accompanying news articles, and the number of times these articles are clicked by users. The task is a multimedia-related spinoff of the independent NewsREEL challenge, which allows researchers to test news recommendation algorithms online in real world conditions. These conditions are characterized by a rapidly changing set of news items, a sparsity of interactions, short interaction sessions, and no user information. Read more...

Task Force
Task forces are groups of people working together to design and plan a task to be offered in future years. If you are interested in the work of a task force and would like to have more information about what is being planned, please contact: m.a.larson (at)

Eyes and Ears Together (EET)
Processing of multimedia content has conventionally been treated as parallel monomedia tasks, e.g speech recognition, computer vision, late fusion in video search. Real-world processing of this content is generally inherently multimodal, where cross and modal processing provides understanding of objects and language, and interpretation of events. Interest is now emerging in extending conventional monomedia processing to exploit multimodal signals, e.g. multimodal speech recognition, visual grounding in natural language processing. Eyes and Ears Together (EET) will offer tasks exploring new and emerging multimodal recognition and understanding tasks. Potential tasks for MediaEval 2018 include use of computer vision for pronoun resolution, e.g. resolve "it" or "he" in a transcript using visual signals; naming people based on information in transcript, e.g. identifying a person found in an image, use of visual signals to support speech processing.

Participating in a task: MediaEval attracts researchers who are interested in community-based benchmarking. This means that they are not only interested in creating solutions to MediaEval tasks, but they are interested in discussing and exchanging ideas with other researchers who are taking part in MediaEval. Researchers who are primarily looking to achieve a high rank in a benchmark, and are not interested in attending the workshop or in engaging in discussion of techniques in results with other researchers, do not benefit from the unique community-driven character of MediaEval.

MediaEval 2018 Timeline

If you are interested in proposing a task:

Indication of Intent: Friday 26 January 2018
Full proposal deadline: Friday 16 February (updated deadline)

If you are interested in participating in a task:
March-May 2018: Registration for task participation
May-June 2018: Development data release
June-July 2018: Test data release
Run submission: End September 2018
Workshop: 29-31 October 2018

MediaEval 2018 Workshop

The MediaEval 2018 Workshop will be held 29-31 October 2018 at EURECOM, Sophia Antipolis, France.

Did you know?
Over its lifetime, MediaEval teamwork and collaboration has given rise to over 700 papers in the MediaEval workshop proceedings, but also at conferences and in journals. Check out the MediaEval bibliography.

General Information about MediaEval

MediaEval was founded in 2008 as a track called "VideoCLEF" within the CLEF benchmark campaign. In 2010, it became an independent benchmark and in 2012 it ran for the first time as a fully "bottom-up benchmark", meaning that it is organized for the community, by the community, independently of a "parent" project or organization. The MediaEval benchmarking season culminates with the MediaEval workshop. Participants come together at the workshop to present and discuss their results, build collaborations, and develop future task editions or entirely new tasks. MediaEval co-located itself with CLEF in 2017, with ACM Multimedia in 2010, 2013, and 2016, and with the European Conference on Computer Vision in 2012. It was an official satellite event of Interspeech in 2011 and 2015. Past working notes proceedings of the workshop include:

MediaEval 2015:
MediaEval 2016:
MediaEval 2017: