MediaEval 2018

MediaEval is a benchmarking initiative dedicated to developing and evaluating new algorithms and technologies for multimedia retrieval, access and exploration. It offers tasks to the research community that are related to human and social aspects of multimedia. MediaEval emphasizes the 'multi' in multimedia and seeks tasks involving multiple modalities, e.g., audio, visual, textual, and/or contextual. Our larger aim is to promote reproducible research that makes multimedia a positive force for society.

Organizing a task: MediaEval Tasks are run by autonomous groups of task organizers, who submit a task proposals. Please see the MediaEval 2018 Call for Task Proposals for full details. Proposals are chosen using a viability check, and also a survey that confirms community interest. The force that makes MediaEval possible is the vision, dedication and hard work of the MediaEval task organizers: we look forward to welcoming you into the team.

2018 Task List
The 2018 task list will be finalized at the end of February. We will continue adding tasks to this previsionary list, so that potential participants know what to expect:

Multimedia Satellite Task: Emergency Response for Flooding Events
The purpose of this task is to augment events captured by satellite images with social multimedia content in order to provide a more comprehensive view. In 2018, we will again focus on flooding. The task involves two subtasks: Flood detection in satellite images and Flood classification in social multimedia. Participants receive data and are required to train classifiers. Fusion of satellite and social multimedia information is encouraged. The task moves forward the state of the art by concentrating on aspects that are important to people, but are not generally studied by multimedia researchers, such as the level to which an area has been effected by a flood in terms of human-specific aspects such as road access.

Medico Multimedia Task
The goal of the task is efficient processing of medical multimedia data for disease prediction. Participants are provided with images and videos of the human gastrointestinal tract, and are required to develop classifier that minimizes necessary resources (processing time, training data). The ground truth labels are created by medical experts. The task differs from existing medical imaging tasks in that is uses only multimedia data (i.e., images and videos) and not medical imaging data (i.e., CT scans). A further innovation is its focus on two non-functional requirements: using as little training data as possible and being computationally effective. In addition to that this years task will also ask to perform one shot classification as one of the sub tasks.
The task lays the basis for automatic, real-time generation of medical reports on the basis of recordings made by standard and capsule endoscopies.

Eyes and Ears Together (EET)
Processing of multimedia content has conventionally been treated as parallel monomedia tasks, e.g speech recognition, computer vision, late fusion in video search. Real-world processing of this content is generally inherently multimodal, where cross and modal processing provides understanding of objects and language, and interpretation of events. Interest is now emerging in extending conventional monomedia processing to exploit multimodal signals, e.g. multimodal speech recognition, visual grounding in natural language processing. Eyes and Ears Together (EET) will offer tasks exploring new and emerging multimodal recognition and understanding tasks. Potential tasks for MediaEval 2018 include use of computer vision for pronoun resolution, e.g. resolve "it" or "he" in a transcript using visual signals; naming people based on information in transcript, e.g. identifying a person found in an image, use of visual signals to support speech processing.

Pixel Privacy Task
This task develops image enhancement approaches that project user privacy. Specifically, it is dedicated to creating technology that invisibly changes or visibly enhances images in such a way that it is no longer possible to automatically infer the location at which they were taken. The task has two sub-tasks: "geo-protect" and "geo-predict". The "geo-protect" task requires participants to develop protective image enhancements (evaluated with respect to user study) and the "geo-predict" task requires participants to predict the geo-location of an image despite the protective enhancement (evaluated using distance to the correct location). This task advances the state of the art in multimedia analysis by investigating the interplay between what users intend to communicate with their images (and must be preserved) and what users do not intend to communicate (and must be protected).

AcousticBrainz Genre Task: Content-based music genre recognition from multiple sources
The goal of our task is to understand how genre classification can explore and address the subjective and culturally-dependent nature of genre categories. Traditionally genre classification is performed using a single source of ground truth with broad genre categories as class labels. In contrast, this task is aimed at exploring how to explore and combine multiple sources of annotations, each of which are more detailed. Each source has a different genre class space, providing an opportunity to analyze the problem of music genre recognition from new perspectives and with the potential of reducing evaluation bias.

Predicting Media Memorability Task
The purpose of this task is the automatic prediction of multimedia content memorability. Understanding what makes a content memorable has a very broad range of current applications, e.g., education and learning, content retrieval and search, content summarization, storytelling, targeted advertising, content recommendation and filtering. Efficient memorability prediction models will also push forward the semantic understanding of multimedia content, by putting human perceptions through memorability in the center of the scene understanding. For the task, participants will be provided with extensive datasets of multimedia content (images and/or videos) associated with memorability annotations, and with pre-extracted state-of-the-art audio-visual features. The corresponding ground truth consists of objective measures of memory performances and had been collected through recognition tests. Participants will be required to train computational models capable to infer multimedia content memorability from features of their choice. Models will be evaluated through standard evaluation metrics. (The task is an evolution of the Predicting Media Interestingness task.)

Emotional Impact of Movies task
In this task, the goal is to elaborate systems designed to predict the emotional impact of movies. It builds on last year's edition, integrating participants feedback. Two subtasks are proposed: (1) predicting induced valence and arousal scores continuously along movies, and (2) predicting begining and ending times of sequences inducing fear in movies. The training data will consist of Creative Commons-licensed movies (professional and amateur) together with human annotations of valence, arousal and fear. The results on a test set will be evaluated using standard evaluation metrics.

E-sport is huge. Already in 2013, concurrent users for a single event exceeded eight million for a League of Legends Championship. In 2016 approximately 161 million viewers accessed esports streams frequently. The rich bouquet of data including audio and video streams, commentaries, game data and statistics, interaction traces, viewer-to-viewer communication and many more channels, allow for particularly challenging multimedia research questions. In this task we encourage participants to think up and investigate ways to summarize how e-sport matches ramp up, evolve and play out over time. Instead of iterating highlights, the summary needs to present an engaging and captivating story, which boils down the thrill of the game to its mere essence. Training and test data for the task are provided in cooperation with ZNIPE.TV, which is a rapidly growing platform for e-sport streaming. Participants are asked to create a specific number of summaries, which are evaluated by an expert panel. The expert panel will include professionals from ZNIPE.TV and researchers from the field of game studies and non-linear narratives. The exact criteria for evaluating submissions will be available to the participants within the in depth task description.

Recommending Movies Using Content: Which content is key?
This task explores ways in which multimedia content is useful for movie recommendation. Participants are supplied with features (audio, visual and textual modalities) corresponding to a large subset of the movies in the well-known MovieLens 20M data set (20 million ratings and 465,000 tag assignments applied to 27,000 movies by 138,000 users). The goal of the task is to improve the accuracy and diversity of top-N recommendation. The emphasis will be on exploiting features derived from trailers. On a small set of movies the difference between using different parts of a trailer, or different parts of the complete movie will be explored. Currently, it is not known how much of a movie, or what part of a movie, is essential for successful recommendation. This task will provide insight into how content can be best used to replace user-item interactions for successful movie recommendation.

NewsREEL Multimedia: News recommendation with image/text content
The goal of this task is to gain insight into the relationship between images accompanying news articles, and the number of times these articles are clicked by users. The task is an offshoot of the independent NewsREEL challenge, which allows researchers to test news recommendation algorithms online in real world conditions. These conditions are characterized by a rapidly changing set of news items, a sparsity of interactions, short interaction sessions, and no user information. (This description will be extended as soon as possible.)

Analysing Social Behavior in Crowded Settings
An important but under-explored problem is the automated analysis of conversational dynamics in large unstructured social gatherings such as networking or mingling events. Research has shown that attending such events contributes greatly to career and personal success. While much progress has been made in the analysis of small pre-arranged conversations, scaling up robustly presents a number of fundamentally different challenges. This task focuses on analysing one of the most basic elements of social behavior: the detection of speaking turns. Research has shown the benefit if deriving features from speaking turns for estimating many different social constructs such as dominance, or cohesion to name but a few. Unlike traditional tasks that have used audio to do this, here the idea is to leverage the body movements (i.e. gestures) that are performed during speech production which are captured from video and/or wearable acceleration and proximity. The benefit of this is that it enables a more privacy preserving method of extracting socially relevant information and has the potential to scale to settings where recording audio may be impractical.

Participating in a task: MediaEval attracts researchers who are interested in community-based benchmarking. This means that they are not only interested in creating solutions to MediaEval tasks, but they are interested in discussing and exchanging ideas with other researchers who are taking part in MediaEval. Researchers who are primarily looking to achieve a high rank in a benchmark, and are not interested in attending the workshop or in engaging in discussion of techniques in results with other researchers, do not benefit from the unique community-driven character of MediaEval.

MediaEval 2018 Timeline

If you are interested in proposing a task:

Indication of Intent: Friday 26 January 2018
Full proposal deadline: Friday 16 February (updated deadline)

If you are interested in participating in a task:
March-May 2018: Registration for task participation
May-June 2018: Development data release
June-July 2018: Test data release
Run submission: End September 2018
Workshop: Late Oct. or Early Nov. 2018

MediaEval 2018 Workshop

The MediaEval 2018 Workshop will be held late October 2018 at EURECOM, Sophia Antipolis, France.

Did you know?
Over its lifetime, MediaEval teamwork and collaboration has given rise to over 700 papers in the MediaEval workshop proceedings, but also at conferences and in journals. Check out the MediaEval bibliography.

General Information about MediaEval

MediaEval was founded in 2008 as a track called "VideoCLEF" within the CLEF benchmark campaign. In 2010, it became an independent benchmark and in 2012 it ran for the first time as a fully "bottom-up benchmark", meaning that it is organized for the community, by the community, independently of a "parent" project or organization. The MediaEval benchmarking season culminates with the MediaEval workshop. Participants come together at the workshop to present and discuss their results, build collaborations, and develop future task editions or entirely new tasks. MediaEval co-located itself with CLEF in 2017, with ACM Multimedia in 2010, 2013, and 2016, and with the European Conference on Computer Vision in 2012. It was an official satellite event of Interspeech in 2011 and 2015. Past working notes proceedings of the workshop include:

MediaEval 2015:
MediaEval 2016:
MediaEval 2017: