Medico: The 2019 Multimedia for Medicine Task

Task Description
The 2019 Medico task tackles the challenge of predicting various aspects of sperm quality using microscope video recordings of sperm and associated data. The provided dataset consists of 85 videos, a set of sperm characteristics (hormones, fatty acids data, etc.), some anonymized study participants-related data and WHO analysis data (ground truth for sperm quality assessment).

Task participants are required predict common measurements used for sperm quality assessment, specifically the motility (movement) and morphology (head, midpiece, and tail failures) of spermatozoa (living sperm). The task encourages participants to combine all the available data sources in order to make predictions.

Specifically, participants build systems to address three tasks:

1) Prediction of motility. Movement is predicted in terms of the percentage of progressive and non-progressive spermatozoa. The prediction needs to be performed sample wise resulting in one value per sample per predicted attribute. No sperm tracking or bounding boxes are required to solve the task. Motility is the ability of an organism to move independently, and where a progressive spermatozoon is able to "move forward", a non-progressive would move in circles without any forward progression.

2) Prediction of morphology. Morphology is a branch of biology dealing with the study of an organism's form and structural features. In the context of semen, doctors often examine the three parts which make up a spermatozoon which includes the head, midpiece, and tail. This task should predict the percentage of sperm with head defects, midpiece defects, and tail defects. Morphology analysis also only requires sample wise prediction resulting in one value per sample per predicted attribute. As for the motility no sperm tracking or bounding boxes are required to solve the task.

For both tasks 1) and 2) task-participants are asked to perform video analysis over single frame analysis. This is important due to the fact that single frame based analysis will not be able to catch the movement of the spermatozoa which contains important information to perform the predictions.

3) Unsupervised sperm tracking. Find the spermatozoon that moves fastest compared to all others. This task requires that task participants track the spermatozoa. Within the tracked ones the fastest is defined as:
i) Fastest average speed: the one that moves the longest distance during the video (total distance / length of the video). This can then be calculated summarizing the different positions (in pixels) between each frame and divide on the number of frames; and
ii) Highest top speed: the one that has the highest intermediate speed. This can be calculated using the maximum of the differences between frames.
One specific challenge of this third subtask is that the video also changes the view on the sample. This happens because the sample is moved below the microscope to observe the complete sample area. Therefore, the tracking has to be performed per viewpoint on the sample.

For all the sub-tasks we require participants to measure and report the data processing performance in terms of time spent on each frame being analyzed. As we aim for the real-time applications, the processing speed is an important key factor, especially for the individual spermatozoon selection task during the in vitro fertilisation procedure.

Task motivation
We hope that this task will encourage the multimedia community to aid in the development of computer-assisted reproductive health, and discover new and clever ways of analyzing multimodal datasets. In addition to good analysis performance an important aspect is also the efficiency of the algorithms due to the fact that the assessment of the sperm is performed in real time and therefore requires real-time feedback.

It is important to point out that all data is fully anonymized and follows the state of the art with respect to privacy of medical information.

Target group

The task is of interest to researchers in the areas of machine learning (classification), visual content analysis and multimodal fusion. Overall, this task is intended to encourage the multimedia community to help improve the health care system through application of their knowledge and methods to reach the next level of computer and multimedia assisted diagnosis, detection and interpretation of abnormalities.

The VISEM data set contains data from 85 male participants aged 18 years or older. For each participant, we include a set of measurements from a standard semen analysis, a video of live spermatozoa, a sperm fatty acid profile, the fatty acid composition of serum phospholipids, study participants-related data and WHO analysis data. The dataset contains over 35 gigabytes of videos, with each video lasting between two to seven minutes. This GIF provides an impression.


Each video has a resolution of 640x480 and runs at 50 frames-per-second. The dataset contains in total six CSV-files (five for data and one which maps video IDs to study participants IDs), a description file, and a folder containing the videos. The name of each video file contains the videos ID, the date it was recorded, and a small optional description. Then, the end of the filename contains the code of the person who assessed the video. Furthermore, VISEM contains five CSV-files for each of the other data provided, a CSV-file with the IDs linked to each their video, and a text file containing descriptions of some of the columns of the CSV-files. One row in each CSV-file represents a participant. The provided CSV-files are:
  • semen_analysis_data: The results of standard semen analysis.
  • fatty_acids_spermatozoa: The levels of several fatty acids in the spermatozoa of the participants.
  • fatty_acids_serum: The serum levels of the fatty acids of the phospholipids (measured from the blood of the participant).
  • sex_hormones: The serum levels of sex hormones measured in the blood of the participants.
  • study_participant_related_data: General information about the participants such as age, abstinence time and Body Mass Index (BMI).
  • videos: Overview of which video-file belongs to what participant.
We will provide pre-extracted features for all visual data, i.e., videos. For the final evaluation, participants will use a provided test dataset without annotations.

The dataset VISEM is publicly available for participants and other multimedia researchers without any restriction. All Study participants agreed to donate their data for the purpose of science and provided the necessary consent for us to be able to distribute the data (checked and approved by Norwegian data authority and ethical committee).

Ground truth and evaluation
For the evaluation, we will use mean squared error, mean absolute error and the mean absolute percentage error for first two subtasks. For the ranking mean absolute percentage error will be used which shows the improvement of the automatic prediction compared to the average prediction. For the optional third task, we will use manual evaluation with the help from three different experts within human reproduction.

For the processing speed evaluation we will use minimum, average and maximum frame processing time in seconds measured by the participants as a time interval from the moment when image has been completely loaded into memory to the moment of the final decision has been made by the corresponding sub-task analysis algorithm.

Recommended reading
[1] Riegler, Michael, et al. "Multimedia and Medicine: Teammates for Better Disease Detection and Survival." Proceedings of the 2016 ACM Multimedia Conference. ACM, 2016.

We recommend also having a look at past year's papers in the MediaEval Workshop Proceedings:

Guillaume Gravier et al. (eds.) 2017. Proceedings of the MediaEval 2017 Workshop, Dublin, Ireland, Sept. 13-15, 2017.
Martha Larson, et al. (eds.) 2018. Proceedings of the MediaEval 2017 Workshop, Sophia Antipolis, France, Oct. 29-31, 2018.

Task organizers
Steven Hicks, SimulaMet, Norway, steven at
Hugo Lewi Hammer, OsloMet, Norway
Haakon K Stensland, Simula Research Laboratory, Norway
Michael Riegler, SimulaMet, Norway michael at
Konstantin Pogorelov, Simula Research Laboratory, Norway
Pia Smedsrud, Simula Research Laboratory, Norway

Pål Halvorsen, SimulaMet, Norway
Trine B Haugen, OsloMet, Norway
Jorunn Marie Andersen, OsloMet, Norway
Oliwia Witczak, OsloMet, Norway
Rune Borgli, Simula Research Laboratory, Norway
Duc-Tien Dang-Nguyen, University of Bergen, Norwary ductien.dangnguyen at,
Mathias Lux, Alpen-Adria-Universität Klagenfurt, Austria

Task schedule
Development data release: 31 May
Test data release: 15 June
Runs due: 20 September
Results returned: 23 September
Working Notes paper due: 30 September
MediaEval 2019 Workshop (in France, near Nice): 27-29 October 2019