The 2018 Predicting Media Memorability Task

Task description
The Predicting Media Memorability Task focuses on the problem of predicting how memorable a video will be. It requires participants to automatically predict memorability scores for videos, which reflect the probability of a video being remembered. Participants will be provided with an extensive dataset of videos with memorability annotations, and pre-extracted state-of-the-art visual features. The ground truth has been collected through recognition tests, and, for this reason, reflects objective measures of memory performance. In contrast to previous work on image memorability prediction, where memorability was measured a few minutes after memorization, the dataset comes with ‘short-term’ and ‘long-term’ memorability annotations. Because memories continue to evolve in long-term memory, in particular during the first day following memorization, we expect long-term memorability annotations to be more representative of long-term memory performance, which is used preferably in numerous applications. Participants will be required to train computational models capable of inferring video memorability from visual content. Optionally, descriptive titles attached to the videos may be used. Models will be evaluated through standard evaluation metrics used in ranking tasks.

Two subtasks will be offered to participants:
(1) Short-term Memorability Subtask: the task involves predicting a ‘short-term’ memorability score for a given short video.
(2) Long-term Memorability Subtask: the task involves predicting a ‘long-term’ memorability score for a given short video.
For the two subtasks, depending on the runs, participants will be allowed to use external data.

The motivation for this task derives from the need for new techniques that can help to organize and retrieve digital content, to make it more useful in our daily lives. The problem is a pressing one since media platforms, such as social networks, search engines, and recommender systems deal with growing amounts of content data day after day. Like other cues of video importance, such as aesthetics or interestingness, memorability can be regarded as useful to help make a choice between otherwise comparable videos. Consequently, a large number of applications, e.g., education and learning, content retrieval and search, content summarization, storytelling, targeted advertising, content recommendation and filtering, would benefit from models capable of ranking videos according to their memorability.

Target group
Researchers will find this task interesting if they work in the areas of human perception and scene understanding such as image and video interestingness, memorability, attractiveness, aesthetics prediction, event detection, multimedia affect and perceptual analysis, multimedia content analysis, machine learning (though not limited to).

Data is composed of 10,000 short (soundless) videos extracted from raw footage used by professionals when creating content. The videos are shared under a license that allows their use and redistribution solely in the context of MediaEval 2018. They come with a set of pre-extracted features, such as: HoG descriptors, LBP, GIST, Color Histogram, Fc7 layer from Inception, C3D features, etc.

Ground truth and evaluation
Each video consists of a coherent unit in terms of meaning and is associated with two scores of memorability that refer to its probability to be remembered after two different durations of memory retention. Memorability has been measured using recognition tests, i.e., through an objective measure, a few minutes after the memorization of the videos, and then 24 to 72 hours later.

The outputs of the prediction models – i.e., the predicted memorability scores for the videos – will be compared with ground truth memorability scores using classic evaluation metrics (e.g., Spearman’s rank correlation).

Recommended reading
[1] Aditya Khosla, Akhil S Raju, Antonio Torralba, and Aude Oliva. 2015. Understanding and predicting image memorability at a large scale. In Proc. IEEE Int. Conf. on Computer Vision (ICCV). 2390–2398.
[2] Phillip Isola, Jianxiong Xiao, Devi Parikh, Antonio Torralba, and Aude Oliva. 2014. What makes a photograph memorable? IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (2014), 1469–1482.
[3] Hammad Squalli-Houssaini, Ngoc Duong, Marquant Gwenaëlle, and Claire-Hélène Demarty. 2018. Deep learning for predicting image memorability. In Proc. IEEE Int. Conf. on Audio, Speech and Language Processing (ICASSP).
[4] Junwei Han, Changyuan Chen, Ling Shao, Xintao Hu, Jungong Han, and Tianming Liu. 2015. Learning computational models of video memorability from fMRI brain imaging. IEEE transactions on cybernetics 45, 8 (2015), 1692–1703.
[5] Sumit Shekhar, Dhruv Singal, Harvineet Singh, Manav Kedia, and Akhil Shetty. 2017. Show and Recall: Learning What Makes Videos Memorable. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2730–2739.
[6] Romain Cohendet, Karthik Yadati, Ngoc K.Q. Duong and Claire-Hélène Demarty. 2018. Annotating, Understanding, and Predicting Long-term Video Memorability. In Proceedings of the ACM International Conference on Multimedia Retrieval (ICMR).
Please note that a dataset for predicting long-term video memorability is publicly available with [6] at the following address:

Task organizers
Romain Cohendet, Technicolor, France (romain.cohendet at
Claire-Hélène Demarty, Technicolor, France (claire-helene.demarty at
Quang-Khanh-Ngoc Duong, Technicolor, France
Bogdan Ionescu, University Politehnica of Bucharest, Romania
Mats Sjöberg, Aalto University, Finland
Thanh-Toan Do, ARC Center of Excellence for Robotic Vision (ACRV), The University of Adelaide, Australia

Task auxiliaries
Ricardo Savii, Federal University of São Paulo
Mihai Gabriel Constantin, University Politehnica of Bucharest

Task schedule
Development data release: 24 May 2018
Test data release: 25 June 2018
Runs due: 1 October 2018
Working Notes paper due: 17 October 2018