Media Interstingness

The 2017 Predicting Media Interestingness Task

Task Results
Task overview: [Slides] [Presentation video]
Participant results: [Playlist of all presentation videos]

Task Description
The task requires participants to automatically select images and/or video segments which are considered to be the most interesting for a common viewer. Interestingness of media is to be judged based on visual appearance, audio information and text accompanying the data, including movie metadata.

The Predicting Media Interestingness Task was proposed for the first time last year. This year's edition is a follow-up which builds incrementally upon the previous experience. It is not necessary to have participated in last year's task in order to succeed at this year's task.

Participating systems are expected to select frames/video excerpts that are interesting in the context of helping a user to make his/her decision about whether he/she is interested in watching a movie. This use scenario is similar to the one in 2016 and it derives from a use case at Technicolor which involves helping professionals to illustrate a Video on Demand (VOD) web site by selecting some interesting frames and/or video excerpts for the movies.

Two subtasks will be offered to participants:

1) Predicting Image Interestingness

Given a set of key-frames extracted from a certain movie, the task involves automatically identifying those images that viewers report to be interesting. To solve the task, participants can make use of visual content as well as accompanying metadata, e.g., Internet data about the movie, social media information, etc.

2) Predicting Video Interestingness

Given a set of video segments extracted from a certain movie, the task involves automatically identifying the segments that viewers report to be interesting. To solve the task, participants can make use of visual and audio data as well as accompanying metadata, e.g., subtitles, Internet data about the movie, etc.

Target group
Researchers will find this task interesting if they work in the areas of human perception and scene understanding. Examples include image/video interestingness/memorability/attractiveness prediction, image aesthetics, event detection, multimedia affect or perceptual analysis, multimedia content analysis, machine learning (though not limited to).

Data
The data will be extracted from Hollywood-like movies shared under Creative Commons licenses that allow redistribution.

For the video interestingness subtask, the data will consist of movie segments (obtained after manual segmentation). Prediction will be carried out on a per movie basis. For the image interestingness subtask, the data will consist of collections of key-frames extracted from the video segments used for the previous subtask (e.g., one key-frame per segment). This will allow comparing results from both subtasks. Prediction will be carried out also on a per movie basis.

Precomputed features will be provided along with the dataset to help teams from different communities to participate to the task.

Ground truth and evaluation
All data will be manually annotated in terms of interestingness by human assessors.

This year again, a pair-wise comparison protocol will be used. Annotators will be provided with a pair of images/video segments at a time and asked to tag which of the content is more interesting for them. The process will be repeated by scanning the whole dataset. To avoid an exhaustive, full comparison, between all the possible pairs, a boosting selection method will be employed (i.e., the adaptive square design method). The obtained annotations will finally be aggregated to result in the final interestingness degrees of the images/video shots.

The official evaluation metric will be the mean average precision at 10 (MAP@10) computed over all videos (whereas average precision is to be computed on a per video basis, over the top 10 best ranked images/video segments). MAP@10 is selected because it reflects the VOD use case, where the goal is to select a small set of the most interesting images or video segments.

Recommended reading
[1] Demarty, C.-H., Sjöberg, M., Ionescu, B., Do, T.-T., Wang, H., Duong, N. Q. K,. and Lefebvre, F., MediaEval 2016 Predicting Media Interestingness Task. In Proceedings of MediaEval 2016 Workshop. Hilversum, Netherlands, Oct. 20-21, 2016.

[2] Katti, H., Bin, K. Y., Seng, C. T., Kankanhalli, M., Pre-attentive discrimination of interestingness in images. In Proceedings of the IEEE ICME International Conference on Multimedia and Expo 2008.

[3] Jiang, Y-G. , Wang, Y., Feng, R., Xue, X., Zheng, Y., Yan, H., Understanding and Predicting Interestingness of Videos, In Proceedings of The 27th AAAI Conference on Artificial Intelligence (AAAI-13). Bellevue, Washington, USA, July 14-18, 2013.

[4] Gygli, M., Grabner, H., Riemenschneider, H., Nater, F., van Gool, L., The Interestingness of Images, In Proceedings of the ICCV International Conference on Computer Vision. Sydney, Australia, 2013.

[5] Khosla, A., Sarma, A., and Hamid, R., What Makes an Image Popular?, In Proceedings of the WWW International Conference on World Wide Web, 2014.

[6] Isola, P., Xiao, J., Parikh, J., and Torralba, A., What Makes a Photograph Memorable? IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI, 2014.[7] Y. Fu, T.M. Hospedales, T. Xiang, S. Gong, Y. Yao, Interestingness Prediction by Robust Learning to Rank, In Proceedings of the ECCV European Conference on Computer Vision, 2014.

[7] Yoon, S., Pavlovic, V., Sentiment Flow for Video Interestingness Prediction, In Proceedings of the ACM International Workshop on Human Centered Event Understanding from Multimedia, 2015.

[8] Soleymani, M., The quest for visual interest. In Proceedings of the ACM Multimedia 2015.

[9] Berlyne, D. E., Conflict, arousal and curiosity, McGraw-Hill, 1960.

[10] Silvia, P. J., Appraisal components and emotion traits: Examining the appraisal basis of trait curiosity, Cognition and Emotion, 2008.

[11] Gygli, M., Soleymani, M., Analyzing and Predicting GIF Interestingness, In Proceedings of the ACM Multimedia, Amsterdam, the Netherlands, 2016.

Task organizers
Claire-Helene Demarty, Technicolor, France (contact person) claire-helene.demarty at technicolor.com
Ngoc Duong, Technicolor, France
Bogdan Ionescu, University Politehnica of Bucharest, Romania
Mats Sjöberg, University of Helsinki, Finland
Toan Do, University of Science, Vietnam, University of Adelaide, Australia
Michael Gygli, ETH Zurich, Switzerland & Gifs.com, US

Task schedule
5 May: Development data release
1 June: Test data release
18 August: Run submission
21 August: Results returned
27 August: Working notes paper: initial submission deadline
4 September: Working notes paper: camera ready deadline
13-15 September: MediaEval Workshop in Dublin

Acknowledgments
Part of the UPB contribution was funded under research grant PN-III-P2-2.1-PED-2016-1065, agreement 30PED/2017, project SPOTTER.

MediaEval Benchmarking Initiative for Multimedia Evaluation

The "multi" in multimedia: speech, audio, visual content, tags, users, context