The 2019 Multimedia for Recommender System Task

Task description
For this task, participants analyse items and derive feature sets combining modalities, for instance, audio, images, and text. Subsequently, For this task, participants analyse items and derive feature sets combining modalities, for instance, audio, images, and text. Subsequently, they implement predictors that estimate which items will be relevant to users.

Participants can target two subtasks that cover different domains.
  • Movie Recommendation subtask: asks participants to predict the average rating of movies given by users, the variance as a measure of raters’ agreement along with the popularity of movies. The provided data set includes links to movie trailers, precomputed state-of-the-art audio-visual features, and metadata from MovieLens.
  • News Recommendation subtask: challenges participants to predict the reading frequencies of news articles. The provided data set comes from a set of German publishers and spans multiple months. It features text snippets, image URLs, and some pre-extracted neural image representations.

Recommender Systems (RecSys) permeate our digital landscape. Whenever users face an overwhelming amount of information, system operators introduce recommender functionalities to preselect a subset of expectedly relevant information. Multimedia RecSys investigates which role multimodal data can play to improve recommendations.

Task motivation
Traditionally recommender systems and multimedia data processing are studied by separate groups of researchers. These researchers have a lot to learn from each other, and this task offers an interdisciplinary forum that promotes exactly such exchange.

An often-cited motivation exploiting features derived from multimedia is cold start. However, in this task, we also relate the importance of using multimedia in recommender systems to the drawbacks for personalization. Personalized information access comes with some caveats. Predictions become successful for some users whereas they fail for others. Understanding how multimedia affects users’ perception of items facilitates creating fair and unbiased information access systems. Recommender systems have been found to induce “filter bubbles” preventing access to some information. The high complexity of content data promises to overcome this issue as content similarities can be defined among all items. Further, the use of multimedia has potential to promote the development of recommender systems that need less user-specific interaction data in order to make recommendations, thus promoting privacy.

Target group
Researchers will find this task interesting if they work in the research areas of multimedia processing, personalization and recommender systems, machine learning and information retrieval.

The movie dataset includes links to the videos (Youtube URLs), precomputed state of the art audio-visual features, and metadata from MovieLens and other sources. The news dataset is collected from a set of German publishers and spans multiple months. It includes text snippets, image URLs, and some pre-extracted neural image features.

Ground truth and evaluation

Movie Recommendation: Each team will be asked to provide 1 submission file, containing 3 predicted scores for the items given in the test set. The scores should be in comma-separated format in the form: id, s1, s2, s3, where ‘id’ is the item id, ‘s1’ is the predicted score for rating average, ‘s2‘ is the predicted score for rating standard deviation and ‘s3’ is the predicted score for the popularity score. The evaluation of participants’ runs is realized by using the standard error metric root-mean-square-error (RMSE) between the predicted scores and the actual scores according to the ground truth.

News Recommendation: We remove a set of weeks from the reading statistics. Participants submit at most five lists with the estimated number of reads for all articles for each held-out week. Subsequently, we sort the estimates and establish a ranking. We measure performance in terms of precision. More specifically, we consider two cut-off points. First, we compute the precision for the top ten articles. Second, we compute the precision for the top ten percent of articles. Based on the results, we determine the best submission for each metric.

Recommended reading
Yashar Deldjoo, Maurizio Ferrari Dacrema, Mihai Gabriel Constantin, Hamid Eghbal-zadeh, Stefano Cereda, Markus Schedl, Bogdan Ionescu, and Paolo Cremonesi. 2019. Movie genome: alleviating new item cold start in movie recommendation. User Model. User-Adapt. Interact. 29, 2 (2019), 291–343.

Ricci, Francesco, Lior Rokach, and Bracha Shapira. "Recommender Systems: Introduction and Challenges." Recommender systems handbook. Springer, Boston, MA, 2015. 1-36.

Michael D. Ekstrand, John T. Riedl and Joseph A. Konstan (2011), "Collaborative Filtering Recommender Systems", Foundations and Trends® in Human–Computer Interaction: Vol. 4: No. 2, pp 81-173.

Lops, Pasquale, Marco De Gemmis, and Giovanni Semeraro. "Content-based recommender systems: State of the art and trends." Recommender systems handbook. Springer, Boston, MA, 2011. 73-105.

Yashar Deldjoo, Mehdi Elahi, Paolo Cremonesi, Franca Garzotto, Pietro Piazzolla, Massimo Quadrana. Content-based Video Recommendation System based on Stylistic Visual Features, Journal on Data Semantics, 5(2), pp. 99-113, 2016.

Yimin Hou, Ting Xiao, Shu Zhang, Xi Jiang, Xiang Li, Xintao Hu, Junwei Han, Lei Guo, L. Stephen Miller, Richard Neupert, Tianming Liu. Predicting Movie Trailer Viewer's “like/dislike” via Learned Shot Editing Patterns, IEEE Transactions on Affective Computing, 7(1), pp. 29-44, 2016.

Yashar Deldjoo, Mihai Gabriel Constantin, Hamid Eghbal-Zadeh, Markus Schedl, Bogdan Ionescu, and Paolo Cremonesi. 2018. Audio-Visual Encoding of Multimedia Content to Enhance Movie Recommendations. In Proceedings of the Twelfth ACM Conference on Recommender Systems. ACM.

Javaria Ahmad, Prakash Duraisamy, Amr Yousef, and Bill Buckles. 2017. Movie success prediction using data mining. In 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT). IEEE, 1–4.

Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. 2007. Google News Personalization: Scalable Online Collaborative Filtering. In Proceedings of the 16th International Conference on World Wide Web. ACM, 271–280.

Florent Garcin, Boi Faltings, Olivier Donatsch, Ayar Alazzawi, Christophe Bruttin, and Amr Huber. 2014. Offline and Online Evaluation of News Recommender Systems at In Proceedings of
the 8th ACM Conference on Recommender systems. ACM, 169–176.

Lommatzsch, Andreas, et al. “CLEF 2017 NewsREEL overview: A stream-based recommender task for evaluation and education.” In International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 239-254. Springer, Cham, 2017.

Corsini, Francesco, and M. A. Larson. “CLEF newsreel 2016: Image-based recommendation.” (2016).

Journal special issue
Participants with the best-performing or most original or creative approaches may be invited to extend their Working Note papers to full articles and submit to our Special Issue on Multimedia Recommender Systems to appear in the Springer International Journal of Multimedia Information Retrieval (

Task organizers
Yashar Deldjoo, Politecnico di Milano, Italy, first.last
Benjamin Kille, TU Berlin, Germany, first.last at
Markus Schedl, Johannes Kepler University Linz, Austria
Andreas Lommatzsch, TU Berlin, Germany, first.last at
Jialie Shen, Queen’s University Belfast, UK j.shen @

Task schedule
Data release: 15 May
Runs due: 20 September
Results returned: 25 September
Working Notes paper due: 30 September
MediaEval 2019 Workshop (in France, near Nice): 27-29 October 2019