The 2015 Context of Experience Task: Recommending Videos Suiting a Watching Situation (New!)
This task tackles the challenge of predicting the multimedia content that users find most fitting to watch in specific viewing situations. Most work on video recommendation focuses on predicting personal preferences. As such, it overlooks cases in which context has a strong impact on preference relatively independently of the personal tastes of specific viewers. Particularly strong influence of context can be expected in unusual, potentially psychologically or physically straining, situations.

In this task, we focus on the case of viewers watching movies on an airplane. Here, independently of personal preferences, viewers share the common goal (which we consider to be a “viewing intent”) of passing the time, and keeping themselves occupied in the small space of an airplane cabin. The objective of the task is to predict which videos allow viewers to achieve this goal, given the context, which includes the limitations of the technology (e.g., screen size), and the environment (e.g., background noise, interruptions, presence of strangers). We choose airplanes since the role of stress, and viewers’ intent to distract themselves is widely acknowledged, e.g., in online descriptions such as [1]. Although this year will limit itself to the airplane scenario, we note that the challenge of Context of Experience is much broader in scope. Other stressful contexts where videos are becoming increasingly important include hospital waiting rooms, and dentists offices, where videos are shown during treatment.

The task will provide participants with a list of movies (including links to descriptions and video trailers), and require them to classify each movie into +goodonairplane/-goodonairplane classes. The ground truth of the task is derived from two sources. First, actual movie lists used by a major airlines, and second user judgments on movies that are collected via a crowdsourcing tool.

Task participants should form their own hypothesis about what is important for users viewing movies on an airplane, and design and approach using appropriate features and a classifier, or decision function. Figure 1 gives an impression of such a screen and the very specific attributes regarding size and quality of the video.

Movie on a plane
Figure 1: A set of conditions, including small screen and confined, crowded space, characterize the context of watching a movie on an airplane.

The following video was made to provide a more detailed impression of viewing conditions, and to encourage people to reflect on the characteristics that make certain movies suitable for watching on an airplane.

The value of the task is in understanding the ability of content-based and metadata-based features to discriminate the kind of movies that people would like to watch on small screens under stressful situations. The task is closely related to work in the area of Quality of Multimedia Experience and producer/consumer intent [2-5].

Task participants will be provided with a collection of videos (we will provide trailers as representative for the movie + context, e.g. video URL in different qualities + metadata + user votes) and will need to develop methods that will predict to which intent class the video belongs.

Target group
The task is attractive to researchers with a wide range of interests since it can be addressed by leveraging techniques from multiple multimedia-related disciplines including social computing (intent), machine learning (classification), multimedia content analysis, multimodal fusion, and crowdsourcing. It is also a practical and attractive topic from a content provider's point of view, since the exploitation of intent in combination with for example users satisfaction could lead to sophisticated ways to provide a better service to the users.

We will release a data set including titles and links that allow participants to gather online metadata and trailers for movies. We will not provide the video files. The data set will include around 500 movies. Examples will be collected in part based on movies lists from a major international airline. This video dataset will contain both positive and negative examples, carefully sampled in order to create a fair and representative negative class. The data set will be split into a training and test set. To collect user judgements, we will use existing system that has been built for the purpose of collecting user feedback of this sort. We will evaluate systems both with respect to the airline’s choice of movies, and the crowd’s choice of airline-suitable movies. The crowd’s choice will be considered the authoritative labels.

Ground truth and evaluation
Overall, we are interested in measuring the accuracy with which an automatic method can distinguish between different intent categories. Hence, given a set of labeled instances (videos + context + label) that could be used for training, the participants should predict the labels of the test cases. As a first proposal, the classical measures of P-R and WF1-score could be used to quantify performance.

Recommended reading
[1], December 2014.

[2] Reiter, U., Brunnström, K., De Moor, K., Larabi, M. C., Pereira, M., Pinheiro, A., & Zgank, A. (2014). Factors influencing quality of experience. In Quality of Experience (pp. 55-72). Springer International Publishing.

[3] Redi, J. A., Zhu, Y., de Ridder, H., & Heynderickx, I. (2015). How Passive Image Viewers Became Active Multimedia Users. In Visual Signal Quality Assessment (pp. 31-72). Springer International Publishing.

[4] Riegler, Michael, et al. “Exploitation of Producer Intent in Relation to Bandwidth and QoE for Online Video Streaming Services.” Proceedings of the ACM SIGMM Workshop on Network and Operating Systems Support for Digital Audio and Video, ACM, 2014.

[5] Lebreton, Pierre, et. al. “Evaluating complex scales through subjective ranking.” Proceedings of the QoMEX 2014, IEEE, 2014.

[6] Rainer, Benjamin, et. al. “A Quality of Experience Model for Adaptive Media Playout.” Proceedings of the QoMEX 2014, IEEE, 2014.

[7] Reiter, Ulrich, et. al. “Long duration audiovisual content: Impact of content type and impairment appearance on user quality expectations over time.” Proceedings of the QoMEX 2013, IEEE, 2013.

[8] Jumisko-Pyykkö, S., & Hannuksela, M. M. (2008, September). Does context matter in quality evaluation of mobile television?. In Proceedings of the 10th international conference on Human computer interaction with mobile devices and services (pp. 63-72). ACM.

[9] Zhu, Y., Heynderickx, I., & Redi, J. A. (2015). Understanding the role of social context and user factors in video Quality of ExperienceComputers in Human Behavior49, 412-426.

[10] Sackl, A., Zwickl, P., & Reichl, P. (2013, October). The trouble with choice: An empirical study to investigate the influence of charging strategies and content selection on QoE. In 2013 9th International Conference on Network and Service Management (CNSM) (pp. 298-303). IEEE.

[11] K. Kilkki, Quality of experience in communications ecosystem, Journal of Universal Computer Science, 14 (2008), pp. 615–624

Task organizers
Michael Riegler, Simula Research Laboratory and University of Oslo, michael (at), Norway
Concetto Spampinato, University of Catania, Italy

Task auxiliaries
Minoo Kargar, Simula Research Laboratory and University of Oslo, Norway
Martha Larson, Delft University of Technology, Netherlands

Task schedule
25 May Development data release
30 June Test data release
15 August Run submission
28 August: Working notes paper deadline
14-15 September MediaEval 2015 Workshop

And also
CrowdRec EC FP7 No. 610594