The 2017 Retrieving Diverse Social Images Task

Task Results
Task overview: [Slides] [Presentation video]
Participant results: [Playlist of all presentation videos]

Task Description
The task aims at image search result diversification in the context of social media. It is a follow-up of the 2013, 2014, 2015, and 2016 editions. Participation in past years does not offer a particular advantage in 2017, and we especially welcome new participants.

The task addresses the use case of a general ad-hoc image retrieval system, which provides the user with visually diverse representations of the queries (see for instance Google Image Search The system should be able to tackle complex and general-purpose, multi-concept queries (e.g., dancing on the street, trees reflected in water, sailing boat).

Given a ranked list of query-related photos retrieved from Flickr using text queries, participating systems are expected to refine the results by providing a set of images that are relevant to the query and, at the same time, offer a visually diversified summary of it. Initial results of a general ad-hoc image retrieval system are typically noisy and often redundant. The refinement and diversification process will be based on the social metadata associated with the images and on their visual characteristics. Additionally, selected runs may make use of information related to user tagging credibility, external information, and/or human-machine-based approaches in order to assess the relevance of the images and to provide a diverse image set.

Target group
Target communities involve both machine- and human-based media analysis such as image and text processing and analysis (text, computer vision, multimedia communities), re-ranking, relevance feedback, crowdsourcing, and automated geo-tagging.

The dataset consists of redistributable Creative Commons licensed information about general-purpose, multi-topic queries. Each query will be represented with up to 300 Flickr photos and their associated social metadata (e.g., title, description, geo-tagging information, number of views, and number of posted comments). The data is partitioned as following: (1) development data intended for designing and training the approaches (ca. 100 general-purpose, multi-concept queries with 30,000 images); (2) credibility data intended to estimate the global quality of tag-image content relationships for a user's contribution (metadata for ca. 3,000 users); (3) evaluation data intended for the actual benchmark (ca. 100 general-purpose, multi-concept queries with 30,000 images).

To encourage participation of groups from different communities (e.g., information retrieval, computer vision, and multimedia), resources such as general-purpose visual descriptors and text models will be provided for the entire collection.

Ground truth and evaluation
All the images are to be annotated in terms of relevance to the query and visual-based diversity. Annotations are to be carried out by expert annotators. Relevance annotation will consist of yes/no annotations (including the “don’t know” option). Input from different annotators is aggregated with majority voting schemes. Diversity annotation will mainly consist of regrouping visually similar images into clusters. Each image cluster is provided with a short textual description that justifies its choice. Naturally, only relevant images are annotated for diversity. Multiple annotations of diversity will be available for the test set in order to investigate the aspect of subjectivity in the perception of visual-based diversification.

System performance is to be assessed in terms of Cluster Recall at X (CR@X) — a measure that assesses how many different clusters from the ground truth are represented among the top X results (only relevant images are considered), Precision at X (P@X) — measures the number of relevant photos among the top X results, and F1-measure at X is defined as the harmonic mean of the previous two. Various cutoff points are to be considered, e.g., X={5,10, 20, 30, 40, 50}. Official ranking metrics will be the F1-measure @20 images, which gives equal importance to diversity (via CR@20) and relevance (via P@20). This metric simulates the content of a single page of a typical Web image search engine and reflects user behavior, i.e., inspecting the first page of results in priority. Additionally, we will consider further evaluation metrics, which are well-established in the information retrieval community, such as the intent-aware expected reciprocal rank (ERR-IA) and the ɑ-normalized discounted cumulative gain (ɑ-nDCG) metrics.

Recommended reading
[1] Boato, G., Dang-Nguyen, D.-T., Muratov, O., Alajlan, N., and De Natale, F. G. B. Exploiting visual saliency for increasing diversity of image retrieval results. Multimedia Tools and Applications, 2016, 75(10):5581–5602.

[2] Ionescu, B., Gînscă A. L., Boteanu, B., Lupu, M., Popescu, A., and Müller, H. Div150multi: A social image retrieval result diversification dataset with multi-topic queries. In: International Conference on Multimedia Systems (MMSys), 2016, pp. 46:1–46:6.

[3] Ionescu, B., Gînscă, A. L., Zaharieva, M., Boteanu, B., Lupu, M., and Müller, H. Retrieving diverse social images at MediaEval 2016: Challenge, dataset and evaluation. In: Proceedings of MediaEval Benchmarking Initiative for Multimedia Evaluation,, vol. 1739, 2016.

[4] Ionescu, B., Popescu, A., Lupu, M., Gînscă, A. L., Boteanu, B., and Müller, H. Div150cred: A social image retrieval result diversification with user tagging credibility dataset. In: International Conference on Multimedia Systems (MMSys), 2015, pp. 207–212.

[5] Ionescu, B., Popescu, A., Radu, A.-L., Müller, H. Result diversification in social image retrieval: a benchmarking framework. Multimedia Tools and Applications, 2016, 75(2):1301–1331.

[6] Santos, R. L. T., Macdonald, C., and Ounis, I. Search result diversification. Foundations and Trends in Information Retrieval, 2015, 9(1):1–90.

[7] Wang, M., Yang, K., Hua, X. S., and Zhang, H. J. Towards a relevant and diverse search of social images. IEEE Transactions on Multimedia, 2010, 12(8): 829–842.

Task organizers
Maia Zaharieva, Vienna University of Technology, Austria (contact person) maia.zaharieva at
Bogdan Ionescu, LAPI, University "Politehnica" of Bucharest, Romania
Alexandru Lucian Gînscă, CEA LIST, France
Rodrygo L.T. Santos, Universidade Federal de Minas Gerais (UFMG), Brazil
Henning Müller, University of Applied Sciences Western Switzerland in Sierre, Switzerland.

Task auxiliaries
Bogdan Boteanu, LAPI, University Politehnica of Bucharest, Romania
Mihai Lupu, Vienna University of Technology, Austria

Task schedule
1 April: Development data release
1 June: Test data release
17 August: Run submission
21 August: Results returned
28 August: Working notes paper: initial submission deadline
13-15 Sept: MediaEval Workshop in Dublin