The 2010 Placing Task

Geotagging, or assigning geographical coordinates to shared content, has recently become an overwhelmingly popular activity on the Web. Currently, there are over 130 million images on Flickr alone (October 2009) that are manually or automatically (via GPS devices) placed on the World map. At the same time, recent developments at leading video content sharing web-sites have motivated their users to start geotag their videos as well. So, it is already possible to browse thousands of Youtube or Flickr videos, just by traversing the map. Check out the following links:

YouTube videos on the map of Paris
Flickr videos on the World Map

Participants will try to automatically guess the location of the video, i.e., assign geo-coordinates (latitude and longitude) to videos using one or more of: video metadata (tags, titles), visual content, audio content, social information. Any use of open resources, such as gazetteers, or geo-tagged articles in Wikipedia is encouraged. The goal of the task is to come as close to possible to the geo-coordinates of the videos as provided by users or their GPS devices. Similar research towards automatic geotagging of images is recently described in:

Serdyukov, P., Murdock, V., and van Zwol, R. Placing flickr photos on a map. In SIGIR 2009.

Hays, J. and Efros, A. A. im2gps: estimating geographic information from a single image. In CVPR 2008.

Target group
The task is of interest to researchers in the area of geo-IR as well as social media.

The data set contained a set of geotagged Flickr videos (~10,000 for development and test purposes) and the metadata for geotagged Flickr images (~3.2 million). A set of basic visual features extracted for all images and for the frames of the videos was provided. Evaluation of runs submitted by participating groups was based on distances between the predicted and the actual geo-coordinates. Ground truth is supplied by Flickr users uploaded the videos and the images. All videos and images are shared by their owners under Creative Commons license.

Ground truth and evaluation
The geo-coordinates associated with the Flickr will be used as the groundtruth. Since these do not always serve to precisely pinpoint the location of the video, we evaluated at each of a series of widening circles: 1 km 10 km 100km 1000km 10000km

2010 Results and Links to Related Working Notes Papers
Five groups crossed the finish line for this task in MediaEval 2010. The best performing run was submitted by Ghent University, which used a two step approach based on metadata. In the first step, a language model identified the most likely area of the video. In the second step, the location of the video was pinpointed by identifying the closest resources (images and videos) from the training set.

ICSI proposed two approaches. The first exploited the prior distribution of tags, in particular choosing tag candidates extracted from video metadata on the basis of small spatial variance. The second approach also undertook supervised resolution of toponymes (using Geonames).

The TALP group at the Universitat Politecnica de Catalunya pursued an approach applying knowledge resources (a placenames database) combined with natural language processing used to detect and disambiguate geographical names in the metadata. The SINAI group from the University of Jaen applied geographic named entity recognizer.

The only group to make use of visual features was the Technical University of Berlin, who took a grid-based approach to the task, predicting the grid position of the video by combining a textual model based on metadata and a visual model based on low-level visual features.

Task coordinators:
Pavel Serdyukov, Delft University of Technology
Vanessa Murdock, Yahoo! Research