Overview of VideoCLEF 2009

VideoCLEF 2009 was a track of the CLEF benchmark campaign that was devoted to developing and evaluating tasks involving access to video content in a multilingual environment. In 2009, there were three tasks. For each task, the participants were provided with a corpus of video data (Dutch-language television, predominantly documentaries) accompanied by speech recognition transcripts.

The first task, called “Subject Classification," involved automatic tagging of videos with subject theme labels (e.g.,'Music', 'History', 'Politics', and 'Museums') used in the archive of the Netherlands Institute of Sound and Vision, who supplied the video data. The TRECVid 2007 and 2008 data sets were used for training and test. The best performance was achieved by approaching subject tagging as an information retrieval task. The highest scoring approach made use of a combination of the speech recognition transcripts along with archival metadata, which was also supplied to participants. Pseudo-relevance feedback and thesaurus-based query expansion proved helpful. Classifiers were also used. Training was carried out using either the speech recognition transcripts from the test data or data collected from Wikipedia or general Web search.

The second task, called “Affect," involved detecting narrative peaks in short-form documentaries. Narrative peaks are points in the narrative flow of a video in which viewers perceive an increase in the dramatic tension. The goal of the Affect task is to move beyond the thematic content and analyze characteristics that are important for viewers, but are orthogonal to the video's subject matter. The task was carried out on the “Beeldenstorm" collection containing 45 short-form documentaries on the visual arts. The best runs exploited affective vocabulary and audience directed speech. Other approaches included using topic changes, elevated speaking pitch, increased speaking intensity and radical visual changes.

The third task, called "Finding Related Resources Across Languages," involved linking video to material on the same subject in a different language. Participants were provided with a list of multimedia anchors (short video segments) in the Dutch-language “Beeldenstorm" collection and were expected to return target pages drawn from English-language Wikipedia. Ground truth for the task was created by three assessors, who identified one primary link and several secondary links for each of the 165 anchors in the data set. The best performing methods used the transcript of the speech spoken during the multimedia anchor to build a query to search an index of the Dutch-language Wikipedia. The Dutch Wikipedia pages returned were used to identify related English pages. The top performer achieved a Mean Reciprocal Rank of 0.25 (average inverse rank position of the primary link). Participants also experimented with pseudo-relevance feedback, query translation and methods that targeted proper names.

VideoCLEF at CLEF 2009
The VideoCLEF Session of the CLEF workshop (Corfu, Greece, 30 Sept-2 Oct 2009) included the following presentations:

"Classification as IR task: Experiments and Observations"
Jens Kuersten, Technical University of Chemnitz
"Identification of Narrative Peaks in Clips: Text Features Perform Best."
Mohammad Soleymani, University of Geneva
"When to cross over? Cross-language linking using Wikipedia for VideoCLEF 2009"
Agnes Gyarmati, Dublin City University
"A cocktail approach to the VideoCLEF'09 linking task"
Stephan Raaijmakers, TNO


Thank you to TrebleCLEF for providing the funding for “Dublin Days,” the VideoCLEF 2009 assessment and data set creation effort. Thank you to the Dublin Days participants for all their hours of effort!

Thank you to Michael Kipp for use of the Anvil video annotation research tool.

Anvil Video Annotation Research Tool Logo