Spoken Web Search

The 2011 Spoken Web Search Task
The task involves searching FOR audio content WITHIN audio content USING an audio content query. This task is particularly interesting for speech researchers in the area of spoken term detection. The task addresses the challenge of multiple, resource-limited languages. The task would require researchers to build a language-independent audio search system so that given a query should be able to find the appropriate audio file and the location of query term within the audio file.

Target group
The task is of interest to researchers in the area of speech technology, spoken term detection and spoken content search.

Data
Participants will be provided with a data set that has been kindly made available by the Spoken Web team at IBM Research, India. The audio content is spontaneous speech that has been created over phone in a live setting by low-literate users. While most of the audio content is related to farming practices, there are other domains as well. The data set comprises audio from four different Indian languages: English, Hindi, Gujarati and Telugu. Each data item is an 8 KHz audio file ca. 4-30 secs in length. In total there are approximately 400 items, plus about 100 spoken search queries. Language labels will not be provided. Participants will be able to use any additional resources they might have available, as long as their use is documented.

Recommended reading
Results of the 2006 Spoken Term Detection Evaluation. Fiscus, J., Ajot, J., Garofolo, J., Doddington, G. The 2007 Special Interest Group on Information Retrieval (SIGIR-07) Workshop in Searching Spontaneous Conversational Speech

Ground truth and evaluation
The ground truth is created manually and provided by the task organizers, following the principles of NIST's Spoken Term Detection (STD) evaluations.

Recommended reading
Arun Kumar, Nitendra Rajput, Dipanjan Chakraborty, Sheetal K. Agarwal, Amit Anil Nanavati, “WWTW: The World Wide Telecom Web,” NSDR 2007 (SIGCOMM workshop), Kyoto, Japan, 27 August, 2007.

Task organizers:
Nitendra Rajput, IBM Research India
Florian Metze, CMU

MediaEval Benchmarking Initiative for Multimedia Evaluation

The "multi" in multimedia: speech, audio, visual content, tags, users, context