Sports video

The 2019 Sport Video Classification Task: Table Tennis

Task description
Participants are provided with a set of videos of table tennis games and are required to build a classification system that automatically labels video segments with the strokes that players can be seen using in those segments.

Action detection and classification is one of the main challenges in visual content analysis and mining. Sport video analysis has been a very popular research topic, due to the variety of application areas, ranging from multimedia intelligent devices with user-tailored digests, up to analysis of athletes' performances. Datasets focused on sports activities or datasets including a large amount of sport activity classes are now available and many researches benchmark on those datasets. A large amount of work is also devoted to fine-grained classification through the analysis of sport gestures using motion capture systems. However, body-worn sensors and markers could disturb the natural behavior of sports players. Furthermore, motion capture devices are not always available for potential users, be it a University Faculty or a local sport team. Later years will build upon this first, basic task that is offered in 2019. The ultimate goal of this research is to produce automatic annotation tools for sport faculties, local clubs and associations to help coaches to better assess and advise athletes during training

This task offers researchers an opportunity to compare their approaches to fine-grained sports Video Annotation by testing them on the task of recognizing strokes in table tennis videos. The low inter-class variability makes the task more difficult than with usual general datasets, like UCF-101 and DeepMind Kinetics.

Target group
The task is of interest to researchers in the areas of machine learning (classification), visual content analysis, computer vision and sport performance. We explicitly encourage researchers focusing specifically in domains of computer-aided analysis of sport performance.

Data
Our focus is on recordings that have been made by widespread and cheap video cameras, e.g. GoPro. We use a dataset specifically recorded in a sport faculty facility and continuously completed by students and teachers. This dataset is constituted of player-centred videos recorded in natural conditions without markers or sensors. It comprises 20 table tennis strokes and a rejection class. The problem is hence a typical research topic in the field of video indexing: for a given recording, we need to label the video by recognizing each stroke appearing in the whole video.

Ground truth and evaluation
Twenty stroke classes and an additional rejection class are considered according to the rules of table tennis. This taxonomy was designed with professional table tennis teachers. We are working on videos recorded at the Faculty of Sports of the University of Bordeaux. Students are the sportsmen filmed and the teachers are supervising exercises conducted during the recording sessions. The recordings are markerless and allow the players to perform in natural conditions. In each video file a table tennis stroke is delimited by temporal borders. The latter are supplied in an xml file. For each test video the participants are invited to produce an xml file in which each stroke is labeled accordingly to a given taxonomy. Submissions will be evaluated in terms of accuracy per class of a stroke and of global accuracy.

Recommended reading
Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik, AVA: A video dataset of spatio-temporally localized atomic visual actions, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6047-6056.

K. Soomro, A. R. Zamir, and M. Shah, “UCF101: A dataset of 101 hu- man actions classes from videos in the wild,”
https://www.crcv.ucf.edu/papers/UCF101_CRCV-TR-12-01.pdf

Laptev, and Cordelia Schmid, “Long- term temporal convolutions for action recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 6, pp. 1510–1517, 2018.

Joao Carreira, and Andrew Zisserman, Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6299-6308.

Pierre-Etienne Martin, Jenny Benois-Pineau, Renaud Péteri, and Julien Morlier, Sport action recognition with siamese spatio-temporal CNNs: Application to table tennis, in CBMI 2018. 2018, pp. 1–6, IEEE.

Task organizers
Jenny Benois-Pineau, Labri Bordeaux, France (contact person) jenny.benois-pineau (at) u-bordeaux.fr
Pierre-Etienne Martin, Labri Bordeaux, France
Boris Mansencal, Labri Bordeaux, France
Julien Morlier, IMS Bordeaux, France
Renaud Péteri, MIA La Rochelle, France renaud.peteri (at) univ-lr.fr
Laurent Mascarilla, MIA La Rochelle, France
Jordan Calandre, MIA La Rochelle, France

Task schedule
Data release: 5 July (updated)
Runs due: 20 September
Results returned: 23 September
Working Notes paper due: 30 September
MediaEval 2019 Workshop (in France, near Nice): 27-29 October 2019

MediaEval Benchmarking Initiative for Multimedia Evaluation

The "multi" in multimedia: speech, audio, visual content, tags, users, context