IIIT Hyderabad Publications |
|||||||||
|
Enhancing soccer analysis through computer vision: A study on player detection in broadcast videoAuthor: Chris Andrew Gadde 2018701019 Date: 2023-12-08 Report no: IIIT/TH/2023/193 Advisor:C V Jawahar AbstractPlayer detection is a fundamental building block for numerous applications in sports analytics, encompassing player recognition, player tracking, and activity detection. However, the majority of existing research in this domain relies on fixed-camera top-view videos of the field, which inherently simplifies the player detection task. Regrettably, such videos are not readily accessible to the general public, rendering them an unreliable data source for comprehensive player analysis. In contrast, broadcast videos of matches offer a readily available resource. Performing player detection on these videos proves considerably more challenging due to the presence of diverse sources of noise. This study investigates player detection in the context of continuous long-shot broadcast videos, acknowledging the complexities associated with this particular setting. In the initial phase of our research, we thoroughly examine the distinctions between player detection and person detection while also investigating the multitude of challenges inherent to player detection. We begin by formulating player detection as a domain adaptation problem and analysing the various challenges associated with this approach. Our analysis encompasses an in-depth examination of the overarching challenges encountered in player detection, along with a comprehensive exploration of the unique obstacles posed specifically by broadcast videos and domain adaptation settings. During the subsequent phase of our research, we worked on the development of an extensively annotated player detection dataset, curated from soccer broadcast videos of the FIFA 2018 World Cup matches. This dataset serves as a robust foundation for evaluating the efficacy of player detection algorithms within the context of broadcast videos. We devise a comprehensive pipeline for generating automatic labels for our dataset, which are then corrected further down the pipeline to facilitate the annotation process. The resultant dataset comprises over 200,000 high-resolution frame images, encompassing more than 2,000,000 annotated bounding boxes extracted from three distinct FIFA 2018 World Cup matches. Notably, our dataset encompasses a diverse set of player positions, orientations, and bounding box sizes, effectively capturing the inherent variability encountered in soccer broadcasts. Additionally, the dataset incorporates numerous instances of challenging noisy data points, elevating its complexity beyond previous datasets in the field. In the third phase of our research, we present a novel transductive approach to address the player detection challenge, treating it as a domain adaptation problem. We demonstrate the significance of instance-level domain labels in achieving effective adaptation, specifically for soccer broadcast videos. To efficiently annotate these domain labels on the bounding box predictions generated by our inductive model, we propose a sophisticated multi-model greedy labelling scheme that leverages visual features. The annotated domain labels are then utilized to train a transductive counterpart of the model, utilizing reliable instances derived from the inductive model inferences. This approach proves to be highly advantageous, enabling remarkable performance enhancements for a given match with a minimal number of labelled samples. Our experimental results highlight an average increase of 16 points in mean Average Precision (mAP) for soccer broadcast videos, accomplished by annotating domain labels for approximately 100 samples per video. In the culminating phase of our research, we demonstrate the practical utilization of robust player detection algorithms in constructing analytical systems to enhance game analysis. Specifically, we develop field heat maps that effectively depict the spatial distribution of players on the field over time. Leveraging bounding box detections derived from our proposed approach, we employ homographic projections to achieve accurate top-view registration of the detected bounding boxes in each frame. These generated heat maps serve as a valuable resource for deriving insightful inferences that directly correlate with significant events transpiring during the match. Furthermore, we present additional potential applications for leveraging reliable detection systems, while also outlining avenues for future enhancements and refinements to our system. Full thesis: pdf Centre for Visual Information Technology |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |