Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Module COMP52615

Assignment

MISCADA Computer Vision module

academic year 2023/2024

Academic in charge and contact information

Professor Paolo Remagnino, Department of Computer Science

You can reach me by email[email protected]

If a meeting in person is required, send mean email and I will arrange for a suitable date and time.

Introduction

In this assignment, you are asked to implement software for the detection of a badminton court and then players on the court. You are then asked to track the players on the court. Data are provided, read the instructions on how to get hold of the dataset in the dedicated section. Image frames of a videoclip of two international badminton matches have been zipped.

Task specification

A badminton court is defined by a rectangular area delimited by lines. The court in the available dataset is green, although a court could be of different colour. A badminton court has a rectangular shape of some given dimensions (see figure 1 for more details), divided into rectangular regions, related to the game itself.

figure 1: A badminton court, left for single, right for double matches.

You might be a badminton player, perhaps a professional, however, if you are not familiar with the game,you can find the international rules athttps://olympics.com/en/news/badminton- guide-how-to-play-rules-olympic-history.

We will work with both single and double matches. You are asked to implement solutions to solve the following tasks:

•    Task 1: Image frames detection - for this part of the assignment, you are asked to utilise both colour segmentation and pattern  segmentation to extract all the frames in the provided videoclips that contain the full badminton court. A court is fully visible when an action is shot from the distance, to provide full visibility of a game; this could be from the side of a court, or from behind one of the two teams. See figure 2 for examples.

Expected Output: Image frames with a fully visible court should be saved in a list of image names, easily accessible to retrieve the frames for further processing; please do adhere to the image frame names of the provided datasets.

figure 2: left, single match, from perspective of the player in black right,a double match, from the perspective of the yellow team.

•    Task 2: Court detection - for all the extracted image frames that contain the full view of the court, you are asked to annotate the court with lines. For this, you will have to develop code that extracts the lines and effectively builds a 2D model of the court.

Expected Output:  both coordinates of the  polygons used to highlight the court image frames with overlapped court should be provided.

•    Task  3:  Player detection  -  for this part, you are asked to exploit one of the taught algorithms to extract bounding boxes that identify the players on the court. If a person is detected outside the court, its bounding box must be discarded as an outlier.

Expected Output: both coordinates of the rectangles bounding the players and the image frames with overlapped bounding rectangles should be provided.

•    Task 4: Player tracking - for this part, you are asked to use one of the taught techniques to track the players. To be borne in mind, while the TV camera does not move much from the best perspective of the game, it might still pan a little, this movement will have to be compensated for by using the information about the court.

Expected Output: For this task you should provide short videoclips using the image frames and overlapped court lines and bounding rectangles for the players.

Dataset

Data can be found at

https://drive.google.com/drive/folders/15ihQI3D9tU8VsZJsU5sP0FV9JPCpMwr5?usp=share_link

They are two zip files containing image frames of short clips of two badminton matches, a single and a double, both played at international level. For simplicity, image frames from clips have been extracted and compressed in zip files.


Marking scheme

Task                                                              comment                                            %

Description   A general introduction to your solution must be provided (5%), as well as a detailed description of how you solve each of the following tasks (5%).   10

Task 1   Hint: this seems simple, but it can be tricky, as you have learnt from the lectures and workshops what appears to be of a given colour, blue say, is not really blue, so a model of colour in a suitable colour space must be devised. You might also want to exploit the idea that a court is well delineated by white lines. Whenever you can, do use combination of information, with a multitude of features.

Marking: the implemented method must extract all the frames that have the entire court fully visible, marks will be detracted if some frames are missed, or some are included erroneously. A 10% will be assigned to a fully working method, 5% only if the method partially works.   10

Task 2   Hint: Once the frames of interest have been selected, you then can search for lines of interest, or shape of interest and find the court delimiters, the lines defining each side of the court and the various areas. For this part you can exploit deep learning methods that have been previously trained to recognise a court.

Marking: the implementation must extract all the lines correctly, so errors in the detection will result in the detraction of marks. Full 30% to a complete solution, if the solution works in part, then 15%.   30

Task 3   Hint: For this part I am expecting you to exploit the latest YOLO and SAM versions to detect the players. The players will have to be in and on the court, so one or more criteria will have to be defined to discard any proposals (deep network suggestions of bounding rectangles) that does not fit. You can also use the idea that at most two players are on each side of the court.

Marking: the implementation exploits existing deep architectures, so validation and testing will be essential to get full marks. Ideally, the result will have to show “tight” bounding rectangles for the extracted players, with an error measure such as the intersection over union (IoU) as a viable metric. Comparison of existing architectures applied to the problem attracts 10%, the other 10% is for the most suitable metric to assess the result.   20

Task 4   Hint: For this task, you can use the dynamic filters you have been taught to track players on the court. You can feel free to use any deep learning method as well you have read during the four weeks.

Marking: the accuracy of the tracking will be a determining factor to obtain full marks, you are recommended to use the metrics you used to solve Task 3. Full marks if any spatial-temporal filter is implemented and manages to track players on the court. A comparison of at least two filters attracts 15%. The other 5% is for the generation of a video clip as qualitative output and 10% for quantitative results.   30

Total   100


Submission instructions

You are asked to create a single Jupyter Notebook, where you will provide the textual description of your solutions and the implemented code. Your notebook should be structured in sections. An introduction should describe in detail the libraries you used, where to find them and  how you solved the tasks. Then you should include one section for each  task solution, where again you describe your solution. The code should be executed, so that the solutions are visualised, in terms of graphics, images and results; it is strongly recommended you also include snippets of the formulae implemented by the used algorithms  and  the graphics of the employed architecture. Your code should be implemented in Python, using OpenCV as the main set of computer vision libraries. PyTorch or TensorFlow can be used to use the deep learning methods you have chosen. Please do make sure your code runs. At the beginning of your Notebook,  provide a detailed description of how you solved the tasks, including a description of all the libraries you have used and where to find them.