Automatic analysis of video in sports is a possible solution to the demands of fans and professionals for various kinds of information. Analyzing videos in sports has provided a wide range of applications, which include player positions, extraction of the ball’s trajectory, content extraction, and indexing, summarization, detection of highlights, on-demand 3D reconstruction, animations, generation of virtual view, editorial content creation, virtual content insertion, visualization and enhancement of content, gameplay analysis and evaluations, identifying player’s actions, referee decisions and other fundamental elements required for the analysis of a game. Recent developments in video analysis of sports have a focus on the features of computer vision techniques, which are used to perform certain operations for which these are assigned, such as detailed complex analysis such as detection and classification of each player based on their team in every frame or by recognizing the jersey number to classify players based on their team will help to classify various events where the player is involved. In higher-level analysis, such as tracking the player or ball, many more such evaluations are to be considered for the evaluation of a player’s skills, detecting the team’s strategies, events and the formation of tactical positions such as midfield analysis in various sports such as soccer, basketball, and also various sports vision applications such as smart assistants, virtual umpires, assistance coaches. A higher-level semantic interpretation is an effective substitute, especially in situations when reduced human intervention and real-time analysis are desired for the exploitation of the delivered system outputs.
Studies in Basketball | ||||
---|---|---|---|---|
Ref. | Problem Statement | Proposed Methodology | Precision and Performance Characteristics | Limitations and Remarks |
[3] | Recognizing actions of basketball players by using image recognition techniques | Bi-LSTM Sequence2Sequence | The metrics used to evaluate the method are the Spearman rank-order correlation coefficient, Kendall rank-order correlation coefficient, Pearson linear correlation coefficient, and Root Mean Squared Error and achieved 0.921, 0.803, 0.932, and 1.03, respectively. | The methodology failed to recognize difficult actions due to which accuracy is reduced. The accuracy of action recognition can be improved with a deep convolutional neural network. |
[26] | Multi-future trajectory prediction in basketball. | Conditional Variational Recurrent Neural Networks (RNN)—TrajNet++ | The proposed methodology was tested on Average Displacement Error and Final Displacement Error metrics. The methodology is robust if the number it achieves is smaller than 7.01 and 10.61. | The proposed methodology fails to predict the trajectories in the case of uncertain and complex scenarios. As the behavior of the basketball or players is dynamic, belief maps cannot steer future positions. Training the model with a dataset of different events can rectify the failures of predictions. |
[30] | Predicting line-up performance of basketball players by analyzing the situation of the field. | RNN + NN | At the point guard (pg) position 4 candidates were detected and at the center (c) position 3 candidates were detected. The total score of pg candidates is 13.67, 12.96, 13.42, 10.39, and where the total score of c candidates is 10.21, 14.08, and 13.48, respectively. | - |
[4] | Multiplayer tracking in basketball videos | YOLOv3 + Deep-SORT, Faster-RCNN + Deep-SORT, YOLOv3 + DeepMOT, Faster-RCNN + DeepMOT, JDE | Faster-RCNN provides better accuracy than YOLOv3 among baseline detectors. The joint Detection and Embedding method performs better in the accuracy of tracking and computing speed among multi-object tracking methods. | Tracking in specific areas such as severe occlusions and improving detection precision improves the accuracy and computation speed. By adopting frame extraction methods, in terms of speed and accuracy, it can achieve comprehensive performance, which may be an alternative solution. |
[12] | Recognizing the referee signals from real-time videos in a basketball game. | HOG + SVM, LBP + SVM | Achieved an accuracy of 95.6% for referee signal recognition using local binary pattern features and SVM classification. | In the case of a noisy environment, a significant chance of occlusion, an unusual viewing angle, and/or variability of gestures, the performance of the proposed method is not consistent. Detecting jersey color and eliminating all other detected elements in the frame can be the other solution to improve the accuracy of referee signal recognition. |
[2] | Event recognition in basketball videos | CNN | mAP for group activity recognition is 72.1% | The proposed model can recognize the global movement in the video. By recognizing the local movements, the accuracy can be improved. |
[31] | Analyzing the behavior of the player. | CNN + RNN | Achieved an accuracy of 76.5% for four types of actions in basketball videos. | The proposed model gives less accuracy for actions such as passing and fouling. This also gives less accuracy of recognition and prediction on the test dataset compared to the validation dataset. |
[5] | Tracking ball movements and classification of players in a basketball game | YOLO + Joy2019 | Jersey number recognition in terms of Precision achieved is 74.3%. Player recognition in terms of Recall achieved 89.8%. | YOLO confuses the overlapped image for a single player. In the subsequent frame, the tracking ID of the overlapped player is exchanged, which causes wrong player information to be associated with the identified box. |
[1] | Event classifications in basketball videos | CNN + LSTM | The average accuracy using a two-stage event classification scheme achieved 60.96%. | Performance can be improved by introducing information such as individual player pose detection and player location detection |
[21] | Classification of different defensive strategies of basketball payers, particularly when they deviate from their initial defensive action. | KNN, Decision Trees, and SVM | Achieved 69% classification accuracy for automatic defensive strategy identification. | Considered only two defensive strategies `switch’ and `trap’ involved in Basketball. In addition, the alternative method of labeling large Spatio-temporal datasets will also lead to better results. Future research may also consider other defensive strategies such as pick-and-roll and pick-and-pop. |
[10] | Basketball trajectory prediction based on real data and generating new trajectory samples. | BLSTM + MDN | The proposed method performed well in terms of convergence rate and final AUC (91%) and proved deep learning models perform better than conventional models (e.g., GLM, GBM). | To improve the accuracy time series the prediction has to consider. By considering factors such as player cooperation and defense when predicting NBA player positions, the performance of the model can be improved. |
[23] | Generating basketball trajectories. | GRU-CNN | Validated on a hierarchical policy network (HPN) with ground truth and 3 baselines. | The proposed model failed in the trajectory of a three-dimensional basketball match. |
[13] | Score detection, highlights video generation in basketball videos. | BEI+CNN | Automatically analyses the basketball match, detects scoring, and generates highlights. Achieved an accuracy, precision, recall, and F1-score of 94.59%, 96.55%, 92.31%, and 94.38%. | The proposed method is lacking in computation speed which achieved 5 frames per second. Therefore, it cannot be implemented in a real-time basketball match. |
[6] | Multi-person event recognition in basketball videos. | BLSTM | Event classification and event detection were achieved in terms of mean average precision, i.e., 51.6% and 43.5%. | A high-resolution dataset can improve the performance of the model. |
[16] | Player behavior analysis. | RNN | Achieved an accuracy of 80% over offensive strategies. | The methodology fails in many factors such as complexity of interaction, distinctiveness, and diversity of the target classes and other extrinsic factors such as reactions to defense, unexpected events such as fouls, and consistency of executions. |
[11] | Prediction of the 3-point shot in the basketball game | RNN | Evaluated in terms of AUC and achieved 84.30%. | The proposed method fails in the case of high ball velocity and the noisy nature of motion data. |
Studies in Soccer | ||||
---|---|---|---|---|
Ref. | Problem Statement | Proposed Methodology | Precision and Performance Characteristics | Limitations and Remarks |
[43] | Player and ball detection and tracking in soccer. | YOLOv3 and SORT | Methodology achieved tracking accuracy of 93.7% on multiple object tracking accuracy metrics with a detection speed of 23.7 FPS and a tracking speed of 11.3 FPS. | This methodology effectively handles challenging situations, such as partial occlusions, players and the ball reappearing after a few frames, but fails when the players are severely occluded. |
[44] | Player, referee and ball detection and tracking by jersey color recognition in soccer. | DeepPlayerTrack | The model achieved a tracking accuracy of 96% and 60% on MOTA and GMOTA metrics, respectively, with a detection speed of 23 FPS. | The limitation of this method is that, when a player with the same jersey color is occluded, the ID of the player is switched. |
[49] | Tracking soccer players to evaluate the number of goals scored by a player. | Machine Learning and Deep Reinforcement Learning. | Performance of the player tracking model measured in terms of mAP achieved 74.6%. | The method failed to track the ball at critical moments such as passing at the beginning and shooting. It also failed to overcome the identity switching problem. |
[67] | Extracting ball events to classify the player’s passing style. | Convolutional Auto-Encoder | The methodology was evaluated in terms of accuracy and achieved 76.5% for 20 players. | Concatenation of the auto-encoder and extreme learning machine techniques will improve classification of the event performance. |
[74] | Detecting events in soccer. | Variational Auto- encoder and EfficientNet | Achieved an F1-score of 95.2% event images and recall of 51.2% on images not related to soccer at a threshold value of 0.50. | The deep extreme learning machine technique which employs the auto-encoder technique may enhance the event detection accuracy. |
[54] | Action spotting soccer video. | YOLO-like encoder | The algorithm achieved an mAP of 62.5%. | - |
[50] | Team performance analysis in soccer | SVM | Prediction models achieved an overall accuracy of 75.2% in predicting the correct segmental and the outcome of the likelihood of the team making a successful attempt to score a goal on the used dataset. | The proposed model failed in identifying the players that are more frequently involved in match events that end with an attempt at scoring i.e., a `SHOT’ at goal, which may assist sports analysts and team staff to develop strategies suited to an opponent’s playing style. |
[57] | Motion Recognition of assistant referees in soccer | AlexNet, VGGNet-16, ResNet-18, and DenseNet-121 | The proposed algorithm achieved 97.56% accuracy with real-time operations. | Though the proposed algorithm is immune to variations of illuminance caused by weather conditions, it failed in the case of occlusions between referees and players. |
[79] | Predicting the attributes (Loss or Win) in soccer. | ANN | The proposed model predicts 83.3% for the winning case and 72.7% for loss. | - |
[82] | Team tactics estimation in soccer videos. | Deep Extreme Learning Machine (DELM). | The performance of the model is measured on precision, recall, and F1-score and achieved 87.6%, 88%, and 87.8%, respectively. | Team tactics are estimated based on the relationship between tactics of the two teams and ball possession. The method fails to estimate the team formation at the beginning of the game. |
[61] | Action recognition in soccer | CNN-based Gaussian Weighted event-based Action Classifier architecture | Accuracy in terms of F1-score achieved was 52.8% for 6 classes. | By classifying the actions into subtypes, the accuracy of action recognition can be enhanced. |
[34] | Detection and tracking of the ball in soccer videos. | VGG – MCNN | Achieved an accuracy of 87.45%. | It could not detect when the ball moved out of play in the field, in the stands region, or from partial occlusion by players, or when ball color matched the player’s jersey. |
[68] | Automatic event extraction for soccer videos based on multiple cameras. | YOLO | The U-encoder is designed for feature extraction and has better performance in terms of accuracy compared with fixed feature extractors. | To carry out a tactical analysis of the team, player trajectory needs to be analyzed. |
[52] | Shot detection in a football game | MobileNetV2 | The MobileNetV2 method performed better than other feature extractor methods. | Extracting the features with the MobileNetV2 and then using 3D convolution on the extracted features for each frame can improve detection performance. |
[58] | Predicting player trajectories for shot situations | LSTM | Performance is measured in terms of F1-score and achieved 53%. | The model failed to predict the player trajectory in the case of players confusing each other by changing their speed or direction unexpectedly. |
[81] | Analyzing the team formation in soccer and formulating several design goals. | OpenCV is used for back-end visualization. | The formation detection model achieved a max accuracy of 96.8%. | The model is limited to scalability as it cannot be used on high-resolution soccer videos. The results are bounded to a particular match, and it cannot evaluate the tactical schemes across different games. Visualization of real-time team formation is another drawback as it limits the visualization of non-trivial spatial information. By applying state-of-the-art tracking algorithms, one can predominantly improve the performance of tactics analysis. |
[32] | Player recognition with jersey number recognition. | Spatial Constellation + CNN | Achieved an accuracy of 82% by combining Spatial Constellation + CNN models. | The proposed model failed to handle the players that are not visible for certain periods. Predicting the position of invisible players could improve the quality of spatial constellation features. |
[62] | Evaluating and classifying the passes in a football game. | SVM | The proposed model achieves an accuracy of 90.2% during a football match. | To determine the quality of each pass, some factors such as pass execution of player in a particularly difficult situation, the strategic value of the pass, and the riskiness of the pass need to be included. To rate the passes in sequence, it is necessary to consider the sequence of passes during which the player possesses the ball. |
[56] | Detecting dribbling actions and estimating positional data of players in soccer. | Random forest | Achieved an accuracy of 93.3%. | The proposed methodology fails to evaluate the tactical strategies. |
[76] | Team tactics estimation in soccer videos. | SVM | The performance of the methodology is measured in terms of precision, recall, and F1-score and achieved 98%, 97%, and 98%. | The model fails when audiovisual features could not recognize quick changes in the team’s tactics. |
[66] | Analyzing past events in the case of non-obvious insights in soccer. | k-NN, SVM | To extract the features of pass location, they used heatmap generation and achieved an accuracy of 87% in the classification task. | By incorporating temporal information, the classification accuracy can be improved and also offers specific insights into situations. |
[33] | Tracking the players in soccer videos. | HOG + SVM | Player detection is evaluated in terms of accuracy and achieved 97.7%. Classification accuracy using k-NN achieved 93% for 15 classes. | - |
[51] | Action classification in soccer videos | LSTM + RNN | The model achieves a classification rate of 92% on four types of activities. | By extracting the features of various activities, the accuracy of the classification rate can be improved. |
Studies in Cricket | ||||
---|---|---|---|---|
Refs. | Problem Statement | Proposed Methodology | Precision and Performance Characteristics | Limitations and Remarks |
[90] | Shot classification in cricket. | CNN—Gated Recurrent Unit | It is evaluated in terms of precision, recall, and F1-score and achieved 93.40%, 93.10%, and 93% for 10 types of shots. | By incorporating unorthodox shots which are played in t20 in the dataset may improve the testing accuracy. |
[98] | Detecting the action of the bowler in cricket. | VGG16-CNN | It was evaluated in terms of precision, recall, and F1-score and the maximum average accuracy achieved is 98.6% for 13 classes (13 types of bowling actions). | Training the model with the dataset of wrong actions can improve detection accuracy. |
[102] | Movement detection of the batsman in cricket. | Deep-LSTM | The model was evaluated in terms of mean square error and achieved a minimum error of 1.107. | - |
[115][116] | Cricket video summarization. | Gated Recurrent Neural Network + Hybrid Rotation Forest-Deep Belief Networks YOLO | The methodology was evaluated in terms of precision, F1-score, accuracy and achieved 96.82%, 94.83%, and 96.32% for four classes. YOLO is evaluated on precision, recall, and F1-score and achieved 97.1%, 94.4%, and 95.7% for 8 classes. | Decision tree classifier performance is low due to the existence of a huge number of trees. Therefore, a small change in the decision tree may improve the prediction accuracy. Extreme Learning Machines have faced the problem of overfitting, which can be overcome by removing duplicate data in the dataset. |
[107] | Prediction of individual player performance in cricket | Efficient Machine Learning Techniques | The proposed algorithm achieves a classification accuracy of 93.73% which is good compared with traditional classification algorithms. | Replacing machine learning techniques with deep learning techniques may improve the performance in prediction even in the case of different environmental conditions. |
[87] | Classification of different batting shots in cricket. | CNN | The average classification in terms of precession is 0.80, Recall is 0.79 and F1-score is 0.79. | To improve the accuracy of classification, a deep learning algorithm has to be replaced with a better neural network. |
[103] | Outcome classification task to create automatic commentary generation. | CNN + LSTM | Maximum of 85% of training accuracy and 74% validation accuracy | Due to the unavailability of the standard dataset for the ball by ball outcome classification in cricket, the accuracy is not up to mark. In addition, better accuracy leads to automatic commentary generation in sports. |
[105] | Detecting the third umpire decision and an automated scoring system in a cricket game. | CNN + Inception V3 | It holds 94% accuracy in the Deep Conventional Neural Network (DCNN) and 100% in Inception V3 for the classification of umpire signals to automate the scoring system of cricket. | To build an automated umpiring system based on computer vision application and artificial intelligence, the results obtained in this paper are more than enough. |
[106] | Classification of cricket bowlers based on their bowling actions. | CNN | The test set accuracy of the model is 93.3% which demonstrates its classification ability. | The model lacks data for detecting spin bowlers. As the dataset is confined to left-arm bowlers, the model misclassifies the right-arm bowlers. |
[88] | Recognition of various batting shots in a cricket game | Deep-CNN | The proposed models can recognize a shot being played with 90% accuracy. | As the model is dependent on the frame per second of the video, it fails to recognize when the frames per second increases. |
[104] | Automatic highlight generation in the game of cricket. | CNN + SVM | Mean Average Precision of 72.31% | The proposed method cannot clear metrics to evaluate the false positives in highlights. |
[106] | Umpire pose detection and classification in cricket. | SVM | VGG19-Fc2 Player testing accuracy of 78.21% | Classification and summarization techniques can minimize false positives and false negatives. |
[89] | Activity recognition for quality assessment of batting shots. | Decision Trees, k-Nearest Neighbours, and SVM. | The proposed method identifies 20 classes of batting shots with an average F1-score of 88% based on the recorded movement of data. | To assess the player’s batting caliber, certain aspects of batting also need to be considered, i.e., the position of the batsman before playing a shot and the method of batting shots for a particular bowling type can be modeled. |
[109][110] | Predicting the outcome of the cricket match. | k-NN, Naïve Bayesian, SVM, and Random Forest | Achieved an accuracy of 71% upon the statistics of 366 matches. | Imbalance in the dataset is one of the causes which produces lower accuracy. Deep learning methodologies may give promising results by training with a dataset that included added features. |
[97] | Performance analysis of the bowler. | Multiple regression | Variation in ball speed has a feeble significance in influencing the bowling performance (the p-value being 0.069). The variance ratio of the regression equation to that of the residuals (F-value) is given as 3.394 with a corresponding p-value of 0.015. | - |
[108] | Predicting the performance of the player. | Multilayer perceptron Neural Network | The model achieves an accuracy of 77% on batting performance and 63% on bowling performance. | - |
Studies in Tennis | ||||
---|---|---|---|---|
Ref. | Problem Statement | Proposed Methodology | Precision and Performance Characteristics | Limitations and Remarks |
[118] | Monitoring and Analyzing tactics of tennis players. | YOLOv3 | The model achieved an mAP of 90% with 13 FPS on high-resolution images. | Using a lightweight backbone for detection, modules can improve the processing speed. |
[132] | Player action recognition in tennis. | Temporal Deep Belief Network (Unsupervised Learning Model) | The accuracy of the recognition rate is 94.72% | If two different movements are similar, then the model fails to recognize the current action. |
[136] | Tennis swing classification. | SVM, Neural Network, K-NN, Random Forest, Decision Tree | Maximum classification accuracy of 99.72% achieved using NN with a Recall of 1. The second-highest classification accuracy of 99.44% was achieved using K-NN with a recall of 0.98. | If the play styles of the players are different but the patterns are the same, in that case, models failed to classify the current swing direction. |
[130] | Player activity recognition in a tennis game. | Long Short Term Memory (LSTM) | The average accuracy of player activity recognition based on the historical LSTM model was 0.95, and that of the typical LSTM model was 0.70. | The model lacks real-time learning ability and requires a large computing time at the training stage. The model also lacks online learning ability. |
[121] | Automatic detection and classification of change of direction from player tracking data in a tennis game. | Random Forest Algorithm | Among all the proposed methods, model 1 had the highest F1-score of 0.801, as well as the smallest rate of false-negative classification (3.4%) and average accuracy of 80.2% | In the case of non-linear regression analysis, the classification performance of the proposed model is not up to the mark. |
[127] | Prediction of shot location and type of shot in a tennis game. | Generative Adversarial Network (GAN) (Semi-Supervised Model) | The performance factor is measured based on the minimum distance recorded between predicted and ground truth shot location. | The performance of the model deviates from the different play styles as it is trained on the limited player dataset. |
[133] | Analyzing individual tennis matches by capturing spatio-temporal data for player and ball movements. | For data extraction, a player and ball tracking system such as HawkEye is used. | Generation of 1-D space charts for patterns and point outcomes to analyze the player activity. | The performance of the model deviates from different matches, as it was trained only on limited tennis matches. |
[131] | Action recognition in tennis | 3-Layered LSTM | The classification accuracies are as follows: Improves from 84.10 to 88.16% for players of mixed abilities. Improves from 81.23 to 84.33% for amateurs and from 87.82 to 89.42% for professionals, when trained using the entire dataset. | The detection accuracy can be increased by incorporating spatio-temporal data and combining the action recognition data with statistical data. |
[135] | Shot prediction and player behavior analysis in tennis | For data extraction, player and ball tracking systems such as HawkEye are used and a Dynamic Bayesian Network for shot prediction is used. | By combining factors (Outside, Left Top, Right Top, Right Bottom) together, speed, start location, the player movement assessment achieved better results of 74% AUC. | As the model is trained on limited data (only elite players), it cannot be performed on ordinary players across multiple tournaments. |
[122] | Ball tracking in tennis | Two-Layered Data Association | Evaluation results in terms of precision, recall, F1-score are 84.39%, 75.81%, 79.87% for Australian open tennis matchwa and 82.34%, 67.01%, 73.89% for U.S open tennis matches. | The proposed method cannot handle multi-object tracking and it is possible to integrate audio information to facilitate high-level analysis of the game. |
[134] | Highlight extraction from rocket sports videos based on human behavior analysis. | SVM | The proposed algorithm achieved an accuracy of 90.7% for tennis videos and 87.6% for badminton videos. | The proposed algorithm fails to recognize the player, as the player is a deformable object of which the limbs perform free movement during action recognition. |
Studies in Volleyball | ||||
---|---|---|---|---|
Ref. | Problem Statement | Proposed Methodology | Precision and Performance Characteristics | Limitations and Remarks |
[147] | Group activity recognition by tracking players. | CNN + Bi-LSTM | The model achieved an accuracy of 93.9%. | The model fails to track the players if the video is taken from a dynamic camera. Temporal action localization can improve the accuracy of tracking the players in severe occlusion conditions. |
[148] | Recognizing and classifying player’s behavior. | SVM | The achieved recognition rate was 98% for 349 correct samples. | - |
[141] | Classification of tactical behaviors in beach volleyball. | RNN + GRU | The model achieves better classification results as prediction accuracies range from 37% for forecasting the attack and direction to 60% for the prediction of success. | By employing a state-of-the-art method and training on a proper dataset that has continuous positional data, it is possible to predict tactics behavior and set/match outcomes. |
[149] | Motion estimation for volleyball | Machine Vision and Classical particle filter. | Tracking accuracy is 89% | Replacing methods with deep learning algorithms gives better results. |
[142] | Assessing the use of Inertial Measurement Units in the recognition of different volleyball actions. | KNN, Naïve Bayes, SVM | Unweighted Average Recall of 86.87% | By incorporating different frequency domain features, the performance factor can be improved. |
[31] | Predicting the ball trajectory in a volleyball game by observing the the motion of the setter player. | Neural Network | The proposed method predicts 0.3 s in advance of the trajectory of the volleyball based on the motion of the setter player. | In the case of predicting the 3D body position data, the method records a large error. This can be overcome by training properly annotated large data on state-of-art-methods. |
[138] | Activity recognition in beach volleyball | Deep Convolutional LSTM | The approach achieved a classification accuracy of 83.2%, which is superior compared with other classification algorithms. | Instead of using wearable devices, computer vision architectures can be used to classify the activities of the players in volleyball. |
[144] | Volleyball skills and tactics analysis | ANN | Evaluated in terms of Average Relative Error for 10 samples and achieved 0.69%. | - |
[139] | Group activity recognition in a volleyball game | LSTM | Group activity recognition of accuracy of the the proposed model in volleyball is 51.1%. | The performance of architecture is poor because of the lack of hierarchical considerations of the individual and group activity dataset. |
Studies in Hockey | ||||
---|---|---|---|---|
Ref. | Problem Statement | Proposed Methodology | Precision and Performance Characteristics | Limitations and Remarks |
[150] | Detecting the player in hockey. | SVM, Faster RCNN, SSD, YOLO | HD+SVM achieved the best results in terms of accuracy, recall, and F1-score with values of 77.24%, 69.23%, and 73.02%. | The model failed to detect the players in occlusion conditions. |
[162] | Localizing puck Position and Event recognition. | Faster RCNN | Evaluated in terms of AUC and achieved 73.1%. | Replacing the detection method with the YOLO series can improve the performance. |
[155] | Identification of players in hockey. | ResNet + LSTM | Achieves player identification accuracy of over 87% on the split dataset. | Some of the jersey number classes such as 1 to 4 are incorrectly predicted. The diagonal numbers from 1 to 100 are falsely classified due to the small number of training examples. |
[151] | Activity recognition in a hockey game. | LSTM | The proposed model recognizes the activities such as free hits, goals, penalties corners, and long corners with an accuracy of 98%. | As the proposed model is focused on spatial features, it does not recognize activities such as free hits and long corners as they appear as similar patterns. By including temporal features and incorporating LSTM into the model, the model is robust to performance accuracy. |
[154] | Pose estimation and temporal-based action recognition in hockey. | VGG19 + LiteFlowNet + CNN | A novel approach was designed and achieved an accuracy of 85% for action recognition. | The architecture is not robust to abrupt changes in the video, e.g., it fails to predict hockey sticks. Activities such as a goal being scored, or puck location, are not recognized. |
[161] | Action recognition in ice hockey using a player pose sequence. | CNN+LSTM | The performance of the model is better in similar classes such as passing and shooting. It achieved 90% parameter reduction and 80% floating-point reduction on the HARPET dataset. | As the number of hidden units to LSTM increases, the number of parameters also increases, which leads to overfitting and low test accuracy. |
[152] | Human activity recognition in hockey. | CNN+LSTM | An F1-score of 67% was calculated for action recognition on the multi-labeled imbalanced dataset. | The performance of the model is poor because of the improper imbalanced dataset. |
[153] | Player action recognition in an ice hockey game | CNN | The accuracy of the actions recognized in a hockey game is 65% and when similar actions are merged accuracy rises to 78%. | Pose estimation problems due to severe occlusions when motions blur due to the speed of the game and also due to lack of a proper dataset to train models, all causing low accuracy. |
Studies in Badminton | ||||
---|---|---|---|---|
Ref. | Problem Statement | Proposed Methodology | Precision and Performance Characteristics | Limitations and Remarks |
[169] | Shuttlecock detection problem of a badminton robot. | Tiny YOLOv2 and YOLOv3 | Results show that, compared with state-of-art methods, the proposed networks achieved good accuracy with efficient computation. | The proposed method fails to detect different environmental conditions. As it uses the binocular camera to detect a 2D shuttlecock, it cannot detect the 3D shuttlecock trajectory. |
[165] | Automated badminton player action recognition in badminton games. | AlexNet+CNN, GoogleNet+CNN and SVM | Recognition of badminton actions by the linear SVM classifier for both AlexNet and GoogleNet using local and global extractor methods is 82 and 85.7%. | The architecture can be improved by fine-tuning in an end-to-end manner with a larger dataset on features extracted at different fully connected layers. |
[166] | Badminton activity recognition | CNN | Nine different activities were distinguished: seven badminton strokes, displacement, and moments of rest. With accelerometer data, accurate estimation was conducted using CNN with 86% precision. Accuracy is raised to 99% when gyroscope data are combined with accelerometer data. | Computer vision techniques can be employed instead of sensors. |
[167] | Classification of badminton match images to recognize the different actions were conducted by the athletes. | AlexNet, GoogleNet, VGG-19 + CNN | Significantly, the GoogleNet model has the highest accuracy compared to other models in which only two-hit actions were falsely classified as non-hit actions. | The proposed method classifies the hit and non-hit actions and it can be improved by classifying more actions in various sports. |
[170] | Tracking shuttlecocks in badminton | An AdaBoost algorithm which can be trained using the OpenCV Library. | The performance of the proposed algorithm was evaluated based on precision and it achieved an average precision accuracy of 94.52% with 10.65 fps. | The accuracy of tracking shuttlecocks is enhanced by replacing state-of-the-art AI algorithms. |
[164] | Tactical movement classification in badminton | k-Nearest Neighbor | The average accuracy of player position detection is 96.03 and 97.09% on two halves of a badminton court. | The unique properties of application such as the length of frequent trajectories or the dimensions of the vector space may improve classification performance. |
Studies in Various Sports | ||||
---|---|---|---|---|
Ref. | Problem Statement | Proposed Methodology | Precision and Performance Characteristics | Limitations and Remarks |
[185] | Beach sports image recognition and classification. | CNN | The model achieved a recognition accuracy of 91%. | Lightweight networks of deep learning algorithms can improve the recognition accuracy and can also be implemented in real-time scenarios. |
[173] | Motion image segmentation in the sport of swimming | GDA + SVM | The performance of the Symmetric Difference Algorithm was measured in terms of recall and achieved 76.2%. | Using advanced optimization techniques such as Cosine Annealing Schedulers with deep learning algorithms may improve the performance. |
[174] | Identifying and recognizing wrong strokes in table tennis. | k-NN, SVM, Naïve Bayes | Performs various ML algorithms and achieves an accuracy of 69.93% using the Naïve Bayes algorithm. | A standard dataset can improve the accuracy of recognizing the wrong strokes in table tennis. |
[186] | Multi-player tracking in sports | Cascade Mask R-CNN | The proposed Deep Player Identification method studies the patterns of jersey number, team class, and pose-guided partial feature. To handle player identity switching, the method correlates the coefficients of player ID in the K-shortest path with ID. The proposed framework achieves state-of-art performance. | When compared with existing methods, the computation cost is higher and can be considered a major drawback of the proposed framework. To refine 2D detection, temporal information needs to be considered and can be transferred to tracking against a real-time performance such as soccer, basketball, etc. |
[189] | Individual player tracking in sports events. | Deep Neural Network | Achieved an Area Under Curve (AUC) of 66% | Tracking by jersey number recognition may increase the performance of the model. |
[178] | Skelton-based key pose recognition and classification in sports | Boltzmann machine+CNN Deep Boltzmann machine + RNN | The proposed architecture successfully analyses feature extraction, motion attitude model, motion detection, and behavior recognition of sports postures. | The architecture is bound to individual-oriented sports and can be further implemented on group-based sports, in case of challenges such as severe occlusion, misdetection due to failure in blob detection in object tracking. |
[199] | Human action recognition and classification in sports | VGG 16 + RNN | The proposed method achieved an accuracy of 92% for ten types of sports classification. | The model fails in the case of scaling up the dataset for larger classification which shows ambiguity between players and similar environmental conditions. Football, Hockey; Tennis, Badminton; Skiing, Snowboarding; these pairs of classes have similar environmental features; thus, it is only possible to separate them based on relevant actions which can be achieved by state-of-the-art methods. |
[212] | Replay and key event detection for sports video summarization | Extreme Learning Machine (ELM) | The framework is evaluated on a dataset that consists of 20 videos of four different sports categories. It achieves an average accuracy of 95.8%, which illustrates the significance of the method in terms of key-event and replay detection for video summarization. | The performance of the proposed method drops in the case of the absence of a gradual transition of a replay segment. It can be extended by incorporating artificial intelligence techniques. |
[203] | Event detection in sports videos for unstructured environments with arbitrary camera angles. | Mask RCNN + LSTM | The proposed method is accurate in unsupervised player extraction, which is used for precise temporal segmentation of rally scenes. It is also robust to noise in the form of camera shaking and occlusions. | It can be extended to doubles games with fine-grained action recognition for detecting various kinds of shots in an unstructured video and it can be extended to analyze videos of games such as cricket, soccer, etc. |
[204] | Human motion quality assessment in complex motion scenarios. | 3-Dimensional CNN | Achieved an accuracy of 81% on the MS-COCO dataset. | Instead of the Stochastic Gradient Descent technique for learning rate, using the Cosine annealing scheduler technique may improve the performance. |
[187] | Court detection using markers, player detection, and tracking using a drone. | Template Matching + Particle Filter | The proposed method achieves better accuracy (94%) in the case of two overlapping players | As the overlapping of players, increases the accuracy of detection and tracking decreases due to similar features of players on the same team. The method uses a template matching algorithm, which can be replaced with a deep learning-based state-of-art algorithm to acquire better results. |
[188] | Target tracking theory and analyses its advantages in video tracking. | Mean Shift + Particle Filter | Achieves better tracking accuracy compared to existing algorithms such as TMS and CMS algorithms. | If the target scales change then the tracking of players fails due to the unchanged window of the mean-shift algorithm. Furthermore, it cannot track objects which are similar to the background color. The accuracy of tracking players can be improved by replacing them with artificial intelligence algorithms. |
[211] | Automatically generating a summary of sports video. | 2D CNN + LSTM | Describes a novel method for automatic summarization of user-generated sports videos and demonstrated the results for Japanese fencing videos. | The architecture can be improved by fine-tuning in an end-to-end manner with a larger dataset for illustrating potential performance and also to evaluate in the context of a wider variety of sports. |
[202] | Action Recognition and classification | SVM | Achieved an accuracy of 59.47% on the HMDB 51 dataset. | In cases where the object takes up most of the frame, the human detector cannot completely cover the body of the object. This leads to the system missing movements of body parts such as hands and arms. In addition, recognition of similar movements is a challenge for this architecture. |
This entry is adapted from the peer-reviewed paper 10.3390/app12094429