The knee is a hinge joint held together by four ligaments. The ligaments are a structure in the knee that hold the bones together and help control joint motion. As shown in Figure 1, there is a ligament (collateral ligament) on each side of the knee and two ligaments deep within the knee. The anterior cruciate ligament (ACL) and posterior cruciate ligament (PCL) are the two ligaments in the knee that cross each other. The two ligaments attach to the end of the thigh bone on one side and the top of the tibia on the other.
ACL injuries are prevalent in sports activities 
, and their incidence of has been on the rise in the US 
. The cost of diagnosis and treatment of ACL injuries surpasses USD 7 billion annually, directly or indirectly 
. If left untreated, ACL injuries can cause knee instability, meniscal damage 
, and osteoarthritis 
, significantly impacting an individual’s health and daily life. Raising awareness about ACL injuries and encouraging athletes to take necessary precautions is vital. Timely and accurate diagnosis in combination with effective treatment are essential for preventing further damage and improving outcomes for patients.
2. Medical Image Classification
With the increasing development of machine learning, good results have been achieved on traditional image analysis tasks with the help of computers. Therefore, more people are focusing on applications in various fields, such as face recognition, vehicle detection, and other tasks. Medical image analysis has been of great help in medical research, clinical disease diagnosis, and treatment. Computer-aided methods, called Computer-Aided Detection/Diagnosis (CAD), have become a trend in this field. Among them, medical image classification generally belongs to the field of Computer-Aided Diagnosis and is one of its most popular applications.
CAD based on medical images can be traced back to the 1960s. To identify cardiac lesions, Becker et al. 
input X-ray images into a computer to calculate the cardiothoracic ratio. Medical image-based CAD was developed as a direct result of this. Lee et al. 
proposed classifying ultrasound liver images by selecting fractal feature vectors based on the M-band wavelet transform. Paredes et al. 
obtained small square windows, i.e., local representations, from images and combined this approach with k-nearest neighbor techniques to achieve state-of-the-art results. Caicedo et al. 
used bag-of-features combined with SVM to select appropriate kernel functions for processing. Moreover, they conducted extensive experiments on the use of different strategies and analyzed the impact of each configuration on the classification results.
Medical imaging has evolved more rapidly with the introduction of convolutional neural networks. The lung nodule classification problem was addressed in 
. A multiscale CNN extracts discriminative features from alternately stacked layers while capturing nodule heterogeneity. Payan et al. 
used sparse autoencoders and 3DCNNs to diagnose Alzheimer’s disease, and the results demonstrated that 3DCNNs produced state-of-the-art results. Gong et al. 
extended the interpretability of deep networks and implemented complex spatial variations by deformable Gabor convolution (DGConv). This approach improved the representativeness and robustness of complex objects, resulting in a Deformable Gabor Feature Network (DGFN). Wei et al. 
considered histopathology image classification as part of course learning and proposed a simple course learning method based on this, which resulted in a 4.5% improvement in the AUC compared with vanilla training.
3. Attention Mechanism
Attention mechanisms are designed to mimic the ability of humans to find salient regions in a scene, and such mechanisms can be used to highlight a specific part of the feature map by weighting the feature map. Attention mechanisms are usually classified into channel attention mechanisms, spatial attention mechanisms, temporal attention mechanisms, branching attention mechanisms, channel spatial attention mechanisms, and spatio-temporal attention mechanisms 
. Spatial attention mechanisms, channel attention mechanisms, and hybrid domain attention mechanisms are more commonly used in image analysis.
The spatial attention mechanism starts from finding regions on the image spatial domain that are helpful for the task. For instance, according to 
, it was suggested that Spatial Transformer Networks (STN) should be able to learn how to adaptively spatially transform various data to obtain spatial transformation invariance.
On the other hand, the channel attention mechanism calculates the weight of each channel in the network to form the attention on the channel domain. Hu et al. 
weighted the original features by dimensionality using a three-step operation of the squeeze, excitation, and attention on SeNet.
Woo et al. 
designed the Convolutional Block Attention Module (CBAM) by combining the spatial and channel attention mechanisms. A simple and efficient forward CNN attention module is designed by combining two different dimensions, spatial and channel, to generate an attention map.
In medical image analysis, Schlempe et al. 
proposed a novel Attention Gate (AG) model for medical imaging, which can automatically learn to focus on target structures of different shapes and sizes. An interactive attention mechanism was developed by Dai et al. 
to implicitly instruct the network to focus on pathological tissue in multimodal data. Tao et al. 
introduced an inter-slice contextual attention mechanism and an intra-slice spatial attention mechanism in lesion detection to improve model performance using fewer slices.
4. Loss Function
The loss function is used to evaluate the difference between the output and true label, which is the objective optimization function of the model. The smaller the loss, i.e., the closer the output is to the accurate label, the better the model works.
The standard loss functions in image classification are the 0–1 loss function, the binary cross-entropy loss function, and the multiclassification cross-entropy loss function. The 0–1 loss function is a discontinuous segmentation function, which is challenging to solve due to its minimization problem. Since Rubinstein et al. 
proposed an adaptive algorithm using cross-entropy to estimate the probability of rare events in complex random networks, cross-entropy loss has also been applied to classification tasks 
. In the time since then, many improvements have been made to cross-entropy. Liu et al. 
proposed that large-margin softmax (L-Softmax) loss encourages the learning of intra-class compactness and inter-class separation features. Circle loss was proposed by Sun et al. 
to re-weight those similarity scores that were not optimized to improve pair-based similarity.
Lin et al. 
proposed focal loss for unbalanced samples to solve the complicated training problem by assigning relatively large weights to the losses of complex samples in the unbalanced dataset. Based on this, Li et al. 
proposed GFocal loss to turn the labels into continuous values between 0 and 1, using the expanded form of focal loss on continuous labels to process them.
In medical imaging, Mazumdar et al. 
proposed a new composite loss function for medical image segmentation by combining the Dice, focus, and Hausdorff distance loss functions. This function handles extreme class imbalances and directly optimizes Dice score and HD, thus significantly improving segmentation accuracy. According to the sequential nature of the knee injury classes, Chen et al. 
developed a novel ordinal loss for the detection of knee osteoarthritis. This loss imposes a greater penalty for misclassification with a greater distance between the actual knee injury class and the predicted knee injury class. Liu et al. 
addressed the drawback that the ordinal loss cannot be varied by proposing an adaptive ordinal weight adjustment strategy on this basis.