The loss given default (LGD) is the ratio of the amount of loss to a lender resulting from a borrower’s default to risk exposure. LGD is an important credit risk parameter in the regulatory system for financial institutions.
1. Introduction
When financial institutions extend loans to borrowers, credit risk is a major issue, which refers to the risk of default and non-fulfilment of debt servicing obligations by the borrower
[1,2,3][1][2][3]. One of the key drivers of credit risk is loss given default (LGD). LGD is the ratio of the amount of loss to a lender resulting from a borrower’s default to risk exposure. It is critical to understand potential losses for effective allocation of regulatory and economic capital and credit risk pricing. According to Article 107 (1) of the Capital Requirements Regulation (CRR), financial institutions should use either the Standardized Approach (SA) or the Internal Ratings-Based Approach (IRBA) when calculating their regulatory capital requirements for credit risk. When implementing advanced IRBA, internal models must be developed to estimate exposure at default (EAD), probability of default (PD), and LGD. EAD is the risk exposure that arises when a default occurs. PD is the probability that a borrower defaults on a loan within a given period. One of the primary objectives of IRBA is to achieve risk-adjusted capital requirements (see Basel Committee on Banking Supervision
[4]). As shown by Gürtler and Hibbeln
[5], accurate forecasts for LGD may generally provide a competitive advantage for the applying financial institution, and therefore, banks use a variety of methodologies to estimate it.
LGD is an important measure that banks need to estimate accurately for several reasons. First, LGD is critical to risk management in banks and other financial institutions. Understanding and measuring LGD can help financial institutions better assess and control their credit risk exposures, i.e., it can be used in conjunction with PD and EAD to estimate expected financial losses, so banks can more accurately measure potential credit losses and thus be well prepared for future defaults. Second, financial institutions can improve their overall risk modelling by better understanding and estimating LGD, thereby improving their ability to measure and manage credit risk. This is important for maintaining the stability of the financial system and preventing financial crises. Third, estimations of LGD and portfolio financial risk are an indispensable part of calculating the capital requirements for covering credit losses under extreme economic conditions
[6,7,8][6][7][8]. Thus, reliable LGD prediction models play important roles in loss control and benefit maximization.
2. Theoretical Development of LGD
The Basel Capital Accord aims to better integrate regulatory capital with the underlying risks in a bank’s credit portfolio. Banks have the flexibility to calculate their credit risk capital through two distinct methods: a modified standardized approach rooted in the original 1988 capital agreement and two variations based on the Internal Ratings-Based (IRB) approach, which allows banks to develop and use their own internal risk ratings. The internal ratings methodology relies on four main parameters for assessing credit risk: EAD, PD, LGD and M. M is Maturity, which refers to the deadline for repayment of a loan. For a particular maturity, these parameters are used to compute two forms of expected loss (EL): expected loss as an amount (the formula is EL = EAD × PD × LGD) and expected loss as a percentage of exposure at default (the formula is EL% = PD × LGD).
Several decades ago, academic research and banking practice primarily emphasized predicting PD. However, in recent years, considerable attention has shifted towards modelling LGD. The main reason for this is that the Basel II/III framework requires banks to give their own estimates of LGD when using IRBA methods for businesses or internal rating methods for retail exposures. Apart from meeting regulatory demands, precise LGD predictions play a crucial role in making risk-informed decisions. For example, they help determine risk-adjusted loan pricing, calculate economic capital, and price assets such as asset-backed securities or credit derivatives.
[21][9].
The relevant literature on LGDs has different streams. Some research endeavours aim to gauge the LGD distribution for credit portfolio modelling
[22,23][10][11]. Meanwhile, others focus on examining the factors that impact individual LGD. In addition, certain studies explore the relation between PD and LGD
[24,25,26][12][13][14]. While a large of the literature consists of empirical investigations into corporate bonds, there is relatively less emphasis on bank loans, primarily due to constraints related to data availability.
3. LGD Modelling
A wide range of LGD modelling techniques have been applied in the literature in the past. Benchmark regression models include simple linear regression and fractional response regression, where a logit link function is used to convert linear combinations to fractional values bounded by 0 and 1
[27][15]. A more complicated regression model is the beta transformation for accommodating irregular LGD distributions. However, machine learning (ML) techniques, such as decision tree (DT) and support vector regression, are more effective and competitive than the traditional parametric regression models
[28,29,30,31][16][17][18][19]. In recent studies, random forest (RF) has been found to outperform other techniques in predicting LGD
[32,33,34,35][20][21][22][23].
Unsupervised ML algorithms usually include clustering algorithms, which are important data mining techniques that cluster samples into groups of similar objects rather than giving direct predictions. As such, these unsupervised ML algorithms are often used as complementary tools to supervised ML algorithms. Some studies have concentrated on clustering support vector machine (SVM) models using unsupervised ML algorithms (e.g., K-means and self-organized maps (SOMs))
[36,37,38,39][24][25][26][27]. On the other hand, unsupervised ML algorithms, such as SOMs, that can be used for prediction have been proposed, but relatively few applications have been reported in the field of LGD evaluation
[40,41][28][29].
Many studies have proposed multi-stage models for LGD prediction
[5,6,12,13,42][5][6][30][31][32]. In the earliest studies, Lucas
[13][31] proposes a two-stage model to analyse mortgage-related LGD, i.e., dividing the loan according to whether or not it is recovered and calculating the loss in case of recovery. A scorecard is constructed to calculate the likelihood of repossession, followed by the utilization of a model to estimate the “haircut”, which represents the proportion of the estimated house sale value that is realized during the actual sale. However, the scorecard is not applicable to certain credit risk problems with a high degree of complexity. Gürtler and Hibbeln
[5] classify defaults into two types (recovery/write-off) an d model LGD through a two-step modelling approach by taking into account length bias sampling, different loan characteristics for default end types, and different information sets for default status, which provides a significant improvement in predictive power compared to direct regression methods. However, they do not fully consider the potential impact of macroeconomic recession or volatility on LGD forecasts and may be somewhat biased. Bellotti and Crook
[6] propose a multi-stage model for LGD prediction (consisting of two LR classifications and an OLS regression) and find that it is important to incorporate macroeconomic features into the developed model. But the class imbalance problem in LR classification prediction has not been solved, and the overall model prediction ability needs to be improved. Tanoue et al.
[14][33] analyse the factors influencing LGD using Japanese bank loan data and develop a multi-stage model for predicting the LGD and expected loss (EL). The shortcoming of their study is that, due to data deficiencies, only credit score and different types of collateral quotas are considered, and more potential factors are not fully explored. Li et al.
[12][30] added the disclosure of post default information to build two models, namely the hierarchical (two-stage) and hybrid models, to predict LGD separately. Most techniques use supervised algorithms as advanced learners in multiple stages to achieve good prediction accuracy. However, first they are dealing with the complexity of the data, such as multidimensional credit information. Traditional supervised learning algorithms may not be able to adequately capture these complexities, resulting in a decrease in model performance. Second, facing the sample imbalance problem, supervised learning algorithms may tend to favour the prediction of categories with more samples over those with fewer samples, leading to poorer model performance in predicting defaults. Finally, the overfitting problem caused by over-reliance on supervised algorithms is less able to generalize new data
[2,15][2][34].