Predicting Risk of Corporate Bankruptcy: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Subjects: Business, Finance

Predicting the risk of corporate bankruptcy is one of the most important challenges for researchers dealing with the issue of financial health evaluation. The risk of corporate bankruptcy is most often assessed with the use of early warning models. The results of these models are significantly influenced by the financial features entering them. 

  • data envelopment analysis
  • domain knowledge
  • feature
  • LASSO

1. Introduction

Research shows that no company can be sure of its future even in times of peace and prosperity. The problem of companies’ risk of bankruptcy is highly relevant today and is being addressed by many researchers. The acceleration in interest in its solution was caused by the events of the last few years (COVID-19, war in Ukraine), especially in Europe. It is necessary to catch earlier signals of bankruptcy, to which business managers should pay increased attention in order to prevent bankruptcy. For this purpose, various methods of selecting bankruptcy prediction features, as well as various bankruptcy prediction models, are suitable. It is proven that domain knowledge plays a significant role in the given process and, when combined with a suitable prediction method, can provide significant results. This is confirmed by the studies of several authors. It is possible to mention the studies of Veganzones and Severin (2021), who selected features based on their popularity in the prior literature, the study of Min and Lee (2008), who used expert opinion, or the study of Zhou et al. (2015), who applied domain knowledge approach. Often used features in bankruptcy prediction are Altman’s (1968) features. They were used in the study of Hu (2009) and that of De Andrés et al. (2011). Barboza et al. (2017) combined the features of Altman (1968) with the features of Carton and Hofer (2006), which have a greater impact on financial performance models in the short term. Similarly, Du Jardin (2015) applied financial ratios traditionally used in the literature since Altman (1968). These ratios were chosen based on the main financial dimensions which govern bankruptcy. Tseng and Hu (2010) used features inspired by the research of Lin (1999) and Lin and Piesse (2004).
Several studies (Kirkos 2015; Zvarikova et al. 2017; Kovacova et al. 2019) were published in which the authors examined the occurrence of individual features in bankruptcy prediction models. Researchers followed up the results of the study of Kovacova et al. (2019), who made a review of the most often used bankruptcy prediction features in Visegrad-group countries.
Based on the above mentioned, the research question was as follows: Which way of selecting financial features for DEA model ensures higher performance of the model: the domain knowledge approach or one of data mining techniques—LASSO regression?

2. Predicting Risk of Corporate Bankruptcy

Determining corporate bankruptcy risk is one of the main challenges of economic and financial research as well as one the most important issues for investors and decision-makers (Korol 2019). Predicting, measuring and assessing the risk of bankruptcy of a company is of particular interest to investors before investing their capital, as the optimization of risk is a prerequisite for the maximum capital profit of the investment, which will ensure payment of dividends. However, value maximization can only occur if capital providers selectively choose a profitable and sustainable business from which they can obtain the maximum share of business income (Agustia et al. 2020). The risk of bankruptcy is an important topic in many scientific articles, which is primarily reflected in the implications for the stakeholders’ decisions (Lukason and Camacho-Miñano 2019). Bankruptcy risk (insolvency) can be understood as “the company’s inability to meet maturing obligations resulting either from current operations, whose achievement conditions the continuation of activity, or from compulsory levies” (Bordeianu et al. 2011, p. 250). According to Achim et al. (2012), the risk of business bankruptcy is closely related to economic and financial risk. While financial risk is determined by the level of indebtedness, economic risk is dependent on the ratio of fixed and variable costs. It can be said that, in general, knowledge of these risks makes it possible to quantify the risk of bankruptcy of the company. Bankruptcy risk is the risk of a company no longer being able to meet its debt obligations. This risk is also referred to as the risk of failure or insolvency (Campbell 2011).
Bankruptcy risk represents a constant threat to businesses, which determines how long they will survive (Khan et al. 2020). If a business goes bankrupt, in fact, the probability of bankruptcy in connected businesses increases (Battiston et al. 2007), which can have a negative effect on the entire economy. Therefore, predicting the risk of bankruptcy is the subject of many research studies dealing with the search for the most suitable bankruptcy prediction model as well as the features describing bankruptcy the best.
Research on bankruptcy prediction dates back to Fitzpatrick (1932), who was the first to examine the financial conditions of bankrupt and non-bankrupt firms by comparing the values of their financial ratios. He found that there are significant differences between bankrupt and non-bankrupt companies, especially between liquidity, debt and turnover indicators (Fejér-Király 2015). In the early days of the development of bankruptcy prediction models, discriminant analysis (DA) was very popular. Beaver (1966) applied univariate discriminant analysis to investigate the predictive ability of 30 financial ratios. The best discriminating factor was identified as the working capital/debt ratio. The second one was the net income/total assets ratio (Gameel and El-Geziry 2016). Despite the criticism, this method was a starting point for the development of other models. The most famous bankruptcy-risk-scoring model, known as Z-score, was published by Altman in 1968 (Voda et al. 2021). This model was developed with the use of multiple discriminant analysis. Since the introduction of Altman’s model, many other authors (Deakin 1972; Altman et al. 1977; Norton and Smith 1979; Taffler 1983) developed their models based on multiple discriminant analysis. In the 1980s, logistic regression analysis was developed, followed by probit analysis. The first logistic regression model intended to predict the financial situation of businesses was developed by Ohlson (1980). In the next period, many authors (Kim and Gu 2006; Mihalovic 2016; Barreda et al. 2017; Khan 2018; Affes and Hentati-Kaffel 2019) compared the accuracy of the multiple discriminant analysis model and the logistic regression model. These two models were the most used parametric models in bankruptcy prediction (Fejér-Király 2015). Probit analysis has not been as widely used as logistic regression. The first probit model was developed by Zmijewski (1984), followed by Zavgren (1985). Since the 1990s, the development of computer science has enabled the use of more computationally demanding methods in bankruptcy prediction. These methods are mainly non-parametric. Within them, Mousavi et al. (2023) identifies two main groups: machine learning and artificial intelligence, and operation research. Most used methods within the machine learning and artificial intelligence group include artificial neural networks, such as those used by Messier and Hansen (1988), Odom and Sharda (1990), Atiya (2001) and Abid and Zouari (2002), decision trees (Frydman et al. 1985; Chen et al. 2011; Stankova and Hampel 2018), the Bayesian models (Sarkar and Sriram 2001; Aghaie and Saeedi 2009; Cao et al. 2022), genetic algorithms (Kingdom and Feldman 1995; Alfaro-Cid et al. 2007; Bateni and Asghari 2020), modeling based on rough sets (Ahn et al. 2000; Wang and Wu 2017) and support vector machines (Huang et al. 2004; Olson et al. 2012).
The main method within operation research is Data Envelopment Analysis. This method by Simak (1997) was firstly used when predicting corporate failure. In his master thesis, he compared the results of DEA with the results of Altman’s Z-score. In recent years, numerous models based on Data Envelopment Analysis have been developed to predict bankruptcy and their results were compared with the results achieved based on other techniques. Cielen et al. (2004) found that DEA outperformed a discriminant analysis model and a rule induction (C5.0) model in terms of their classification accuracy. Ouenniche and Tone (2017) proposed the out-of-sample evaluation of decision-making units by applying DEA. Out-of-sample framework was based on an instance of case-based reasoning methodology. They found that “DEA as a classifier is a real contender to Discriminant Analysis, which is one of the most commonly used classifiers by practitioners” (Ouenniche and Tone 2017, p. 249). Premachandra et al. (2009) compared the results of an additive DEA model with the results of a Logit model. They found that DEA outperformed the Logit model in evaluating bankruptcy out of sample. Condello et al. (2017, p. 2186) found that DEA has “a greater capacity for bankruptcy prediction, while Logit Regression and Discriminant Analysis perform better in non-bankruptcy and overall prediction in the short term”. Janova et al. (2012) achieved similar results. They found that the additive DEA model seems to perform well in correctly identifying bankrupt agricultural businesses. On the other hand, it is less powerful when identifying non-bankrupt agricultural businesses. The performance of DEA models is assessed mainly with the use of sensitivity, specificity, or overall accuracy. In this regard, Premachandra et al. (2011) pointed out that the cut-off point of 0.5 traditionally used to classify bankrupt and non-bankrupt businesses may not be appropriate for the DEA model. According to these authors “depending on the precision with which predictions for bankrupt and non-bankrupt businesses need to be done, the decision maker has to determine an appropriate cut-off point”, Premachandra et al. (2011, p. 623). Stefko et al. (2020) determined the optimal cut-off of the additive DEA model at a point in which the sum of sensitivity and specificity is the highest. Stankova and Hampel (2023) selected an optimal threshold by applying the Youden index and distance from the corner. They found that “selecting a suitable threshold improves specificity visibly with only a small reduction in the total accuracy” (Stankova and Hampel 2023, p. 129).
In the development of the above-mentioned models, the variables included in the model are as important as the method applied (Nurcan and Köksal 2021). In order to select appropriate variables from high-dimensional datasets, various dimensionality reduction methods can be applied. Depending on whether the original features are transformed into new features or not, feature extraction methods and feature selection methods are differentiated (Wang et al. 2016; Li et al. 2020). Feature extraction methods transform existing features into a lower-dimensional space (new set of features) while preserving the original relative distance between the features (Subasi and Gursoy 2010; Li et al. 2020). Well-known feature extraction methods often used in current research include Principal Component Analysis (Adisa et al. 2019; Karas and Reznakova 2020), Multidimensional Scaling (Tang et al. 2020) and Isometric Mapping (Gao et al. 2020). Since the new set of features is different from the original ones, it may be difficult to interpret them (Wang et al. 2016). When using feature selection methods, the original features are sorted according to specific criteria and features with the highest ranking are selected to form a subset (Li et al. 2020). Among the feature selection methods, researchers can differentiate between filter, wrapper, embedded and combined methods (Liu et al. 2018). Filter methods examine each feature independently while ignoring the individual performance of the feature in the relation to the group. Within filter methods, researchers frequently use t-test (Chandra et al. 2009; Xiao et al. 2012), correlation analysis (Zhou et al. 2012) and stepwise methods (Lin et al. 2010). Wrapper methods use machine learning algorithms to evaluate the performance of selected feature subsets. Within them, decision trees (Ratanamahatana and Gunopulos 2003), Naive Bayes (Chen et al. 2009), artificial neural networks (Ledesma et al. 2008) and genetic algorithms (Amini and Hu 2021) are often used. The results of wrapper methods are often superior to the results of filter methods; however, the computational cost of wrapper methods is high. Embedded methods integrate feature selection and learning procedures. Important embedded techniques are regularization approaches which have recently become more and more interesting, for example, LASSO (Fonti and Belitser 2017; Cao et al. 2022; Paraschiv et al. 2021), and Elastic net (Jones et al. 2016; Amini and Hu 2021). Combined methods include different types of feature selection measures, such as filter and wrapper.
Various methodologies have been applied to select features for DEA models. Cielen et al. (2004) used variables according to their efficiency to predict bankruptcy in prior research. Similarly, Psillaki et al. (2010) focused on financial ratios which appeared to be most successful in previous studies. Premachandra et al. (2009) approached this issue in the same way. When creating DEA models, they used ratios which were applied in past bankruptcy literature, and some of them were the same as the ratios used by Altman (1968) and Cielen et al. (2004). The ratios selected by Premachandra et al. (2009) were later applied in the study of Condello et al. (2017) and other studies. Min and Lee (2008) combined expert opinion and factor analysis when selecting features for DEA models. The resulting set of indicators contained the most relevant financial classification dimensions, while taking into account the mathematical relationships among ratios as well. Sueyoshi and Goto (2009) applied Principal Component Analysis to reduce the number of financial factors in order to reduce the computational burden of the DEA-DA model. Stefko et al. (2021) used Principal Component Analysis and Multidimensional Scaling when selecting inputs and outputs for DEA models. Huang et al. (2015) selected variables for DEA models based on gray relational analysis. They proved this method to be an effective technique for obtaining variables for DEA models. Gray relational analysis was later used in this way by Nurcan and Köksal (2021) as well. Lee and Cai (2020) were dealing with the curse of dimensionality in DEA. They proposed the LASSO variable selection technique and combined it in a sign-constrained convex nonparametric least squares (SCNLS) to support estimating the production function using DEA for small datasets. They also proved that this approach provides useful guidelines for DEA with small datasets. Chen et al. (2021) were inspired by their approach and proposed a simplified two-step LASSO+DEA approach to handle the dimensionality of data entering the DEA models via LASSO. They used standard cross-validation LASSO to select an optimal number of regressors. These regressors were used in the DEA model. As an important advantage of this approach against the study of Lee and Cai (2020), Chen et al. (2021) state that tuning parameter λ was not chosen manually, but it was determined based on optimizing the classical cross-validation criterion to optimally select the relevant variables.

This entry is adapted from the peer-reviewed paper 10.3390/risks11110199

This entry is offline, you can click here to edit this entry!
ScholarVision Creations