Urbanization has accelerated the degradation of urban aquatic ecosystems, and the associated ecological issues have been recognized in China 
. To mitigate these effects and improve urban river health, large-scale water-clearing regulations have been instituted for plain urban river networks, such as the one that empties into the Yangtze River Delta 
. Water transparency is a commonly used indicator of water quality 
because it incorporates physical, chemical, and biological processes 
(e.g., flow rate, nutrient cycling, and phytoplankton photosynthesis, respectively 
. Moreover, it is a key indicator for measuring the effect of ecological restoration 
and is widely valued in China when constructing urban water environments 
Additionally, studies have shown that the three abovementioned processes also relate to geographical location 
. For instance, water transparency is dominated by dynamic conditions associated with wind, waves, and human activities in shallow areas such as the Yangtze Delta. Other environmental water quality indicators include chlorophyll-a (Chl-a) 
; nutritional status 
; total phosphorus 
; sediment resuspension, transportation and settlement; and total suspended substance 
. For a long time, the Yangtze River was diverted to bring water to the cities on the coastal plain, but this method was once considered to be a waste of money and labour. Therefore, an investigation into how hydrodynamic and hydro-environmental factors affected water transparency was begun. This improved understanding of the large-scale water clearing regulation can provide a theoretical basis for a better evaluation of water diversion projects.
The Secchi disk depth, or Secchi depth (SD), is a simple, traditional measure of water transparency 
. A black-and-white disk is immersed vertically into water, and the visual depth is called the Secchi depth 
, indicating the extent of the water transparency. Although the Secchi disk is a powerful tool, its main disadvantage is discrete spatial-temporal and asynchronous observations 
for large areas. Additionally, errors may occur due to a lack of visual acuity. Therefore, the traditional method might not be adequate to evaluate water quality for large river networks. The Secchi method also consumes an abundance of labour and resources because of the complex branch system of a plain urban river network. As such, satellite sensors like China’s CBERS-1 can provide high-quality water transparency measurements with high spatial-temporal resolution 
. Colour sensors combined with remote sensors have been successfully applied to estuarine and thalassic water 
, but there are limited studies on its application to China’s inland shallow areas.
Previous studies have shown that SD is a function of hydrodynamic factors like velocity and water level 
. Further, hydrodynamic variations may lead to variations in hydro-environmental indicators 
, and variations in hydro-environmental indicators might further cause variations in SD. As such, only employing hydrodynamic indicators to predict SD might be inadequate by themselves. For this reason, previous studies have explored the function of SD with various environmental parameters 
. A partial list is given in Table 1
). Over the past 50 years, several regression models have been developed, and in those models, the SD (its natural or decimal logarithm) was shown as a function of one or two parameters. Compared to the regression model, studies that employed and ANN on SD predictions is limited to date 
Table 1. Selected studies for different Secchi depth (SD) prediction models.
|Carlson (1977) 
||Ln(SD) = 2.040 − 0.68Ln(Chl-a)
|Carlson (1977) 
||Ln(SD) = 3.876 − 0.98Ln(TP)
|Brezonik (1978) 
||Log(SD) = 0.63 − 0.55Log(Chl-a)
|Brezonik (1978) 
||Log(SD) = 0.48 − 0.72Log(TUR)
|Gikas et al. (2006) 
||SD = 0.52(Chl-a) − 0.05
|Gikas et al. (2006) 
||SD = 0.85(Chl-a) − 0.22
|Gikas et al. (2009) 
||Log(SD) = 5.32Log(Chl-a) − 0.38 + 2.11Log(TSS) − 0.16
|Gikas et al. (2009) 
||Log(SD) = 10.96Log(TP) − 0.54
|Wu et al. (2009) 
||Ln(SD) = −0.712 + 0.093WL − 0.278WV [WL ≥ 14.75]
|Wu et al. (2009) 
||Ln(SD) = −19.887 + 1.393WL − 0.217WV [WL < 14.75]
Nevertheless, limited studies were using the ANN for SD prediction considering the three processes (physical, chemical, biological) impact on the plain urban river network. Inspired by big-data analysis and machine learning techniques, we attempted to develop a machine learning model based on remotely sensed SD and other hydrodynamic and environmental parameters from 2013 to 2019 to assess the response of SD to the large-scale water clearing regulation in the Yangtze Delta. Selected input candidates for the machine learning model include hydrodynamic condition index and water environmental factors: surface velocity (V), total suspended solids (TSS) concentration, dissolved oxygen (DO) concentration, near-surface chlorophyll (Chl) concentration, chemical oxygen demand (COD) concentration, and water temperature (TE).
The objectives of this study are:
To evaluate the big data analysis and self-learning ability of the developed machine learning model in SD prediction for a plain urban river network with long-term field observations;
To compare the SD prediction performance between a machine model and a regression model to provide a better prediction model and highlight suitable parameters.
2. Plain Urban Water Transparency Prediction
The ANN model and MLR model are compared based on their performances in (i) training, (ii) verification, and (iii) testing phases, with results summarized in Table 2
. It appears that the ANN model is more accurate and consistent in different subsets since all the values of RMSE and MAE are similar, and all the correlation coefficients are also close to unity, and the performance of this model can be well demonstrated based on RMSE. It also shows that the ANN model results in a much higher value of the CC than the MLR model. The prediction results regarding the CC value during the verification phase showed an approximately 38.1% of improvement. In addition, the forecast results regarding the CC value during the test phase improved by approximately 36.9%. In some previous studies, the reported prediction of SD was not tested on the training dataset, which was due to the insufficient data size 
. In this study, as we can see from Table 2
, these are very encouraging results regarding the modelling of SD, and the results were fitted in all phases.
Table 2. Statistical parameters of the input data.
According to Table 2, the results show that during the verification phase, the ANN model shows a reasonable estimation of SD. Furthermore, an acceptable level can be observed using the model M1 and M3, and through the comparison of various statistical indices (CC, RMSE and MAE) expounds the performance of ANN models better than the MLR models, which demonstrates that the ANN method has the good advantage on predictive ability to acquire the SD of the plain urban river network. In the verification phase, using the ANN model, the best results are achieved using the M4 model. Therefore, in this comparison, the prediction performance of M4 is slightly better than that of M1 and M3. In the testing phase, as shown in Table 3, model M4 is always the best model, while for the MLR model, the M1 is the best model. In order to possess a good predictive ability, RMSE and MAE should be as low as possible, but for CC, the value of this parameter should be as high as possible.
Table 3. Statistical characteristics of different SD (m).
|Weekly average SD
|Weekly maximum SD
|Weekly minimum SD
Consequently, we can see that the inclusion of the two parameters (DO and TE) may not improve the performance of the model. Interestingly, besides TSS and Chl, the COD assumed major importance when included simultaneously as input to the model. As water quality parameters that affect the SD of the water body, when included with COD parameters, DO, and TE did not contribute significantly to model performance to predict the SD of the urban river network. As the most important environmental factors in water bodies, DO and TE mainly affect the degradation rate of pollutants in urban river networks. As the river network of plain cities in the Yangtze River Delta has undergone years of diversion and flow control, the water quality of the river network gradually improved and entered a steady state. Therefore, DO and TE, which affects the chemical process, are less sensitive to transparency. In contrast, COD is extremely difficult to degrade in urban river network water bodies and is closely related to TSS, which cause the SD to be more sensitive to COD in the plain urban river network. The final model selected to predict the SD of the urban river network in this study contained velocity, TSS, Chl and COD (M4). The inclusion of the DO and TE may not improve the model performance and even sometimes contribute to increasing the values of the error indices. Additionally, more suitable and fewer inputs will help to simplify the implementation and the calculations procedure, which improve the practicality of the model.
Finally, for a given regression model, the LR model exhibited a slightly lower RMSE for exponential correlation than power correlation, whereas the SVR model resulted in an opposite result. The result indicates that within a certain flow rate threshold, there is a positive correlation between transparency and flow rate, which reveals that the hydrodynamic factors of the plain river network have a significant impact on the water transparency and can be used as an effective parameter of the prediction model. Within the range of flow velocity 0.22–0.45 m/s, increased flow rate has a positive effect on SD. On the one hand, the improvement in hydrodynamics brought by water resources regulation does have a positive impact on the water environment of urban river networks. On the other hand, the method of improving the water environment through hydrodynamic regulation has an improved flow rate threshold, which means hydrodynamic control is not a once-and-for-all method.
Additionally, in the analysis of long-term SD changes and ANN model results, it is essential to consider the influence of flow velocity changes caused by water regulation. The comparison of the correlation coefficient shown in Table 3 reveals that flow velocity has a larger impact weight on the water transparency of urban river networks, and the absolute value of its correlation coefficient is ranked before dissolved oxygen and temperature.
In this study, an artificial neural networks model is proposed for estimating Secchi depth in a plain urban river network using long-term observed data. Through the comparison of results between the ANN model and MLR model, it reveals that the hydrodynamic parameters can be used as effective parameters for SD prediction models of the urban river network. Additionally, the impact of COD concentration on transparency is crucial in the river network due to the notable improvement with the inclusion of COD parameter as input in the model. The more accurate and more practical model of SD is the one with input parameters including flow velocity, TSS, COD, and Chl, and sensitivity ranks from high to low as TSS, Chl, COD and flow velocity.
In addition, ANN models perform better than the MLR models, which demonstrates the existence of a complex nonlinear relationship between SD and various parameters. The support vector machine was used to deduce the relationship between SD and hydrodynamic parameter, and a strong positive correlation was explored in this study when velocity range from 0.22 m/s to 0.45 m/s. Over 90% of data fall in the predicted intervals of SD for the method, reflecting the flow rate threshold of hydrodynamic regulation to improve water transparency in the urban river network.