At present, it is unquestionable that machine learning (ML) modeling is one of the most promising and powerful tools for the development of diagnosis methods and technologies. It permits the fast cribbage and analysis of huge amounts of data from overwhelmingly complex biological matrices which, applied to diagnostics, can be translated into valuable support technologies that would ease rapid decision-making in early diagnosis and screening programs. It has been seen that one can find a great number of colorimetric and electrochemical sensing methods for the detection of biomarkers related to diabetes mellitus (DM) and diabetic rethinopathy (DR), including some recent efforts towards the development of sensor-array technologies exploiting or not ML models for the sensing of diverse biomarkers and for diagnose purposes (including DM).
Diabetes mellitus (DM) is a group of metabolic diseases involving severe insulin deficiency with usually acute onset of hyperglycemia due to autoimmune destruction of pancreatic beta cells, or gradual onset of hyperglycemia due to insulin resistance . Diabetic retinopathy (DR) is a complication of DM, and the main cause of blindness in working-age adults, which can be retarded, palliated, or even avoided if detected early. Unfortunately, most patients who develop this condition are asymptomatic until late stages, when treatment is less effective or DR is irreversible. Classically, screening procedures for DR are usually based on imaging techniques of the fundus of the eye. The analysis of such images has been optimized during the last years by machine learning (ML) methods, capable of detecting even microaneurysms—the earliest visible sign of retinal damage . The relatively recent application of user-friendly and reasonably affordable smartphones for the implementation of computer-assisted approaches has permitted the exportation of such screening imaging-based methods to developing countries, where lack of funds and personnel have often been the limiting factors to their healthcare systems . Many recent reviews have already extensively explored the possibilities of computer-assisted methods for screening DM and DR , which still present few actual applications in the form of commercially available products . Less (but commendable) attention is paid to alternative and early DR detection techniques, such as electrochemical  or colorimetric . These methods usually rely upon the detection of a biomarker (or multiple biomarkers simultaneously) which might indicate the presence of the target disease (potentially earlier than the image-based methods, which usually rely upon observation of already-damaged tissues) . The higher reliability of these multi-targeted sensing approaches can even be improved by the use of technologies with multiple and diversely integrated sensors, for which the great amount of data can be efficiently filtered and processed by ML. In this entry, the researchers present a series of colorimetric (with special emphasis on naked-eye approaches) and electrochemical techniques for the non-invasive detection and/or quantification of DM biomarkers, including sensor arrays powered or not by ML models . The researchers want to encourage the reader towards development of ML-powered sensor arrays for early diagnosis of DM and DR.
It has been observed that a subset of patients with type 2 DM (T2D) have signs of DR at the time of diagnosis; moreover, glucose intolerance or pre-diabetes is also associated with diabetic eye disease . Therefore, improvement and accessibility of the screening for DM is directly related to the screening and identification of DR. Although DR prevalence seems to be higher and develops faster in type 1 DM (T1D) patients, it is more difficult to prevent DR in T2D patients, as its progression strongly depends on the duration of DM. T2D often remains undiagnosed for a longer time (even several years) compared to T1D . Moreover, T2D is much more common than T1D (i.e., approximately 90% of the total DM cases) . Consequently, a special emphasis has to be placed on the screening of T2D, provided the higher impact of the early detection of this type of DM and, therefore, prevention and mitigation of DR.
Even if blood testing is one of the most used methods in diagnosis of DM , great efforts have been made during the last century in the study of non-invasive testing alternatives . Blood tests present risk of infection and require skilled personnel during both sampling and analysis. Moreover, multiple extractions are painful and generate fear and anxiety to some patients, which can lead to avoidance behaviors, with the subsequent public health and social consequences . Thus, the use of other physiological fluids, which do not imply invasive techniques per se, are indeed desirable. Hence, the fluids considered in this research are urine, saliva, tears, sweat, and breath, which contain biomarkers relevant for the diagnosis of DM and DR and have already been studied in terms of sensing ; these have turned out to be promising alternatives for the development of less painful and more economic methods as compared to blood testing.
Urine is one of the best candidates as a diagnostic fluid for sensor arrays. Apart from permitting an easy, abundant, and non-invasive sampling, it is an aqueous solution (95% water) of inorganic salts containing, among others, urea. If obtained from a healthy individual, urine contains low concentration of lipids, proteins, and other high-molecular weight compounds, which eases the detection of abnormally-high quantities of these big molecules . The detection of glucose in urine is used for the diagnosis of DM, with the test being considered positive with glucose concentrations above ~100 mg/dL (5.6 mM) , especially if used together with parallel methods or as a part of a multisensing system. Most colorimetric glucose sensors rely upon enzyme-based reactions due to their high specificity and catalytic efficiency . These enzymatic approaches usually involve glucose-oxidase (GOx), due to its specificity towards glucose and its tolerance for extreme pH, temperature, and ionic strength changes in comparison to other enzymes . Even if these tests might not be the most sensitive, they are a good alternative to invasive and more expensive/time-consuming methods for a fast and cost-effective screening .
Saliva is an aqueous solution (~99.5% water), containing mainly electrolytes, sugars, vitamins, proteins, and polypeptides . It is a good candidate as a diagnostic fluid, since its sampling is easy (no need for trained personnel) and not invasive, can be collected in substantial volumes (0.1–7 mL/min) and is a less complex matrix in comparison to other body fluids (e.g., blood) . However, the simplicity of saliva stems from its concentration, not from its composition; it has been shown to present more than 1000 diverse proteins in it, and biomarkers that have great potential for rapid test purposes . There are pathological conditions that can modify the composition of the saliva . In fact, several studies point at its potential for screening purposes and its successful use for testing several diseases (e.g., renal disease monitoring, human immunodeficiency virus, dental studies, or Cushing’s disease) . Actually, a correlation between salivary and blood glucose levels in patients with and without DM has been shown , and saliva has already been declared as an excellent candidate for the monitoring of T2D .
The analysis of breath for disease diagnosis is a practice used since the time of Hypocrates, when it was found to be a useful method for monitoring human health . Human breath is mainly composed of N2 (78%), O2 (16%), CO2 (4–5%), H2 (5%), inert gases (0.9%), and water vapor, but it also contains traces of inorganic (e.g., N2O, NO, or CO) and organic (e.g., acetone, ethanol, isoprene, or ethane) volatile species . Acetone was first considered as a good breath-biomarker for DM in 1857 , and it is actually a good candidate, as its concentration in human breath appears to rise as the severity of DM increases, and there is a linear correlation between its concentration in blood and breath . As shown in Table 1, patients without DM usually present breath acetone concentration lower than 2 ppm, while those having DM have values which can grow up to tenths of ppm . Along with acetone, also isoprene (105 ppb in the breath of a person without DM) and aldehydes have proved to be good biomarkers for DM, and there are specific sensing approaches for acetone, aldehydes, and isoprene sensing, but they are not colorimetric or use gas samples other than breath as a proof of concept .
|Ocular fluids||1.8–9.0 mg/dL
|Breath||0.1–2 ppm||0.1–103.7 ppm||7.4–8.1||-|
The volume of samples obtained from the tears by means of, for instance, a Schirmer strip, can be enough for a single-sensor method but, usually, slightly higher volumes would be needed in multi-array sensing in order to obtain high-enough sensitivity . Moreover, the procurement of tears is a relatively non-invasive technique, as its sampling implies the use of cumbersome methods that might not be especially comfortable to the patient, yet easily approachable . Beyond this, one should not avoid mentioninc the potential of this rich fluid that contains numerous analytes of great relevance for the assessment of the health status of the patient , including numerous proteins that permit proteomic approaches in the diagnosis of DR .
Sweat might not be the best option for multi-array screening approaches; it seems that this body fluid could be more suitable for monitoring purposes, especially in the form of wearable electrochemical devices, which are usually interfaced with algorithm-based software . This approach can overcome the lack of big amounts of sample with direct contact of the sensor with the skin. In order to gain more insight into this complex fluid and its applications in diagnostics, the researchers refer to the reviews from Senf et al. , Bandodkar et al. , and Kim et al. . Additionally, an interesting microfluidic, colorimetric approach is presented by Choi et al. .
All physiological fluids are complex matrices with many potential interfering factors, which are concomitant part of the heterogeneous nature of DM (depending on, for instance, the patient) . Consequently, the research of a single biomarker in a fluid can easily lead to many technical issues, with the results obtained being false positive or negative. In order to overcome this problem, the researchers propose the design of sensor arrays, based on diverse principles (e.g., chemical, enzymatic, pH or immunoassay-based), as well as the inclusion of diverse biomarkers. This approach can provide a much more meaningful set of qualitative/quantitative information, which would significantly improve the accuracy of the screening, leading to a more robust and reliable interpretation of the results. The multi-marker approach is not new in the diagnostics area , but simpler, faster, and cheaper methods are needed for effective screening of DM and the consequent minimization of the related retinal damage in these patients. The designed sensor array should present the so-called ASSURED characteristics (affordable, sensitive, specific, user- friendly, rapid and robust, equipment free, and deliverable to end-users) as described by the World Health Organization, and they should permit their application as point-of-care (POC) screening devices .
Sensor arrays can generate a great amount of data that, when correctly interpreted, provide filtered and relevant information. Such a great amount of data cannot always be understood intuitively by applying simple mathematical models—e.g., linear or polynomial regressions . However, usually a hidden pattern explains the observed data, even if the researchers ignore the exact analytes and/or mechanisms that generate the output. By using ML approaches, it is possible to bypass the need for deep understanding of the hidden rules underlying the studied system, but still getting relevant information for the prediction of trends, groups, and characteristics . ML models have proved to be robust for the diagnosis of several diseases , including ophthalmology-related ones . In the literature, the resesarchers can find numerous examples of ML-sensor-array technologies for diagnostics, from techniques to detect lung cancer  to multi-sensors capable of diagnosing respiratory diseases and breast cancer from breath air , which proves the great potential of these approaches for pre-clinical diagnosis, screening purposes, and to assist practitioners in making fast decisions . The miniaturization of such devices and the optimization of the related production process could lead in the near future to the fabrication of POC sensor-array systems, with coupled software trained to diagnose a disease or condition (or even more than one) by simple non-invasive analysis of the breath, urine, saliva, tears, and/or sweat of the patient.
Electronic noses (gas electrochemical sensor arrays) are a promising technological platform for the diagnosis of T2D. Esfahani et al.  present electronic nose-based technologies that present a differentiated electrochemical response to urinary volatile organic compounds. After analyzing 140 urine samples from healthy and T2D patients and classifying the data using diverse ML approaches, they demonstrated that the developed tools discriminated between the two patient groups with an area under curve between 85–96%. A more recent publication  reaches 96–100% accuracy for T2D diagnosis exploiting the same electronic nose principle with urine samples, using both principal components analysis and an ML algorithm. Malik et al. , driven by the strong potential of electrochemical variations in saliva for the detection of blood glucose, developed an electrochemical ML sensing technology that could use this physiological fluid for the assessment of glucose in blood. They reached an 85% accuracy, which is comparable to the accuracy of commercially available glucose sensing dispositives. They also point out the possibility of miniaturizing these ion-selective sensor arrays for POC usage. Sarno et al.  propose an Arduino-assisted electronic nose capable of distinguishing among healthy (150 mg/dL) blood glucose concentrations by the analysis of the breath of the patient. The electrochemical data obtained from the sensor array was processed by a deep-learning classification method after optimization with discrete wavelet transform, and the output show an accuracy of 96.29%. A similar approach is also presented by Parte et al. , with a conductivity-based metal oxide gas sensor array and a modified deep learning convolution neural network algorithm integrated with support vector machines in order to detect acetone in breath and correlate it to DM. The later are just two recent representative examples of the great potential of electronic noses (and gas sensor arrays in general) coupled to ML approaches for the non-invasive diagnosis of diabetes .
The diagnosis of a disease is a complex task involving a great number of factors, variability and uncertainty, and DM and DR are no exception. It is implausible to achieve a satisfactory verdict within an acceptable confidence range by relying upon a single factor. That is one of the multiple reasons why there is a growing interest in big-data studies (and their related computational methods). ML techniques, such as principal component analysis (PCA) and partial least squares (PLS), are already widely used to model and predict inherent correlations in complex biological data, as is the case of metabolomics and proteomics studies .
Figure 1. Simplified comparison of the sense of taste and a sensor array + ML technology.
In some cases, by using such big-data approaches, the researchers are, in a way, mimicking what nature already does in order to interpret complex information. This is the case, for in-stance, for taste or smell (See Figure 1), where a considerable amount of sensors presenting diverse responses to different odor/taste molecules produce a great amount of both relevant and irrelevant data, which creates a specific profile or ‘fingerprint’ that is latterly filtered and interpreted by the brain. This is how (generically speaking) some big-data/ML approaches work, including electronic noses/tongues . It is worth highlighting that this fingerprint-based approach, which is commonly known as ‘non-targeted’, permits the gaining of useful output without the need of exactly knowing which specific analytes or processes occur in the sample (which tend to be markedly complex).