4. Data Integration
GEO AquaWatch, long-recognized as the Global Water Quality Initiative with an extensive global network, coordinates an integrated approach to community stakeholder involvement, and its Knowledge Hub is recognized as a trusted global water quality data and information center. Ongoing efforts by GEO AquaWatch working groups and short-term, cross-cutting focus group activities have yeilded metadata compilations, best practices for data providers, aquatic analysis ready data recommendations, and expanded community-wide information-sharing. Transformational water quality knowledge is delivered to GEO AquaWatch stakeholders via our website and social media accounts, a successful and sustaining webinar series, topical workshops, and Initiative meetings. Printed and electronic work products, such as whitepapers, publications, and data products, services, and tools compiled by our diverse subject matter experts, are easy-access, community-wide resources serving the gamut of water quality users. Leveraging GEO AquaWatch’s association with other national and international water quality organizations amplifies best-practice messaging and enables broader user input.
Stakeholder engagement in the delivery of fit-for-purpose, open source products, meeting user needs, is foundational to GEO AquaWatch’s portfolio of projects. A highlight is it’s namesake Google Earth Engine project for inland and coastal waters whichcompiles a reference database of biogeochemical, inherent, and apparent optical properties that is essential for calibration and validation of satellite products. AquaWatch features a self-guided user interface with a nimble API that delivers chunk-sized data streams to the Google Earth Engine Servers.
To effectively achieve integrated inland coastal water quality management, users ideally inform their decision-making through combination of evaluating acute conditions and general trends in satellite-derived and in situ data, and utilizing iterative models; but we recognize many users lack one or more of these “three pillars” of data assimiliation and must otherwise rely upon the best available information. The complexity of environmental systems, and the need for meaningful information to be provided to stakeholders, requires the integration of data collected from different and heterogeneous sources. GEO AquaWatch’s Real Earth Portal, expected to debut in 2022, synthesizes multi-platform water quality data from disparate sources and over broad temporal-spatial scales accessible through the GEO AquaWatch website. Many hydrodynamic and ecological models have evolved as relevant predictive tools, and it is essential to accurately calibrate and validate them with available observations to validly represent the reality. Data fusion (DF) techniques have been used to maximally extract consistent spatial and temporal description of variables within hydrodynamic-biological models, cohesively and systematically, to produce more robust and informative datasets. DF algorithms include the wavelength-based method 
, the Jointly Sparse Fusion of Images (J-SparseFI) 
designed for pan-sharpening fusion, and other algorithms 
. The effectiveness of DF has been demonstrated in improving spatio-temporal distributions of total organic carbon (TOC) 
, chlorophyll-a concentrations, and water transparency 
, and the prediction of harmful algal blooms 
NASA-promoted STAR-FM software (https://www.ars.usda.gov/research/software/download/?softwareid=432
, accessed on July 2021), designed for the fusion of Landsat and MODIS images, is available for direct implementation of fusion techniques at the pixel level 
. Spatial and spectral information of the pixel is correlated and combined to better characterize the single pixel.
To refine model parameters and better align prediction with measurements, Data Assimilation (DA) techniques have been used 
. DA enables automated calibration of models, as well as enhancing forecast capabilities of biogeochemical and hydrological models and improving ecosystem state assessments 
. Among the most common DA methods, the Ensemble Steady State Kalman Filter (EnSSKF) 
and the Ensemble Kalman Filter (EnKF) 
have been widely and successfully used to improve the prediction of lake temperature 
, suspended particulate matter concentrations 
, water levels and currents 
, and algae and algal bloom dynamics 
through the integration of remotely sensed and in situ data. Data integration techniques are variable and site specific depending at the phenomena at hand scale, this is beyond the scope of this paper and thus not addressed.
To promote and facilitate the use of DA methodologies, pre-fabricated algorithms and toolkits have been developed and made available from several working groups. These toolkits include the Parallel Data Assimilation Framework (http://pdaf.awi.de/trac/wiki
, accessed on July 2021), the Data Assimilation Research Testbed (DART, https://www.image.ucar.edu/DAReS/DART/
, accessed on July 2021), and the OpenDA (https://www.openda.org/
, accessed on July 2021), which was used successfully for the automatic calibration of a water level forecasting model 
and current and salinity profiles 
and for the improvement of forecast accuracy of a water quality model 
With such a reservoir of ready-to-apply methodologies, the effort in using these techniques and applying them to current modeling is reduced. Nevertheless, the accurate calibration of system models requires high resolution description of the model inputs. These are not always readily available for many environmental systems because of the limitations of current measurement technologies. This remains a major challenge for DA applications in water quality management that can be overcome utilizing DF techniques to integrate, fuse, and harmonize in situ, remote sensing, volunteer monitoring, and modeled data.