The advancement of the Internet of Things applications (technologies and enabling platforms), consisting of software and hardware (e.g., sensors, actuators, etc.), allows healthcare providers and users to analyze and measure physical environments at home or hospital. The measured physical environment parameters contribute to improving healthcare in real time. Researchers in this domain require existing representative datasets to develop machine-learning techniques to learn physical variables from the surrounding environments. The available environmental datasets are rare and need too much effort to be generated. It has been noticed that no datasets are available for some countries, including Saudi Arabia.
Recent advancements in information and communication technology and the Internet of Things (IoT) have increased the number of sensors used in smart cities. Collecting and analyzing enormous amounts of data from connected devices in homes and cities has emerged as a significant study area. At the same time, algorithms for machine learning have been deployed to learn and analyze data collected from various sensors to classify and predict trends in smart homes and cities. Several factors have contributed to the popularity of utilizing algorithms for machine learning in today’s society. Three primary aspects inspire the use of machine learning 
: (a) computers have high performance and memory; (b) machine learning algorithms learn and train on behavioral patterns resembling the human brain; and (c) huge datasets available.
Therefore, collecting environmental datasets is essential for monitoring environmental conditions and gaining knowledge of the environment’s state in any country, including Saudi Arabia. Deserts, mountains, coastal regions, and marine environments are only a few of the ecosystems that may be found in Saudi Arabia. As a result, collecting data on the environment in Saudi Arabia is essential for various reasons. To begin, it might help identify environmental problems such as pollution, the degradation of habitats, conditions affecting healthcare, and the implications of climate change. Scientists and officials can comprehend the environment’s condition better and make informed decisions about how to handle environmental challenges if they acquire data on the quality of the air and water, changes in the environment, and the distribution of animals. Second, collecting data on the environment may assist in keeping track of the movement toward environmental objectives and targets. Researchers may improve their knowledge of the dynamics of ecosystems, their ability to anticipate changes in the environment, and their ability to recommend solutions to environmental issues by using environmental data. For instance, Saudi Arabia has vowed to drastically decrease its emissions of greenhouse gases by 2030 drastically and to significantly raise the percentage of its total energy supply that comes from renewable sources. Collecting energy use, emissions, and renewable energy generation data may help track progress toward these objectives and identify areas where extra action may be required. Last but not least, collecting environmental data may facilitate research and innovation in various sectors, such as healthcare, smart cities, ecosystem, climatology, and environmental engineering.
Collecting environmental data is a crucial component of environmental management and holds significant importance in Saudi Arabia, which is confronted with various environmental problems. The nation has been facing noteworthy air pollution challenges due to industrialization and urbanization, which have led to increased health and environmental concerns 
. Furthermore, the nation depends mostly on wells in water supply 
. Moreover, the climate varies from city to city. Temperatures may reach over 40 degrees in the summer season. At the same time, the temperature could reach below zero degrees in the northern cities. In response to the environmental challenges, Saudi Arabia has undertaken various environmental initiatives such as the Saudi Green Initiative and the Green Middle East Initiative. These initiatives are designed to mitigate greenhouse gas emissions, conserve natural resources, and enhance renewable energy production 
. The efficacy of these endeavors is contingent upon the accurate and reliable collection of environmental data. Environmental data collection in Saudi Arabia is presently conducted by various governmental and non-governmental entities, such as the Ministry of Environment, Water, and Agriculture, the General Authority for Meteorology and Environmental Protection, and the Saudi Arabian Society for Environmental Sciences 
. These entities gather ecological information on various parameters, including but not limited to the environment and water-based quality, soil and land use, and biodiversity.
Saudi Arabia has to deal with the problem of urbanization, as over 85% of its population resides in urban areas, as per the studies conducted by 
. This is in addition to the environmental challenges that the country is dealing with. The nation is addressing the challenges of urbanization by creating intelligent cities, such as Neom and Riyadh, that strive to leverage technology and data to enhance the prosperity of citizens and promote sustainable development 
. In addition, collecting environmental data is crucial in advancing smart city initiatives. Monitoring and mitigating urban heat island effects in cities can be facilitated by collecting air quality, temperature, and humidity data. According to 
, collecting energy usage and emissions data can optimize energy consumption and mitigate greenhouse gas emissions within smart urban areas. In addition, collecting environmental data is important for meeting healthcare needs in Saudi Arabia. Environmental factors such as air pollution, water contamination, and poor sanitation can substantially affect public health. Collecting environmental data can identify possible risk factors and provide valuable information for developing public health policies and interventions to mitigate these risks. An investigation was carried out to examine the effects of air pollution on public health in the city of Jeddah through the collection and analysis of relevant data 
. The authors of 
revealed that the prevalence of respiratory and cardiovascular diseases was linked to elevated levels of air pollutants in Jeddah. The above data was utilized to develop policies and interventions that focused on enhancing the air quality within the urban area. These measures included the mitigation of emissions from both transportation and industrial sources.
In light of the growing interest in smart homes and cities, researchers have attempted to incorporate different aspects of urban life into smart homes and cities through important technologies, including the IoT paradigm, wireless sensor networks, embedded systems, and other related technologies. Machine learning algorithms have various potential applications in smart home and city environments, including dealing with problems related to healthcare, power consumption, and automation. To effectively apply machine learning algorithms for various applications, it is necessary to have a dataset representative of the problem domain for training and testing purposes. The dataset demonstrates the potential for evaluating and validating the proposed method’s accuracy. Although many real-world datasets are publicly accessible, some lack adequate physical factors such as temperature, humidity, pressure, and altitude. It reads data from sensors that can be noisy, has poor network conditions, and is subject to several unknown variables, such as missing data and faulty sensors. Consequently, it is prone to errors that affect the accuracy of the data.
As seen from the previous discussion, there are some efforts either from the authorities or researchers in environmental data collection in Saudi Arabia. However, the focus was on the outdoor environment. There is a lack of indoor environment data collection; to the knowledge, no dataset is already available for researchers. The dataset was generated using the Arduino integrated development environment (IDE), an open-source electronics platform that uses hardware and software. It includes several sensors on a board that can generate data using light, temperature, humidity, and other sensors. The dataset was generated from real sensors to measure physical variables in a home for one month. The dataset contains five features representing temperature, humidity, pressure, light, and altitude. It records measurements of physical variables in real time.
2. Indoor Environmental Parameters in Northern Saudi Arabia
The collection and availability of environmental datasets are essential for making accurate choices regarding environmental management, sustainability, and public health. Research has indicated that open environmental datasets can substantially benefit research, policymakers, and decision-making processes 
. Notwithstanding, challenges exist in enhancing the accessibility of environmental data sets, such as concerns regarding data quality, privacy, and ownership. Effective management of big data for environmental sustainability requires adopting data integration, standardization, and quality control strategies, as Hřebíček and Hejč 
highlighted. Implementing environmentally sustainable practices heavily relies on data-driven decision-making, which involves systematically collecting, analyzing, and visualizing data to inform policies and interventions to foster environmental sustainability 
. Understanding the availability and accessibility of environmental data is crucial for proficient environmental governance. Refs. 
conducted a study on the availability of environmental data in the United States. The study revealed that despite the abundance of environmental data sources, limited access to data remains, with certain data remaining inaccessible to the public.
Furthermore, collecting indoor environmental parameters is crucial for maintaining a healthy and comfortable indoor environment. The indoor environmental parameters involve various factors such as temperature, humidity, air quality, lighting, and noise levels. The parameters above have the potential to influence indoor air quality, thereby leading to consequential negative health effects, especially for vulnerable populations such as children and elderly people. Collecting and monitoring indoor environmental parameters can aid in identifying potential sources of indoor air pollution and promote effective interventions to improve indoor air quality. Collecting indoor environmental parameters is crucial for optimizing energy efficiency and enhancing building performance. Through monitoring and controlling indoor environmental factors, facility managers and residents can enhance energy efficiency and reduce energy consumption. Modifying the temperature and lighting parameters per occupancy patterns can yield considerable conservation of energy benefits while maintaining optimal indoor comfort levels. In addition, collecting indoor environmental parameters is essential for implementing intelligent buildings and integrating the Internet of Things (IoT). Sensors and monitoring devices that rely on the Internet of Things (IoT) can gather runtime data on indoor environmental parameters and communicate this information to a central control system. Subsequently, the control system can examine the data and regulate the building systems correspondingly, aiming to enhance energy efficiency and improve indoor comfort. The implementation of this approach has the potential to mitigate energy expenses, optimize building functionality, and boost occupant satisfaction.
Many academic studies have highlighted the significance of collecting indoor environmental parameters. Figure 1 shows the number of publications related to indoor environments in a 10-year range. As can be seen from the figure, the number of publications increased drastically in 2021, where the smart cities trend has been a hot topic. However, researchers believe that due to the lack of datasets, the number of publications has been reduced. Table 1 also shows some of the research done in the environment based on the country and the collected parameters.
Figure 1. Number of publications in 10 years.
Table 1. Some indoor environmental-related research.
The collection of accurate indoor environmental parameters requires the utilization of suitable monitoring equipment and data analysis software. Sophisticated monitoring equipment, such as sensors and data loggers, can facilitate the collection of precise and up-to-date indoor environmental data. Using data analysis tools, such as statistical models and machine learning algorithms, can facilitate the identification of patterns and trends within the data. This, in turn, can enable the implementation of effective interventions aimed at improving indoor air quality.
The previous table illustrates that the studies were conducted in various cities and regions worldwide, emphasizing the importance of collecting indoor environmental parameters. These studies have shown that various factors, including outdoor air pollution, building materials, ventilation rates, and human activities, can influence indoor air quality. Through collecting and analyzing indoor environmental parameters, building managers and policymakers may develop successful strategies to enhance indoor air quality and preserve the well-being of building residents. An interesting dataset is generated by the authors of 
that consists of timestamps, temperature, and humidity, including 4,164,267 records spanning over two months in Columbia, USA. A total of 12 sensors were installed within the laboratory to measure the temperature and humidity levels accurately. A dataset was produced to support Internet of Things (IoT) researchers. Machine learning algorithms were used to create and evaluate the models. Aside from the previous table, there is some related research related to Saudi Arabia’s environmental measures, such as 
. Both papers proved the importance of country-specific environmental parameters in different fields.
Table 2 shows different environmental datasets that are available to the public. However, none of them are related to Saudi Arabia’s environment.
Table 2. Available environmental datasets.
Furthermore, some other related research has been recently presented in the literature. Common approaches for time series forecasting include statistical methods, machine learning, and deep neural networks. Statistical methods use mathematical and probability analysis to model time series data based on past trends. Common time series forecasting techniques make up the autoregressive model, autoregressive moving average (ARMA) model, and differential autoregressive moving average (ARIMA) model. Zeng et al. 
, Wang et al. 
, and Chen 
have investigated the integration of statistical models with backpropagation neural networks to enhance predictions for various applications, including wind power, cloud coverage, and power generation.
Machine learning techniques are better suited for nonlinear fitting problems because they adjust model parameters through internal iterations. Xiao 
proposed a rough set backpropagation model for short-term load prediction, which mitigates the effect of noise on prediction accuracy. Multilayer perceptron (MLP) 
, support vector machine (SVM) 
, and hidden Markov models 
are additional machine learning techniques used for time series forecasting. DNNs 
also have facilitated the handling of intricate data. DNNs have exhibited exceptional performance in diverse applications, including fault detection, speech recognition, natural language processing (NLP), and disease diagnosis. RNNs have strong nonlinear fitting abilities due to their ability to establish connections in the hidden layers that consider the temporal aspects of the data. Traditional RNNs 
encounter vanishing gradients, which hinder their ability to capture long-term dependencies. LSTMs and GRUs 
have been introduced in recent years to address this limitation. LSTMs address gradient vanishing and long-term dependency issues in RNNs by using multiple gated structures.
The significant developments in statistical data-driven machine learning have revived notable interest in artificial intelligence. The success of AI can be due to two pervasive factors: the availability of extensive datasets and increasing computational power. Deep learning (DL) algorithms gained significant success in various industries and everyday life applications around 2010. Examples include Siri, Alexa, and DeepL. The recent resurgence of AI indicates the beginning of a second AI revolution. OpenAI’s ChatGPT 
is a recent example of advanced natural language technology that demonstrates the outstanding potential of contemporary AI. The AI shows its capabilities while recognizing the absence of human senses.
The main objective of AI is to develop the theoretical basis for ML, which enables the creation of software that can learn independently from previous experiences without human involvement 
. To achieve practical intelligence, specific steps must be taken. The process involves utilizing historical information, collecting knowledge, making generalizations, addressing issues related to high-dimensional data, and uncovering explanatory factors within the data. The objective of machine learning is to create algorithms that can learn from data, acquire knowledge, and improve their learning abilities over time in order to understand intelligence. The main challenge is recognizing relevant structural and temporal patterns, also known as “knowledge”, which are frequently hidden in complex spaces that have many dimensions, making them difficult for humans to access 
Nevertheless, there are known challenges in analyzing data within particular application domains. Data quality and relevant feature inclusion are essential. Previous research has shown that the optimal approach involves combining various low-level features with high-level contextual details 
. However, the algorithms’ ability to reproduce results, interpret findings, and explain outcomes to domain experts limits the full potential of AI and ML.