Human Activity Recognition (HAR) consists in monitoring and analyzing the behavior of one or more persons in order to deduce their activity. In a smart home context, the HAR consists in monitoring daily activities of the residents, based on a network of IoT devices. Owing to this monitoring, a smart home can offer personalized home assistance services to improve quality of life, autonomy and health of their residents, especially for elderly and dependent people.
Human Activity Recognition (HAR) from sensors consists in using a network of sensors and connected devices to track a person’s activity. This produces data in the form of a time series of state changes or parameter values. The wide range of sensors—contact detectors, RFID, accelerometers, motion sensors, noise sensors, radar, etc.—can be placed directly on a person, on objects or in the environment. The sensor-based solutions are generally divided into three categories : Wearable, Sensors on Objects and Ambient Sensors.
Tracking at home people's activities can pose serious privacy issues. While camera installation can be part of various security services, residents are generally reluctant to leave cameras and monitoring systems turned on when they are home. Sensor-based systems have dominated the applications of daily activities recognition in smart homes insofar as they are globally less intrusive . Thanks to the development of the Internet of Things and the multiplication of cheap and powerful smart devices, smart homes based on ambient sensors have become a viable technical solution, in order to offer various services. Beyond the hardware, the latter also need algorithms able to exploit such a potential.
Many surveys  involve this research question. But they rarely tackled smart homes problems. Yet, HAR in smart homes is a crucial and challenging problem because human activity is complex and variable from a resident to another. Every resident has different lifestyle, habits or abilities. The wide range of daily activities, as well as the variability and the flexibility in how they can be performed, requires an approach both scalable and adaptive. Algorithms for HAR in smart homes and the challenges for the ambient sensors studied in HAR can be classified as pertaining to a problem of pattern classification, of temporal data analysis or of data variability. The proposed taxonomy is described below.
Algorithms for HAR in smart homes are first pattern recognition algorithms. Based on characteristics and criteria, a method identifies patterns in order to assign them a category. The methods found in the literature can be divided into two broad categories: Knowledge-Driven Approaches (KDA) and Data-Driven Approaches (DDA)
In KDA, an activity model is built through the incorporation of rich prior knowledge gleaned from the application domain. This is done by using knowledge engineering and knowledge management techniques. KDA are motivated by real-world observations that involve activities of daily living and lists of objects required for performing such activities. In real life situations, even if the activity is performed in different ways, the number and the type of involved objects do not vary significantly. For example, the activity “to brush teeth” contains actions involving a toothbrush, a toothpaste, a water tap, cup, and a towel. On the other hand, as humans have different lifestyles, habits and abilities, they can perform an activity in different ways.
KDA also uses the observation that most activities, in particular, routine activities of daily living and working, take place in certain circumstances of time and location. For example, “brushing teeth” is generally undertaken twice a day in a bathroom, in the morning and before going to bed and involves a minima the use of toothpaste and toothbrush. These implicit relations, which build up a local universe on the basis of singular actions, temporal and spatial data and involved objects, provide a diversity of hints and foster heuristics for inferring activities.
KDA for HAR are actually ontology-based approaches. They are commonly used as ontological activity models that do not depend on algorithmic choices. They have been thoroughly used to construct reliable activity models. One can find in a comprehensive overview of such approaches. Ontologies can be used to represent objects in activity spaces exploiting, noticeably, the semantic relations between objects and activities, like in . Such approaches aim at automatically detecting possible activities related to an object.
DDA include both supervised and unsupervised machine learning methods, which primarily use probabilistic and statistical reasoning. The DDA strength is the probabilistic modeling capacity. These models are capable of handling noisy, uncertain, and incomplete sensor data. They can capture domain heuristics, e.g., some activities are more likely than others. They do not require to set up a predefined domain knowledge. However, DDA require much data and, in the case of supervised learning, clean and correctly labeled data.
Several classification algorithms were evaluated. Boundaries classifiers, such as, Decision Trees , Conditional Random Fields, or Support Vector Machines have been used. Probabilistic classifiers, such as the Naive Bayes classifier, also showed good performance in learning and classifying offline activities when a large amount of training data is available.
Most of these methods use hand crafted features, similar to KDA that use prior knowledge. Generating features by hand is time consuming, and avoids adaptability. Another DDA axis, the Deep Learning (DL) approaches (CNN, FCN , Auto-encoders, LSTM, etc.), are now used to overcome this limitation of features extraction and also perform the classification task.
Smart home sensors produce a data stream of events that are ordered in time. This stream is unfortunately not sampled in a regular way like other time series problems. Indeed, it is an event stream because sensors are activated or change state when the resident interacts or performs an action. A sensor activation corresponds more or less to a resident's action. There may be a few seconds or a several minutes between events. It is this sequence of events that translates into a sequence of actions and therefore an activity. It is worth noting the challenges of dealing with the temporal complexity of human activity data in real use cases.
Activities can be more or less complex. A simple activity is an activity that consists of a single action or movement, such as walking, running, turning on the light or opening a drawer. A complex activity is an activity that involves a sequence of actions, potentially involving different interactions with objects, equipment or other people, as, for example, cooking.
Clearly, if monitoring the activities of daily living performed by a single resident is already a complex task, the complexity increases drastically when we have to deal with several residents. The same activities become harder to recognize. On the one hand, in a group, a resident may interact to perform common activities. In this case, the activation of the sensors reflects the same activity for each resident in the group. On the other hand, everyone can perform different activities simultaneously. This produces a simultaneous activation of the sensors for different activities. These activations are then merged and combined in the activity sequences. An activity performed by some resident may become a noise for the activities of another.
Long Short Term Memory (LSTM) algorithms show excellent performance on the classification of irregular time series in the context of a single resident and simple activities. However, human activity is much more complex than this. Long-term dependencies often occur in activities of daily living. To tackle this non-Markovian time series, a context can be introduced to help the understanding of the observed activations.  used language models to both estimate the duration of this dependency and encode this context. Challenges related to the recognition of concurrent, interleaved or idle activities offer more difficulties. Currently, work on HAR in smart homes does not take into account these types of activities. Moreover, people, generally, do not live alone in a house. This is why even more complex challenges are introduced, including the recognition of activity in homes with multiple residents. These challenges, which address multi-class classification problems, are still unsolved.
The complexity of real human activities is not the sole problem. Indeed, the application of human activity recognition in smart homes for real-use cases faces moreover issues causing a sound discrepancy between training and test data. Some of these issues are inherent to smart homes: the temporal drift of the data and the variability of settings.
To accommodate this drift, algorithms for HAR in smart homes should incorporate life-long learning to continuously learn and adapt to changes in human activities from new data, as proposed in [. Recent works in life-long learning incorporating deep learning, as reviewed in , could help tackle this issue of temporal drift. In particular, one can imagine that an interactive system can from time to time request labeled data to users, to continue to learn and adapt. Such algorithms have been developed under the names of interactive reinforcement learning or active imitation learning in robotics. For instance, in , the system is allowed to learn micro and compound actions, while minimizing the number of requests for labeled data by choosing when and what information to ask, and even to whom to ask for help. Such principles could inspire a smart home system to continue to adapt its model, while minimizing user intervention and optimizing his intervention, by pointing out the missing key information.
Beside these long-term evolutions, the data from one house to another are also very different. Thus, the model learned in one house is hardly applicable in another because of the change in house configuration, sensors equipment and families’ compositions and habits. As a matter of fact, the location, the number and the sensor type of smart homes can influence activity recognition performances of a system. Smart homes can be equipped in different ways and may have different architectures in terms of sensors, room configuration, appliance, etc. Some can have a lot of sensors, multiple bathrooms, or bedrooms and contain multiple appliances, while others can be smaller, such as a single apartment, where sensors can be fewer and have more overlaps and noisy sequences. Due to this difference in house configurations, a model, optimized for a particular smart home, could perform poorly in another. Of course, this issue can be faced by collecting a new dataset for each new household, in order to train the models anew; however, this is costly. A less costly solution for data augmentation would be to collect the data in a simulated appartement.
Another solution is to adapt the models learned in a household to another. Transfer learning methods have recently been developed to allow pre-trained deep learning models to be used with different data distributions,. Transfer learning using deep learning has been successfully applied to time series classification, as reviewed in . For activity recognition,  reviewed the different types of knowledge that could be transferred in traditional machine learning. These methods can be updated with deep learning algorithms and by taking advantage of current researches in transfer learning for deep learning. Furthermore, adaptation to new settings have recently been improved by the development of meta-learning algorithms. Their goal is to train a model on a variety of learning tasks, so it can solve new learning tasks using only a small number of training samples. This field has seen recent breakthroughs, as reviewed in , which has never been applied yet to HAR. Yet, the peculiar variability of data of HAR in smart homes can only hope of some benefit from such algorithms.
HAR in smart homes have demonstrated interesting advances owed, mainly, to the development of recent Deep Learning algorithms for end-to-end classification such as convolutional neural networks. It also benefits from recent algorithms for sequence learning such as long-short term memory. However, as with video processing, sequence learning still needs to be improved to be able, both, to deal with the vanishing gradient problem and to take into account the context of the sensor readings. The temporal dimension is incidentally a particularity of ambient sensor systems, as the data is a sparse and irregular time series. The irregular sampling in time has also been tackled with adapted windowing methods for data segmentation. In addition to the time windows used in other HAR fields, sensor event windows are commonly used as well. The sparsity of the data of ambient sensors do not allow machine learning algorithms to take advantage of the redundancy of data over time, as in the case of videos where successive video frames are mostly similar. Moreover, whereas HAR in videos the context of the human action can be seen in the images by the detection of the environment or the objects of attention, the sparsity of the HAR in ambient sensors result in a high reliance on the past information to infer the context information.
While HAR in ambient sensors has to face the problems of complex activities such as sequences of activities, concurrent activities or multi-occupant activities, or even data drift, it has also to tackle specific unsolved problems such as the variability of data. Indeed, the data collected by sensors are even more sensitive to the house configuration, the choice of sensors and their localization.