Figure 1. High-level Perception Framework.
Data fusion entails combining information to accomplish something. This ’something’ is usually to sense the state of some aspect of the universe[14]. The applications of this ’state sensing’ are versatile, to say the least. Some high level areas are: neurology, biology, sociology, engineering, physics,and so on[15,16,17,18,19,20,21]. Due to the very versatile nature of the application of data fusion, throughout this manuscript, we will limit our review to the usage of data fusion using LiDAR data and camera data for autonomous navigation. Kolar et. al., [118] performed an exhaustive data fusion survey and docmented several details about data fusion using the above two sensors.
Sensors and their Input to Perception
A sensor is an electronic device that measures physical aspects of an environment and outputs machine(a digital computer) readable data. They provide a direct perception of the environment they are implemented in. Typically a suite of sensors is used since it is the inherent property of an individual sensor; to provide a single aspect of an environment. This not only enables the completeness of the data but also improves the accuracy of measuring the environment.
The Merriam-Webster dictionary defines a sensor[22] as
A device that responds to a physical stimulus (such as heat, light, sound, pressure, magnetism, or a particular motion) and transmits a resulting impulse (as for measurement or operating a control)
The initial step is raw data capture using the sensors. The data is then filtered and an appropriate fusion technology implemented this is fed into localization and mapping techniques like SLAM; The same data can be used to identify static or moving objects in the environment and this data can be used to classify the objects, wherein classification information is used to finalize information in creating a model of the environment which in turn can be feed into the control algorithm[
27]. The classification information could potentially give details of pedestrians, furniture, vehicles, buildings, etc. Such a classification is useful in both pre-mapped ie., known environments and unknown environments since it increases the potential of the system to explore its environment and navigate.
Multiple Sensors vs. Single Sensor
It is a known fact that most of the autonomous systems require multiple sensors to function optimally. However, why should we use multiple sensors? The individual usage of any sensor could impact the system where they are used, due to the limitations in each of those sensors. Hence, to get acceptable results, one may utilize a suite of different sensors and utilize the benefits of each of them. The diversity offered by the suite of sensors contributes positively to the sensed data perception[
38,
39]. Another reason could be the system failure risk due to the failure of that single sensor[
21,
27,
40] and hence one should introduce a level of redundancy. For instance, while executing the obstacle avoidance module, if the camera is the only installed sensor and it fails, it could be catastrophic. However, if it has an additional camera or LiDAR, it can navigate itself to a safe place after successfully avoiding the obstacle, if such logic is built-in for that failure. Many researchers performed a study on high-level decision data fusion and concluded that using multiple sensors with data fusion is better than individual sensors without data fusion. The research community discovered that every sensor used provides a different type, sometimes unique type of information in the selected environment, which includes the tracked object, avoided object, the autonomous vehicle itself, the world it is being used, and so on, and the information is provided with differing accuracy and differing details[
27,
39,
44,
45,
46].
There are some disadvantages while using multiple sensors and one of them is that they have additional levels of complexity; however, using an optimal technique for fusing the data can mitigate this challenge efficiently. There may be the presence of a level of uncertainty in the functioning, accuracy, and appropriateness of the sensed raw data[
47]. Due to these challenges, the system must be able to diagnose accurately when a failure occurs and ensure that the failed component(s) are identified for apt mitigation.
Need for Sensor Data Fusion
Some of the limitations of single sensor unit systems are as follows:
Deprivation: If a sensor stops functioning, the system where it was incorporated in will have a loss of perception.
Uncertainty: Inaccuracies arise when features are missing, due to ambiguities or when all required aspects cannot be measured
Imprecision: The sensor measurements will not be precise and will not be accurate.
Limited temporal coverage: There is initialization/setup time to reach a sensor’s maximum performance and transmit a measurement, hence limiting the frequency of the maximum measurements.
Limited spatial coverage: Normally, an individual sensor will cover only a limited region of the entire environment—for example, a reading from an ambient thermometer on a drone provides an estimation of the temperature near the thermometer and may fail to correctly render the average temperature in the entire environment.
Some of the advantages of using multiple sensors or a sensor suite [
38,
44,
46,
50,
51] are as follows:
Extended Spatial Coverage: Multiple sensors can measure across a wider range of space and sense where a single sensor cannot'
Extended Temporal Coverage: Time-based coverage increases while using multiple sensors
Improved resolution: A union of multiple independent measurements of the same property, the resolution is better, i.e., more than that of single sensor measurement.
Reduced Uncertainty: As a whole, when we consider the entire sensor suite, the uncertainty decreases, since the combined information reduces the set of unambiguous interpretations of the sensed value.
Increased robustness against interference: An increase in the dimensionality of the sensor space (measuring using a LiDAR and stereo vision cameras), the system becomes less vulnerable against interference.
Increased robustness: The redundancy that is provided due to the multiple sensors provides more robustness, even when there is a partial failure due to one of the sensors being down.
Increased reliability: Due to the increased robustness, the system becomes more reliable.
Increased confidence: When the same domain or property is measured by multiple sensors, one sensor can confirm the accuracy of other sensors; this can be attributed to re-verification and hence the confidence is better.
Reduced complexity: The output of multiple sensor fusion is better; it has lesser uncertainty, is less noisy, and complete.
Data Fusion Techniques
Over the years, scientists and engineers have applied concepts of sensing that occur in nature and implement them into their research areas and have developed new disciplines and technologies that span over several fields. In the early 1980s, researchers used aerial sensor data to obtain passive sensor fusion of stereo vision imagery. Crowley et al. performed fundamental research in the area of data fusion, perception, and world model development that is vital for robot navigation [
57,
58,
59]. They have developed systems with multiple sensors and devised mechanisms and techniques to augment the data from all the sensors and get the 'best' data as output from this set of sensors, also known as a 'suite of sensors'. In short, this augmentation or integration of data from multiple sensors can simply be termed as multi-sensor data fusion. In the survey paper, we discuss the following integration techniques.
K-Means
K-means is a popular algorithm that has been widely employed; it provides a good generalization of the data clustering and it guarantees the convergence.
Probabilistic Data Association (PDA)
PDA was proposed by Bar-Shalom and Tse, and it is also known by the "modified filter of all neighbors" [
86]. The functionality is to assign an association probability to each hypothesis from the correct measurement of a destination/target and then process it. PDA is mainly good for tracking targets that do not make abrupt changes in their movement pattern
Distributed Multiple Hypothesis Test
A very useful technique that can be used in distributed and decentralized systems[
90]. This is an extension of the multiple hypothesis tests. It is efficient at tracking multiple targets in cluttered environments [
91]. This can be used as an estimation and tracking technique [
90]. The main disadvantage is the high computation cost, which is in the exponential order.
State Estimation
Also known as tracking techniques, they assist with calculating the moving target's state, when measurements are given. These measurements are obtained using the sensors [
87]. This is a fairly common technique in data fusion mainly for two reasons: (1) measurements are usually obtained from multiple sensors; and there could be noise in the measurements. Some examples are Kalman Filters, Extended Kalman Filters, Particle Filters, etc
Covariance Consistency Methods
These methods were proposed initially by Uhlmann et al[
84,
87]. This is a distributed technique that maintains covariance estimations and means in a distributed system. They comprise of estimation-fusion techniques.
Distributed Data Fusion
As the name suggests, this is a distributed fusion system and is often used in multi-agent systems, multisensor systems, and multimodal systems[
84,
94,
95]. Efficient when distributed and decentralized systems are present. An optimum fusion can be achieved by adjusting the decision rules. However, there are difficulties in finalizing decision uncertainties.
Decision Fusion Techniques
These techniques can be used when successful target detection occurs [
87,
92,
93]. They enable high-level inference for such events. When a user has a situation where multiple classifiers are present, this technique can be used. A single decision can be arrived at using the multiple classifiers. For the enablement of multiple classifiers to be achieved, apriori probabilities need to be present and this is difficult.
LiDAR
Light Detection and Ranging (LiDAR) is a technology that is used in several autonomous tasks and functions as follows: an area is illuminated by a light source. The light is scattered by the objects in that scene and is detected by a photo-detector. The LiDAR can provide the distance to the object by measuring the time it takes for the light to travel to the object and back [
104,
105,
106,
107,
108,
109 ].
Camera
The types of cameras are Conventional color cameras like USB/web camera; RGB [
115], RGB-mono, and RGB cameras with depth information; RGB-Depth (RGB-D), 360 degree camera [
28,
116,
117,
118], and Time-of-Flight (TOF)camera[
119,
120,
121].
Implementation of Data Fusion using a LiDAR and camera
We review an input-output type of fusion as described by Dasarathy et al. They propose a classification strategy based on input-output of entities like data, architecture, features, and decisions. The fusion of raw data in the first layer, a fusion of features in the second, and finally the decision layer fusion. In the case of the LiDAR and camera data fusion, two distinct steps effectively integrate/fuse the data.
- Geometric Alignment of the Sensor Data
- Resolution Match between the Sensor Data
Geometric Alignment of the Sensor Data
The first and foremost step in the data fusion methodology is the alignment of the sensor data. In this step, the logic finds LiDAR data points for each of the pixel data points from the optical image. This ensures the geometric alignment of the two sensors [
28].
Resolution Match between the Sensor Data
Once the data is geometrically aligned, there must be a match in the resolution between the sensor data of the two sensors. The optical camera has the highest resolution of 1920 × 1080 at 30 fps, followed by the depth camera output that has a resolution of 1280 × 720 pixels at 90 fps, and finally, the LiDAR data have the lowest resolution. This step performs an extrinsic calibration of the data. Madden et al. performed a sensor alignment [
126] of a LiDAR and 3D depth camera using a probabilistic approach. De Silva et al. [
28] performed a resolution match by finding a distance value for the image pixels for which there is no distance value. They solve this as a missing value prediction problem, which is based on regression. They formulate the missing data values using the relationship between the measured data point values by using a multi-modal technique called Gaussian Process Regression (GPR), developed by Lahat et al. [
39]. The resolution matching of two different sensors can be performed through extrinsic sensor calibration. Considering the depth information of a liDAR and the stereo vision camera, 3D depth boards can be developed out of simple 2D images
Challenges with Sensor Data Fusion
Several challenges have been observed while implementing multisensor data fusion. Some of them could be data related to like: complexity in data, conflicting and/or contradicting data, or they can be technical such as resolution differences between the sensors, the difference in alignment between the sensors, etc. We review two of the fundamental challenges surrounding sensor data fusion, which are the resolution differences in the heterogeneous sensors and understanding and utilizing the heterogeneous sensor data streams while accounting for many uncertainties in the sensor data sources [
28,
39]. We focus on reviewing the utilization of the fused information in the autonomous navigation, which is challenging since many autonomous systems work in complex environments, be it at home or work, which is to assist persons with severe motor disabilities to handle their navigational requirements and hence pose significant challenges for decision-making due to the safety, efficiency, and accuracy requirements. For reliable operation, decisions on the system need to be made by considering the entire set of multi-modal sensor data they acquire, keeping in mind a complete solution. In addition to this, the decisions need to be made considering the uncertainties associated with both the data acquisition methods and the implemented pre-processing algorithms. Our focus in this review is to survey the data fusion techniques that consider the uncertainty in the fusion algorithm.
Some researchers used mathematical and/or statistical techniques for data fusion. Others used techniques comprised of reinforcement learning in implementing multisensor data fusion [
70], where they encountered conflicting data. In this study, they fitted smart mobile systems with sensors that enabled the systems to be sensitive to the environment(s) they were active in. The challenge they try to solve is mapping the multiple streams of raw sensory data Smart agents to their tasks. In~their environment, the tasks were different and conflicting, which complicated the problem. This resulted in their system learning to translate the multiple inputs to the appropriate tasks or sequence of system actions. Crowel et al. developed mathematical tools to counter uncertainties with fusion and perception [
133]. Other implementations include adaptive learning techniques [
134], wherein the authors use D-CNN techniques in a multisensor environment for fault diagnostics in planetary gearboxes.
Sensor data noise and rectification
Noise filtering techniques include a suite of Kalman filters and their variations. We discuss the following:
Kalman Filter, Extended Kalman Filter, Unscented Kalman Filter, Distributed Kalman Filter, Particle Filter
Autonomous Navigation
Robot navigation has been extensively studied in the community for several decades [161,162,163,164,165,166,167]. It can be termed as the safe mobility of the robot from a source location to a target location, without hurting people or properties in its environment, and without damaging itself, and these tasks are performed with no or limited need for a human operator. This means that the navigation system is also responsible for decision-making capability when the system faces situations (critical or otherwise) that demand negotiation with humans and/or other robots. Autonomous navigation is a task that takes in the output from a sensor data fusion module. Autonomous navigation means that a vehicle can plan its path and execute its plan without human intervention. An autonomous robot is one that not only can maintain its stability as it moves, but also can plan its movements. They use navigation aids when possible, but can also rely on visual, auditory, and olfactory cues. Decision-making relies on data fusion which comprises combining inputs from various sources to get a more accurate combined sensor data as output [35,38,44,51]. Figure 2 gives a simple sensor data fusion and its implementation in an autonomous dynamic model genrator. Sub-systems mapping, localization, path planning, and obstacle avoidance is detailed below.
Figure 2. Tasks that accept integrated sensor information .
Mapping
When a system maps an environment, the mapping module senses the environment that the robot operates in and provides data to analyze it for optimal functioning. It is also a process of establishing a spatial relationship among stationary objects in an environment. Efficient mapping is a crucial process that gives rise to accurate localization and driving decision making. Usage of LiDARs for mapping is beneficial as they are well known for their high-speed and long-range sensing and hence long-range mapping, while cameras RGB, and RGB-Depth are used for short-range mapping and also used to efficiently detect obstacles [
170,
171,
172].
Localization
Localization is one of the most fundamental competencies required by an autonomous system, as~the knowledge of the vehicle's location is an essential precursor to take any decisions about future actions, whether planned or unplanned. In a typical localization situation, a map of the environment or world is available and the robot is equipped with sensors that sense and observe the environment as well as monitor the robot's motion [
188,
198,
199,
200]. Hence, localization is that branch in autonomous system navigation, which deals with the study and application of the ability of a robot to localize itself in a map or plan. The localization module informs the robot of its current position at any given time. A process of establishing the spatial relationship between the intelligent system and the stationary objects Localization is achieved using devices like Global Positioning Systems(GPS), odometric sensors, Inertial Measurement Units (IMU), etc. These sensors give the position information of the autonomous system, which can be used by the system to see where it is in the environment or the robot world [
198,
201,
202]. Some of the localization techniques are dead reckoning [
205,
206], GPS [
212,213],signal-based localization[
207],vision-based localization[
217,
218], indoor-VR localization [
219], networked sensor-based localization [
214].
Path planning
Path Planning is an important subtask of autonomous navigation and is generally termed as a problem of searching for a path which an autonomous system has to follow in a described environment and requires the vehicle to go in the direction closest to the goal, and, generally, the map of the area is already known[
220,
221,
222,
223]. Path planning when used in conjunction with techniques of obstacle avoidance gives a more robust deployment of the path planner module by enabling the system to avoid hazardous collision objects, no-go zones, and negative objects like potholes and similar objects. Some types of path planners are global, local, heuristic, static, dynamic path planners.
Obstacle avoidance
For successful navigation of an autonomous system, avoiding obstacles while in motion is an absolute requirement. The vehicles must be able to navigate in their environment safely [
30,
32,
33,
35,
170]. Obstacle avoidance involves choosing the best direction among multiple non-obstructed directions, in real-time, hence obstacle avoidance can be considered to be more challenging than path~planning[
223]. Obstacles can be of two types (i) Immobile Obstacles (ii) Mobile Obstacles. Static object detection deals with localizing objects that are immobile in an environment, for example, indoor static obstacles, can be a table, sofa, bed, planter, TV stand, walls, etc. Outdoor static obstacles can be buildings, trees, parked vehicles, poles (light, communication), (standing or sitting) persons, animals lying down, etc. Moving object detection deals with localizing the dynamic objects through different data frames obtained by the sensors to estimate their future state example of indoor moving objects can be walking or running pets at home, moving persons, operating vacuum robots, crawling baby, people moving in wheelchairs, etc. Outdoor moving obstacles can, for instance, by moving vehicles, pedestrians walking on the pathway, moving ball thrown in the air, flying drone(s), running pets, etc. The object's state has to be updated at each time instance. Moving object localization is not a simple task even with precise localization information. The challenge increases when the environment is cluttered with obstacles. The obstacles can be detected using two approaches that rely on prior mapped knowledge of the targets or the environments [
33,
37,
48,
180,
244]. These are the (i) Feature-based approaches that use LiDAR and detect the dynamic features of the objects; and (ii) Appearance-based approaches that use cameras and detect moving objects or temporally static objects. Autonomous vehicles must be able to navigate their environment safely. We can broadly classify obstacle avoidance into static and mobile obstacle avoidance [
245,
246]. As the name suggests, static obstacle avoidance deals with navigating around obstacles that do not move, and only the autonomous vehicle is in motion. Static obstacle avoidance is a process of establishing the temporal and spatial relationship between the mobile vehicle and the immobile obstacles—for example, a sofa in a living room. In contrast, mobile obstacle avoidance is a process of establishing the temporal and spatial relationship between the mobile objects in the environment, in addition to the vehicle and stationary objects. While path planning requires the vehicle to go in the direction nearest to the goal [
223], and generally the map of the area is known, obstacle avoidance entails selection of the best direction among several
unobstructed directions in real-time. Some of the obstacle avoidance techniques are as follows: static obstacle avoidance, mobile obstacle avoidance. Some of the approaches are grid-based, topological, hybrid, vector field histogram, dynamic window, and neural network-based approach.
In this section, we discuss the four subtasks that are prominently used in autonomous navigation.
We used a 3D Lidar Velodyne and integrated it with an Intel Realsense D435. Our results showed that the millisecond response time of the LiDAR when integrated with the high resolution data of the camera, gave us accurate information to detect static or moving targets, perform accurate path planning, and implement SLAM functionality. We have plans to extend this research to develop an autonomous wheelchair controlled by brain computing interface using the human thought.