Autonomous navigation is a very important area in the huge domain of mobile autonomous vehicles. Sensor integration is a key concept that is critical to the successful implementation of navigation. As part of this publication, we review the integration of Laser sensors like LiDAR with vision sensors like cameras. The past decade, has witnessed a surge in the application of sensor integration as part of smart-autonomous mobility systems. Such systems can be used in various areas of life like safe mobility for the disabled, disinfecting hospitals post Corona virus treatments, driver-less vehicles, sanitizing public areas, smart systems to detect deformation of road surfaces, to name a handful. These smart systems are dependent on accurate sensor information in order to function optimally. This information may be from a single sensor or a suite of sensors with the same or different modalities. We review various types of sensors, their data, and the need for integration of the data with each other to output the best data for the task at hand, which in this case is autonomous navigation. In order to obtain such accurate data, we need to have optimal technology to read the sensor data, process the data, eliminate or at least reduce the noise and then use the data for the required tasks. We present a survey of the current data processing techniques that implement integration of multimodal data from different types of sensors like LiDAR that use light scan technology, various types of Red Green Blue (RGB) cameras that use optical technology and review the efficiency of using fused data from multiple sensors rather than a single sensor in autonomous navigation tasks like mapping, obstacle detection, and avoidance or localization. This survey will provide sensor information to researchers who intend to accomplish the task of motion control of a robot and detail the use of LiDAR and cameras to accomplish robot navigation
Autonomous systems can play a vital role in assisting humans in a variety of problem areas. This could potentially be in a wide range of applications like driver-less cars, humanoid robots, assistive systems, domestic systems, military systems, and manipulator systems, to name a few. Presently, the world is at a bleeding edge of technologies that can enable this even in our daily lives. Assistive robotics is a crucial area of autonomous systems that helps persons who require medical, mobility, domestic, physical, and mental assistance. This research area is gaining popularity in applications like autonomous wheelchair systems [1,2], autonomous walkers , lawn movers [4,5], vacuum cleaners , intelligent canes , and surveillance systems in places like assisted living [8,9,10,11]. Data are one of the most important components to optimally start, continue, or complete any task. Often, these data are obtained from the environment that the autonomous system functions in; examples of such data could be the system’s position and location coordinates in the environment, the static objects, speed/velocity/acceleration of the system or its peers or any moving object in its vicinity, vehicle heading, air pressure, and so on. Since this is obtained directly from the operational environment, the information is up-to-date and can be accessed through either built-in or connected sensing equipment/devices. This survey is focused on the vehicle navigation of an autonomous vehicle. We review the past and present research using Light Imaging Detection and Ranging (LiDAR) and Imaging systems like a camera, which are laser and vision-based sensors, respectively. The autonomous systems use sensor data for tasks like object detection, obstacle avoidance, mapping, localization, etc. As we will see in the upcoming sections, these two sensors can complement each other and hence are being used extensively for detection in autonomous systems. The LiDAR market alone is expected to reach $52.5 Billion by the year 2032, as given in a recent survey by the Yole group, documented by the “First Sensors” group .
In a typical autonomous system, a perception module inputs the optimal information into the control module. Refer to Figure 1. Crowley et al.  define perception as "The process of maintaining an internal description of the external environment."
Figure 1. High-level Perception Framework.
Data fusion entails combining information to accomplish something. This ’something’ is usually to sense the state of some aspect of the universe. The applications of this ’state sensing’ are versatile, to say the least. Some high level areas are: neurology, biology, sociology, engineering, physics,and so on[15,16,17,18,19,20,21]. Due to the very versatile nature of the application of data fusion, throughout this manuscript, we will limit our review to the usage of data fusion using LiDAR data and camera data for autonomous navigation. Kolar et. al.,  performed an exhaustive data fusion survey and docmented several details about data fusion using the above two sensors.
A sensor is an electronic device that measures physical aspects of an environment and outputs machine(a digital computer) readable data. They provide a direct perception of the environment they are implemented in. Typically a suite of sensors is used since it is the inherent property of an individual sensor; to provide a single aspect of an environment. This not only enables the completeness of the data but also improves the accuracy of measuring the environment.
The Merriam-Webster dictionary defines a sensor as a device that responds to a physical stimulus (such as heat, light, sound, pressure, magnetism, or a particular motion) and transmits a resulting impulse (as for measurement or operating a control).
The initial step is raw data capture using the sensors. The data is then filtered and an appropriate fusion technology implemented this is fed into localization and mapping techniques like SLAM; The same data can be used to identify static or moving objects in the environment and this data can be used to classify the objects, wherein classification information is used to finalize information in creating a model of the environment which in turn can be feed into the control algorithm. The classification information could potentially give details of pedestrians, furniture, vehicles, buildings, etc. Such a classification is useful in both pre-mapped ie., known environments and unknown environments since it increases the potential of the system to explore its environment and navigate.
It is a known fact that most of the autonomous systems require multiple sensors to function optimally. However, why should we use multiple sensors? The individual usage of any sensor could impact the system where they are used, due to the limitations in each of those sensors. Hence, to get acceptable results, one may utilize a suite of different sensors and utilize the benefits of each of them. The diversity offered by the suite of sensors contributes positively to the sensed data perception[38,39]. Another reason could be the system failure risk due to the failure of that single sensor[21,27,40] and hence one should introduce a level of redundancy. For instance, while executing the obstacle avoidance module, if the camera is the only installed sensor and it fails, it could be catastrophic. However, if it has an additional camera or LiDAR, it can navigate itself to a safe place after successfully avoiding the obstacle, if such logic is built-in for that failure. Many researchers performed a study on high-level decision data fusion and concluded that using multiple sensors with data fusion is better than individual sensors without data fusion. The research community discovered that every sensor used provides a different type, sometimes unique type of information in the selected environment, which includes the tracked object, avoided object, the autonomous vehicle itself, the world it is being used, and so on, and the information is provided with differing accuracy and differing details[27,39,44,45,46].
There are some disadvantages while using multiple sensors and one of them is that they have additional levels of complexity; however, using an optimal technique for fusing the data can mitigate this challenge efficiently. There may be the presence of a level of uncertainty in the functioning, accuracy, and appropriateness of the sensed raw data. Due to these challenges, the system must be able to diagnose accurately when a failure occurs and ensure that the failed component(s) are identified for apt mitigation.
Some of the limitations of single sensor unit systems are as follows:
Deprivation: If a sensor stops functioning, the system where it was incorporated in will have a loss of perception.
Uncertainty: Inaccuracies arise when features are missing, due to ambiguities or when all required aspects cannot be measured
Imprecision: The sensor measurements will not be precise and will not be accurate.
Limited temporal coverage: There is initialization/setup time to reach a sensor’s maximum performance and transmit a measurement, hence limiting the frequency of the maximum measurements.
Limited spatial coverage: Normally, an individual sensor will cover only a limited region of the entire environment—for example, a reading from an ambient thermometer on a drone provides an estimation of the temperature near the thermometer and may fail to correctly render the average temperature in the entire environment.
Extended Spatial Coverage: Multiple sensors can measure across a wider range of space and sense where a single sensor cannot'
Extended Temporal Coverage: Time-based coverage increases while using multiple sensors
Improved resolution: A union of multiple independent measurements of the same property, the resolution is better, i.e., more than that of single sensor measurement.
Reduced Uncertainty: As a whole, when we consider the entire sensor suite, the uncertainty decreases, since the combined information reduces the set of unambiguous interpretations of the sensed value.
Increased robustness against interference: An increase in the dimensionality of the sensor space (measuring using a LiDAR and stereo vision cameras), the system becomes less vulnerable against interference.
Increased robustness: The redundancy that is provided due to the multiple sensors provides more robustness, even when there is a partial failure due to one of the sensors being down.
Increased reliability: Due to the increased robustness, the system becomes more reliable.
Increased confidence: When the same domain or property is measured by multiple sensors, one sensor can confirm the accuracy of other sensors; this can be attributed to re-verification and hence the confidence is better.
Reduced complexity: The output of multiple sensor fusion is better; it has lesser uncertainty, is less noisy, and complete.
Over the years, scientists and engineers have applied concepts of sensing that occur in nature and implement them into their research areas and have developed new disciplines and technologies that span over several fields. In the early 1980s, researchers used aerial sensor data to obtain passive sensor fusion of stereo vision imagery. Crowley et al. performed fundamental research in the area of data fusion, perception, and world model development that is vital for robot navigation [57,58,59]. They have developed systems with multiple sensors and devised mechanisms and techniques to augment the data from all the sensors and get the 'best' data as output from this set of sensors, also known as a 'suite of sensors'. In short, this augmentation or integration of data from multiple sensors can simply be termed as multi-sensor data fusion. In the survey paper, we discuss the following integration techniques.
K-means is a popular algorithm that has been widely employed; it provides a good generalization of the data clustering and it guarantees the convergence.
PDA was proposed by Bar-Shalom and Tse, and it is also known by the "modified filter of all neighbors" . The functionality is to assign an association probability to each hypothesis from the correct measurement of a destination/target and then process it. PDA is mainly good for tracking targets that do not make abrupt changes in their movement pattern.
A very useful technique that can be used in distributed and decentralized systems. This is an extension of the multiple hypothesis tests. It is efficient at tracking multiple targets in cluttered environments . This can be used as an estimation and tracking technique . The main disadvantage is the high computation cost, which is in the exponential order.
Also known as tracking techniques, they assist with calculating the moving target's state, when measurements are given. These measurements are obtained using the sensors . This is a fairly common technique in data fusion mainly for two reasons: (1) measurements are usually obtained from multiple sensors; and there could be noise in the measurements. Some examples are Kalman Filters, Extended Kalman Filters, Particle Filters, etc
6.5. Covariance Consistency Methods
These methods were proposed initially by Uhlmann et al[84,87]. This is a distributed technique that maintains covariance estimations and means in a distributed system. They comprise of estimation-fusion techniques.
As the name suggests, this is a distributed fusion system and is often used in multi-agent systems, multisensor systems, and multimodal systems[84,94,95]. Efficient when distributed and decentralized systems are present. An optimum fusion can be achieved by adjusting the decision rules. However, there are difficulties in finalizing decision uncertainties.Decision Fusion TechniquesThese techniques can be used when successful target detection occurs [87,92,93]. They enable high-level inference for such events. When a user has a situation where multiple classifiers are present, this technique can be used. A single decision can be arrived at using the multiple classifiers. For the enablement of multiple classifiers to be achieved, apriori probabilities need to be present and this is difficult.
Light Detection and Ranging (LiDAR) is a technology that is used in several autonomous tasks and functions as follows: an area is illuminated by a light source. The light is scattered by the objects in that scene and is detected by a photo-detector. The LiDAR can provide the distance to the object by measuring the time it takes for the light to travel to the object and back [104, 105, 106, 107, 108, 109 ].CameraThe types of cameras are Conventional color cameras like USB/web camera; RGB , RGB-mono, and RGB cameras with depth information; RGB-Depth (RGB-D), 360 degree camera [28,116,117,118], and Time-of-Flight (TOF)camera[119,120,121].Implementation of Data Fusion using a LiDAR and cameraWe review an input-output type of fusion as described by Dasarathy et al. They propose a classification strategy based on input-output of entities like data, architecture, features, and decisions. The fusion of raw data in the first layer, a fusion of features in the second, and finally the decision layer fusion. In the case of the LiDAR and camera data fusion, two distinct steps effectively integrate/fuse the data.
Geometric Alignment of the Sensor Data
Resolution Match between the Sensor Data
Geometric Alignment of the Sensor DataThe first and foremost step in the data fusion methodology is the alignment of the sensor data. In this step, the logic finds LiDAR data points for each of the pixel data points from the optical image. This ensures the geometric alignment of the two sensors .Resolution Match between the Sensor DataOnce the data is geometrically aligned, there must be a match in the resolution between the sensor data of the two sensors. The optical camera has the highest resolution of 1920 × 1080 at 30 fps, followed by the depth camera output that has a resolution of 1280 × 720 pixels at 90 fps, and finally, the LiDAR data have the lowest resolution. This step performs an extrinsic calibration of the data. Madden et al. performed a sensor alignment  of a LiDAR and 3D depth camera using a probabilistic approach. De Silva et al.  performed a resolution match by finding a distance value for the image pixels for which there is no distance value. They solve this as a missing value prediction problem, which is based on regression. They formulate the missing data values using the relationship between the measured data point values by using a multi-modal technique called Gaussian Process Regression (GPR), developed by Lahat et al. . The resolution matching of two different sensors can be performed through extrinsic sensor calibration. Considering the depth information of a liDAR and the stereo vision camera, 3D depth boards can be developed out of simple 2D imagesChallenges with Sensor Data FusionSeveral challenges have been observed while implementing multisensor data fusion. Some of them could be data related to like: complexity in data, conflicting and/or contradicting data, or they can be technical such as resolution differences between the sensors, the difference in alignment between the sensors, etc. We review two of the fundamental challenges surrounding sensor data fusion, which are the resolution differences in the heterogeneous sensors and understanding and utilizing the heterogeneous sensor data streams while accounting for many uncertainties in the sensor data sources [28, 39]. We focus on reviewing the utilization of the fused information in the autonomous navigation, which is challenging since many autonomous systems work in complex environments, be it at home or work, which is to assist persons with severe motor disabilities to handle their navigational requirements and hence pose significant challenges for decision-making due to the safety, efficiency, and accuracy requirements. For reliable operation, decisions on the system need to be made by considering the entire set of multi-modal sensor data they acquire, keeping in mind a complete solution. In addition to this, the decisions need to be made considering the uncertainties associated with both the data acquisition methods and the implemented pre-processing algorithms. Our focus in this review is to survey the data fusion techniques that consider the uncertainty in the fusion algorithm.
Some researchers used mathematical and/or statistical techniques for data fusion. Others used techniques comprised of reinforcement learning in implementing multisensor data fusion , where they encountered conflicting data. In this study, they fitted smart mobile systems with sensors that enabled the systems to be sensitive to the environment(s) they were active in. The challenge they try to solve is mapping the multiple streams of raw sensory data Smart agents to their tasks. In~their environment, the tasks were different and conflicting, which complicated the problem. This resulted in their system learning to translate the multiple inputs to the appropriate tasks or sequence of system actions. Crowel et al. developed mathematical tools to counter uncertainties with fusion and perception . Other implementations include adaptive learning techniques , wherein the authors use D-CNN techniques in a multisensor environment for fault diagnostics in planetary gearboxes.Sensor data noise and rectificationNoise filtering techniques include a suite of Kalman filters and their variations.
LiDAR and Camera are two of the most important sensors that can provide situation awareness to an autonomous system which can be applied in tasks like mapping, visual localization, path planning, and obstacle avoidance. We are currently integrating these two sensors for autonomous tasks on a power wheelchair. Our results have been in agreement with past observations that an accurate integration does provide better information for all the above mentioned tasks. The improvement is seen since both sensors complement each other. We have been successful in using the speed of the LiDAR with the data richness of the camera.
We used a 3D Lidar Velodyne and integrated it with an Intel Realsense D435. Our results showed that the millisecond response time of the LiDAR when integrated with the high resolution data of the camera, gave us accurate information to detect static or moving targets, perform accurate path planning, and implement SLAM functionality. We have plans to extend this research to develop an autonomous wheelchair controlled by brain computing interface using the human thought.