Advanced visual perception in mining – multimodal fusion and enhancement

The Australasian Institute of Mining and Metallurgy

C Xu B Li

Organization:: The Australasian Institute of Mining and Metallurgy
Pages:: 4
File Size:: 229 KB
Publication Date:: Sep 1, 2024

Abstract

Email to a Friend

Multi-sensor fusion visual perception technology significantly enhances visual perception capabilities in complex and harsh environments by combining visual data from different sensors. This technology is especially useful for scenarios where traditional single visible light sensors, such as standard cameras, struggle due to poor lighting, adverse visual conditions, or other visual obstructions. In environments with low light, smoke, or dust, the performance of conventional visible light sensors degrades, limiting their application. By integrating data from various sensors like radar, infrared thermal imaging, and others, multi-sensor fusion technology provides a more comprehensive and complementary visual solution. This fusion not only enhances the robustness of visual systems but also greatly expands their potential applications in fields like visual surveillance. In recent years, deep learning methods have become the mainstream technology for addressing multi-sensor fusion issues, demonstrating significant advantages in adaptive feature selection compared to traditional machine learning methods. However, deep learning-based multi-sensor fusion algorithms still face several challenges. The design of network structures is often redundant, lacking effective screening of useful components within multimodal information. Mainstream fusion algorithms focus excessively on improving display effects without adequately considering the needs of downstream applications and tasks. Existing fusion perception algorithms are generally designed for open visual scenes and lack targeted design for complex visual degradation factors present in tunnel environments of mines. Current deep learning methods for designing multi-sensor fusion networks heavily rely on manual experience to create fusion modules. This reliance increases the complexity of network design and can lead to redundancy. Redundant network modules are difficult to identify in end-to-end learning tasks at this stage, significantly slowing down network inference and potentially interfering with the output of fusion perception. Additionally, training these fusion modules typically requires large amounts of well-annotated multimodal data, further increasing implementation difficulty and cost. For instance, the fusion network design proposed by Li and Wu (2018) focuses on extracting multimodal fusion features, while Zhao et al (2020) introduced a feature decomposition mechanism in the feature fusion and extraction modules. Researchers like Zhang and Ma (2021), Xu et al (2020), and Liu et al (2017) have attempted to improve network structures through dense cascades and the introduction of residual connections. Although these methods have improved the handling of multimodal information to some extent, they still face limitations in distinguishing between useful and redundant information, with redundancy persisting in network designs. Ideally, the design of fusion networks should be more intelligent, guiding network structure design heuristically based on the importance of different modal information for the current task. This would reduce information redundancy and network weights, requiring the development of new algorithms or frameworks capable of automatically identifying and optimising fusion strategies, thereby reducing dependence on manual experience while improving fusion efficiency and network performance. Current deep learning designs for multi-sensor fusion algorithms often overlook the

Citation

APA: C Xu B Li (2024) Advanced visual perception in mining – multimodal fusion and enhancement

MLA: C Xu B Li Advanced visual perception in mining – multimodal fusion and enhancement. The Australasian Institute of Mining and Metallurgy, 2024.

Export

CSV

RIS

Purchase this Article for $25.00

Create a Guest account to purchase this file
- or -
Log in to your existing Guest account