New Techniques Extract more Precise Data from Images that have been Deteriorated by Environmental Factors

Automatic surveillance systems, self-driving cars, facial recognition, healthcare, and social distancing tools are all examples of where computer vision technology is being applied. Users need accurate and trustworthy visual data to properly benefit from video analytics solutions, however, video data quality is frequently impacted by environmental factors such as rain, nighttime circumstances, or crowds (where there are multiple images of people overlapping with each other in a scene).

A team of researchers led by Yale-NUS College Associate Professor of Science (Computer Science) Robby Tan, who is also from the National University of Singapore’s (NUS) Faculty of Engineering, has developed novel approaches to solve the problem of low-level vision in videos caused by rain and night-time conditions, as well as improve the accuracy of 3D human pose estimation in videos, using computer vision and deep learning.

The findings were presented at the Computer Vision and Pattern Recognition (CVPR) Conference in 2021.

Combating visibility issues during rain and night-time conditions

Low light and man-made light effects such as glare, glow, and floodlights affect nighttime photographs, whereas rain images are affected by rain streaks or buildup (or rain veiling effect).

“Many computer vision systems like automatic surveillance and self-driving cars, rely on clear visibility of the input videos to work well. For instance, self-driving cars cannot work robustly in heavy rain and CCTV automatic surveillance systems often fail at night, particularly if the scenes are dark or there is significant glare or floodlights,” explained Assoc Prof Tan.

Assoc Prof Tan and his team used deep learning algorithms to improve the quality of night-time and rain-related videos in two different experiments. In the first experiment, they increased the brightness while reducing noise and light effects (glare, glow, and floodlights) to provide crisp nighttime images.

This is a new technology that tackles the issue of clarity in night-time photographs and movies when glare cannot be avoided. In comparison, current state-of-the-art approaches are incapable of dealing with glare.

The rain veiling effect can drastically reduce the visibility of movies in tropical areas like Singapore, where heavy rain is prevalent. The researchers adopted a frame alignment method in the second investigation, which allows them to collect superior visual information without being influenced by rain streaks that appear randomly in different frames and degrade image quality.

Many computer vision systems like automatic surveillance and self-driving cars, rely on clear visibility of the input videos to work well. For instance, self-driving cars cannot work robustly in heavy rain and CCTV automatic surveillance systems often fail at night, particularly if the scenes are dark or there is significant glare or floodlights.

Assoc Prof Tan

They then employed a moving camera and depth estimation to remove the rain veiling effect generated by accumulated rain droplets. Unlike previous approaches, which focused just on removing rain streaks, the new ways can erase both rain streaks and the rain veiling effect simultaneously.

3D Human Pose Estimation: Tackling inaccuracy caused by overlapping, multiple humans in videos

Assoc Prof Tan also presented his team’s study on 3D human posture estimation at the CVPR conference, which has applications in video surveillance, video gaming, and sports broadcasting.

Researchers and developers have been progressively focusing on 3D multi-person posture estimation from a monocular video (video obtained from a single camera) in recent years. Monocular videos, rather of employing numerous cameras to capture footage from various locations, offer greater versatility because they may be captured with a single, regular camera, even a mobile phone camera.

However, significant activity, i.e. several humans in the same scene, has an impact on human detection accuracy, especially when individuals are interacting together or appear to be overlapping in the monocular video.

In this third study, the researchers combine two current methods to estimate 3D human poses from a video, namely a top-down approach and a bottom-up one. The novel method can offer more reliable pose estimation in multi-person environments and handle distance between individuals (or scale fluctuations) more robustly by integrating the two ways.

Members of Assoc Prof Tan’s team from the NUS Department of Electrical and Computer Engineering, where he holds a dual appointment, as well as partners from City University of Hong Kong, ETH Zurich, and Tencent Game AI Research Center, were involved in the three investigations. His lab specializes on computer vision and deep learning research, notably in the areas of low-level vision, human pose and motion analysis, and deep learning applications in healthcare.

“As a next step in our 3D human pose estimation research, which is supported by the National Research Foundation, we will be looking at how to protect the privacy information of the videos. For the visibility enhancement methods, we strive to contribute to advancements in the field of computer vision, as they are critical to many applications that can affect our daily lives, such as enabling self-driving cars to work better in adverse weather conditions,” said Assoc Prof Tan.