PathSense

This study addresses a critical gap in computer vision: while most monocular depth estimation models (like Depth Anything) focus on long-range accuracy (up to 80m) for autonomous vehicles, pedestrian navigation requires high precision in the immediate 0–3 meter range using low-power hardware like a Raspberry Pi 4.

The researchers developed a custom dataset and a U-Net-based architecture optimized for close-range obstacle detection with minimal computational overhead.

‍

The Challenge: Close-Range Precision

Standard models treat all distances with equal importance. However, for a person navigating a sidewalk or a room, an error of 20cm at a distance of 1 meter is far more dangerous than the same error at 50 meters.

Key Problems:

Resource Constraints: Mobile and assistive devices lack high-end GPUs.
Depth Ambiguity: Standard datasets (KITTI, NYU Depth) lack the specific urban pedestrian contexts of Southeast Asia.

‍

Custom Dataset & Preprocessing

The team collected 21,777 image-depth pairs using an OAK-D Pro W stereo camera.

The Rolling Average Filter:

Because stereo cameras often produce "holes" (invalid pixels) where disparity cannot be calculated, the team implemented a Rolling Average Filter.

It fills missing data using local and global means.
It downsamples the depth map by ~99% (to 40×54 pixels). This sounds drastic, but it preserves the "semantic structure" (where the obstacles are) while making the model incredibly fast to train and run.

‍

Optimized Model Architecture

The model uses a U-Net-style encoder-decoder design. It takes a 240×320 RGB image and predicts a metric depth map.

Encoder Path: Four blocks extract hierarchical features.
Skip Connections: These are vital. They pass fine-grained spatial data directly from the encoder to the decoder, ensuring that the boundaries of nearby objects (like a chair leg or a curb) remain sharp.
Softplus Activation: Instead of a standard Sigmoid, the model uses Softplus to ensure all predicted depth values are naturally positive ($0$ to $\infty$).

‍

The Secret Sauce: Exponential Loss

The breakthrough in this model is the Close-Range Optimized Loss Function. It uses an exponential decay constant ($\tau = 0.5m$) to penalize errors.

$$L_{close} = \text{mean}(w_{dist} \odot |p_{close} - t_{close}|)$$

$$w_{dist} = \exp(-t_{close}/\tau)$$

The Result of this Math:

An error at 0.5m is penalized 20x more heavily than an error at 2.0m.
This forces the AI to be "obsessed" with the accuracy of objects closest to the user.

‍

Performance Results

The model achieves high reliability in the critical "danger zone" for pedestrians:

Metric

Result (0–3m Range)

Mean Absolute Error (MAE)

0.32 meters

Median Error

0.1725 meters

Hardware Target

Raspberry Pi 4

Inference Speed

Real-time capable

‍

Qualitative Success: The model produces depth maps that clearly delineate obstacles like walls, stairs, and people, providing enough spatial awareness for an assistive navigation system to trigger haptic or audio alerts.

‍

YEAR 3 PathSense Poster

‍

https://drive.google.com/drive/folders/1PtnxBrBYGJQgA9PnYTTpmpxySxlr8YqL?usp=drive_link

‍