Research

Research Statement

I design resource-efficient multimodal machine learning systems for real-world health sensing. My research focuses on enabling accurate, robust, and interpretable inference under strict on-device constraints (limited compute, power, and sensing bandwidth).

My work lies at the intersection of machine learning, embedded systems, and mobile health (mHealth), with three primary directions:

On-device multimodal perception: developing adaptive sensing pipelines that integrate heterogeneous data streams (e.g., thermal, RGB/depth, IMU, physiological signals) while minimizing unnecessary computation.
Efficient model design and deployment: leveraging techniques such as neural architecture search (NAS), lightweight deep models, and conditional computation to enable real-time inference on resource-constrained devices.
Robust real-world modeling: building systems that generalize across users, environments, and sensing conditions, with an emphasis on interpretability and reliability in longitudinal deployments.

I have applied these principles to problems including eating behavior understanding, energy expenditure modeling, indoor localization, and stress monitoring, with a consistent focus on end-to-end system design—from sensing to inference to deployment.

Ongoing Projects

Multi-Stage Thermal-Triggered VLM Framework for On-Device Eating Detection and Caloric Estimation

A resource-aware multimodal sensing system that introduces a thermal-triggered gating mechanism to selectively activate high-cost sensors (RGB/depth). The pipeline combines temperature-based filtering, connected-component analysis, and spatial constraints for event detection, followed by a NAS-optimized vision-language model and depth-based volumetric reconstruction for caloric estimation. Designed for real-time, on-device deployment under strict power and compute budgets.

Thermal-triggered sensing: Designed a low-cost event detection module using temperature thresholding, spatial clustering, and centroid constraints to gate RGB/depth sensing, significantly reducing unnecessary sensor activation.
Multimodal pipeline: Integrated thermal (MLX90640), RGB/depth, and IMU streams into a unified inference pipeline for eating detection and intake estimation.
Model optimization: Applied neural architecture search (NAS) to design lightweight models optimized for on-device inference, balancing latency, accuracy, and energy consumption.
Caloric estimation: Implemented depth-based volumetric reconstruction to estimate food portion size and combined with classification outputs for kCal estimation.
System deployment: Built an end-to-end embedded system (ESP32 / edge device) supporting real-time inference, adaptive sensing, and efficient data logging under hardware constraints.

Wi-Fi Indoor Positioning via RSSI–FTM Fusion

A probabilistic localization framework that fuses RSSI fingerprinting and FTM ranging through Bayesian filtering, with explicit modeling of LOS/NLOS conditions. Incorporates a graph neural network–based correction module to improve cross-site generalization, reducing dependence on dense site-specific calibration.

Sensor fusion: Combined RSSI-based fingerprinting with FTM ranging using Bayesian filtering to improve localization robustness in multipath environments.
Feature engineering: Designed signal preprocessing and feature extraction pipelines to stabilize RSSI variance and improve spatial consistency.
GNN correction model: Developed a graph neural network-based refinement model to capture spatial dependencies and enhance cross-site generalization.
System evaluation: Built automated data collection and benchmarking workflows to ensure reproducible evaluation across different indoor environments.

Boyang Wei

Research Statement

Ongoing Projects

Multi-Stage Thermal-Triggered VLM Framework for On-Device Eating Detection and Caloric Estimation

Wi-Fi Indoor Positioning via RSSI–FTM Fusion