Image Coding for Machines via Feature-Preserving RDO
Optimizing image compression for machine analysis, not only human perception
At a glance
Many images and videos today are primarily processed by computer vision algorithms, with humans inspecting only occasionally. Feature-preserving RDO provides a method to compress such content efficiently while preserving the features that downstream vision models rely on.
Overview
Purpose
FP-RDO adapts standard codecs for remote inference by optimizing for machine vision tasks by preserving task-relevant features during encoding.
Approach
Block-wise approximation of feature distances using input-dependent squared error (IDSE), compatible with existing encoders.
Key Features
- ✓ Compatible with standard codecs (AVC, HEVC).
- ✓ Simplifies to per-pixel importance during RDO.
- ✓ Low additional encoder complexity (~7.86%).
Applications
Useful for distributed vision systems, IoT devices, or any application where machine analysis dominates human viewing.
Coding for Machines: Different setups
Setup 1: Compressing the original content is unnecessary since it is not transmitted.
- For a single task: Local inference (running the model at the transmitter and sending only the results) is optimal.
- For multiple tasks sharing the same features: Compressing and transmitting the features (FCM) is more efficient.
Setup 2: Content must be transmitted for potential human viewing, in addition to machine analysis.
- For a single task: Compress the image preserving task-relevant features using our setup.
- For multiple tasks sharing the same features: Compress the image preserving common features using our setup.
Input-Dependent Squared Error (IDSE)
The core idea behind FP-RDO is to minimize the feature distance between the original and compressed images, ensuring that machine-relevant features are preserved for downstream tasks.
By applying a Taylor expansion to the feature extractor, the feature distance can be approximated as a quadratic form involving the Jacobian of the neural network. This linearization makes the metric tractable for optimization.
The resulting Input-Dependent Squared Error (IDSE) can be computed blockwise, allowing integration with standard block-based codecs. This enables efficient rate-distortion optimization that preserves features critical for machine analysis, without requiring changes to the decoder.
Results
As an example, we study object detection and instance segmentation using Mask R-CNN. Our feature extractor is a FPN with ResNet-50 backbone, pre-trained on MS COCO, which we denote as RPN-FE(50).
Importance map and block-wise quantization step variation. IDSE defines a per-pixel importance map for feature preservation. FP-RDO allocates bits according to this feature importance, preserving regions critical for downstream tasks.
Rate-distortion curves: Up to 17% reduction in bitrate compared to standard SSE-RDO for the same task accuracy across multiple datasets and codecs.
Computational Overhead: FP-RDO adds minimal encoder complexity and does not affect the decoder.
In our experiments, the average encoder runtime overhead is approximately 7.9% compared to standard (SSE-based) rate-distortion optimization.
Cite
@article{fernandez2025image,
title={Image Coding for Machines via Feature-Preserving Rate-Distortion Optimization},
author={Samuel Fernández Menduiña, Eduardo Pavez, Antonio Ortega},
journal={IEEE Transactions on Multimedia},
year={2025},
url=https://sf219.github.io/TMM_CfM/
}
Contact
Samuel Fernández Menduiña
University of Southern California
samuelf9@usc.edu
sf219.github.io