Image Coding for Machines via Feature-Preserving RDO

Optimizing image compression for machine analysis, not only human perception

University of Southern California

At a glance

Many images and videos today are primarily processed by computer vision algorithms, with humans inspecting only occasionally. Feature-preserving RDO provides a method to compress such content efficiently while preserving the features that downstream vision models rely on.

Overview

Purpose

FP-RDO adapts standard codecs for remote inference by optimizing for machine vision tasks by preserving task-relevant features during encoding.

Approach

Block-wise approximation of feature distances using input-dependent squared error (IDSE), compatible with existing encoders.

Key Features

  • Compatible with standard codecs (AVC, HEVC).
  • Simplifies to per-pixel importance during RDO.
  • Low additional encoder complexity (~7.86%).

Applications

Useful for distributed vision systems, IoT devices, or any application where machine analysis dominates human viewing.

Coding for Machines: Different setups

CfM Setup 1

Setup 1: Compressing the original content is unnecessary since it is not transmitted.

  • For a single task: Local inference (running the model at the transmitter and sending only the results) is optimal.
  • For multiple tasks sharing the same features: Compressing and transmitting the features (FCM) is more efficient.

CfM Setup 2

Setup 2: Content must be transmitted for potential human viewing, in addition to machine analysis.

  • For a single task: Compress the image preserving task-relevant features using our setup.
  • For multiple tasks sharing the same features: Compress the image preserving common features using our setup.

Input-Dependent Squared Error (IDSE)

The core idea behind FP-RDO is to minimize the feature distance between the original and compressed images, ensuring that machine-relevant features are preserved for downstream tasks.

Taylor Expansion of Feature Distance

By applying a Taylor expansion to the feature extractor, the feature distance can be approximated as a quadratic form involving the Jacobian of the neural network. This linearization makes the metric tractable for optimization.

Blockwise Computation of IDSE

The resulting Input-Dependent Squared Error (IDSE) can be computed blockwise, allowing integration with standard block-based codecs. This enables efficient rate-distortion optimization that preserves features critical for machine analysis, without requiring changes to the decoder.

Results

As an example, we study object detection and instance segmentation using Mask R-CNN. Our feature extractor is a FPN with ResNet-50 backbone, pre-trained on MS COCO, which we denote as RPN-FE(50).

Importance Map and Bit Allocations

Importance map and block-wise quantization step variation. IDSE defines a per-pixel importance map for feature preservation. FP-RDO allocates bits according to this feature importance, preserving regions critical for downstream tasks.

BD-Rate Results

Rate-distortion curves: Up to 17% reduction in bitrate compared to standard SSE-RDO for the same task accuracy across multiple datasets and codecs.

Computational Overhead

Computational Overhead: FP-RDO adds minimal encoder complexity and does not affect the decoder.


In our experiments, the average encoder runtime overhead is approximately 7.9% compared to standard (SSE-based) rate-distortion optimization.

Cite

@article{fernandez2025image,
  title={Image Coding for Machines via Feature-Preserving Rate-Distortion Optimization},
  author={Samuel Fernández Menduiña, Eduardo Pavez, Antonio Ortega},
  journal={IEEE Transactions on Multimedia},
  year={2025},
  url=https://sf219.github.io/TMM_CfM/
}

Contact

Samuel Fernández Menduiña
University of Southern California
samuelf9@usc.edu
sf219.github.io