Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Components used in satellites have become smaller, making them more widely available.
  • Smaller organizations can now deploy satellites with data-intensive applications.
  • Image analysis is a popular application used on satellites.
  • Resource-constrained nature of devices on satellites creates challenges.
  • This paper investigates performance of edge devices for deep-learning-based image processing in space.
  • Hardware accelerators are necessary to meet latency requirements.
  • State-of-the-art edge devices with GPUs have high power draw, making them unsuitable for satellites.

Paper Content

Introduction

  • Innovation in real-world satellite applications was only available to large countries
  • Components of satellites were reduced in size and cost of manufacturing and deployment was reduced
  • CubeSat class of miniature satellites was introduced, based on 10cm cube
  • Satellites can now be owned by small or private organizations
  • CubeSats pose new complex challenges such as power and thermal constraints
  • Satellites perform resource-intensive tasks
  • Images must be compressed or filtered by quality and areas of interest
  • Paper is a step towards understanding requirements and limitations of deep-learning-based image filtering systems on small satellites
  • Characterize requirements and constraints of an image processing unit on a CubeSat
  • Compare performance of multiple edge devices on different scenarios and their suitability for deployment on a satellite

Background

  • Introduces DISCO project
  • Surveys related work targeting data processing in satellites

Disco

  • DISCO is a collaboration between four universities in Denmark to launch student Cubesats into Low Earth Orbit.
  • DISCO2 is designed to carry an Earth imaging payload and will support research in Greenland.
  • Payload includes one or more high resolution cameras and a dedicated IPU for machine learning applications.
  • Students can carry out conventional on-satellite image processing and machine learning applications.
  • Low bandwidth on small satellites requires data post-processing
  • Neural-network-based filtering has been studied for this purpose
  • GPUs, ASICs, and TPUs have been tested for suitability
  • Intel’s Movidius Myriad VPU has seen real-life deployment
  • TPUs have only been explored in one work, with no real-life deployments
  • DISCO project would be the first satellite to leverage TPU for onboard deep learning applications

Requirements

  • DISCO2 Arctic imaging mission use case is the basis for requirements and constraints
  • Power and mass constraints of edge device based on DISCO2 engineering design
  • 3U Cubesat with off-the-shelf modules for attitude control, power, communications, and flight control
  • Payload capacity is 1U (10x10x10cm) and 1.3kg, including cameras, module enclosure, optics, and image processing unit (IPU)
  • Polar orbit at 550 km altitude
  • Camera sensor is Alvium 1800 C-2040 and lens focal length is maximum
  • 50% overlap between images, 4.42s between consecutive images
  • 3 image processing scenarios with different levels of difficulty

Methodology and setup

  • Goal is to characterize performance of low-power hardware
  • Experimental methodology and setup used to achieve goal

Devices under test

  • Three devices chosen based on physical dimensions, weight, power draw, and performance
  • ARM Cortex-M Microcontroller has potential for lowest power draw
  • Extensive flight history
  • TensorFlow Lite for Microcontrollers used for neural network inference
  • No need for operating system
  • Specialized neural network kernels supported
  • X-CUBE-AI framework for deployment of neural networks
  • Quantized model used for better latency and lower memory footprint
  • NVIDIA Jetson Nano supports TensorFlow Core
  • CoralAI TPU is an ASIC AI accelerator
  • Compiled model must be quantized to 8-bit integer

Metrics

  • Evaluate suitability of devices as onsatellite IPUs based on requirements
  • Latency measured in seconds per inference of sample
  • Nominal power draw measured in mW multiplied by duty cycle
  • Peak power draw measured in mW
  • Power consumption reported per inference of sample
  • Accuracy of deployed model measured to show effect of quantization

Workload

  • Used an image classification workload with a 5-class classification problem
  • Used MobileNetV1 model pre-trained on ImageNet dataset
  • Fine-tuned model with Flowers dataset
  • Pixel values rescaled to range of [0, 1] for Jetson Nano
  • Used tiling method to divide images into 400 patches of size 224 x 224
  • Reported results as average of 10 inferences on full 4512 x 4512 pixel image
  • Tested devices with scaling factors 0.25, 0.5, and 1.0, except Cortex-M7 which couldn’t fit larger models in memory

Results

  • Analysis of latency and power draw with respect to 3 scenarios
  • Impact of quantization on accuracy of models with various scaling factors
  • Impact of batch size on performance of NVIDIA Jetson Nano

Scenarios

  • Table 5 shows latency and power consumption of devices using different scaling factors
  • Tables 6 and 7 show nominal and peak power draw of different configurations respectively

Scenario 1:

  • Real-time imaging requires a high degree of specialization
  • 4.42s latency is required for 50% overlap
  • No passive periods, latency must be lower than imaging
  • TPU configurations fit into nominal power budget
  • Relaxing constraints allows for higher latency and lower power draw
  • TPU performs inference more efficiently than other devices

Effect of quantization and scaling factor

  • Model accuracy with various scaling factors before and after quantization to 8-bit integer shown in Table 8
  • Models with scaling factors 0.5 and 1.0 show no significant change in predictive performance
  • Model with smallest scaling factor shows 3% drop in accuracy, but difference is acceptable
  • Benefits of lower memory footprint, latency, and higher overall efficiency outweigh drop in accuracy

Batch size impact on nvidia jetson nano

  • NVIDIA Jetson Nano is the only device that allows inference with a batch size higher than 1.
  • Increasing the batch size can lead to 45.3-74.6% lower latency and 41.7-64.8% lower power consumption.

Discussion

  • TPU is the only device that can fulfill the requirements for real-time imaging
  • ARM Cortex-M7 microcontroller can only fulfill the least constrained scenario
  • NVIDIA Jetson Nano can fulfill the requirements of the realtime imaging scenario but exceeds the power budget
  • GPU architecture can perform massively parallel operations but is inefficient

Conclusion

  • Three devices were tested for image analysis scenarios on a satellite
  • Specialized hardware architectures are needed to meet the low latency and budget requirements
  • CoralAI TPU was the only device that fulfilled the requirements for all scenarios
  • NVIDIA Jetson Nano could match CoralAI TPU’s performance but at a higher power draw
  • ARM Cortex-M7 could only fulfill the requirements of the least constrained scenario
  • Results included latency, power consumption, nominal power draw, peak power draw, and accuracy