Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Components used in satellites have become smaller, making them more widely available.
- Smaller organizations can now deploy satellites with data-intensive applications.
- Image analysis is a popular application used on satellites.
- Resource-constrained nature of devices on satellites creates challenges.
- This paper investigates performance of edge devices for deep-learning-based image processing in space.
- Hardware accelerators are necessary to meet latency requirements.
- State-of-the-art edge devices with GPUs have high power draw, making them unsuitable for satellites.
Paper Content
Introduction
- Innovation in real-world satellite applications was only available to large countries
- Components of satellites were reduced in size and cost of manufacturing and deployment was reduced
- CubeSat class of miniature satellites was introduced, based on 10cm cube
- Satellites can now be owned by small or private organizations
- CubeSats pose new complex challenges such as power and thermal constraints
- Satellites perform resource-intensive tasks
- Images must be compressed or filtered by quality and areas of interest
- Paper is a step towards understanding requirements and limitations of deep-learning-based image filtering systems on small satellites
- Characterize requirements and constraints of an image processing unit on a CubeSat
- Compare performance of multiple edge devices on different scenarios and their suitability for deployment on a satellite
Background
- Introduces DISCO project
- Surveys related work targeting data processing in satellites
Disco
- DISCO is a collaboration between four universities in Denmark to launch student Cubesats into Low Earth Orbit.
- DISCO2 is designed to carry an Earth imaging payload and will support research in Greenland.
- Payload includes one or more high resolution cameras and a dedicated IPU for machine learning applications.
- Students can carry out conventional on-satellite image processing and machine learning applications.
Related work
- Low bandwidth on small satellites requires data post-processing
- Neural-network-based filtering has been studied for this purpose
- GPUs, ASICs, and TPUs have been tested for suitability
- Intel’s Movidius Myriad VPU has seen real-life deployment
- TPUs have only been explored in one work, with no real-life deployments
- DISCO project would be the first satellite to leverage TPU for onboard deep learning applications
Requirements
- DISCO2 Arctic imaging mission use case is the basis for requirements and constraints
- Power and mass constraints of edge device based on DISCO2 engineering design
- 3U Cubesat with off-the-shelf modules for attitude control, power, communications, and flight control
- Payload capacity is 1U (10x10x10cm) and 1.3kg, including cameras, module enclosure, optics, and image processing unit (IPU)
- Polar orbit at 550 km altitude
- Camera sensor is Alvium 1800 C-2040 and lens focal length is maximum
- 50% overlap between images, 4.42s between consecutive images
- 3 image processing scenarios with different levels of difficulty
Methodology and setup
- Goal is to characterize performance of low-power hardware
- Experimental methodology and setup used to achieve goal
Devices under test
- Three devices chosen based on physical dimensions, weight, power draw, and performance
- ARM Cortex-M Microcontroller has potential for lowest power draw
- Extensive flight history
- TensorFlow Lite for Microcontrollers used for neural network inference
- No need for operating system
- Specialized neural network kernels supported
- X-CUBE-AI framework for deployment of neural networks
- Quantized model used for better latency and lower memory footprint
- NVIDIA Jetson Nano supports TensorFlow Core
- CoralAI TPU is an ASIC AI accelerator
- Compiled model must be quantized to 8-bit integer
Metrics
- Evaluate suitability of devices as onsatellite IPUs based on requirements
- Latency measured in seconds per inference of sample
- Nominal power draw measured in mW multiplied by duty cycle
- Peak power draw measured in mW
- Power consumption reported per inference of sample
- Accuracy of deployed model measured to show effect of quantization
Workload
- Used an image classification workload with a 5-class classification problem
- Used MobileNetV1 model pre-trained on ImageNet dataset
- Fine-tuned model with Flowers dataset
- Pixel values rescaled to range of [0, 1] for Jetson Nano
- Used tiling method to divide images into 400 patches of size 224 x 224
- Reported results as average of 10 inferences on full 4512 x 4512 pixel image
- Tested devices with scaling factors 0.25, 0.5, and 1.0, except Cortex-M7 which couldn’t fit larger models in memory
Results
- Analysis of latency and power draw with respect to 3 scenarios
- Impact of quantization on accuracy of models with various scaling factors
- Impact of batch size on performance of NVIDIA Jetson Nano
Scenarios
- Table 5 shows latency and power consumption of devices using different scaling factors
- Tables 6 and 7 show nominal and peak power draw of different configurations respectively
Scenario 1:
- Real-time imaging requires a high degree of specialization
- 4.42s latency is required for 50% overlap
- No passive periods, latency must be lower than imaging
- TPU configurations fit into nominal power budget
- Relaxing constraints allows for higher latency and lower power draw
- TPU performs inference more efficiently than other devices
Effect of quantization and scaling factor
- Model accuracy with various scaling factors before and after quantization to 8-bit integer shown in Table 8
- Models with scaling factors 0.5 and 1.0 show no significant change in predictive performance
- Model with smallest scaling factor shows 3% drop in accuracy, but difference is acceptable
- Benefits of lower memory footprint, latency, and higher overall efficiency outweigh drop in accuracy
Batch size impact on nvidia jetson nano
- NVIDIA Jetson Nano is the only device that allows inference with a batch size higher than 1.
- Increasing the batch size can lead to 45.3-74.6% lower latency and 41.7-64.8% lower power consumption.
Discussion
- TPU is the only device that can fulfill the requirements for real-time imaging
- ARM Cortex-M7 microcontroller can only fulfill the least constrained scenario
- NVIDIA Jetson Nano can fulfill the requirements of the realtime imaging scenario but exceeds the power budget
- GPU architecture can perform massively parallel operations but is inefficient
Conclusion
- Three devices were tested for image analysis scenarios on a satellite
- Specialized hardware architectures are needed to meet the low latency and budget requirements
- CoralAI TPU was the only device that fulfilled the requirements for all scenarios
- NVIDIA Jetson Nano could match CoralAI TPU’s performance but at a higher power draw
- ARM Cortex-M7 could only fulfill the requirements of the least constrained scenario
- Results included latency, power consumption, nominal power draw, peak power draw, and accuracy