Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

State-of-the-art approaches for weather and climate modeling are based on physics-informed numerical models.
Data-driven approaches based on machine learning aim to directly solve a downstream forecasting or projection task.
These networks are trained using curated and homogeneous climate datasets.
ClimaX is a flexible and generalizable deep learning model for weather and climate science.
ClimaX is pre-trained with a self-supervised learning objective on climate datasets.
ClimaX can be fine-tuned to address a breadth of climate and weather tasks.
ClimaX results in superior performance on benchmarks for weather forecasting and climate projections.

Paper Content

Introduction

Modeling weather and climate is a challenge for science and society.
Numerical methods for global modeling of weather and climate are parameterized via general circulation models (GCMs).
GCMs have limitations in simulating atmospheric variables quickly at short time scales or accurately at long time scales.
Data-driven approaches for forecasting of atmospheric variables have been rising.
Deep neural networks are trained to predict target atmospheric variables using historical global datasets.
ML models trained to solve a single task using supervised learning are label-hungry and brittle when deployed outside their training distribution.
Pretraining large unsupervised “foundation” models on huge passive datasets can mitigate the supervision bottleneck.
ClimaX is proposed as a foundation model for weather and climate.
ClimaX is pretrained on a large dataset using an unsupervised objective.
ClimaX uses a vision transformer and a randomized forecasting objective.
ClimaX is benchmarked and is state-of-the-art on ClimateBench.
ClimaX can scale using heterogeneous climate datasets during pretraining.

Current weather and climate models rely on numerical methods and computational simulations.
Primitive equations are at the core of both weather and climate models.
Earth system models (ESM) are used for climate modeling.
Numerical Weather Prediction (NWP) models share components with GCMs.

Data sources

Weather and climate data is not solely based on sensed data, but incorporates information from a range of sources.
Data measurements are heterogeneous, representing various physical variables with different data types.
Data sources span multiple axes, from direct weather measurements to physics-informed climate projections.

Era5

ERA5 reanalysis archive is the main data source for weather forecasting systems
ERA5 reanalysis is a detailed record of global atmosphere, land surface and ocean waves from 1950 onwards
ERA5 reanalysis combines forecasting model with available observations
ERA5 reanalysis data is huge: 40 years, 0.25°× 0.25° grid, hourly intervals, 37 altitude levels

Tasks

Machine learning is being used for weather and climate modeling tasks.
Global forecasting tasks range from a few hours to days and weeks.
Evaluation is done on the ERA5 reanalysis dataset with Operational IFS of ECMWF being the current state-of-the-art NWP baselines.
ClimateBench is a benchmark dataset providing an evaluation framework for machine learning models to improve accuracy of climate projections.

Foundation models

Bommasani et al. introduced the term “foundation models” to refer to deep learning models trained on broad data via self-supervision
Examples of foundation models include BERT, GPT and PaLM
Foundation models have been applied to data from web and scientific domains like protein design
Key significance of foundation models is emergence of model capabilities and homogenization of methodologies for different tasks, domains and modalities
Current research in weather and climate science and ML has focused on designing separate models for every task, but recent works have proposed pretraining techniques for satellite imagery and remote sensing

Approach

Aim to build a generalizable deep learning foundation model
Model needs to be able to input heterogeneous datasets of different variables
Model needs to provide spatio-temporal coverage based on physical groundings

Input representation

Model takes an input of shape × × and predicts an output of shape ′ × ′ × ′
refers to the number of input variables, such as weather conditions and climate forcing factors
and refer to the spatial resolution of the input data
′ , ′ , ′ refer to the variables and spatial resolution of the predicted outputs
Mainly work with two spatial resolutions: 5.625°(32 × 64 grid points) and 1.40625°(128 × 256 grid points)

Model architecture

Aim to design a foundation model that can be pretrained on heterogeneous data sources and finetuned to solve various downstream weather and climate tasks.
Tasks can be thought of as image-to-image translation problems with input and output channels.
Image architectures such as UNet, ResNet, and Vision Transformers (ViT) are natural fits.
Climate and weather tasks require more flexibility than current CNN-based architectures can provide.
Build ClimaX architecture upon Vision Transformers (ViT).
Propose two major architectural changes: variable tokenization and variable aggregation.
Variable tokenization tokenizes each variable in the input separately.
Variable aggregation performs a cross-attention operation for each spatial position.
Use a standard Vision Transformer (ViT) for generating output tokens.

Datasets

Pretraining

ClimaX is a computer model used to predict future weather conditions given current conditions.
During pretraining, the lead time (how far into the future the model is predicting) is randomized from 6 hours to 168 hours (1 week).
The lead time is added to the tokens to inform the model of how long it is forecasting.
The ERA5 reanalysis data is used for finetuning and evaluation for various weather related downstream tasks.

Finetuning

ClimaX has four learnable components
Evaluate performance of ClimaX on various downstream tasks
Downstream tasks categorized into two finetuning scenarios
First scenario: finetune entire model
Second scenario: replace embedding layers and prediction head with newly initialized networks, finetune or freeze other two components

Experiments

Evaluate performance and generality of ClimaX on downstream tasks
Compare ClimaX performance to current state-of-the-art NWP system
Analyze scaling property of ClimaX with increasing data size, model capacity, and data resolution
Perform ablation studies to understand trade-off between computation and performance

Neural baselines

Compared ClimaX with IFS, the current gold standard in weather forecasting
Compared with UNet and ResNet, two CNN baselines commonly used in vision tasks
Borrowed ResNet architecture from Weatherbench

Forecasting

We want to forecast the weather at a future time given global weather conditions.
There are 48 input variables in total.
We are predicting four target variables: geopotential at 500hPa, temperature at 850hPa, temperature at 2 meters from the ground, and zonal wind speed at 10 meters from the ground.
We are considering seven lead times: 6 hours, 1, 3, 5, 7 days, 2 weeks, and 1 month.
We are comparing ClimaX with IFS and two CNN baselines on the ERA5 dataset at 5.625°and 1.40625°resolutions.
We are using latitude-weighted MSE loss and evaluating the best checkpoint on the test set.
We are comparing all methods on latitude-weighted root mean squared error (RMSE) and latitude-weighted anomaly correlation coefficient (ACC).
We are evaluating ClimaX on regional forecasting of relevant variables in North America.
We are comparing ClimaX with two CNN baselines and the scratch-trained version of ClimaX.
We are evaluating ClimaX on S2S prediction of four target variables: T850, T2m, U10, and V10.
ClimaX achieves the lowest error for all variables.

Climate projection

ClimaX is tested on ClimateBench, a benchmark designed for testing machine learning models for climate projections
Goal is to predict annual mean global distributions of surface temperature, diurnal temperature range, precipitation, and the 90th percentile of precipitation
Input variables are four anthropogenic forcing factors: carbon dioxide (CO 2 ), sulfur dioxide (SO 2 ), black carbon (BC), and methane (CH 4 )
Finetuning pipeline of ClimaX for ClimateBench includes replacing pretrained embedding layers and prediction heads with newly initialized networks, while keeping the attention layers and the variable aggregation module
Two finetuning protocols are considered: freezing or finetuning the attention layers
Two components are added to the pipeline: history of the preceding ten years of the forcing factors and global average pooling layer
Mean squared error is used as the loss function
Results show that ClimaX frozen performs the best in predicting two temperature-related variables, followed by ClimaX

Climate model downscaling

Climate models are often run at coarse grids due to their high computational cost.
Downscaling aims to obtain higher-resolution projections and reduce biases from the outputs of these models.
ClimaX achieves the lowest RMSE and a mean bias closest to 0 for all three target variables.
ClimaX has successfully captured the spatial structure of weather data.

Scaling laws analysis

Transformers have favorable scaling properties
Performance improves with data size and model capacity
Figure 11 shows performance of ClimaX as a function of data size and model capacity
Error rate of two biggest models decreases with increasing data and model size
Larger models are more data efficient
High-resolution data contains finer details and local processes
Figure 12 compares performance of ClimaX on 5.625° and 1.40625° data
ClimaX-all-vars achieves comparable performance to ClimaX
ClimaX-iter works reasonably well up to 1-day prediction
ClimaX-cont performs competitively on 6-hour to 7-day forecasting
Finetuning cost scales linearly with number of target variables and lead times
ClimaX is most expensive, ClimaX-iter is cheapest
Performance is proportional to computational cost
Scaling has transformative impact in AI subdisciplines
ClimaX opens up new opportunities for scaling
Future research could explore incorporating observational and simulated datasets
Future research could explore single multi-scale architectures

Link to paper#

Abstract#

Paper Content#

Introduction#

Background and related work#

Data sources#

Era5#

Tasks#

Foundation models#

Approach#

Input representation#

Model architecture#

Datasets#

Pretraining#

Finetuning#

Experiments#

Neural baselines#

Forecasting#

Climate projection#

Climate model downscaling#

Scaling laws analysis#

Link to paper

Abstract

Paper Content

Introduction

Background and related work

Data sources

Era5

Tasks

Foundation models

Approach

Input representation

Model architecture

Datasets

Pretraining

Finetuning

Experiments

Neural baselines

Forecasting

Climate projection

Climate model downscaling

Scaling laws analysis