Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Formulates machine learning mechanism as bi-level optimization problem
  • Inner level optimization loop minimizes loss function on training data
  • Outer level optimization loop maximizes performance metric on validation data
  • Entails model engineering, experiment tracking, dataset versioning, etc.
  • Automated via AutoML or left to intuition of ML students, engineers, researchers
  • Need to reduce computational cost and carbon footprint of developing AI algorithms
  • Considers supervised, semi-supervised, self-supervised, unsupervised, etc.
  • Surfaces open problems in the field

Paper Content

Introduction

  • Data is the source code for machine learning
  • Model is written to fit the data and trained
  • Model is evaluated using performance metric
  • Iteration step is repeated to find most performant solution
  • Model is tested on test data before production
  • Iteration step is computationally expensive and reduces carbon footprint

Background

  • Trying to solve bi-level optimization problems
  • Hyper-parameters of the model include learning rate, depth and width of neural networks, normalization layers, convolutional kernel sizes, dropout, etc.
  • Performance metric is a discrete and non-differentiable function of hyper-parameters (e.g. accuracy)
  • Loss function is denoted by L train, model parameters by w
  • Solutions to the problem are complex and case specific
  • Solutions include intuition, tools, grid search, random search, Bayesian optimization, reinforcement learning, evolutionary algorithms
  • Computational cost is high, carbon footprint is high
  • Trade off computation for more memory consumption
  • Approximate validation function with differentiable function
  • Compromise between automated and expert-in-the-loop solutions
  • Performance metric pursues multiple competing objectives
  • Balancing trade-off between optimizing one objective versus the other is an open problem
  • Data augmentation policies could be part of the search space

Open problems

  • Iteration step is a bi-level optimization problem
  • Increasing complexity from supervised to unsupervised learning
  • Exposes open challenges in the field

Supervised learning

  • Most progress in AutoML is in supervised learning, particularly image classification
  • Performance metric is usually accuracy, or a differentiable approximation of it
  • Could investigate if techniques work with other performance metrics, such as precision, recall, F1 score, calibration
  • Could investigate if techniques are memory efficient
  • Could investigate if techniques generalize to architectures such as transformers or multi-layer perceptrons
  • Could investigate if data augmentation strategies such as mix-up and cut-mix can be discovered
  • Could investigate if automating the discovery of learning rate schedules is possible
  • Could investigate if automating the process of making pre-trained models smaller is possible
  • Could investigate the impact of the iteration process on vulnerability to adversarial and backdoor attacks
  • Could investigate the impact of the iteration process on explainability
  • Could investigate the impact of the iteration process on transferability of learned features to downstream tasks
  • Could investigate the impact of the iteration process on semantic segmentation
  • Could investigate the impact of the iteration process on creative tasks such as super-resolution, denoising, colorization, and style transfer
  • Could investigate the impact of the iteration process on human pose estimation

Beyond supervised learning

  • Moving beyond supervised learning to semi-supervised, self-supervised, unsupervised, fewshot, federated, reinforcement, and physics-informed learning
  • Natural Language Processing: using pre-trained large language models on unlabeled text data, fine-tuning on downstream tasks, using perplexity as a measure of the goodness of language models
  • Multimodal Learning: translating images to text, text to image synthesis, visual question answering, training large language models on both text and images
  • Generative Networks: maximizing likelihood or minimizing distance/divergence between training data and model predictions, using Inception score or Fréchet Inception Distance to guide iteration process
  • Domain Adaptation: minimizing risk of making errors on target data, using unsupervised hyper-parameter selection techniques
  • Few-shot Learning: handling new labels given few observations, using N-way K-shot classification accuracy to guide iteration phase

Concluding remarks

  • AI evangelizes automation
  • AI algorithms generate insights and predictions without human intervention
  • Iteration process involves model engineering and management, experiment tracking, dataset versioning and augmentation
  • Iteration process is typically carried out by highly-trained specialists
  • AutoML can streamline part of the work
  • AutoML has demonstrated promise in solving simple supervised learning problems, in particular (image) classification