Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Formulates machine learning mechanism as bi-level optimization problem
- Inner level optimization loop minimizes loss function on training data
- Outer level optimization loop maximizes performance metric on validation data
- Entails model engineering, experiment tracking, dataset versioning, etc.
- Automated via AutoML or left to intuition of ML students, engineers, researchers
- Need to reduce computational cost and carbon footprint of developing AI algorithms
- Considers supervised, semi-supervised, self-supervised, unsupervised, etc.
- Surfaces open problems in the field
Paper Content
Introduction
- Data is the source code for machine learning
- Model is written to fit the data and trained
- Model is evaluated using performance metric
- Iteration step is repeated to find most performant solution
- Model is tested on test data before production
- Iteration step is computationally expensive and reduces carbon footprint
Background
- Trying to solve bi-level optimization problems
- Hyper-parameters of the model include learning rate, depth and width of neural networks, normalization layers, convolutional kernel sizes, dropout, etc.
- Performance metric is a discrete and non-differentiable function of hyper-parameters (e.g. accuracy)
- Loss function is denoted by L train, model parameters by w
- Solutions to the problem are complex and case specific
- Solutions include intuition, tools, grid search, random search, Bayesian optimization, reinforcement learning, evolutionary algorithms
- Computational cost is high, carbon footprint is high
- Trade off computation for more memory consumption
- Approximate validation function with differentiable function
- Compromise between automated and expert-in-the-loop solutions
- Performance metric pursues multiple competing objectives
- Balancing trade-off between optimizing one objective versus the other is an open problem
- Data augmentation policies could be part of the search space
Open problems
- Iteration step is a bi-level optimization problem
- Increasing complexity from supervised to unsupervised learning
- Exposes open challenges in the field
Supervised learning
- Most progress in AutoML is in supervised learning, particularly image classification
- Performance metric is usually accuracy, or a differentiable approximation of it
- Could investigate if techniques work with other performance metrics, such as precision, recall, F1 score, calibration
- Could investigate if techniques are memory efficient
- Could investigate if techniques generalize to architectures such as transformers or multi-layer perceptrons
- Could investigate if data augmentation strategies such as mix-up and cut-mix can be discovered
- Could investigate if automating the discovery of learning rate schedules is possible
- Could investigate if automating the process of making pre-trained models smaller is possible
- Could investigate the impact of the iteration process on vulnerability to adversarial and backdoor attacks
- Could investigate the impact of the iteration process on explainability
- Could investigate the impact of the iteration process on transferability of learned features to downstream tasks
- Could investigate the impact of the iteration process on semantic segmentation
- Could investigate the impact of the iteration process on creative tasks such as super-resolution, denoising, colorization, and style transfer
- Could investigate the impact of the iteration process on human pose estimation
Beyond supervised learning
- Moving beyond supervised learning to semi-supervised, self-supervised, unsupervised, fewshot, federated, reinforcement, and physics-informed learning
- Natural Language Processing: using pre-trained large language models on unlabeled text data, fine-tuning on downstream tasks, using perplexity as a measure of the goodness of language models
- Multimodal Learning: translating images to text, text to image synthesis, visual question answering, training large language models on both text and images
- Generative Networks: maximizing likelihood or minimizing distance/divergence between training data and model predictions, using Inception score or Fréchet Inception Distance to guide iteration process
- Domain Adaptation: minimizing risk of making errors on target data, using unsupervised hyper-parameter selection techniques
- Few-shot Learning: handling new labels given few observations, using N-way K-shot classification accuracy to guide iteration phase
Concluding remarks
- AI evangelizes automation
- AI algorithms generate insights and predictions without human intervention
- Iteration process involves model engineering and management, experiment tracking, dataset versioning and augmentation
- Iteration process is typically carried out by highly-trained specialists
- AutoML can streamline part of the work
- AutoML has demonstrated promise in solving simple supervised learning problems, in particular (image) classification