Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • TESS mission produces large amount of time series data
  • Deep learning techniques used to differentiate promising astrophysical eclipsing candidates from other phenomena
  • Dataset curated using manual review process and used to train neural network
  • Neural network achieves 99.6% recall and 75.7% precision
  • Neural network able to recover 3577 out of 4140 TOIs

Paper Content

Introduction

  • Human judgement has been used to detect exoplanets for 30 years
  • Exoplanets are hard to detect due to their size and faintness
  • Historically, humans have been used to classify planet signals as either false positives or viable planet candidates
  • Humans are slow and inconsistent when classifying planet signals
  • Machine learning has become a popular tool for identifying planet candidates
  • Astronet-Triage was used in the TESS Quick-Look Pipeline to triage planet candidates
  • Astronet-Triage-v2 was created to reduce the number of lost planet candidates while throwing out more false positives
  • Input transit signals and corresponding light curves were used for training and testing the classifier
  • Data was processed before being input to the neural network classifier
  • Neural network architecture and training process were described
  • Results of the classifier were quantified and presented
  • Implications of the results were discussed

Data

  • Used 25000 human vetted transit signals for training and testing model
  • Signals detected by Quick-Look Pipeline (QLP)

Tces from tess ffis

  • TESS collected full-frame images every 30 minutes for 2 years
  • FFI cadence was updated to 10 minutes for 1st Extended Mission
  • QLP produces light curves from images for targets in TIC with TESS-band magnitude brighter than 13.5
  • Flux time series extracted for each star from five different sized circular apertures
  • Low-frequency variability removed by dividing light curve from each orbit by basis spline
  • Detrended light curves merged with previous TESS sectors
  • Optimal aperture selected for target star based on TESS magnitude
  • BLS algorithm used to search for transit signals
  • Transit signals with signal-to-pink-noise > 9 and BLS peak significance > 5/9 filtered out
  • Signals with semi-major axis to stellar radius ratio < 1 labeled as inside the star

Assembling a set of signals to label

  • Labeling every TCE would take a lot of time
  • Three batches of labeled TCEs were collected from the first two years of TESS Primary Mission and the first year of the TESS 1st Extended Mission
  • 8992 TCEs were selected from Sector 13 for labeling
  • 13372 brightest TCEs were selected from Sectors 14-26
  • 2588 TCEs were added from Sectors 27-39
  • Final TCE distribution is shown in Figures 1 and 2

Labels and their definitions

  • Assigned one of five labels: E, S, B, J, N
  • E denotes periodic eclipsing signal (planetary transits and non-contact eclipsing binaries)
  • S denotes single transit or incorrect period
  • B denotes contact eclipsing binaries
  • J denotes junk (astrophysical and instrumental phenomena)
  • N denotes not sure

Labeling process

  • Labels assigned to targets based on human-visual representations
  • Targets with conflicting labels discussed to reach consensus
  • Weights assigned to labels with only B, J, or N votes
  • Process took over 2 years
  • Majority of labels are J
  • Comparable amount of signals identified as eclipsing objects (E) and contact binaries (B)
  • Majority of TCEs with period smaller than 0.5 days not caused by eclipses
  • Majority of shallow events with period longer than 10 days not caused by eclipses
  • Clear pile-up of TCEs at TESS orbital period and its alias not caused by eclipses
  • Majority of TCEs with extremely short/long transit duration not caused by eclipses

Model input representations

  • Pass raw flux time series to neural network
  • Pass relevant information about detected periodic signal and target star to neural network

Time series data

  • Preprocess raw flux time series
  • Mask out transit signals
  • Use multiple detrending settings
  • Generate 7 different plots/views
  • Bin data with robust binning technique
  • Normalize binned data
  • Global View: full light curve folded on reported period
  • Local View: points within two transit durations of transit center
  • Secondary View: most significant secondary transit
  • Local Half-Period View: folded at half detected period
  • Global Double Period View: folded at twice period of global view

Scalar data

  • Uses scalar values to describe transit, host star and light curve
  • Transit features include period, duration, depth and number of full periods
  • Host star features include TESS magnitude, mass and radius
  • Estimate radius using distance, apparent magnitude and color/temperature/bolometric corrections
  • Light curve features include total number of points
  • Normalize scalar values to zero mean and unit variance, except for number of full periods which is truncated and log-scaled
  • Include detected phase of secondary eclipse and scaling factor when normalizing views

Neural network architecture

  • Model uses convolutional neural network architecture from Astronet
  • Features grouped together and passed through convolutional tower
  • Convolutional tower consists of convolutional layers with ReLU activation and pooling layers
  • Output of each tower flattened into vector shape
  • Flattened outputs concatenated with auxiliary inputs to form input for fully-connected tower

Training

  • Trained model using Adam optimization for 20,000 steps
  • Binary cross-entropy loss used as loss function
  • Model produces independent scores for each label
  • Weight of label determined by majority of votes

Prediction and ensembling

  • Model outputs prediction score for each label
  • If “E” label score exceeds threshold, model predicts “E”
  • Ensemble of 10 models constructed, if any predict “E” then ensemble prediction is “E”
  • Otherwise, ensemble prediction is majority of models, ties broken at random
  • Primarily interested in “E” label, other labels used to encourage network to learn natural representations

Results

  • Used precision and recall to evaluate performance
  • Precision is number of true positives divided by true positives and false positives
  • Recall is number of true positives divided by true positives and false negatives
  • High precision means fewer false positives, high recall means successful recovery of more planet candidates

Performance on validation and test sets

  • Obtained AUC-PR value of 0.977 on validation dataset
  • Obtained AUC-PR value of 0.965 on test set
  • 100% recall at 41% precision with prediction threshold of 0.0105 on validation set
  • 96.9% recall at 79.8% precision with prediction threshold of 0.215 on validation set
  • 100% recall at 15% precision with prediction threshold of 0.0005 on test set
  • 99.6% recall at 39.7% precision with prediction threshold of 0.0105 on test set
  • 97.2% recall at 75.7% precision with prediction threshold of 0.215 on test set

Generalizing to tess 1st extended mission data

  • Astronet-Triage-v2 is trained on previously observed sectors to classify new observations taken by TESS
  • 90% of training dataset comes from TESS Primary Mission, QLP data from TESS 1st Extended Mission used to test generalization
  • Random sample of 759 targets with T mag < 11 from camera 1 and 590 targets with 11 < T mag < 13.5 from camera 2 used
  • 255 TCEs assigned an E label
  • 3 models applied to Sector 33 dataset: Astronet-Triage, Astronet-Triage-v2, 3 independent instances of Astronet-Triage-v2
  • S-labeled data removed from precision and recall calculations
  • Astronet-Triage-v2 improves on Astronet-Triage with AUC-PR scores of 0.961 and 0.927
  • Models trained on Y1, Y2, and Y3 data perform similarly to Astronet-Triage
  • Supports Astronet-Triage-v2’s ability to generalize to future sectors

Performance on the toi catalog

  • TESS Objects of Interest (TOI) catalog is a useful benchmark for high-confidence E or S labels
  • Astronet-Triage-v2 provides higher precision even when trained only on Primary Mission data taken during Y1 or Y2
  • A good model should label all TOI entries as E or S
  • After evaluating all TOI signals with Astronet-Triage-v2, 93% of the TOIs have E scores > 0.0105
  • Astronet-Triage-v2 passes 86% of the TOIs at a cutoff of 0.215
  • Astronet-Triage-v2 performance is better on known, confirmed, or validated planets compared to planet candidates
  • Astronet-Triage recovers 3349 TOIs at a threshold of 0.09
  • Astronet-Triage-v2 recovers 3577 TOIs at a threshold of 0.2

Use in producing the toi catalog

  • Astronet-Triage-v2 was developed to improve on Astronet-Triage and reduce the number of planet candidates lost when searching for TOIs via QLP.
  • Astronet-Triage-v2 is expected to save many planet candidates without adding false positives or increasing the hours needed for human TOI vetting.
  • Astronet-Triage-v2 has officially replaced Astronet-Triage within QLP.
  • Astronet-Triage-v2 is not yet developed enough for population statistics.

What is limiting our precision?

  • False negatives can be caused by patterns with borderline label assessments
  • Examples of ambiguous patterns include eclipsing binaries, noisy transits, and transits on a background of high stellar variability
  • Errors in period and duration values estimated by BLS can lead to de-trending distortions
  • Phase folding and binning processes are lossy and can cause a loss of precision

Comparison to other works

  • Astronet-Triage was trained and tested on Sectors 1-5, Astronet-Triage-v2 was trained and tested on Sectors 1-39.
  • Astronet-Triage used 16,516 labeled TCEs, Astronet-Triage-v2 used 24,926 TCEs.
  • Astronet-Triage used labels from one vetter, Astronet-Triage-v2 used labels from 3-5 vetters.
  • Astronet-Triage-v2 should have more reliable labels.

Applications to exoplanet population statistics

  • Planet catalogs can be used to estimate occurrence rates of exoplanets
  • Characterization of catalog completeness and reliability is important for occurrence rate studies
  • Kepler mission used a fully automated pipeline to characterize completeness and reliability
  • TESS does not yet have a fully automated pipeline
  • Astronet-Triage-v2 is an important step towards uniformly vetted FFI planet catalogs
  • Future improvements to Astronet-Triage-v2 expected to improve precision and recall of resulting planet catalog

Further improvements to the neural network

  • Deep learning classifiers have seen success due to increasing size of training datasets
  • Training dataset for this work is relatively low and has a large class-imbalance
  • Data augmentation techniques can be used to increase training dataset without obtaining new labelled data
  • Data augmentation methods such as reversing or clipping light curves and applying random Gaussian noise can help reduce over-fitting
  • More complex augmentation methods such as fitting a model to the minority class light curves can also help improve limited data

Conclusion

  • Astronet-Triage-v2 is the next in a line of Astronet architectures
  • First used for Kepler, then extended to K2 and TESS
  • Improvements over Astronet-Triage include larger and more robust training set, expanded list of possible classifications, and more views used to analyze each signal
  • 86% recall at a cutoff of 0.215 compared to 82% recall by Astronet-Triage
  • Better recall of E and S labels than Astronet-Triage for similar (or better) levels of precision
  • Replaced Astronet-Triage within QLP starting in Sector 34
  • Trained and tuned on Google Compute Engine
  • Used data from ESA mission Gaia
  • Dataset focused on brightest TCEs in Y1 and Y2, added TCEs more uniformly across magnitudes in Y3
  • Distribution of labels: E, S, B, J, N
  • Scatterplots of transit depth, planet radii, transit duration vs. orbital period
  • Test set of 2516 targets used for final evaluation
  • Figure 17 shows incorrect BLS estimation