Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- TESS mission produces large amount of time series data
- Deep learning techniques used to differentiate promising astrophysical eclipsing candidates from other phenomena
- Dataset curated using manual review process and used to train neural network
- Neural network achieves 99.6% recall and 75.7% precision
- Neural network able to recover 3577 out of 4140 TOIs
Paper Content
Introduction
- Human judgement has been used to detect exoplanets for 30 years
- Exoplanets are hard to detect due to their size and faintness
- Historically, humans have been used to classify planet signals as either false positives or viable planet candidates
- Humans are slow and inconsistent when classifying planet signals
- Machine learning has become a popular tool for identifying planet candidates
- Astronet-Triage was used in the TESS Quick-Look Pipeline to triage planet candidates
- Astronet-Triage-v2 was created to reduce the number of lost planet candidates while throwing out more false positives
- Input transit signals and corresponding light curves were used for training and testing the classifier
- Data was processed before being input to the neural network classifier
- Neural network architecture and training process were described
- Results of the classifier were quantified and presented
- Implications of the results were discussed
Data
- Used 25000 human vetted transit signals for training and testing model
- Signals detected by Quick-Look Pipeline (QLP)
Tces from tess ffis
- TESS collected full-frame images every 30 minutes for 2 years
- FFI cadence was updated to 10 minutes for 1st Extended Mission
- QLP produces light curves from images for targets in TIC with TESS-band magnitude brighter than 13.5
- Flux time series extracted for each star from five different sized circular apertures
- Low-frequency variability removed by dividing light curve from each orbit by basis spline
- Detrended light curves merged with previous TESS sectors
- Optimal aperture selected for target star based on TESS magnitude
- BLS algorithm used to search for transit signals
- Transit signals with signal-to-pink-noise > 9 and BLS peak significance > 5/9 filtered out
- Signals with semi-major axis to stellar radius ratio < 1 labeled as inside the star
Assembling a set of signals to label
- Labeling every TCE would take a lot of time
- Three batches of labeled TCEs were collected from the first two years of TESS Primary Mission and the first year of the TESS 1st Extended Mission
- 8992 TCEs were selected from Sector 13 for labeling
- 13372 brightest TCEs were selected from Sectors 14-26
- 2588 TCEs were added from Sectors 27-39
- Final TCE distribution is shown in Figures 1 and 2
Labels and their definitions
- Assigned one of five labels: E, S, B, J, N
- E denotes periodic eclipsing signal (planetary transits and non-contact eclipsing binaries)
- S denotes single transit or incorrect period
- B denotes contact eclipsing binaries
- J denotes junk (astrophysical and instrumental phenomena)
- N denotes not sure
Labeling process
- Labels assigned to targets based on human-visual representations
- Targets with conflicting labels discussed to reach consensus
- Weights assigned to labels with only B, J, or N votes
- Process took over 2 years
- Majority of labels are J
- Comparable amount of signals identified as eclipsing objects (E) and contact binaries (B)
- Majority of TCEs with period smaller than 0.5 days not caused by eclipses
- Majority of shallow events with period longer than 10 days not caused by eclipses
- Clear pile-up of TCEs at TESS orbital period and its alias not caused by eclipses
- Majority of TCEs with extremely short/long transit duration not caused by eclipses
Model input representations
- Pass raw flux time series to neural network
- Pass relevant information about detected periodic signal and target star to neural network
Time series data
- Preprocess raw flux time series
- Mask out transit signals
- Use multiple detrending settings
- Generate 7 different plots/views
- Bin data with robust binning technique
- Normalize binned data
- Global View: full light curve folded on reported period
- Local View: points within two transit durations of transit center
- Secondary View: most significant secondary transit
- Local Half-Period View: folded at half detected period
- Global Double Period View: folded at twice period of global view
Scalar data
- Uses scalar values to describe transit, host star and light curve
- Transit features include period, duration, depth and number of full periods
- Host star features include TESS magnitude, mass and radius
- Estimate radius using distance, apparent magnitude and color/temperature/bolometric corrections
- Light curve features include total number of points
- Normalize scalar values to zero mean and unit variance, except for number of full periods which is truncated and log-scaled
- Include detected phase of secondary eclipse and scaling factor when normalizing views
Neural network architecture
- Model uses convolutional neural network architecture from Astronet
- Features grouped together and passed through convolutional tower
- Convolutional tower consists of convolutional layers with ReLU activation and pooling layers
- Output of each tower flattened into vector shape
- Flattened outputs concatenated with auxiliary inputs to form input for fully-connected tower
Training
- Trained model using Adam optimization for 20,000 steps
- Binary cross-entropy loss used as loss function
- Model produces independent scores for each label
- Weight of label determined by majority of votes
Prediction and ensembling
- Model outputs prediction score for each label
- If “E” label score exceeds threshold, model predicts “E”
- Ensemble of 10 models constructed, if any predict “E” then ensemble prediction is “E”
- Otherwise, ensemble prediction is majority of models, ties broken at random
- Primarily interested in “E” label, other labels used to encourage network to learn natural representations
Results
- Used precision and recall to evaluate performance
- Precision is number of true positives divided by true positives and false positives
- Recall is number of true positives divided by true positives and false negatives
- High precision means fewer false positives, high recall means successful recovery of more planet candidates
Performance on validation and test sets
- Obtained AUC-PR value of 0.977 on validation dataset
- Obtained AUC-PR value of 0.965 on test set
- 100% recall at 41% precision with prediction threshold of 0.0105 on validation set
- 96.9% recall at 79.8% precision with prediction threshold of 0.215 on validation set
- 100% recall at 15% precision with prediction threshold of 0.0005 on test set
- 99.6% recall at 39.7% precision with prediction threshold of 0.0105 on test set
- 97.2% recall at 75.7% precision with prediction threshold of 0.215 on test set
Generalizing to tess 1st extended mission data
- Astronet-Triage-v2 is trained on previously observed sectors to classify new observations taken by TESS
- 90% of training dataset comes from TESS Primary Mission, QLP data from TESS 1st Extended Mission used to test generalization
- Random sample of 759 targets with T mag < 11 from camera 1 and 590 targets with 11 < T mag < 13.5 from camera 2 used
- 255 TCEs assigned an E label
- 3 models applied to Sector 33 dataset: Astronet-Triage, Astronet-Triage-v2, 3 independent instances of Astronet-Triage-v2
- S-labeled data removed from precision and recall calculations
- Astronet-Triage-v2 improves on Astronet-Triage with AUC-PR scores of 0.961 and 0.927
- Models trained on Y1, Y2, and Y3 data perform similarly to Astronet-Triage
- Supports Astronet-Triage-v2’s ability to generalize to future sectors
Performance on the toi catalog
- TESS Objects of Interest (TOI) catalog is a useful benchmark for high-confidence E or S labels
- Astronet-Triage-v2 provides higher precision even when trained only on Primary Mission data taken during Y1 or Y2
- A good model should label all TOI entries as E or S
- After evaluating all TOI signals with Astronet-Triage-v2, 93% of the TOIs have E scores > 0.0105
- Astronet-Triage-v2 passes 86% of the TOIs at a cutoff of 0.215
- Astronet-Triage-v2 performance is better on known, confirmed, or validated planets compared to planet candidates
- Astronet-Triage recovers 3349 TOIs at a threshold of 0.09
- Astronet-Triage-v2 recovers 3577 TOIs at a threshold of 0.2
Use in producing the toi catalog
- Astronet-Triage-v2 was developed to improve on Astronet-Triage and reduce the number of planet candidates lost when searching for TOIs via QLP.
- Astronet-Triage-v2 is expected to save many planet candidates without adding false positives or increasing the hours needed for human TOI vetting.
- Astronet-Triage-v2 has officially replaced Astronet-Triage within QLP.
- Astronet-Triage-v2 is not yet developed enough for population statistics.
What is limiting our precision?
- False negatives can be caused by patterns with borderline label assessments
- Examples of ambiguous patterns include eclipsing binaries, noisy transits, and transits on a background of high stellar variability
- Errors in period and duration values estimated by BLS can lead to de-trending distortions
- Phase folding and binning processes are lossy and can cause a loss of precision
Comparison to other works
- Astronet-Triage was trained and tested on Sectors 1-5, Astronet-Triage-v2 was trained and tested on Sectors 1-39.
- Astronet-Triage used 16,516 labeled TCEs, Astronet-Triage-v2 used 24,926 TCEs.
- Astronet-Triage used labels from one vetter, Astronet-Triage-v2 used labels from 3-5 vetters.
- Astronet-Triage-v2 should have more reliable labels.
Applications to exoplanet population statistics
- Planet catalogs can be used to estimate occurrence rates of exoplanets
- Characterization of catalog completeness and reliability is important for occurrence rate studies
- Kepler mission used a fully automated pipeline to characterize completeness and reliability
- TESS does not yet have a fully automated pipeline
- Astronet-Triage-v2 is an important step towards uniformly vetted FFI planet catalogs
- Future improvements to Astronet-Triage-v2 expected to improve precision and recall of resulting planet catalog
Further improvements to the neural network
- Deep learning classifiers have seen success due to increasing size of training datasets
- Training dataset for this work is relatively low and has a large class-imbalance
- Data augmentation techniques can be used to increase training dataset without obtaining new labelled data
- Data augmentation methods such as reversing or clipping light curves and applying random Gaussian noise can help reduce over-fitting
- More complex augmentation methods such as fitting a model to the minority class light curves can also help improve limited data
Conclusion
- Astronet-Triage-v2 is the next in a line of Astronet architectures
- First used for Kepler, then extended to K2 and TESS
- Improvements over Astronet-Triage include larger and more robust training set, expanded list of possible classifications, and more views used to analyze each signal
- 86% recall at a cutoff of 0.215 compared to 82% recall by Astronet-Triage
- Better recall of E and S labels than Astronet-Triage for similar (or better) levels of precision
- Replaced Astronet-Triage within QLP starting in Sector 34
- Trained and tuned on Google Compute Engine
- Used data from ESA mission Gaia
- Dataset focused on brightest TCEs in Y1 and Y2, added TCEs more uniformly across magnitudes in Y3
- Distribution of labels: E, S, B, J, N
- Scatterplots of transit depth, planet radii, transit duration vs. orbital period
- Test set of 2516 targets used for final evaluation
- Figure 17 shows incorrect BLS estimation