A Machine Learning Technique to Identify Transit Shaped Signals
Daniel Wysocki • • journal_club
Paper
“A Machine Learning Technique to Identify Transit Shaped Signals” by Thompson et al.
http://arxiv.org/pdf/1509.00041.pdf
Summary
Supervised machine learning is used to automatically identify expolanets from Kepler light curves.
Figures
Figure 1 shows some transit-like TCEs, folded on the left, and binned on the right. Figure 2 shows the same for non-transit-like TCEs.
Figure 4 shows a histogram of log(T_LPP)
for (non-)transit-like TCEs.
Figure 5 shows a histogram of log(T_LPP)
for injected transits, divided into pass/fail.
Procedure
Binning
(Summarized in Section 3.2, page 3, 3rd paragraph in the right column)
- Each star’s light curve is folded on the Threshold Crossing Event (TCE) periods provided by the NExScI archive
- An equal number of bins near the TCE are chosen for each light curve
- Bins are selected such that 51 lie within, and 90 outside the transit
- (See Section 3.5 on page 5 for details)
- Bins are selected such that 51 lie within, and 90 outside the transit
- The mean of each bin is used as the magnitude
- Points are sorted by phase
- Points are normalized such that the minimum occurs at -1
Dimensionality Reduction
(Summarized in Section 3.3, beginning of page 5)
- Locality preserving projections (LPP) projects the binned light curves into a lower dimensional space
- Algorithm is similar to PCA
- Less sensitive to outliers
- Better at preserving locality (hence the name) for methods like k-NN
- produces a 20-dimensional feature vector for each event
- Algorithm is similar to PCA
Classification
(Summarized in Section 3.5, pages 5 and 7)
- The k-nearest neighbor (k-NN) algorithm is used to label the events as transit-like or not
- Training data are known transit-like events from kepler
- (Section 3.4, page 5)
- k = 15
- Distance is measured by the Euclidean 2-norm
T_LPP
is the mean of thek
distances- If
T_LPP
for a given TCE is within the range of values for known transit-like events, it is labeled as one
- If
- Training data are known transit-like events from kepler
Results
- Removes over 90% of non-transiting candidates from Kepler data, and retains over 99% of known transits
- Loses 1% of injected transits
- (Injection described in Section 3.6, page 7)