This paper is a result of an attempt that started way back during my PhD thesis actually. back then in early 2010’s we started investigating a way to automate the spectral classification of Be X-ray binaries. The problem with these sources is that due to the strong emission in the Balmer lines they cannot be used as characteristic features for their corresponding classes. Thus, a different automated approach is needed (based on a classification scheme that we have developed in Maravelias et al. 2014). We started with a rather small sample of well-classified OB stars in the Galaxy and the Small Magellanic Cloud and implemented a Naive Bayesian Classifier, that actually proved to work very well. However, more tests and a larger sample was in need to proceed to a publication. And as time was limited I was postponing the project.
Finally Elias Kyritsis showed up as graduate student willing to deal with this. After a successful undergraduate thesis on spectral classification of BeXBs in the Large Magellanic Cloud Elias moved from the visual inspection to the automated approach. He was successful in many fields: increasing drastically the sample, trying/optimizing/developing a different machine-learning approach, improving the line measurements, and submitting the paper to A&A. His tremendous effort has paid out finally!
I am really excited about this journey and his accomplishment. Without his help this project will at least delayed a loooooot! Thanks Elia!
A new automated tool for the spectral classification of OB stars
E. Kyritsis, G. Maravelias, A. Zezas, P. Bonfini, K. Kovlakas, P. Reig
(abridged) We develop a tool for the automated spectral classification of OB stars according to their sub-types. We use the regular Random Forest (RF) algorithm, the Probabilistic RF (PRF), and we introduce the KDE-RF method which is a combination of the Kernel-Density Estimation and the RF algorithm. We train the algorithms on the Equivalent Width (EW) of characteristic absorption lines (features) measured in high-quality spectra from large Galactic (LAMOST,GOSSS) and extragalactic surveys (2dF,VFTS) with available spectral-types and luminosity classes. We find that the overall accuracy score is ∼70% with similar results across all approaches. We show that the full set of 17 spectral lines is needed to reach the maximum performance per spectral class. We apply our model in other observational data sets providing examples of potential application of our classifier on real science cases. We find that it performs well for both single massive stars and for the companion massive stars in Be X-ray binaries. In addition, we propose a reduced 10-features scheme that can be applied to large data sets with lower S/N. The similarity in the performances of our models indicates the robustness and the reliability of the RF algorithm when it is used for the spectral classification of early-type stars. The score of ∼70% is high if we consider (a) the complexity of such multi-class classification problems, (b) the intrinsic scatter of the EW distributions within the examined spectral classes, and (c) the diversity of the training set since we use data obtained from different surveys with different observing strategies. In addition, the approach presented in this work, is applicable to data of different quality and of different format (e.g.,absolute or normalized flux) while our classifier is agnostic to the Luminosity Class of a star and, as much as possible, metallicity independent.