Tag Archive


amateur astronomy awk bash b[e] supergiant cartoon conference convert evolved star exoplanet fedora figaro fits fun galaxy iraf large magellanic cloud latex linux lmc machine learning magellanic clouds massive star matplotlib meteor mypaper paper peblo photometry planet pro-am pyraf python red supergiant scisoft skinakas observatory small magellanic cloud smc spectroscopy starlink talk ubuntu university of crete video x-ray yellow hypergiant

New Paper: A machine-learning photometric classifier for massive stars in nearby galaxies I. The method

This is the first paper that results from my work with the ASSESS team over the last years. It focuses on the development of a machine-learning photometric classifier to characterize massive stars originating from IR (Spitzer) catalogs, which will help us understand the episodic mass loss. The first paper presents the method and the multiple test we performed to understand its capabilities and limitations. Now we proceed with the derivation of the catalogs and their analysis.


A machine-learning photometric classifier for massive stars in nearby galaxies I. The method

Grigoris Maravelias, Alceste Z. Bonanos, Frank Tramper, Stephan de Wit, Ming Yang, Paolo Bonfini

Context. Mass loss is a key parameter in the evolution of massive stars. Despite the recent progress in the theoretical understanding of how stars lose mass, discrepancies between theory and observations still hold. Moreover, episodic mass loss in evolved massive stars is not included in the models while the importance of its role in the evolution of massive stars is currently undetermined.
Aims. A major hindrance to determining the role of episodic mass loss is the lack of large samples of classified stars. Given the recent availability of extensive photometric catalogs from various surveys spanning a range of metallicity environments, we aim to remedy the situation by applying machine learning techniques to these catalogs.
Methods. We compiled a large catalog of known massive stars in M31 and M33 using IR (Spitzer) and optical (Pan-STARRS) photometry, as well as Gaia astrometric information which helps with foreground source detection. We grouped them in 7 classes (Blue, Red, Yellow, B[e] supergiants, Luminous Blue Variables, Wolf-Rayet, and outliers, e.g. QSOs and background galaxies). As this training set is highly imbalanced, we implemented synthetic data generation to populate the underrepresented classes and improve separation by undersampling the majority class. We built an ensemble classifier utilizing color indices as features. The probabilities from three machine-learning algorithms (Support Vector Classification, Random Forests, Multi-layer Perceptron) were combined to obtain the final classification.
Results. The overall weighted balanced accuracy of the classifier is ∼ 83%. Red supergiants are always recovered at ∼ 94%. Blue and Yellow supergiants, B[e] supergiants, and background galaxies achieve ∼ 50 − 80%. Wolf-Rayet sources are detected at ∼ 45% while Luminous Blue Variables are recovered at ∼ 30% from one method mainly. This is primarily due to the small sample sizes of these classes. In addition, the mixing of spectral types, as there are no strict boundaries in the features space (color indices) between those classes, complicates the classification. In an independent application of the classifier to other galaxies (IC 1613, WLM, Sextans A) we obtained an overall accuracy of ∼ 70%. This discrepancy is attributed to the different metallicity and extinction effects of their host galaxies. Motivated by the presence of missing values we investigated the impact of missing data imputation using simple replacement with mean values and an iterative imputor, which proved to be more capable. We also investigated the feature importance to find that r − i and y − [3.6] were the most important, although different classes are sensitive to different features (with potential improvement with additional features).
Conclusions. The prediction capability of the classifier is limited by the available number of sources per class (which corresponds to the sampling of their feature space), reflecting the rarity of these objects and the possible physical links between these massive star phases. Our methodology is also efficient in correctly classifying sources with missing data, as well as at lower metallicities (with some accuracy loss), making it an excellent tool for accentuating interesting objects and prioritizing targets for observations.

The confusion matrix for 54 sources without missing values in the three galaxies (IC 1613, WLM, and Sextans A). We achieve an overall accuracy of ~70%, and we notice that the largest confusion occurs between BSG and YSG. The overall difference in the accuracy compared to that obtained with the M31 and M33 sample is attributed to the photometric errors, and the effect of metallicity and extinction in these galaxies.

arXiv: 2203.08125

New paper: A new automated tool for the spectral classification of OB stars

This paper is a result of an attempt that started way back during my PhD thesis actually. back then in early 2010’s we started investigating a way to automate the spectral classification of Be X-ray binaries. The problem with these sources is that due to the strong emission in the Balmer lines they cannot be used as characteristic features for their corresponding classes. Thus, a different automated approach is needed (based on a classification scheme that we have developed in Maravelias et al. 2014). We started with a rather small sample of well-classified OB stars in the Galaxy and the Small Magellanic Cloud and implemented a Naive Bayesian Classifier, that actually proved to work very well. However, more tests and a larger sample was in need to proceed to a publication. And as time was limited I was postponing the project.

Finally Elias Kyritsis showed up as graduate student willing to deal with this. After a successful undergraduate thesis on spectral classification of BeXBs in the Large Magellanic Cloud Elias moved from the visual inspection to the automated approach. He was successful in many fields: increasing drastically the sample, trying/optimizing/developing a different machine-learning approach, improving the line measurements, and submitting the paper to A&A. His tremendous effort has paid out finally!

I am really excited about this journey and his accomplishment. Without his help this project will at least delayed a loooooot! Thanks Elia!


A new automated tool for the spectral classification of OB stars

E. Kyritsis, G. Maravelias, A. Zezas, P. Bonfini, K. Kovlakas, P. Reig

(abridged) We develop a tool for the automated spectral classification of OB stars according to their sub-types. We use the regular Random Forest (RF) algorithm, the Probabilistic RF (PRF), and we introduce the KDE-RF method which is a combination of the Kernel-Density Estimation and the RF algorithm. We train the algorithms on the Equivalent Width (EW) of characteristic absorption lines (features) measured in high-quality spectra from large Galactic (LAMOST,GOSSS) and extragalactic surveys (2dF,VFTS) with available spectral-types and luminosity classes. We find that the overall accuracy score is ∼70% with similar results across all approaches. We show that the full set of 17 spectral lines is needed to reach the maximum performance per spectral class. We apply our model in other observational data sets providing examples of potential application of our classifier on real science cases. We find that it performs well for both single massive stars and for the companion massive stars in Be X-ray binaries. In addition, we propose a reduced 10-features scheme that can be applied to large data sets with lower S/N. The similarity in the performances of our models indicates the robustness and the reliability of the RF algorithm when it is used for the spectral classification of early-type stars. The score of ∼70% is high if we consider (a) the complexity of such multi-class classification problems, (b) the intrinsic scatter of the EW distributions within the examined spectral classes, and (c) the diversity of the training set since we use data obtained from different surveys with different observing strategies. In addition, the approach presented in this work, is applicable to data of different quality and of different format (e.g.,absolute or normalized flux) while our classifier is agnostic to the Luminosity Class of a star and, as much as possible, metallicity independent.

arXiv:2110.10669

Fig. 8.Top left panel shows the confusion matrix of the best RF model applied to the test sample. The right panel shows the confusion matrix of the PRF best model applied to the same data set. The bottom panel shows the confusion matrix of the KDE-RF method. The overall accuracy is the same for all algorithms, 70 %, with the majority of misclassified objects belonging to neighboring classes, indicating the reliability of the algorithms.