Tag Archive


amateur astronomy awk bash be b[e] supergiant cartoon conference convert evolved star exoplanet fedora figaro fits fun galaxy history iraf jupiter latex linux lmc machine learning massive star matplotlib meteor mypaper paper peblo photometry planet pro-am pyraf python scisoft skinakas observatory small magellanic cloud smc spectroscopy starlink talk theli ubuntu university of crete video x-ray

New Paper: A machine-learning photometric classifier for massive stars in nearby galaxies I. The method

This is the first paper that results from my work with the ASSESS team over the last years. It focuses on the development of a machine-learning photometric classifier to characterize massive stars originating from IR (Spitzer) catalogs, which will help us understand the episodic mass loss. The first paper presents the method and the multiple test we performed to understand its capabilities and limitations. Now we proceed with the derivation of the catalogs and their analysis.


A machine-learning photometric classifier for massive stars in nearby galaxies I. The method

Grigoris Maravelias, Alceste Z. Bonanos, Frank Tramper, Stephan de Wit, Ming Yang, Paolo Bonfini

Context. Mass loss is a key parameter in the evolution of massive stars. Despite the recent progress in the theoretical understanding of how stars lose mass, discrepancies between theory and observations still hold. Moreover, episodic mass loss in evolved massive stars is not included in the models while the importance of its role in the evolution of massive stars is currently undetermined.
Aims. A major hindrance to determining the role of episodic mass loss is the lack of large samples of classified stars. Given the recent availability of extensive photometric catalogs from various surveys spanning a range of metallicity environments, we aim to remedy the situation by applying machine learning techniques to these catalogs.
Methods. We compiled a large catalog of known massive stars in M31 and M33 using IR (Spitzer) and optical (Pan-STARRS) photometry, as well as Gaia astrometric information which helps with foreground source detection. We grouped them in 7 classes (Blue, Red, Yellow, B[e] supergiants, Luminous Blue Variables, Wolf-Rayet, and outliers, e.g. QSOs and background galaxies). As this training set is highly imbalanced, we implemented synthetic data generation to populate the underrepresented classes and improve separation by undersampling the majority class. We built an ensemble classifier utilizing color indices as features. The probabilities from three machine-learning algorithms (Support Vector Classification, Random Forests, Multi-layer Perceptron) were combined to obtain the final classification.
Results. The overall weighted balanced accuracy of the classifier is ∼ 83%. Red supergiants are always recovered at ∼ 94%. Blue and Yellow supergiants, B[e] supergiants, and background galaxies achieve ∼ 50 − 80%. Wolf-Rayet sources are detected at ∼ 45% while Luminous Blue Variables are recovered at ∼ 30% from one method mainly. This is primarily due to the small sample sizes of these classes. In addition, the mixing of spectral types, as there are no strict boundaries in the features space (color indices) between those classes, complicates the classification. In an independent application of the classifier to other galaxies (IC 1613, WLM, Sextans A) we obtained an overall accuracy of ∼ 70%. This discrepancy is attributed to the different metallicity and extinction effects of their host galaxies. Motivated by the presence of missing values we investigated the impact of missing data imputation using simple replacement with mean values and an iterative imputor, which proved to be more capable. We also investigated the feature importance to find that r − i and y − [3.6] were the most important, although different classes are sensitive to different features (with potential improvement with additional features).
Conclusions. The prediction capability of the classifier is limited by the available number of sources per class (which corresponds to the sampling of their feature space), reflecting the rarity of these objects and the possible physical links between these massive star phases. Our methodology is also efficient in correctly classifying sources with missing data, as well as at lower metallicities (with some accuracy loss), making it an excellent tool for accentuating interesting objects and prioritizing targets for observations.

The confusion matrix for 54 sources without missing values in the three galaxies (IC 1613, WLM, and Sextans A). We achieve an overall accuracy of ~70%, and we notice that the largest confusion occurs between BSG and YSG. The overall difference in the accuracy compared to that obtained with the M31 and M33 sample is attributed to the photometric errors, and the effect of metallicity and extinction in these galaxies.

arXiv: 2203.08125

Contribution to IAUS 366 the origin of outflows in evolved stars

The week 1-5 of November 2021, I (virtually) participated to the IAU Symposium 366 on the origin of outflows in evolved stars. I had the opportunity to present our recently submitted work on a photometric machine-learning classifier.

New paper: A new automated tool for the spectral classification of OB stars

This paper is a result of an attempt that started way back during my PhD thesis actually. back then in early 2010’s we started investigating a way to automate the spectral classification of Be X-ray binaries. The problem with these sources is that due to the strong emission in the Balmer lines they cannot be used as characteristic features for their corresponding classes. Thus, a different automated approach is needed (based on a classification scheme that we have developed in Maravelias et al. 2014). We started with a rather small sample of well-classified OB stars in the Galaxy and the Small Magellanic Cloud and implemented a Naive Bayesian Classifier, that actually proved to work very well. However, more tests and a larger sample was in need to proceed to a publication. And as time was limited I was postponing the project.

Finally Elias Kyritsis showed up as graduate student willing to deal with this. After a successful undergraduate thesis on spectral classification of BeXBs in the Large Magellanic Cloud Elias moved from the visual inspection to the automated approach. He was successful in many fields: increasing drastically the sample, trying/optimizing/developing a different machine-learning approach, improving the line measurements, and submitting the paper to A&A. His tremendous effort has paid out finally!

I am really excited about this journey and his accomplishment. Without his help this project will at least delayed a loooooot! Thanks Elia!


A new automated tool for the spectral classification of OB stars

E. Kyritsis, G. Maravelias, A. Zezas, P. Bonfini, K. Kovlakas, P. Reig

(abridged) We develop a tool for the automated spectral classification of OB stars according to their sub-types. We use the regular Random Forest (RF) algorithm, the Probabilistic RF (PRF), and we introduce the KDE-RF method which is a combination of the Kernel-Density Estimation and the RF algorithm. We train the algorithms on the Equivalent Width (EW) of characteristic absorption lines (features) measured in high-quality spectra from large Galactic (LAMOST,GOSSS) and extragalactic surveys (2dF,VFTS) with available spectral-types and luminosity classes. We find that the overall accuracy score is ∼70% with similar results across all approaches. We show that the full set of 17 spectral lines is needed to reach the maximum performance per spectral class. We apply our model in other observational data sets providing examples of potential application of our classifier on real science cases. We find that it performs well for both single massive stars and for the companion massive stars in Be X-ray binaries. In addition, we propose a reduced 10-features scheme that can be applied to large data sets with lower S/N. The similarity in the performances of our models indicates the robustness and the reliability of the RF algorithm when it is used for the spectral classification of early-type stars. The score of ∼70% is high if we consider (a) the complexity of such multi-class classification problems, (b) the intrinsic scatter of the EW distributions within the examined spectral classes, and (c) the diversity of the training set since we use data obtained from different surveys with different observing strategies. In addition, the approach presented in this work, is applicable to data of different quality and of different format (e.g.,absolute or normalized flux) while our classifier is agnostic to the Luminosity Class of a star and, as much as possible, metallicity independent.

arXiv:2110.10669

Fig. 8.Top left panel shows the confusion matrix of the best RF model applied to the test sample. The right panel shows the confusion matrix of the PRF best model applied to the same data set. The bottom panel shows the confusion matrix of the KDE-RF method. The overall accuracy is the same for all algorithms, 70 %, with the majority of misclassified objects belonging to neighboring classes, indicating the reliability of the algorithms.

EAS 2021 poster contributions

Three poster contributions during EAS 2021 with the following … statistics: all of them on massive stars,  two within the framework of the ASSESS project, and two on machine-learning applications.

1. Applying machine-learning methods to build a photometric classifier for massive stars in nearby galaxies

Grigoris Maravelias, Alceste Bonanos, Frank Tramper, Stephan de Wit, Ming Yang, Paolo Bonfini

Mass loss is a key parameter in the evolution of massive stars. Despite the recent progress in the theoretical understanding of how stars lose mass, discrepancies between theory and observations still hold. Even worse, episodic mass loss in evolved massive stars is not included in the models while the importance of its role in the evolution os massive stars is currently undetermined. A major hindrance to determining the role of episodic mass loss is the lack of large samples of classified stars. Given the recent availability of extensive photometric catalogs from various surveys spanning a range of metallicity environments, we aim to remedy the situation by applying machine learning techniques to these catalogs.We compiled a large catalog of known massive stars in M31 and M33, using IR (Spitzer) and optical (Pan-STARRS) photometry, as well as Gaia astrometric information. We grouped them in 7 classes (Blue, Red, Yellow, B[e] supergiants, Luminous Blue Variables, Wolf-Rayet, and outliers, e.g. QSO’s and background galaxies). Using this catalog as a training set, we built an ensemble classifier utilizing color indices as features. The probabilities from three machine-learning algorithms (Support Vector Classification, Random Forests, Neural Networks) are combined to obtain the final classifications. The overall performance of the classifier is ~87%. Highly populated (Red/Blue/Yellow Supergiants) and well-defined classes (B[e] Supergiants) have a high recovery rate between ~98-74%. On the contrary, Wolf-Rayet sources are detected at ~20% while Luminous Blue Variables are almost non-existent. The is mainly due to the small sample sizes of these classes, although M31 and M33 have spectral classifications for several massive stars (about 2500). In addition, the mixing of spectral types, as there are no strict boundaries in the features space (color indexes) between those classes, complicates the classification. In an independent application of the classifier to other galaxies (IC 1613, WLM, Sextans A) we obtained an overall accuracy of ~71% despite the missing values on their features (which we replace with averaged values from the training sample). This approach results only in a few percent difference, with the remaining discrepancy attributed to the different metallicity environments of their host galaxies. The classifier’s prediction capability is only limited by the available number of sources per class, reflecting the rarity of these objects and the possible physical links between these massive star phases. Our methodology is also efficient in correctly classifying sources with missing data and at lower metallicities, making it an excellent tool for spotting interesting objects and prioritizing targets for observations. Future spectroscopic observations will offer a test-bed of its actual performance along with opportunities for improvement.

For more see this k-poster (submitted for SS32: Machine Learning and Visualisation in Data Intensive Era ).

2. A new automated tool for the spectral classification of OB stars

E. Kyritsis, G. Maravelias, A. Zezas, P. Bonfini, K. Kovlakas, P. Reig

As more and more large spectroscopic surveys become available, an automated approach in spectral classification becomes necessary. Due to the importance of the massive stars it is of paramount importance to identify the phenomenological parameters of these stars (e.g., the spectral type ) which can be used as proxies to their physical parameters (e.g mass, temperature).
In this work, we use the Random Forest (RF) algorithm to develop a tool for automated spectral classification of the OB-type stars into their sub-types. We use the regular RF algorithm, the Probabilistic RF (PRF) which is an extension of RF that incorporates uncertainties, and we introduce the KDE – RF method which is a combination of the Kernel-Density Estimation and the RF algorithm. We train the algorithms on the Equivalent Width (EW) of characteristic absorption lines measured in the spectra from large Galactic (LAMOST, GOSSS) and extragalactic surveys (2dF, VFTS) with available spectral-type classification. By following an adaptive binning approach we group the labels of these data on 11 sub-types within the range O3-B9. We examined which of the characteristic spectral lines (features) are more important to use based on a number of feature selection methods and we searched for the optimal hyper-parameters of the classifiers, to achieve the best performance.
From the feature screening process, we find 13 spectral lines as the optimal number of features. We find that the overall accuracy score is ~ 76 % with similar results across all approaches, with our KDE – RF being slightly lower at ~ 73 %. In addition, we show that our optimized RF model can reach an overall accuracy score of ~ 85 % in the ideal case of robust measurement of the weakest characteristic spectral lines. We apply our model in other observational data sets providing examples of potential application of our classifier on real science cases. We find that it performs well for both single massive stars and for the companion massive stars in Be X-ray Binaries, especially for data with S/N in the range 50-300. Furthermore, we present an alternative model for lower quality data S/N < 25 based on a reduced feature-set classification scheme, including only the strongest spectral lines.
The similarity in the performances of our models indicates the robustness and the reliability of the RF algorithm when used for spectral classification of early-type stars. This is strengthened also by the fact that we are working with real-world data and not with simulations. In addition, the approach presented in this work is very fast and applicable to products from different surveys in terms of quality (e.g different resolutions) and of different formats (e.g., absolute or normalized flux).

For more see this k-poster (submitted for S16: Massive stars: birth, rotation, and chemical evolution).

3. Evolved massive stars in the Magellanic Clouds

Ming Yang, Alceste Bonanos, Biwei Jiang, Jian Gao, Panagiotis Gavras, Grigoris Maravelias, Man I Lam, Shu Wang, Xiaodian Chen, Yi Ren, Frank Tramper, Zoi Spetsieri

We present two clean, magnitude-limited (IRAC1 or WISE1≤15.0 mag) multiwavelength source catalogs for the Large and Small Magellanic Cloud (LMC and SMC). The catalogs were built by crossmatching (1”) and deblending (3”) between the source list of Spitzer Enhanced Imaging Products (SEIP) and Gaia Data Release 2 (DR2), with strict constraints on the Gaia astrometric solution in order to remove the foreground contamination. It is estimated that about 99.5% of the targets in our catalog are most likely genuine members of the LMC and SMC. The LMC catalog contains 197,004 targets in 52 different bands, while SMC catalog including contains 45,466 targets in 50 different bands, ranging from the ultraviolet to the far-infrared. Additional information about radial velocities and spectral and photometric classifications were collected from the literature. For the LMC, we compare our sample with the sample from Gaia Collaboration et al. (2018), indicating that the bright end of our sample is mostly comprised of blue helium-burning stars (BHeBs) and red HeBs with inevitable contamination of main sequence stars at the blue end. For the SMC, by using the evolutionary tracks and synthetic photometry from MESA Isochrones & Stellar Tracks and the theoretical J-Ks color cuts, we identified and ranked 1,405 red supergiant (RSG), 217 yellow supergiant (YSG), and 1,369 blue supergiant (BSG) candidates in the SMC in five different color-magnitude diagrams (CMDs), where attention should also be paid to the incompleteness of our sample. For the LMC, due to the problems with models, we applied modified magnitude and color cuts based on previous studies, and identified and ranked 2,974 RSG, 508 YSG, and 4,786 BSG candidates in the LMC in six CMDs. The comparison between the CMDs from the two catalogs of the LMC SMC indicates that the most distinct difference appears at the bright red end of the optical and near-infrared CMDs, where the cool evolved stars (e.g., RSGs, asymptotic giant branch stars, and red giant stars) are located, which is likely due to the effect of metallicity and star formation history. A further quantitative comparison of colors of massive star candidates in equal absolute magnitude bins suggests that there is essentially no difference for the BSG candidates, but a large discrepancy for the RSG candidates since LMC targets are redder than the SMC ones, which may be due to the combined effect of metallicity on both spectral type and mass-loss rate as well as the age effect. The effective temperatures (Teff) of massive star populations are also derived from reddening-free color of (J-Ks). The Teff ranges are 3500≤Teff≤5000 K for an RSG population, 5000≤Teff≤8000 K for a YSG population, and Teff≥8000 K for a BSG population, with larger uncertainties toward the hotter stars.

For more see this k-poster (submitted for S16: Massive stars: birth, rotation, and chemical evolution).

Conference contributions – summer 2019 edition

Although the summer has finished long ago, only now I got some time to update on summer activities, i.e. a number of conferences I attended and contributed to.

Since 2018, I have been working in an automated classifier for massive stars in nearby galaxies, using photometric datasets. These have been produced by my colleagues within the ASSESS team, an ERC project led by Alceste Bonanos at the National Observatory of Athens, and I have been responsible to develop a machine learning method to achieve this. We have made a lot of progress and we have reached to the point that the results are almost final (working now on the Maravelias et al. paper). So, this work has been presented in:

  1. A poster presentation at the Supernova Remnants II, Chania, Greece, 3-8 June 2019,
    as “Identifying massive stars in nearby galaxies, in a smart way”
  2. A talk, done by Frank Tramper due to my unavailability to attend the 14th Hellenic Astronomical Conference, Volos, Greece, 8-11 July 2019,
    as “Automated classification of massive stars in nearby galaxies”
  3. A talk at the Computational Intelligence in Remote Sensing and Astrophysics, FORTH workshop, Heraklion, Greece, 17-19 July 2019,
    as “An automated classifier of massive stars in nearby galaxies”
  4. A remote talk for the ASTROSTAT 2nd Consortium meeting, Boston, USA, 18-19 July 2019,
    as “Towards an automated classifier of massive stars in nearby galaxies”

Grigoris Maravelias, Alceste Z. Bonanos, Ming Yang, Frank Tramper, Stephan A. S. de Wit, Paolo Bonfini

Abstract:
Current photometric surveys can provide us with multiwavelength measurements for a vast numbers of stars in many nearby galaxies. Although the majority of these stars are evolved luminous stars (e.g. Wolf-Rayet, Blue/Yellow/Red Supergiants), we lack an accurate spectral classification, due to the demands that spectroscopy faces at these distances and for this number of stars. What we can do instead is to take advantage of machine learning algorithms (such as Support Vector Machines, Random Forests, Convolutional Neural Networks) to build an automated classifier based on a large multi-wavelength photometric catalog. We have compiled such a catalog based on optical (e.g. Pan-STARRS, OGLE) and IR (e.g. 2MASS, Spitzer) surveys, combined with astrometric information from the GAIA mission. We have also gathered spectroscopic samples of massive stars for a number of nearby galaxies (e.g. the Magellanic Clouds, M31, M33) and by using our algorithm we have achieved a success ratio of more than 80% for the training and test samples. By applying the fully trained algorithm to the available photometric datasets, we can uncover previously unclassified sources, which will become our prime candidates for spectroscopic follow-up aiming to confirm their nature and our approach.


Also Ming has presented his work in a couple of conferences:

1. As a poster presentation at the Supernova Remnants II, Chania, Greece, 3-8 June 2019,

“Evolved Massive Stars at Low-metallicity: A Source Catalog for the Small Magellanic Cloud”

Ming Yang, Alceste Z. Bonanos, Bi-Wei Jiang, Jian Gao, Panagiotis Gavras, Grigoris Maravelias, Yi Ren, Shu Wang, Meng-Yao Xue, Frank Tramper, Zoi T. Spetsieri, Ektoras Pouliasis, Stephan A. S. de Wit

We present a clean, magnitude-limited (IRAC1 or WISE1 ≤ 15.0 mag) multiwavelength source catalog for the SMC with 45,466 targets in total, with the purpose of building an anchor for future studies, especially for the massive star populations at low-metallicity. The catalog contains data in 50 different bands including 21 optical and 29 infrared bands, ranging from the ultraviolet to the far-infrared. Additionally, radial velocities and spectral classifications were collected from the literature, as well as infrared and optical variability statistics were retrieved from different datasets. The catalog was essentially built upon a 1′′ crossmatching and a 3′′ deblending between the SEIP source list and Gaia DR2 photometric data. Further constraints on the proper motions and parallaxes from Gaia DR2 allowed us to remove the foreground contamination. We estimated that about 99.5% of the targets in our catalog were most likely genuine members of the SMC. By using the evolutionary tracks and synthetic photometry from MIST and the theoretical J−Ks color cuts, we identified 1,405 RSG, 217 YSG and 1,369 BSG candidates in the SMC in five different CMDs. We ranked the candidates based on the intersection of different CMDs. A comparison between the models and observational data shows that the lower limit of initial masses for the RSGs population may be as low as 7 or even 6 M⊙, making RSGs a unique population connecting the evolved massive and intermediate stars, since stars with initial mass around 6 to 8 M⊙ are thought to go through a second dredge-up to become AGBs. We encourage the interested reader to further exploit the potential of our catalog.

2. As a talk at the ESO workshop “A synoptic view of the Magellanic Clouds: VMC, Gaia and beyond”, Garching near Munich, Germany, September 9-13, 2019

“Evolved Massive Stars and Red Supergiant Stars in the Magellanic Clouds”

Ming Yang, Alceste Z. Bonanos, Bi-Wei Jiang, Jian Gao, Panagiotis Gavras, Grigoris Maravelias, Yi Ren, Shu Wang, Meng-Yao Xue, Frank Tramper, Zoi T. Spetsieri, Ektoras Pouliasis, and Stephan de Wit

We present an ongoing investigation of infrared properties, variabilities, and mass loss rate (MLR) of evolved massive stars in the Magellanic Clouds, especially the red supergiant stars (RSGs). For the LMC, 744 RSGs compiled from the literature are identified and analysed by using the color-magnitude diagram (CMD), spectral energy distribution (SED) and mid-infrared (MIR) variability, based on 12 bands of near-infrared (NIR) to MIR co-added data from 2MASS, Spitzer and WISE, and ∼6.6 yr of MIR time-series data collected by the ALLWISE and NEOWISE-R projects. The results show that there is a relatively tight and positive correlation between the brightness, MIR variability, MLR, and the warm dust or continuum, where both the variability and the luminosity may be important for the MLR. The identified RSG sample has been compared with the theoretical evolutionary models and shown that the discrepancy between observation and evolutionary models can be mitigated by considering both variability and extinction. For the SMC, we present a relatively clean, magnitude-limited (IRAC1 or WISE1 ≤ 15.0 mag) multiwavelength source catalog with 45,466 targets in total, intending to build an anchor for the future studies, especially the massive stars at low-metallicity. It contains data in 50 different bands including 21 optical and 29 infrared bands, retrieved from SEIP, VMC, IRSF, AKARI, Heritage, Gaia, SkyMapper, NSC, Massey et al. (2002), and GALEX, ranging from the ultraviolet to the far-infrared. Additionally, radial velocities and spectral classifications are collected from the literature, as well as the infrared and optical variability information derived from WISE, SAGE-Var, VMC, IRSF, Gaia, NSC, and OGLE. The catalog is essentially built upon a 1” crossmatching and a 3” deblending between the Spitzer Enhanced Imaging Products (SEIP) source list and Gaia Data Release 2 (DR2) photometric data. Further constraints on the proper motions and parallaxes from Gaia DR2 allow us to remove the foreground contamination. We estimate that about 99.5% of the targets in our catalog are likely to be the genuine members of the SMC. By using the evolutionary tracks and synthetic photometry from MESA Isochrones & Stellar Tracks and the theoretical J−Ks color cuts, we identify 1,405 red supergiant, 217 yellow supergiant and 1,369 blue supergiant candidates in the SMC in five different CMDs. We rank the candidates based on the intersection of the different CMDs. A comparison between the models and observational data shows that, the lower limit of the RSGs population may reach to 7 or even 6M⊙, making RSGs an unique population connecting the evolved massive and intermediate stars, since stars with initial mass around 6 to 8M⊙ are thought to go through a second dredge-up to become asymptotic giant branch stars. We encourage the interested reader to further exploit the potential of our catalog, including, but not limited to, massive stars, supernova progenitors, star formation history and stellar population. Detailed analysis and comparison of RSGs in the LMC and SMC may be also presented depending on the progress of the investigation.

Master presentation by Elias Kyritsis

Elias Kyritsis has been a student at the Physics Department of the University of Crete that I have co-supervised with Prof. Andreas Zezas since 2017.

Initially as an undergraduate student he worked on the visual classification of High-Mass X-ray Binary sources in the Large Magellanic Cloud, but later on he decided to work on a more automated approach. Over the last year he has developed an automated spectral classifier (focusing on the early OB type stars) with Random Forests.Last Friday (Sep 27, 2019) he defended his work and obtained his MSc. diploma. Congratulations!

 

Elias Kyritsis on the day of his MSc, defense.Elias Kyritsis on the day of his MSc, defense. (Credit: Elias Kyritsis)