FFNN–TabNet: An Enhanced Stellar Age Determination Method Based on TabNet

Zhang, Han; Wu, Yadong; Zhang, Weihan; Zhang, Yuling

doi:10.3390/app14031203

Open AccessArticle

FFNN–TabNet: An Enhanced Stellar Age Determination Method Based on TabNet

¹

School of Computer Science and Engineering, Sichuan University of Science and Engineering, Yibin 644000, China

²

Sichuan Province Big Data Visual Analysis Technology Engineering Laboratory, Sichuan University of Science and Engineering, Yibin 644000, China

³

Sichuan Key Provincial Research Base of Intelligent Tourism, Sichuan University of Science and Engineering, Zigong 643000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(3), 1203; https://doi.org/10.3390/app14031203

Submission received: 28 December 2023 / Revised: 25 January 2024 / Accepted: 30 January 2024 / Published: 31 January 2024

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

The precise ascertainment of stellar ages is pivotal for astrophysical research into stellar characteristics and galactic dynamics. To address the prevalent challenges of suboptimal accuracy in stellar age determination and limited proficiency in apprehending nonlinear dynamics, this study introduces an enhanced model for stellar age determination, amalgamating the Feedforward Neural Network (FFNN) with TabNet (termed FFNN–TabNet). The methodology commences with the acquisition of a stellar dataset via meticulous cross-matching. Subsequent advancements encompass refinements to the activation functions within TabNet, coupled with augmentations to the Attentive transformer module by incorporating an FFNN module. These enhancements substantially boost training efficiency and precision in age estimation while amplifying the model’s capability to decode complex nonlinear interactions. Leveraging Bayesian Optimization Algorithm (BOA) for hyperparameter fine-tuning further elevates the model’s efficiency. Comprehensive ablation and comparative analyses validate the model’s superior performance in stellar age determination, demonstrating marked enhancements in accuracy. The experiment also demonstrates an enhanced ability of the model to capture nonlinear relationships between features.

Keywords:

stellar ages; TabNet; Feedforward Neural Network; Bayesian optimization algorithm

1. Introduction

The age of stars constitutes one of the most fundamental parameters in the study of stellar characteristics, as well as the current structure and formation history of the Milky Way galaxy [1]. The accurate determination of stellar ages is crucial for conducting a comprehensive census of stellar populations within the Milky Way, revealing its origins and understanding its integrated history of formation and evolution.

Contemporary large-scale astronomical surveys have amassed a significant collection of high-quality spectral data, including but not limited to the LAMOST (Large Sky Area Multi-Object Fiber Spectroscopic Telescope) survey data [2,3], GAIA (Global Astrometric Interferometer for Astrophysics) [4,5], and the SDSS (Sloan Digital Sky Survey) [6]. These spectral analyses have enabled the determination of highly accurate atmospheric parameters and chemical abundances for a substantial number of stars [7]. However, determining the age of stars via spectral analysis proves challenging. Typically, it requires the correlation of observed data with stellar evolution models, such as spectroscopic methods. Unfortunately, these approaches often yield age determinations with relatively low precision.

In recent years, the field of stellar age determination based on machine learning and deep learning has rapidly evolved with the proposition of several models, including but not limited to Artificial Neural Networks (ANN) [8], Kernel Principal Component Analysis (KPCA) [9], and Random Forests (RF) [10]. These models, thanks to their progressively refined mechanisms and functionalities, have demonstrated considerable potential in this domain. Traditional methods typically specialize in distinct data types, with the spectroscopic method, for instance, being devoted to the analysis of stellar spectra to ascertain stellar ages. However, the spectroscopic method is limited for low-mass cooling main-sequence stars or giants with narrow and indistinct isochrone intervals. The gyrochronology method’s utility is limited by its reliance on rotational periods and stellar chromatic attributes, reducing its overall applicability. The asteroseismology method deduces stellar attributes like mass and age from oscillation frequencies but is mainly suitable for stars with solar-like oscillations. The method based on stellar activity has lower age determination accuracy. A comprehensive review of traditional methods is provided in Section 2.1. In contrast, machine-learning methods have wide applicability, being able to identify and extract relevant features from various data sources (such as spectral data and asteroseismic parameters) through training, and they exhibit higher accuracy. However, the stellar data derived from spectroscopic analysis and asteroseismology encompass a wealth of physical and chemical attributes. Current models often struggle to accurately discern the key features significantly impacting age determination, resulting in suboptimal precision. Additionally, there exist complex nonlinear relationships among these datasets, and current models exhibit significant shortcomings in capturing these complexities. Further optimization and improvement are necessary to enhance their accuracy and efficiency in determining stellar ages.

Addressing the aforementioned issues, this study introduces an improved model for determining stellar ages: FFNN–TabNet, which combines Feedforward Neural Network (FFNN) [11] and TabNet [12]. This model aims to resolve the challenges of low precision and the inadequate capture of complex nonlinear relationships inherent in existing models. Specifically, the main contribution of this work can be summarized as follows:

This study introduces an enhanced model for stellar age determination based on TabNet, named FFNN–TabNet. We have made significant improvements to the attentive transformer module within TabNet, specifically by incorporating a Feedforward Neural Network (FFNN). This step not only optimizes the functionality of the module but also significantly enhances its ability to process complex data structures and capture nonlinear relationships, thereby effectively increasing the accuracy of stellar age determination.
To address the issue of gradient vanishing or diminishing that may be caused by the Rectified Linear Unit (ReLU) activation function [13] within TabNet, this study has replaced it with the Parametric Rectified Linear Unit (PReLU) activation function [14]. The PReLU function dynamically adjusts activation thresholds to avoid gradient vanishing, effectively enhancing the model’s learning speed and stability. This improvement is particularly crucial for enhancing the model’s ability to process complex nonlinear relationships, contributing to increased accuracy and efficiency in stellar age determination.
To further optimize the performance of the model, this study incorporated the Bayesian optimization algorithm (BOA) [15,16] for hyperparameter tuning. This method effectively resolves the issues of inefficiency and inaccuracy inherent in traditional hyperparameter tuning approaches, significantly enhancing the model’s capability to process complex data structures and improving the accuracy of its determination. Additionally, the algorithm accelerates the hyperparameter search process and more accurately identifies the optimal combination of hyperparameters, thereby increasing the model’s stability and robustness.
To verify the effectiveness of the FFNN–TabNet model, we conducted detailed ablation experiments and compared it comprehensively with six other different models. Through these experiments, we were able to clearly observe the performance differences between the models. The results show that in the challenging task of determining stellar ages, the FFNN–TabNet model excels, surpassing the other models in terms of accuracy and stability. The comparative experiment on feature contribution leads to the conclusion that there has been an enhancement in the model’s ability to capture nonlinear relationships among features.

The structure of this article is organized as follows: Section 2 delves into the existing research on methods for determining stellar ages. Section 3 explains the sources of the datasets used and describes the architectural design of the FFNN–TabNet. Section 4 elucidates the construction process of the FFNN–TabNet model. Section 5 provides a detailed analysis of the experimental results. Section 6 discusses the research findings presented in this paper. Section 7 summarizes the article.

2. Related Works

In the pursuit of swift and precise determination of stellar ages, a multitude of scholars from both national and international arenas have engaged in comprehensive research. Presently, the methodologies in this domain are primarily classified into two categories: conventional stellar age determination methods and efficient approaches utilizing machine-learning and deep-learning algorithms.

2.1. Traditional Stellar Age Determination Methods

5.: Spectroscopic Method

Spectroscopic methods excel in determining the ages of the main-sequence turnoff and subgiant stars, owing to the broad and non-intersecting isochrone intervals characteristic of these stars. However, for low-mass cooling main-sequence stars or giants, the accuracy of this method is limited due to the narrow and indistinct isochrone intervals. Edvardsson et al. [17] employed spectral data to determine the ages of thousands of F/G-type stars, but the method’s efficacy is limited for stars with indistinct spectral features. Nordstrom B et al. [18] combined the ubvyβ photometric method [19] with precise trigonometric parallax data to determine the ages of 16,682 F and G-type stars, although this approach demands high-quality observational data. Misha et al. [20] and Bergemann et al. [21] conducted age determinations by comparing the positions of stars on the Hertzsprung–Russell diagram with theoretical predictions of stellar evolutionary models, but the dependency and uncertainty of these models impact the accuracy of the results. Wu et al. [22] utilized LSP3 [23] and asteroseismic features based on Kepler photometry [24] to construct stellar evolutionary models, determining the ages of 150 main-sequence turnoff stars. Although effective in parameter extraction, this method’s reliance on complex data limits its widespread application.

6.: Gyrochronology Method

The gyronchronology approach infers stellar ages by analyzing the rotational periods and chromatic characteristics of stars, being particularly applicable to solar-type and young stars whose rotation periods and colors can be measured with high precision. Barnes et al. [25] and Angus et al. [26] have employed this methodology for the determination of stellar ages. However, the utility of this method is somewhat constrained due to its reliance on the rotational periods and stellar chromatic attributes, limiting its general applicability.

7.: Asteroseismology Method

The asteroseismology method infers global stellar attributes, including mass and age, based on frequencies generated by stellar oscillations [27]. However, it is primarily applicable to stars with spectral types exhibiting solar-like oscillations [28]. Kjeldsen and Bedding [29] identified two key asteroseismic parameters: Δν (large frequency separation) and ν_max (frequency of maximum oscillation power), which were used to deduce the mass and radius from oscillation data. Mazumdar et al. [30], Christensen–Dalsgaard et al. [31], and Soderblom et al. [32] report that asteroseismology-derived stellar ages have an error margin of about 30%. N. Gai et al. [33] utilized these two asteroseismic parameters and stellar temperature to determine the radius and mass of approximately 7300 stars, determining their ages based on masses and metallicity. S. Deheuvels and E. Michel [34] used two asteroseismic parameters, Δν and ν_cross, to determine the age of the star HD 49385, with an uncertainty of about 5.18%, but its applicability to a broader range of star types is limited. A. V. Silva et al. [35] implemented a novel Bayesian scheme capable of averaging asteroseismic parameters. By applying this scheme to stellar evolution models, they derived the fundamental properties of 33 Kepler planetary candidate stars, with a median statistical uncertainty of 14%, and it can be as good as 5% for red giants. C. Jiang and L. Gizon [36] proposed a Bayesian Estimation of STellar Parameters (BESTP), which is a tool that utilizes Bayesian statistics and the nested sampling Monte Carlo algorithm to search for the stellar models that best match a particular set of classical and asteroseismic constraints from observations. For HD 222076, the uncertainties of estimated masses, radii, and ages are reduced by 2%, 0.5%, and 4.7%. However, Bayesian methods rely heavily on prior knowledge and model selection. If these priors are not accurate or the model does not fit the data perfectly, then the results may be biased. Li et al. [37] utilized the asteroseismology method to conduct detailed modeling of the age of 36 Kepler subgiant stars, and this estimation is more reliable due to its lower model dependency compared to other types of stars. However, with an average uncertainty of about 15%, it also indicates that there is room for improvement in age determination.

8.: Method Based on Stellar Activity

Skumanich et al. [38], Mentuch et al. [39], and Nascimento et al. [40] successfully predicted stellar ages using lithium depletion. Tian et al. [41] successfully predicted stellar ages using helium flash indicators. However, the precision of these age determinations remains relatively low.

2.2. Machine-Learning and Deep-Learning Methods

Machine-learning and deep-learning methods involve using stellar elemental abundance and asteroseismic parameters as features to train models for age determination.

Ness et al. [42] used The Cannon [43], a model for generating observed spectra, with features like

T_{e f f}

(effective temperature),

\log g

(surface gravity), [Fe/H] (metallicity, logarithm of the ratio of iron to hydrogen atoms), and αelements abundances, predicting the mass and age of 70,000 distant red giants, including red clump stars. Bellinger et al. [44] trained a random forest regressor, applied to observational data and evolutionary models, to quickly and accurately infer the initial conditions and current age of several stars, including the Sun and 16 Cyg A and B. However, this method has issues with inaccurate parameter weighting. Verma et al. [45] used neural networks (NNs) to determine the fundamental parameters of sun-like stars, demonstrating robustness in a broad parameter space and low computational cost, but the accuracy of their age determinations needs improvement. Martig et al. [46] used linear regression algorithms with [C/Fe] (Individual element abundance of [C/Fe] determined by the Convolutional Neural Network (CNN) [47] method) and [N/Fe] (Individual element abundance of [N/Fe] determined by the CNN method) elemental abundances and asteroseismic parameters as features, determining the masses and ages of 1475 red giants. Ho et al. [48] applied Martig et al.’s method to LAMOST DR2, determining the ages of 230,000 red giants. Wu et al. [49] used KPCA to determine the masses and ages of 6940 red giants, with errors around 7% for mass determination and 25% for age determination. Payel et al. [50] proposed a new method (MADE), centered on ANN, which can learn from and entirely substitute stellar isochrones. By combining asteroseismic parameters, luminosity, and spectroscopic data, they determined the masses, ages, and distances of red giants. Wu et al. [51] continued their 2017 work, selecting 640,986 Milky Way disk Red Giant Branch stars and providing determination of their ages and masses. Hendriks and Aerts [52] adopted a deep neural network (DNN) to enhance the computational efficiency of asteroseismic modeling, focusing on the consistent oscillation modes of medium- to high-mass stars. Although their method revealed significant uncalibrated issues in core superfluidity and envelope mixing, it shows notable differences from other methods in modeling low-mass stars solely based on asteroseismic quantities. M. Hon et al. [53] utilized a Mixture Density Network (MDN) [54] to combine modal frequency information with other spectroscopic and global seismic parameters to determine the age and mass of subgiant stars. However, with a Mean Absolute Percentage Error (MAPE) of 8.12%, there remains room for improvement. Li et al. [55] combined KPCA with RF, using labels like Δν (large frequency separation),

T_{e f f}

(effective temperature),

\log g

(surface gravity), [Ba/Fe] (individual element abundance of [Ba/Fe] determined by the CNN method), [C/Fe], [Ca/Fe] (individual element abundance of [Ca/Fe] determined by the CNN method), and [Fe/H] elemental abundances, determining the masses and ages of 163,105 red clump cluster stars, with masses determination errors around 9% and ages determination errors around 18%.

Compared to traditional methods, machine-learning and deep-learning approaches can integrate a variety of input features, such as spectroscopic data and asteroseismic parameters, to capture complex relationships and enhance determination accuracy.

While current machine-learning and deep-learning methodologies have garnered notable achievements, they are not without their constraints. For example, techniques like linear regression and KPCA are inadequate in comprehensively capturing intricate nonlinear correlations and latent linkages within the features of high-dimensional data. Additionally, these approaches frequently necessitate elaborate feature engineering and domain-specific expertise for the selection of pertinent input features. Methods such as ANN and random forests introduce an element of nonlinear modeling. However, they are still limited by intrinsic network architectures, exemplified by the layer count in ANNs and the tree quantity in random forests. Adjusting these parameters, often through methods like cross-validation, complicates the process of model selection and training. Furthermore, conventional deep-learning models, despite frequently surpassing other methodologies in performance, generally suffer from a lack of interpretability.

In order to address the current challenges in the field of stellar age determination, our study introduces an innovative method that combines FFNN and TabNet. This approach aims to enhance the precision and efficiency of age determination. By integrating the strengths of FFNN in handling complex data structures and the powerful capability of TabNet in capturing nonlinear relationships, this combined method provides a more robust and accurate tool for determining stellar ages.

3. Method

3.1. Data

Red giants, a category of stars characterized by their slow evolutionary rate, are prevalent throughout the entire evolutionary process of the Milky Way galaxy. These stars carry extensive information about the evolution of both individual stars and the galaxy as a whole, making them ideal candidates for studies on stellar and galactic evolution. Consequently, this research focuses on red giants as the primary subject of study.

The LAMOST DR8 survey data, by analyzing stellar spectra, provided atmospheric parameters for 1,033,787 stellar spectra and determined the α-element abundance, effective temperature, surface gravity, and abundance of 12 individual elements through the label-transfer method based on CNN. Wu et al. [51] provided 640,986 entries with mass parameters for red giant star spectra. The SDSS DR17 [6] survey data provided 733,901 stellar spectra data, including age parameters.

This study utilized the Tool for Operations on Catalogues And Tables (TOPCAT) (http://www.starlink.ac.uk/topcat/, accessed on 24 January 2024) [56] software (version: 4.8-7) to intricately cross-match data possessing identical Fiber pointing right ascension (RA) and Fiber pointing declination (DEC) parameters across the LAMOST DR8, Wu et al. [22], and SDSS DR17 datasets. This rigorous process successfully identified a dataset comprising 41,590 red giants. Owing to the CNN in LAMOST DR8 employing training samples from SDSS, the derived dataset features a synthesis of parameters: α-element abundance, effective temperature, surface gravity, and chemical abundance from LAMOST DR8, mass parameters as per Wu et al. [49], and age labels attributed to SDSS DR17. Table 1 presents a list of parameters contained in the red giant star data.

3.2. The FFNN–TabNet

TabNet represents a neural network architecture meticulously engineered for the classification and regression of structured data. Distinct from conventional architectures predominantly oriented towards image or textual data, TabNet demonstrates exceptional aptitude in assimilating complex structured data in a tabular format, ideally aligning with the intricacies of red giant star data. This study is constructed based on the original TabNet by Arik and Pfister (2021) [12] with specific modifications aimed at enhancing network performance. The following section delves into an in-depth exposition of the myriad design attributes and architectural elements integral to the TabNet model.

3.2.1. TabNet Neural Network Decision Process

The TabNet [12] neural network architecture employs a strategically devised fully connected (FC) [57] layer, coupled with the ReLU function, to adeptly simulate the decision-making paradigm characteristic of tree models. This innovative approach enables the network to replicate the intricate decision processes inherent in these models, thereby enhancing its analytical capabilities. In this research, the standard ReLU activation function has been supplanted by the more versatile PReLU. Decision process is delineated in Figure 1.

This modified model structure, as illustrated in Figure 1, initiates the decision process by inputting a feature vector [x₁, x₂] into the TabNet neural network. After traversing the Mask layer, which selectively filters features x₁ and x₂ from the vector, these features undergo a linear transformation via a fully connected layer, custom-tailored with distinct weights and biases. Activation functions are pivotal in augmenting the learning capabilities of neural networks, serving as a fundamental component that significantly influences the network’s ability to process complex data patterns and learn from them effectively. The ReLU activation function traditionally manipulates each vector element, ensuring a singular positive element per vector, designating the chosen feature and assigning zero to all other elements, denoting their non-selection. The formula for ReLU is shown in Equation (1).

f (x) = \max (0, x)

(1)

However, this approach can induce gradient degradation or disappearance, particularly when inputs turn negative, resulting in the deactivation of neurons and their unresponsiveness to subsequent data variations. To mitigate these limitations, the study introduces the PReLU activation function. PReLU introduces a learnable parameter, α, which ensures a consistent non-zero gradient, even for negative input values. This innovative approach effectively addresses the issue of gradient vanishing, thereby significantly enhancing the model’s training efficiency and facilitating improved gradient flow throughout the network. Additionally, PReLU’s allowance for minor activation values in negative inputs addresses the issue of “dead neurons”. The decision process culminates with the fully connected layer and PReLU activation function performing conditional judgments, linearly amalgamating the decision outcomes of both features, with the final output determined by the Softmax function. The formula for PReLU is shown in Equation (2).

f (x) = \max (0, x) + α \min (0, x)

(2)

3.2.2. The FFNN–TabNet Architecture

The FFNN–TabNet framework operates through a streamlined process, wherein the final feature vector is meticulously constructed via a series of continuous feature selections and the aggregation of pertinent information, ultimately facilitating the execution of decision-making tasks. The architecture of the FFNN–TabNet’s encoder and decoder is delineated in Figure 2.

In this methodology, the red giant dataset, comprised of M data instances each with D features as an input to TabNet, R is the vector set. The model processes the feature vector

Y \in R^{M \times D}

at each stage. The detailed feature-selection mechanism encompasses several phases:

(1): Initially, the primal feature vector $Y$ undergoes normalization in a batch normalization (BN) [58] layer, followed by computational processing in the Feature Transformer layer. Within this layer, data traverses a shared parameter layer constituted of two FC + BN + GLU units, connected via a residual network and subsequently normalized by a factor of 5.0. This layer’s chief purpose is to deduce commonalities among features. Subsequent to this layer, data proceeds to an independent parameter layer, mirroring the preceding layer’s architecture [12]. The feature transformer layer architecture is depicted in Figure 3.

(2): Subsequent to its processing through the Feature Transformer layer, the red giant dataset advances into the Split module. This module functions to segregate the output derived from the initial Feature Transformer, thereby isolating and acquiring the features $a [i - 1]$ in the first step, specifically when $i = 1$ . The formula is shown in Equation (3).

[\begin{matrix} d [i], a [i] \end{matrix}] = f_{i} (M [i] \cdot f)

(3)

In Equation (3), the term d[i] denotes the current decision step, while a[i] denotes the input of the next decision step.

(3): Subsequent to traversing the Split module, the dataset proceeds into the Attentive Transformer layer, analogous to the Mask layer depicted in Figure 1. The primary role of this layer is the computation of the current step’s Mask, which is essential for feature selection. The Attentive Transformer layer is intricately composed of an FC layer, BN layer, and a Sparsemax layer, along with a weighted scaling factor as a foundational scale component [12]. Owing to the intricate non-linear interrelations among the features of red giant star data, in the current research, we have strategically integrated a FFNN upstream of the BN layer. The attentive transformer architecture is illustrated in Figure 4.

(1): Feedforward Neural Network (FFNN)

As illustrated in Figure 4, in the architecture of the attentive transformer, the FFNN module is positioned between the FC layer and the BN layer. FFNN module consists of two sequences of FC and PReLU layers. The formula is shown in Equation (4).

FFNN (x) = PReLU ({FC}_{2} (PReLU ({FC}_{1} (x))))

(4)

The integration of the FFNN module significantly enhances the TabNet model’s proficiency in comprehending and leveraging the intrinsic characteristics of features, thereby substantially amplifying its capacity to discern and interpret complex nonlinear interrelationships. This architectural innovation plays a pivotal role in advancing the model’s analytical capabilities.

(2): Sparsemax [59]

The Sparsemax layer serves to induce sparsity in the output of the Attentive Transformer. It accomplishes this by directly mapping the feature vector onto a simplex, thereby effectuating sparsification. The formula is delineated in Equation (5).

S p a r s e m a x {(z)}_{i} = \max (0, z_{i} - τ (z))

(5)

where

τ (z)

is a threshold function that ensures the sum of the resulting vector is 1.

Subsequently, the Attentive Transformer fabricates masks

M [i] \in R^{B \times D}

, functioning as a nuanced selection mechanism for prominent features, utilizing the processed features

a [i - 1]

derived from the preceding step [12], The formula is delineated in Equation (6).

M [i] = S p a r s e m a x (P [i - 1] \cdot h_{i} (a [i - 1]))

(6)

Sparsemax functions as a sparse probability activation function, analogous to Softmax, yet it yields a more sparsified output, predominantly clustered around 0 or 1, with limited intermediate values. This characteristic is instrumental in ensuring the selection of the most salient features during the feature-selection phase. In the context of feature selection, if a feature has been recurrently utilized in prior feature extraction phases, the model is designed to eschew its selection in the current stage. The primary function of the prior scale factor P[i] is to diminish the significance of such repeatedly harnessed features in the present feature selection process [12]. The formula is delineated in Equation (7).

P [i] = \prod_{j = 1}^{i} (γ - M [j])

(7)

The assignment of values is contingent on specific conditions: with

γ = 1

, a feature is designated for singular usage within the training phase. When

γ > 1

, it allows for the repeated utilization of a feature across various stages of feature selection, where its weighting diminishes in accordance with increasing usage frequency, thus attenuating its significance in future feature selection endeavors. Upon model initialization, all feature instances encapsulated by P [0] are uniformly assigned a value of 1. To augment its proficiency in sparse feature selection, TabNet incorporates an entropy-style sparse regularization component [12]. The formula is delineated in Equation (8).

L_{s p a r s e} = \sum_{i = 1}^{N_{s e p s}} \sum_{b = 1}^{B} \sum_{j = 1}^{D} \frac{- M_{b, j} [i]}{N_{s t e p s} \cdot B} \log (M_{b, j} [i] + ε)

(8)

Within the specified formula, N denotes the sequential steps, B the batch size, D the dimensionality of the features, and

ε

is a minuscule numeric value, crucial for maintaining numerical stability. The quintessential aim of this regularization term is to intensify the sparsity of

M [i]

, steering its distribution predominantly towards 0 or 1. The magnitude of

L_{s p a r s e}

is inversely proportional to the sparsity of

M [j]

; a diminutive

L_{s p a r s e}

value correlates with heightened sparsity in

M [j]

. Ultimately,

L_{s p a r s e}

is assimilated into the overall loss function. Upon the culmination of feature extraction across all steps, TabNet amalgamates the outputs from each Feature Transformer layer, processed via the PReLU activation function, aggregates these results and subsequently channels them through an FC layer to derive TabNet’s final determined output from the comprehensive process diagram of the FFNN–TabNet model as illustrated in Figure 5.

4. Construction of the FFNN–TabNet Determination Model

The construction of the red giant age determination model utilizing the FFNN–TabNet involves the following specific steps:

4.1. Data Preprocessing

The data acquired post-cross-matching is subjected to preprocessing. This includes conducting anomaly detection to identify and eliminate extreme outliers. Since the dataset contained only a few missing values, these incomplete entries were excluded to ensure the quality and consistency of the data. Subsequently, the data undergo normalization, compressing it into the [−1, 1] range. The formula is delineated in Equation (9).

{\hat{X}}^{i} = \frac{X^{i} - X_{\min}^{i}}{X_{\max}^{i} - X_{\min}^{i}}

(9)

Here,

X^{i}

and

{\hat{X}}^{i}

correspond to the values of the red giant data before and after normalization, respectively, while

X_{m i n}^{i}

and

X_{m a x}^{i}

indicate the minimum and maximum values present in the red giant data, respectively.

Following the data preprocessing phase, a comprehensive selection of 6470 red giant stars was finalized to serve as input data. The distribution of red giant samples in

T_{e f f}

(from LAMOST DR8), mass (from Wu et al. [51]) and age (from SDSS DR17) is represented in Figure 6.

In this study, the Python sklearn library was used to divide the dataset of 6470 entries into a training set and a test set in an 8:2 ratio. Specifically, out of the 6470 entries, 5176 entries (which is 80%) were allocated to the training set for model learning and adjustment. The remaining 1294 entries (20%) were designated as the test set for evaluating the model’s performance and generalization ability.

4.2. Bayesian Optimization

To further optimize the performance of the model, this study incorporated the Bayesian optimization algorithm (BOA), aimed at systematically adjusting the hyperparameter space of the FFNN–TabNet. Bayesian optimization, a probabilistic model-based optimization method, effectively tunes hyperparameters and predicts the performance of unexplored parameter configurations. Prior to the commencement of the algorithm, it is essential to define the search range for hyperparameters.

The detailed steps of the BOA are as follows:

(1): Initialization: Based on the aforementioned search range, a set of initial hyperparameters is randomly selected, denoted as T(0).
(2): Selection of Candidate Combination: Based on T(0), select a potential set of candidate hyperparameter combinations, denoted as R[t].
(3): Construction of Probability Model: Build a Bayesian network based on the current distribution of hyperparameters to represent the probabilistic relationship between hyperparameters and the objective function.
(4): Objective Function Fitting: Employ the constructed probability model to estimate the objective function, thereby generating a new set of hyperparameters, denoted as R₁[t + 1].
(5): Update of Parameter Combination: Substitute a part of R₁[t + 1] into T[t] to form a new hyperparameter combination, T[t + 1].
(6): Termination Condition Assessment: Evaluate whether the current candidate combination R[t] meets the predetermined termination criteria. If not, revert to Step 2 for further iteration; if satisfied, conclude the search.

The Bayesian optimizer was initialized with parameters init_points = 15 and n_iter = 50, indicating that for each hyperparameter adjustment, TabNet undergoes 50 training epochs with a maximum of 65 iterations. After optimizing the hyperparameter space with the Bayesian optimizer, we identified four key parameters: N_d, N_a, N_steps, and LR, with their optimal settings presented in Table 2. Additionally, the table includes several other significant hyperparameters, which, although not derived through the Bayesian optimizer, also impact the model’s performance.

4.3. Construction of the FFNN–TabNet Model

Following the meticulous preprocessing of the data, they are segregated into distinct training and testing datasets. The model’s loss function Mean Absolute Error (MSE) and Root Mean Squared Error (RMSE) are established, alongside the integration of a Bayesian optimization approach. These prepared data are subsequently input into the model for training, during which the parameters are fine-tuned to ascertain the most efficacious model configuration. Subsequently, the model undergoes a comprehensive evaluation utilizing the validation set, culminating in the derivation of determination results. The determination results are inversely normalized and saved, and the model is evaluated.

The construction process diagram is illustrated in Figure 7.

5. Experiments and Results

5.1. Loss Functions and Evaluation Metric

The loss functions used in this paper are MAE, The formula is delineated in Equation (10).

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(10)

RMSE, the formula is delineated in Equation (11).

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(11)

The evaluation metrics used in this paper are the Coefficient of Determination (R²). The formula is delineated in Equation (12).

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(12)

Here, n is the number of samples,

y_{i}

is the target value of the i-th sample parameter,

{\hat{y}}_{i}

is the FFNN–TabNet’s determination for

y_{i}

, and

\bar{y}

is the average predicted age of the red giants by the model.

MAE represents the average absolute error between the predicted ages of red giant stars by the model and the actual ages in the sample, indicating the actual size of the error in the model’s determination. RMSE is the square root of the ratio of the sum of the squared differences between the predicted and actual ages of red giants and the number of samples n, measuring the deviation of the model’s determination from the actual ages. Smaller values of MAE and RMSE indicate better model performance. R² reflects the degree of fit between the model’s predicted ages for red giant stars and the actual ages in the sample. Generally, the closer R² is to 1, the better the model’s fit.

5.2. Experimental Results and Analysis

In this study, the age of red giants is determined using an optimized TabNet model, wherein the model’s efficacy is enhanced through strategic modifications to the network architecture and meticulous fine-tuning of Bayesian hyperparameters. The scatter plot depicting the determination results is illustrated in Figure 8, with a red solid line denoting the optimal linear fit for the data points.

The determination analysis indicates that the FFNN–TabNet model exhibits remarkable precision in estimating the ages of red giants, as evidenced by its MAE of 0.535, RMSE of 0.711, and R² of 0.946. These exceptional results underscore the superior performance of the FFNN–TabNet model. The model’s high-accuracy age estimations significantly contribute to a more profound understanding of stellar evolution, offering substantial support for studies exploring the genesis and evolutionary trajectory of the Milky Way. Moreover, the findings of this research provide vital insights into the ongoing refinement and optimization of models for stellar age determination.

5.3. Ablation Experiment and Analysis

This study details an ablation study performed on the model to scrutinize each module’s contribution within the FFNN–TabNet framework for determining the age of red giants and to ascertain its impact on improving determination accuracy. The study entailed disassembling the modules for individual determination and assessing their accuracy. To neutralize parameter-induced variances, identical parameters were employed across all model iterations. The model underwent segmentation into five distinct configurations in Figure 9:

(1): TabNet 1: The model excluding both PReLU activation function and FFNN module, termed TabNet without PReLU and FFNN;
(2): TabNet 2: The model excluding only the FFNN module, termed TabNet without FFNN;
(3): TabNet 3: The model excluding only the PReLU activation function, termed TabNet without PReLU;
(4): FFNN–TabNet: The FFNN–TabNet model proposed in this study.

As indicated in Table 3, the experimental outcomes, integrated with model analysis, suggest that the exclusion of the FFNN module leads to a notable increase in the determination error. This reveals the FFNN module’s vital role in enhancing the TabNet model’s performance and its ability to intricately capture the nonlinear relationships present in red giant data. Furthermore, the removal of the PReLU activation function results in a modest reduction in determination accuracy, highlighting the function’s contribution to the model’s nonlinear extraction capabilities. Most tellingly, the simultaneous removal of both the PReLU activation function and the FFNN module results in a drastic reduction in determination accuracy, with the R² escalating from 0.882 to 0.946. This indicates that the incorporation of the PReLU activation function and FFNN module is crucial for significantly improving the model’s overall determination efficacy.

5.4. Model Comparison and Analysis

In an effort to substantiate the enhanced model’s efficacy in determining the ages of red giants, this investigation juxtaposed it with a spectrum of prevalent regression methodologies. These included an array of deep-learning approaches, namely Long Short-Term Memory (LSTM) [60] and the traditional machine-learning regression algorithm Random Forest Regression (RF) [10], Backpropagation Neural Networks (BPNN) [61], Convolutional Neural Networks (CNN) [47], Recurrent Neural Networks (RNN) [62], and Temporal Convolutional Networks (TCN) [63]. These algorithms were utilized on the same dataset for this study. The optimal hyperparameter configurations for different algorithms are delineated in Table 4.

The results of these comparative analyses are delineated in Figure 10 and Table 5.

Experimental results show that FFNN–TabNet outperforms these current popular regression methods in determining stellar ages, effectively enhancing the performance of the determination.

5.5. Feature Correlation and Analysis

The interpretability of the TabNet model is quantified using the encoder output termed “feature attributes” [12], as depicted in Figure 2. The formula is delineated in Equation (13).

η_{b} [i] = \sum_{c = 1}^{N_{d}} PReLU (d_{b, c} [i])

(13)

Equation (13) delineates the contribution of the i-th decision to the model’s ultimate warning outcome [12]. The computational methodology for ascertaining the global contribution of features is delineated in Equation (14).

M_{a g g - b, j} = \sum_{i = 1}^{N s t e p s} η_{b} [i] \cdot M_{b, j} [i]

(14)

After normalization, the global contribution of features is delineated in Equation (15).

M_{a g g - b, j} = \sum_{i = 1}^{N_{s t e p s}} η_{b} \frac{[i] M_{b, j} [i]}{\sum_{j = 1}^{D} \sum_{i = 1}^{N_{s t e p s}} η_{b} [i] M_{b, j} [i]}

(15)

Here,

M_{a g g}

reflects the global contribution of the feature, while

M [i]

represents the feature contribution at the i-th decision step.

Figure 11 compares the feature contribution between TabNet 1 and FFNN–TabNet. The trend line in Figure 10b is smoother than that in (a), indicating an enhanced ability of the FFNN–TabNet model to capture complex nonlinear relationships between features. This enhancement allows for a more effective utilization of feature information in the determination of stellar ages. Specifically, within the FFNN–TabNet model, mass has the highest contribution among all features, indicating the critical importance of a stellar’s mass in determining its age. This finding aligns with astrophysical knowledge, where mass is indeed a key factor in determining a stellar’s evolutionary stage. Subsequently, chemical abundances such as N_H and Ca_H also occupy high positions in the feature contribution chart, highlighting their significance in stellar age determination. These chemical abundances reflect the star’s internal nuclear synthesis history, and the abundance of different elements is related to the stellar’s age. The contribution of

\log g

is also high, as surface gravity is related to the mass and radius of the stellar, serving as an indirect indicator of its age. The contribution of

α

elements is not the highest but is still significant; α elements are produced in stars through nuclear fusion processes, and their abundance is closely linked to the star’s chemical evolutionary history.

T_{e f f}

has a moderate contribution, indicating that temperature is also an important factor in determining the age of stellar.

Observations from Figure 11 also indicate that the contributions of certain chemical abundance features, specifically C_H, K_H, Co_H, Ti_H, Al_H, and Mn_H, are exceedingly minimal, to the extent of being almost negligible. Consequently, this study delved into the critical question of whether the exclusion of these features would affect the model’s accuracy in determination. The results after removing these features are shown in Figure 12.

An analysis of Figure 12 reveals that the exclusion of certain features results in a negligible impact on the performance capabilities of the model proposed in this study, with MAE = 0.542, RMSE = 0.716, and R² = 0.939. This observation underscores the model’s substantial robustness, demonstrating its ability to rely on other more significant features for effective determination.

6. Discussion

The FFNN–TabNet model proposed in this study provides a new approach for the accurate determination of stellar ages. Previous research methods, such as traditional stellar age determination techniques and those based on machine learning, often faced challenges in accuracy and effectively capturing complex nonlinear relationships in the data. The main feature of the FFNN–TabNet model lies in its efficient processing of complex data structures and deep understanding of nonlinear relationships. This not only ensures high accuracy in determining stellar ages but also demonstrates its powerful capability in analyzing and interpreting complex astronomical data. The use of the Bayesian optimization algorithm played a key role in hyperparameter tuning, enhancing the overall performance of the model. This approach provides new insights into optimizing deep-learning models, especially when dealing with high-dimensional and complex data like astronomical observations. An analysis of the feature contribution comparison experiment reveals that the inclusion of the FFNN module significantly enhances the model’s capability to capture nonlinear relationships among features. Furthermore, this analysis identifies the features most closely related to stellar age. These pivotal features can guide future observational and data-collection strategies, allowing researchers to focus on attributes with the most significant impact on determining stellar ages. FFNN–TabNet breaks through the limitations of traditional methods, offering a new tool for future research in stars and galaxies and paving new paths for the application of deep-learning technologies in astronomy.

Future research will focus on three main areas: (1) applying the FFNN–TabNet model to different types of stars to enhance its generalization ability; (2) further refining feature engineering to improve the computational efficiency of the model; (3) integrating with astronomical visualization technologies to help researchers more intuitively understand the data and model results.

7. Conclusions

In this study, we proposed a stellar age determination method named FFNN–TabNet, which integrates FFNN and TabNet. This method employs the PReLU activation function instead of the ReLU activation function, solving the issue of gradient vanishing or diminishing. By incorporating an FFNN module composed of two layers of FC and PReLU activation functions into the attentive transformer module, it addresses the inadequacy of existing models in capturing nonlinear relationships. Furthermore, the model’s performance is further optimized by using the Bayesian optimization algorithm for hyperparameter tuning. Experimental results demonstrate that the FFNN–TabNet model proposed in this paper is more capable of capturing complex nonlinear relationships in stellar data compared to other models, yielding more stable training results and higher determination accuracy.

Author Contributions

Conceptualization, H.Z., Y.W. and W.Z.; methodology, H.Z., Y.W. and W.Z.; investigation, H.Z., Y.W., W.Z. and Y.Z.; writing—review and editing, H.Z., Y.W., W.Z. and Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Sichuan Provincial Department of Science and Technology Project (2023YFG0307); and in part by the Sichuan Province Intelligent Tourism Research Base Project (ZHYJ22-02).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset used in this study was obtained through cross-matching datasets from LAMOST, SDSS, and those mentioned in the paper (datasets available on the LAMOST official website). These are publicly available datasets, which can be found on the official websites of LAMOST and SDSS.

Acknowledgments

The authors would like to thank the authors of all references used in the paper, the editors, and the anonymous reviewers for their detailed comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Tian, H.-J.; Liu, C.; Wu, Y.; Xiang, M.-S.; Zhang, Y. Time stamps of vertical phase mixing in the Galactic disk from LAMOST/Gaia stars. Astrophys. J. Lett. 2018, 865, L19. [Google Scholar] [CrossRef]
Bai, Z.-R.; Zhang, H.-T.; Yuan, H.-L.; Fan, D.-W.; He, B.-L.; Lei, Y.-J.; Dong, Y.-Q.; Yu, S.-C.; Zhao, Y.-H.; Zhang, Y.; et al. The first data release of LAMOST low-resolution single-epoch spectra. Res. Astron. Astrophys. 2021, 21, 249. [Google Scholar] [CrossRef]
Li, S.-S.; Fan, D.W.; Cui, Z.Z.; He, B.L.; Tao, Y.H.; Huo, Z.Y.; Mi, L.Y.; Luo, A.L.; Chen, J.J.; Hou, W.; et al. Review of LAMOST Open Data Access and Future Prospect. China Sci. Technol. Resour. Rev. 2022, 54, 47–55. [Google Scholar]
Torra, F.; Castañeda, J.; Fabricius, C.; Lindegren, L.; Clotet, M.; González-Vidal, J.J.; Bartolomé, S.; Bastian, U.; Bernet, M.; Biermann, M.; et al. Gaia Early Data Release 3-Building the Gaia DR3 source list–Cross-match of Gaia observations. Astron. Astrophys. 2021, 649, A10. [Google Scholar] [CrossRef]
Brown, A.; Vallenari, A.; Prusti, T.; De Bruijne, J.; Babusiaux, C.; Biermann, M.; Creevey, O.; Evans, D.; Eyer, L.; Hutton, A.; et al. Gaia data release 2-summary of the contents and survey properties. Astron. Astrophys. 2018, 616, A1. [Google Scholar]
Abdurro’uf, N.; Accetta, K.; Aerts, C.; Aguirre, V.S.; Ahumada, R.; Ajgaonkar, N.; Ak, N.F.; Alam, S.; Prieto, C.A.; Almeida, A.; et al. The seventeenth data release of the Sloan Digital Sky Surveys: Complete release of MaNGA, MaStar, and APOGEE-2 data. Astrophys. J. Suppl. Ser. 2022, 259, 35. [Google Scholar] [CrossRef]
Xiang, M.; Liu, X.; Shi, J.; Yuan, H.; Huang, Y.; Chen, B.; Wang, C.; Tian, Z.; Wu, Y.; Yang, Y.; et al. The ages and masses of a million galactic-disk main-sequence turnoff and subgiant stars from the LAMOST galactic spectroscopic surveys. Astrophys. J. Suppl. Ser. 2017, 232, 2. [Google Scholar] [CrossRef]
Zou, J.; Han, Y.; So, S.S. Overview of artificial neural networks. Methods Mol. Biol. 2009, 458, 15. [Google Scholar]
Schölkopf, B.; Smola, A.; Müller, K.-R. Kernel Principal component analysis. In Artificial Neural Networks—ICANN’97; Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D., Eds.; Springer: Berlin/Heidelberg, Germany, 1997; pp. 583–588. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Bebis, G.; Georgiopoulos, M. Feed-forward neural networks. IEEE Potentials 1994, 13, 27–31. [Google Scholar] [CrossRef]
Arik, S.Ö.; Pfister, T. Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Palo Alto, CA, USA, 2–9 February 2021; pp. 6679–6687. [Google Scholar]
Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, Las Condes, Chile, 11–18 December 2015; pp. 1026–1034. [Google Scholar]
Pelikan, M.; Goldberg, D.E.; Cantú-Paz, E. BOA: The Bayesian optimization algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference GECCO-99, Citeseer, Orlando, FL, USA, 13–17 July 1999. [Google Scholar]
Pelikan, M. Bayesian Optimization Algorithm: From Single Level to Hierarchy; University of Illinois at Urbana-Champaign: Champaign, IL, USA, 2002. [Google Scholar]
Edvardsson, B.; Andersen, J.; Gustafsson, B.; Lambert, D.; Nissen, P.; Tomkin, J. VizieR Online Data Catalog: Chemical evolution of the galactic disk I. (Edvardsson+ 1993). In VizieR Online Data Catalog; Université de Strasbourg: Strasbourg, France, 1993; p. J-A+. [Google Scholar]
Nordström, B.; Mayor, M.; Andersen, J.; Holmberg, J.; Pont, F.; Jørgensen, B.R.; Olsen, E.H.; Udry, S.; Mowlavi, N. The Geneva-Copenhagen survey of the Solar neighbourhood-Ages, metallicities, and kinematic properties of ∼14,000 F and G dwarfs. Astron. Astrophys. 2004, 418, 989–1019. [Google Scholar] [CrossRef]
Franco, G. VizieR Online Data Catalog: Ubvyβ photometry in Puppis-Vela (Franco, 2012). In VizieR Online Data Catalog; Université de Strasbourg: Strasbourg, France, 2012; p. J-A+. [Google Scholar]
Haywood, M.; Di Matteo, P.; Lehnert, M.D.; Katz, D.; Gómez, A. The age structure of stellar populations in the solar vicinity-Clues of a two-phase formation history of the Milky Way disk. Astron. Astrophys. 2013, 560, A109. [Google Scholar] [CrossRef]
Bergemann, M.; Ruchti, G.R.; Serenelli, A.; Feltzing, S.; Alves-Brito, A.; Asplund, M.; Bensby, T.; Gruiters, P.; Heiter, U.; Hourihane, A.; et al. The Gaia-ESO Survey: Radial metallicity gradients and age-metallicity relation of stars in the Milky Way disk. Astron. Astrophys. 2014, 565, A89. [Google Scholar] [CrossRef]
Wu, Y.-Q.; Xiang, M.-S.; Zhang, X.-F.; Li, T.-D.; Bi, S.-L.; Liu, X.-W.; Fu, J.-N.; Huang, Y.; Tian, Z.-J.; Liu, K.; et al. Stellar parameters of main sequence turn-off star candidates observed with LAMOST and Kepler. Res. Astron. Astrophys. 2017, 17, 5. [Google Scholar] [CrossRef]
Xiang, M.; Liu, X.W.; Huang, Y.; Huo, Z.Y.; Chen, B.Q.; Zhang, H.H.; Sun, N.C.; Wang, C.; Zhao, Y.H.; Shi, J.R.; et al. The LAMOST stellar parameter pipeline at Peking University–LSP3. Mon. Not. R. Astron. Soc. 2015, 448, 822–854. [Google Scholar] [CrossRef]
Clarke, R.W.; Bianco, F.; Gizis, J. Detection and Removal of Periodic Noise in KeplerK2 Photometry with Principal Component Analysis. Res. Notes AAS 2021, 5, 175. [Google Scholar] [CrossRef]
Barnes, S.A. Ages for illustrative field stars using gyrochronology: Viability, limitations, and errors. Astrophys. J. 2007, 669, 1167. [Google Scholar] [CrossRef]
Angus, R.; Aigrain, S.; Foreman-Mackey, D. Probabilistic stellar rotation periods with Gaussian processes. IAU Gen. Assem. 2015, 29, 2258396. [Google Scholar]
Cunha, M.; Metcalfe, T. Asteroseismic signatures of small convective cores. Astrophys. J. 2007, 666, 413. [Google Scholar] [CrossRef]
Vauclair, S. Stellar ages from asteroseismology: A few examples. Proc. Int. Astron. Union 2008, 4, 443–448. [Google Scholar] [CrossRef]
Kjeldsen, H.; Bedding, T.R.; Viskum, M.; Frandsen, S. Solar-like oscillations in eta Boo. arXiv 1994, arXiv:astro-ph/9411016. [Google Scholar]
Mazumdar, A. Asteroseismic diagrams for solar-type stars. Astron. Astrophys. 2005, 441, 1079–1086. [Google Scholar] [CrossRef]
Christensen-Dalsgaard, J. The Sun as a fundamental calibrator of stellar evolution. Proc. Int. Astron. Union 2008, 4, 431–442. [Google Scholar] [CrossRef]
Soderblom, D.R. The ages of stars. Annu. Rev. Astron. Astrophys. 2010, 48, 581–629. [Google Scholar] [CrossRef]
Gai, N.; Basu, S.; Chaplin, W.J.; Elsworth, Y. An in-depth study of grid-based asteroseismic analysis. Astrophys. J. 2011, 730, 63. [Google Scholar] [CrossRef]
Deheuvels, S.; Michel, E. Constraints on the structure of the core of subgiants via mixed modes: The case of HD49385. Astron. Astrophys. 2011, 535, A91. [Google Scholar] [CrossRef]
Silva, A.V.; Davies, G.R.; Basu, S.; Christensen-Dalsgaard, J.; Creevey, O.; Metcalfe, T.S.; Bedding, T.R.; Casagrande, L.; Handberg, R.; Lund, M.N.; et al. Ages and fundamental properties of Kepler exoplanet host stars from asteroseismology. Mon. Not. R. Astron. Soc. 2015, 452, 2127–2148. [Google Scholar] [CrossRef]
Jiang, C.; Gizon, L. BESTP—An automated Bayesian modeling tool for asteroseismology. Res. Astron. Astrophys. 2021, 21, 11. [Google Scholar] [CrossRef]
Li, T.; Bedding, T.R.; Christensen-Dalsgaard, J.; Stello, D.; Li, Y.; Keen, M.A. Asteroseismology of 36 Kepler subgiants–II. Determining ages from detailed modelling. Mon. Not. R. Astron. Soc. 2020, 495, 3431–3462. [Google Scholar] [CrossRef]
Skumanich, A. Time scales for CA II emission decay, rotational braking, and lithium depletion. Astrophys. J. 1972, 171, 565. [Google Scholar] [CrossRef]
Mentuch, E.; Brandeker, A.; van Kerkwijk, M.H.; Jayawardhana, R.; Hauschildt, P.H. Lithium depletion of nearby young stellar associations. Astrophys. J. 2008, 689, 1127. [Google Scholar] [CrossRef]
Nascimento, J.D.; Castro, M.; Meléndez, J.; Bazot, M.; Théado, S.; de Mello, G.F.P.; De Medeiros, J.R. Age and mass of solar twins constrained by lithium abundance. Astron. Astrophys. 2009, 501, 687–694. [Google Scholar] [CrossRef]
Tian, H.-J.; Liu, C.; Wan, J.-C.; Wang, Y.-G.; Wang, Q.; Deng, L.-C.; Cao, Z.-H.; Hou, Y.-H.; Wang, Y.-F.; Wu, Y.; et al. Peculiar in-plane velocities in the outer disc of the Milky Way. Res. Astron. Astrophys. 2017, 17, 114. [Google Scholar] [CrossRef]
Ness, M.; Hogg, D.W.; Rix, H.-W.; Martig, M.; Pinsonneault, M.H.; Ho, A. Spectroscopic determination of masses (and implied ages) for red giants. Astrophys. J. 2016, 823, 114. [Google Scholar] [CrossRef]
Ness, M.; Hogg, D.W.; Rix, H.-W.; Ho, A.Y.; Zasowski, G. The cannon: A data-driven approach to stellar label determination. Astrophys. J. 2015, 808, 16. [Google Scholar] [CrossRef]
Bellinger, E.P.; Angelou, G.C.; Hekker, S.; Basu, S.; Ball, W.H.; Guggenberger, E. Fundamental Parameters of Main-Sequence Stars in an Instant with Machine Learning. Astrophys. J. 2016, 830, 31. [Google Scholar] [CrossRef]
Verma, K.; Hanasoge, S.; Bhattacharya, J.; Antia, H.; Krishnamurthi, G. Asteroseismic determination of fundamental parameters of Sun-like stars using multilayered neural networks. Mon. Not. R. Astron. Soc. 2016, 461, 4206–4214. [Google Scholar] [CrossRef]
Martig, M.; Fouesneau, M.; Rix, H.-W.; Ness, M.; Mészáros, S.; García-Hernández, D.A.; Pinsonneault, M.; Serenelli, A.; Aguirre, V.S.; Zamora, O.; et al. Red giant masses and ages derived from carbon and nitrogen abundances. Mon. Not. R. Astron. Soc. 2016, 456, 3655–3670. [Google Scholar] [CrossRef]
Bouvrie, J. Notes on Convolutional Neural Networks; Massachusetts Institute of Technology: Cambridge, MA, USA, 2006. [Google Scholar]
Ho, A.Y.; Rix, H.-W.; Ness, M.K.; Hogg, D.W.; Liu, C.; Ting, Y.-S. Masses and ages for 230,000 LAMOST giants, via their carbon and nitrogen abundances. Astrophys. J. 2017, 841, 40. [Google Scholar] [CrossRef]
Wu, Y.; Xiang, M.; Bi, S.; Liu, X.; Yu, J.; Hon, M.; Sharma, S.; Li, T.; Huang, Y.; Liu, K.; et al. Mass and age of red giant branch stars observed with LAMOST and Kepler. Mon. Not. R. Astron. Soc. 2018, 475, 3633–3643. [Google Scholar] [CrossRef]
Das, P.; Sanders, J.L. MADE: A spectroscopic mass, age, and distance estimator for red giant stars with Bayesian machine learning. Mon. Not. R. Astron. Soc. 2019, 484, 294–304. [Google Scholar] [CrossRef]
Wu, Y.; Xiang, M.; Zhao, G.; Bi, S.; Liu, X.; Shi, J.; Huang, Y.; Yuan, H.; Wang, C.; Chen, B.; et al. Ages and masses of 0.64 million red giant branch stars from the LAMOST Galactic Spectroscopic Survey. Mon. Not. R. Astron. Soc. 2019, 484, 5315–5329. [Google Scholar] [CrossRef]
Hendriks, L.; Aerts, C. Deep Learning Applied to the Asteroseismic Modeling of Stars with Coherent Oscillation Modes. Publ. Astron. Soc. Pac. 2019, 131, 108001. [Google Scholar] [CrossRef]
Hon, M.; Bellinger, E.P.; Hekker, S.; Stello, D.; Kuszlewicz, J.S. Asteroseismic inference of subgiant evolutionary parameters with deep learning. Mon. Not. R. Astron. Soc. 2020, 499, 2445–2461. [Google Scholar] [CrossRef]
Bishop, C.M. Mixture Density Networks; IEEE Computer Society: Washington, DC, USA, 1994. [Google Scholar]
Li, Q.-D.; Wang, H.F.; Luo, Y.P.; Li, Q.; Deng, L.C.; Ting, Y.S. Large Sample of Stellar Age Determination Based on LAMOST Data and Machine Learning. J. China West Norm. Univ. (Nat. Sci.) 2023, 44, 195–200. [Google Scholar]
Taylor, M.B. TOPCAT & STIL: Starlink table/VOTable processing software. In Astronomical Data Analysis Software and Systems XIV; Astronomical Society of the Pacific: San Francisco, CA, USA, 2005; p. 29. [Google Scholar]
Ma, W.; Lu, J. An equivalence of fully connected layer and convolutional layer. arXiv 2017, arXiv:1712.01252. [Google Scholar]
Hoffer, E.; Hubara, I.; Soudry, D. Train longer, generalize better: Closing the generalization gap in large batch training of neural networks. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Martins, A.F.T.; Astudillo, R.; Hokamp, C.; Kepler, F. Unbabel’s Participation in the WMT16 Word-Level Translation Quality Estimation Shared Task. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, Berlin, Germany, 1–2 August 2016. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Wythoff, B.J. Backpropagation neural networks: A tutorial. Chemom. Intell. Lab. Syst. 1993, 18, 115–155. [Google Scholar] [CrossRef]
Medsker, L.R.; Jain, L. Recurrent neural networks. Des. Appl. 2001, 5, 2. [Google Scholar]
Lea, C.; Vidal, R.; Reiter, A.; Hager, G.D. Temporal convolutional networks: A unified approach to action segmentation. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8–10 and 15–16, 2016, Proceedings, Part III 14; Springer: Berlin/Heidelberg, Germany, 2016; pp. 47–54. [Google Scholar]

Figure 1. TabNet neural network decision process. In contrast to the original TabNet by Arik and Pfister (2021) [12], which employs ReLU as the activation function, our implementation has utilized PReLU for improved performance.

Figure 2. The architecture of the FFNN–TabNet’s encoder and decoder. In contrast to the original TabNet by Arik and Pfister (2021) [12], our implementation has made improvements in activation function and Feature Transformer modules.

Figure 3. The feature transformer layer architecture.

Figure 4. The attentive transformer architecture. In contrast to the original TabNet by Arik and Pfister (2021) [12], our implementation has added the FFNN module to improve performance.

Figure 5. Comprehensive process diagram of the FFNN–TabNet model.

Figure 6. Pairwise relationships for

T_{e f f}

(from LAMOST DR8), mass (from Wu et al. [51]), and age (from SDSS DR17) of red giant stars.

Figure 6. Pairwise relationships for

T_{e f f}

(from LAMOST DR8), mass (from Wu et al. [51]), and age (from SDSS DR17) of red giant stars.

Figure 7. Construction process diagram of the FFNN–TabNet model.

Figure 8. Comparison of the actual ages of red giants and the determination results by the FFNN–TabNet model. This experiment utilized the test dataset.

Figure 9. Comparison of Ablation Experiment Results by (a) Tabnet 1; (b) TabNet 2; (c) TabNet 3 models. The red solid line denotes the optimal linear fit for the data points. This experiment utilized the test dataset.

Figure 10. Comparison of the actual ages of red giants and the determined results by (a) LSTM, (b) RF, (c) BPNN, (d) RNN, (e) CNN, and (f) TCN models. The red solid line denotes the optimal linear fit for the data points. This experiment utilized the test dataset.

Figure 11. Comparison of feature contributions. (a) TabNet feature contributions; (b) FFNN–TabNet feature contributions.

Figure 12. Comparison of the actual ages of red giants and the determination results by the FFNN–TabNet model. The red solid line denotes the optimal linear fit for the data points. This experiment utilized the test dataset with the features C_H, K_H, Co_H, Ti_H, Al_H, and Mn_H removed.

Table 1. Red giant stars data parameters.

Parameter	Description
$T_{e f f}$ (K)	Effective temperature
$\log g$ (dex)	Surface gravity
mass	Red giant mass
alpha	α elements
Fe_H, N_H, Ca_H, V_H, C_H, P_H, Ni_H, Na_H, Mg_H, K_H, Co_H, Ti_H, Al_H, Mn_H, O_H, Cr_H	Chemical abundances, logarithm of the ratio of iron, nitrogen, calcium, vanadium, carbon, phosphorus, nickel, natrium, magnesium, potassium, cobalt, titanium, aluminum, manganese, oxygen, chromium to hydrogen atoms
Age (Gyr)	red giant age

Table 2. The optimal hyperparameter configuration for FFNN–TabNet.

Hyperparameter	Description	Value Range	Value
N_d	Width of the decision prediction layer	~4–64	16
N_a	Width of the attention embedding for each mask	~4–64	16
N_steps	Number of steps in the architecture	~1–10	6
Lr	Learning rate	~0.001–0.1	0.01
Optimizer_fn	Pytorch optimizer function	-	Adam
Max Epochs	Maximum number of epochs for training	-	100
Batch Size	Number of examples per batch	-	64
Virtual Batch Size	Size of the mini batches used for GBN	-	128
Early Stopping Patience	Early Stopping Patience	-	20

Table 3. Comparison of ablation experiment results.

Methods	MSE	RMSE	R²
Tabnet 1	0.659	0.875	0.883
Tabnet 2	0.715	0.803	0.913
Tabnet 3	0.623	0.752	0.935
FFNN–TabNet	0.535	0.711	0.946

Table 4. The optimal hyperparameter configurations for different algorithms.

Methods	Settings
LSTM	Learning rate = 0.001, batch size = 32, epochs = 50, optimizer = Adam, activation functions = tanh
RF	n_estimators = 100, max_depth = 3
BPNN	Learning rate = 0.001, batch size = 64, epochs = 50, optimizer = Adam, activation functions = ReLU
CNN	Learning rate = 0.001, batch size = 64, epochs = 50, optimizer = Adam, activation functions = ReLU
RNN	Learning rate = 0.001, batch size = 32, epochs = 50, optimizer = Adam, activation functions = tanh
TCN	Learning rate = 0.001, batch size = 64, epochs = 50, optimizer = Adam, activation functions = ReLU

Table 5. Comparison results of determination using different algorithms.

Methods	MSE	RMSE	R²
LSTM	0.860	1.146	0.802
RF	0.749	0.981	0.848
BPNN	0.697	0.913	0.856
CNN	0.672	0.882	0.871
RNN	0.681	0.876	0.878
TCN	0.593	0.796	0.901
FFNN–TabNet	0.535	0.711	0.946

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, H.; Wu, Y.; Zhang, W.; Zhang, Y. FFNN–TabNet: An Enhanced Stellar Age Determination Method Based on TabNet. Appl. Sci. 2024, 14, 1203. https://doi.org/10.3390/app14031203

AMA Style

Zhang H, Wu Y, Zhang W, Zhang Y. FFNN–TabNet: An Enhanced Stellar Age Determination Method Based on TabNet. Applied Sciences. 2024; 14(3):1203. https://doi.org/10.3390/app14031203

Chicago/Turabian Style

Zhang, Han, Yadong Wu, Weihan Zhang, and Yuling Zhang. 2024. "FFNN–TabNet: An Enhanced Stellar Age Determination Method Based on TabNet" Applied Sciences 14, no. 3: 1203. https://doi.org/10.3390/app14031203

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

FFNN–TabNet: An Enhanced Stellar Age Determination Method Based on TabNet

Abstract

1. Introduction

2. Related Works

2.1. Traditional Stellar Age Determination Methods

2.2. Machine-Learning and Deep-Learning Methods

3. Method

3.1. Data

3.2. The FFNN–TabNet

3.2.1. TabNet Neural Network Decision Process

3.2.2. The FFNN–TabNet Architecture

4. Construction of the FFNN–TabNet Determination Model

4.1. Data Preprocessing

4.2. Bayesian Optimization

4.3. Construction of the FFNN–TabNet Model

5. Experiments and Results

5.1. Loss Functions and Evaluation Metric

5.2. Experimental Results and Analysis

5.3. Ablation Experiment and Analysis

5.4. Model Comparison and Analysis

5.5. Feature Correlation and Analysis

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI