Next Article in Journal
The Method of Spatial Suitability Assessment for Photovoltaic Development at the Municipality Scale
Previous Article in Journal
Efficient PRNU Matching in the Encrypted Domain
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Fast Algorithm for Impact Point Selection in Semiparametric Functional Models †

1
MODES Research Group, CITIC, Universidade da Coruña, 15071 A Coruña, Spain
2
Institut de Mathématiques, Université Paul Sabatier, 31062 Toulouse, France
*
Author to whom correspondence should be addressed.
Presented at the 2nd XoveTIC Conference, A Coruña, Spain, 5–6 September 2019.
Proceedings 2019, 21(1), 14; https://doi.org/10.3390/proceedings2019021014
Published: 31 July 2019
(This article belongs to the Proceedings of The 2nd XoveTIC Conference (XoveTIC 2019))

Abstract

:
A new sparse semiparametric functional model is proposed, which tries to incorporate the influence of two functional variables in a scalar response in a quite simple and interpretable way. One of the functional variables is included trough a single-index structure and the other one linearly, but trough the high-dimensional vector of its discretized observations. For this model, a new algorithm for impact point selection in the linear part and for the model estimation is proposed. This procedure is based on the functional origin of the linear covariates. Some asymptotic results will ensure the good performance of the method. The computational efficiency of the algorithm, without loss of predictive power, will be showed trough a simulation study and a real data application, by comparing its results with those obtained trough the standard PLS method.

1. Introduction

In the BIG data era, it is more and more frequent having observations of variables measured in a continuous support (data are curves, images). This informative richness provided by the functional variables makes very usual found them in regression problems. In many situations, we have a scalar variable of interest and we want to know which points of a functional variable are the most influential (points of impact) on this scalar variable (see [1]). The problem is that the functional variables usually are observed in many points and standard variable selection methods in the multidimensional context can provide inadequate results. On the one hand, these procedures are affected by the dependence between observations, which in this case is directly derived from its functional origin. On the other hand, the great quantity of observations makes difficult obtaining results in reasonable amount of time.
In this work, we are going to focus on a regression model with scalar response which incorporates the influence of two functional variables: one of them is included trough a single-index type structure (see for details [2,3]) and the other one, linearly, but trough a high-dimensional vector formed by its discretized observations (see [1,4] for details and motivation of this structure). In this way we obtain a very flexible model, which combines interpretable estimations with dimension reduction. For this model, the so-called Multi-functional Partial Linear Single-Index Model (MFPLSIM), we work in the framework where we have a very big number of linear covariates but only a few of them have a real influence in the response (sparse context). Accordingly, we are going to develop an efficient algorithm for impact point selection in the linear part and for the estimation of the model (the Fast Algorithm for Sparse Semiparametric Multifunctional Regression- FASSMR), which takes advantage of the functional origin of these scalar variables included in the linear part. The good practical behaviour of the proposed methodology will be showed trough a simulation study and a real data application. In both cases, we will show its computational efficiency, without loss of predictive power, by comparing its results with the standard PLS procedure. Furthermore, some asymptotic results will support theoretically the FASSMR.

2. The Model

The MFPLSIM is defined by the relationship
Y = j = 1 p n β 0 j ζ ( t j ) + m θ 0 , X + ε ,
where Y is a real random response, X denote a random curve defined on some Hilbert space H with inner product · , · and ζ denote another random curve defined on some interval [ c , d ] . The curve ζ is observed in the points c t 1 < < t p n d and denote by ζ ( t j ) , j = 1 , , p n , its discretized observations; β 01 , , β 0 p n is a vector of unknown coefficients, m is an unknown link function and θ 0 denotes an unknown curve in H . Finally, ε is the random error, which verifies E ε | ζ ( t 1 ) , , ζ ( t p n ) , X = 0 . In model (1), we assume that only a few points of the curve ζ have an effect on the response Y. Then, we denote S n = { j = 1 , , p n , such that β 0 j 0 } , and it is verified that S n = s n = o ( p n ) .

3. The FASSMR

Our procedure is based on the fact that the variables ζ ( t j ) , j = 1 , , p n , come from the discretization of the functional variable ζ . Then, when t j is close from t k , the two corresponding variables ζ ( t j ) and ζ ( t k ) roughly contain the same information on the response. As consequence, some variables can be discarded before applying the variable selection procedure.
For presenting the FASSMR, let us assume that we have a statistical sample of size n, { ( ζ i , X i , Y i ) , i = 1 , , n } i.i.d. as ( ζ , X , Y ) . We will consider, without lost of generality, that p n can be factorized in the following way: p n = q n w n with q n and w n integers. The previous considerations allow us present the following set of variables
R n 1 = { ζ ( t k 1 ) = ζ ( t ( 2 k 1 ) q n / 2 ) , k = 1 , , w n } ,
where [ z ] denotes the smallest integer not less than z R . Note that the correlation between consecutive variables inside of R n 1 is much less important than in the whole set of p n initial linear covariates. As consequence, the variable selection procedure will be carried out in variables belonging to R n 1 . In other words, we will considerer the following model with only w n linear covariates
Y i = k = 1 w n β 0 k 1 ζ i ( t k 1 ) + m 1 θ 0 1 , X i + ε i 1 .
Then, variable selection task can be developed following the standard procedure described in [5] and detailed in [6], which is based on transforming the model (2) into a linear one and applying the PLS procedure. We denote by ( β ^ 0 1 , θ ^ 0 1 ) , the estimation of the parameters of model (2) where β ^ 0 1 = ( β ^ 01 1 , , β ^ 0 w n 1 ) . Then, ζ ( t k 1 ) is selected in R n 1 if and only if β ^ 0 k 1 0 .
Considering the whole set of initial of p n linear covariates, that is, returning to model (1), a variable ζ ( t j ) { ζ ( t 1 ) , , ζ ( t p n ) } is selected if and only if it belongs to R n 1 and its estimated coefficient, which can be denoted by β ^ 0 k j 1 , is non null. Then, S ^ n = { j = 1 , , p n , such   that   ζ ( t j ) = ζ ( t k j 1 ) R n 1 and β ^ 0 k j 1 0 } and β ^ 0 j = β ^ 0 k j 1 if j S ^ n and β ^ 0 j = 0 otherwise. Finally, θ ^ 0 = θ ^ 0 1 and an estimator of the function m θ 0 ( · ) m ( θ 0 , χ ) , denoted by m ^ θ ^ 0 ( χ ) , can be obtained by smoothing the residuals from the parametric fit (see Appendix A).

4. Theory, Simulation and Real Data Application Conclusions

The good behaviour of the proposed algorithm will be ensured theoretically. Furthermore, from the simulation study it can be seen that the FASSMR allows us to obtain the variable selection and estimation of model (1) in a reasonable amount of time, even for very big values of p n . As will be derived from the simulation study, the developed algorithm clearly overpasses standard PLS procedure in terms of computational time without loss in prediction power. A real data application will also illustrate the flexibility and applicability of model (1) together with the FASSMR estimation.

Funding

The authors acknowledge partial support by MINECO grants MTM2014-52876-R and MTM2017-82724-R (EU ERDF support included). Additionally, financial support from the Xunta de Galicia (Centro Singular de Investigación de Galicia accreditation ED431G/01 2016-2019 and Grupos de Referencia Competitiva ED431C2016-015) and the European Union (European Regional Development Fund—ERDF), is gratefully acknowledged. The first author also thanks the financial support from the Xunta de Galicia and the European Union (European Social Fund—ESF), the reference of which is ED481A-2018/191.

Conflicts of Interest

The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
FASSMRFast Algorithm for Sparse Semiparametric Multi-functional Regression
i.i.d.Independent and identically distributed
MFPLSIMMulti-functional Partial Linear Single-Index Model
PLSPenalized Least Squares

Appendix A

Denoting by β ^ 0 the vector of estimated parameters,
m ^ θ ^ 0 ( χ ) m ^ θ ^ 0 , χ = i = 1 n Y i ζ i β ^ 0 K d θ ^ 0 χ , X i / h i = 1 n K d θ ^ 0 χ , X i / h ,
where we have denoted ζ i = ζ i ( t 1 ) , , ζ i ( t p n ) , h > 0 is a bandwidth, K is a kernel and, for any θ H , d θ ( · , · ) is the semimetric defined as d θ ( χ , χ ) = θ , χ χ for each χ , χ H .

References

  1. Aneiros, G.; Vieu, P. Variable selection in infinite-dimensional problems. Stat. Probab. Lett. 2014, 9, 12–20. [Google Scholar] [CrossRef]
  2. Ait-Saïdi, A.; Ferraty, F.; Kassa, R.; Vieu, P. Cross-Validated Estimations in the Single-Functional Index Model. Statistics 2008, 42, 475–494. [Google Scholar] [CrossRef]
  3. Novo, S.; Aneiros, G.; Vieu, P. Automatic and location-adaptive estimation in functional single-index regression. J. Nonparametric Stat. 2019, 31, 364–392. [Google Scholar] [CrossRef]
  4. Aneiros, G.; Vieu, P. Partial linear modelling with multi-functional covariates. Comput. Stat. 2015, 30, 647–671. [Google Scholar] [CrossRef]
  5. Novo, S.; Aneiros, G.; Vieu, P. Sparse Semi-Functional Partial Linear Single-Index Regression. Proceedings 2018, 2, 1190. [Google Scholar] [CrossRef]
  6. Novo, S.; Aneiros, G.; Vieu, P. Sparse semiparametric regression when predictors are mixture of functional and high-dimensional variables. preprint.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Novo, S.; Aneiros, G.; Vieu, P. Fast Algorithm for Impact Point Selection in Semiparametric Functional Models. Proceedings 2019, 21, 14. https://doi.org/10.3390/proceedings2019021014

AMA Style

Novo S, Aneiros G, Vieu P. Fast Algorithm for Impact Point Selection in Semiparametric Functional Models. Proceedings. 2019; 21(1):14. https://doi.org/10.3390/proceedings2019021014

Chicago/Turabian Style

Novo, Silvia, Germán Aneiros, and Philippe Vieu. 2019. "Fast Algorithm for Impact Point Selection in Semiparametric Functional Models" Proceedings 21, no. 1: 14. https://doi.org/10.3390/proceedings2019021014

Article Metrics

Back to TopTop