1. Introduction
Thin-Plate Spline Generalized Linear Models (TPS-GLMs) represent an extension of semiparametric generalized linear models (SGLMs) by enabling the application of smoothing splines in multiple dimensions. These models have the same characteristics of the generalized linear model (GLM), as described by McCullagh and Nelder [
1]. Like GLMs, TPS-GLMs can assume a variety of distribution families for the response variable. They also allow for a non-linear relationship between the response variable’s mean and the linear predictor via a link function, and they account for non constant variance in the data. Furthermore, the TPS-GLM allow modeling non-linear joint interaction effects due to some covariates, as well as the effects of coordinates in spatial data, making them a useful tool to model dynamic pattern in different scientific areas, such as environment, agronomy, ecology, and so on. Some of the main works related to thin-plate spline technique are Duchon [
2,
3], Bookstein [
4], and Chen et al. [
5], while in the context of statistical modeling, Wahba [
6], Green and Silverman [
7], Wood [
8], and Moraga et al. [
9], can be mentioned, among others.
However, it is well known that diagnostic analysis is a fundamental process in all statistical modeling for any data set. This analysis allows us to validate the assumptions established about the model in question and identify discrepant observations, and eventually influential ones on the fit of the model. One of the main diagnostic techniques used in GLM and SGLM is local influence. In general, the idea of the local influence technique introduced by Cook [
10] is to evaluate the sensitivity of the MLEs when small perturbations are introduced in the assumptions of the model or in the data, both in the response variable and in the explanatory variables. This technique has the advantage, regarding the case elimination technique, that it is not necessary to calculate the estimates of the parameters for each case excluded. In our case, we are interested in developing the local influence technique in the TPS-GLM, in order to detect observations that may have a disproportionate influence on the estimators of both the parametric (regression coefficient) and non-parametric (surface) part of the linear predictor. Such influence may be due, for example, to the fact that each experimental unit contributes differently to the model or that our variable of interest is exposed to a certain modification. In the context of GLM and SGLM, there is empirical evidence that the maximum likelihood estimators (MLEs) and maximum penalized likelihood estimators (PMLEs) are sensitive to this type of situation, and therefore we believe that this sensitivity is also present in the estimators of the TPS-GLM, in particular, in the surface estimator.
Various studies have expanded upon the technique of local influence within different parametric models. Thomas and Cook [
11] applied Cook’s method of local influence [
10] to generalized linear models to assess the impact of minor data perturbations. Ouwens and Beger [
12] obtained the normal curvature under a generalized linear model in order to identify influential subjects and/or individual observations. Zhu and Lee [
13] developed the local influence technique for incomplete data, and extended such results to generalized linear mixed models (see also Zhu and Lee [
14] for further details). Espinheira et al. [
15] extended the local influence analysis to beta regression models considering various perturbation scenarios. Rocha and Simas [
16] and Ferrari et al. [
17] derived the normal curvature considering a beta regression model whose dispersion parameter varies according to the effect of some covariates. Ferreira and Paula [
18] developed the local influence approach to partially linear Skew Normal models under different perturbation schemes, and Emami [
19] evaluated the sensitivity of Liu penalized least squares estimators using local influence analysis. Most recently, Liu et al. [
20] have reported the implementation of influence diagnostics in AR time series models with Skew Normal (SK) distributions.
Within a semiparametric framework, Thomas [
21] developed diagnostics for local influence to assess the sensitivity of estimates for the smoothing parameter, which were determined using the cross-validation criterion. Zhu and Lee [
14] and Ibacache-Pulgar and Paula [
22] introduced measures of local influence to analyze the sensitivity of maximum penalized likelihood estimates in normal and partially linear Student-t models, respectively. Ibacache-Pulgar et al. [
23,
24] explored local influence curvature within elliptical semiparametric mixed models and symmetric semiparametric additive models. Subsequently, ref. [
25] and Ibacache-Pulgar and Reyes [
26] further extended local influence measures to normal and elliptical partially varying-coefficient models, respectively. Ibacache-Pulgar et al. [
27] developed the local influence method within the context of semiparametric additive beta regression models. Meanwhile, Cavieres et al. [
28] calculated the normal curvature to assess the sensitivity of estimators in a thin-plate spline model that incorporates skew normal random errors. Jeldes et al. [
29] applied the partially coefficient-varying model with symmetric random errors to air pollution data from the cities of Santiago, Chile, and Lima, Peru. In this context, they carried out an application of the local influence technique to detect influential observations in the model fit. Saavedra-Nievas et al. [
30] extended the local influence technique for the spatio-temporal linear model under normal distribution and with separable covariance. Recently, Sánchez et al. [
31] obtained the normal curvature for the varying-coefficient quantile regression model under log-symmetric distributions, and presented an interesting application of such results to an environmental pollution data set.
In this work, we extend the local influence approach in Thin-Plate Spline Generalized Linear Model.
The contents are organized as follows:
Section 2 introduces the thin-plate spline generalized linear model.
Section 3 details the method for obtaining maximum penalized likelihood estimators and discusses some statistical inferential results. In
Section 4, we provide a detailed description of the local influence method and derives normal curvatures for various perturbation schemes. In
Section 5, the methodology is illustrated using two datasets. The paper concludes with some final observations in
Section 6.
6. Concluding Remarks and Future Research
In this work, we study some aspects of the Thin-Plate Spline Generalized Linear Models. Specifically, we derive an iterative process to estimate the parameters and the Fisher information matrix to approximate, through its inverse, the variance–covariance matrix of the estimators. In addition, we extended the local influence method, obtaining closed expressions for the Hessian and perturbation matrices under cases-weight perturbation and additive perturbation of the response variable. We performed a statistical data analysis with two real data sets of the agronomic and environmental area. The study showed the advantage of incorporating a smooth surface to model the joint effect of a pair of explanatory variables or the spatial effect determined by the coordinates. In both applications, it was observed that the adjusted values of the response variable were consistent. In addition, it was observed that our model presented a better fit to model the soybean yield and ozone concentration data, compared to some classic parametric and semiparametric models, respectively. In our analysis, it was found that those observations detected as potentially influential generated important changes in the estimates, but not significant inferential changes. In addition, our study confirms the need to develop the local influence method to evaluate the sensitivity of maximum penalized likelihood estimators and thus determine those observations that can exert an excessive influence on both the parametric and non-parametric components, or on both.
As future work, we propose to incorporate a correlation component in the model and extend the local influence technique to other perturbation schemes, mainly on the non-parametric component of the model.