**1. Introduction**

Financial Transmission Rights (FTR) is an energy derivative that allows market participants to receive an annual or monthly share of congestion cost revenues collected in settled electricity prices, or locational marginal price (LMP), by Independent System Operators (ISO) [1,2]. The ISO is a third-party organization to ensure electric systems reliability in generation resources and transmission lines. Congestion cost at a pricing location is a price difference between the least expensive electricity available in the ISO region and more expensive options due to transmission system constraints.

FTR holders are paid a congestion cost difference (credit) settled on a transmission path when it is positive (prevailing flow FTR) and must pay the difference (charge) when negative (counterflow FTR). The above FTR is called the FTR Obligation, compared to the FTR Option where FTR Option holders do not have to pay the difference even when the settled value is negative [1]. As of April 2020, FTR Option products are not available in ISO New England (ISO-NE) in the U.S., but exist in other electric markets, such as the Pennsylvania–New Jersey–Maryland Interconnection LLC (PJM Interconnection).

As FTR values are derived from the pairs of pricing locations in an electric market, the possible combinations of FTR paths could be very large, providing multiple FTR bidding opportunities for market participants. For example, ISO New England has an existing generating capacity of 31,200 MW from 1976 generators in its six member states, with a generation mix of natural gas (40%), nuclear (25%), net imports (19%), renewable (9%), and hydro (7%) [3]. According to the Day-Ahead Energy Market Hourly LMP Report for 14 April 2020 published by the ISO-NE, the total number of pricing locations in the region is 1192, comprising 1125 network nodes, 32 hub nodes, 20 demand response locations, 8 load zones, 6 external nodes, and 1 hub. The total pricing locations could theoretically be translated into 1.4 million FTR paths in the ISO auction for prevailing flows, and when added by counterflows, the total number could be doubled.

There are two types of FTR market participants: hedgers and speculators. Hedging participants, with electricity supply obligations, want to hedge against congestion costs by purchasing FTRs on the paths from their supply sources to their customer load zones, while speculating participants, without any physical supply obligations, may purchase FTRs to arbitrage di fferences between expected and actual settled values of FTR paths [1]. With such numerous choices of potential FTR paths available, FTR market participants need to reasonably evaluate which FTR paths to bid in the auctions, subject to their limited capital budgets. Consistent and standardized methodology is crucial in evaluating interested paths, in terms of profitability and risks associated with the paths.

Item Response Theory (IRT) is one of the most influential methods in the field of educational and psychological measurement, to understand the behaviors of individual test items or variables [4]. IRT models provide information about item parameters and latent traits of test respondents, helping gain insights and assessments about their performance as well as the items. It is also useful for test development, item analysis, equating, item banking, and computer aided test (CAT) [5]. As a group of statistical models with probabilistic and stochastic procedures, IRT connects the pattern of responses to a group of items to predict a latent trait/ability, and then, converts discrete item responses into the levels or locations of probability estimates which respondents possess underlying the latent trait [6,7].

The most basic model is the One-Parameter Logistic model (1PL), or the Rasch model named after Georg Rasch, a Danish mathematician. In the model, the probability of correct response (denoted as *Xi* = 1) to each item (labeled *i*) is a function of the item's di fficulty level (labeled *bi*) and the respondent's trait level (labeled θ), with a mathematical expression as in Equation (1) [7]:

$$P\_i(\theta) = P\_i\left(X\_i = 1 | \theta\right) \\
= \frac{\mathbf{e}^{(\theta - b\_i)}}{1 + \mathbf{e}^{(\theta - b\_i)}} = \frac{1}{1 + \mathbf{e}^{-(\theta - b\_i)}} \tag{1}$$

In the Equation (1), *X*i = 1 indicates that a respondent endorsed an item i or provided a correct response. A horizontal line at *Pi*(θ) = 0.5 on the y-axis in Figure 1 denotes a mid-probability of the correct response to item *i* being correct. That is, it indicates that the respondent has 50% chance of providing a correct response to the item. The di fficulty coe fficient (*b*i) of an item are the value of a latent trait level (θ) on the x-axis which is an intersection point between the mid-probability, shown on the horizontal line, and an individual characteristic curve (ICC) of the item. Figure 1 illustrates three items in Rasch 1PL model, with the values of di fficulty coe fficients with *b*i = −1, 0, and 1. Both the latent trait level (θ) and di fficulty coe fficient (*bi*) are on the same z-score metric, with the latent trait level (θ) typically in the range of [−2, 2] [6].

IRT models are logistic regression models to predict dichotomous, or binary, outcomes, with a monotonically increasing S-shaped curve, called the Item Characteristic Curves (ICCs) [8]. ICCs display the relationship between a latent trait level and the probability of correct response. Figure 1 illustrates three ICCs of probability of correct response ( *Pi* (θ)), with assumptions of three items' di fficulty parameters of −1, 0, and 1, respectively, given a range of the latent trait levels [−4, 4]. This paper will use the term latent trait and ability interchangeably in describing the IRT and its application to FTR path evaluation in the U.S. electricity market.

**Figure 1.** Item characteristic curves, Item Response Theory (IRT) 1PL model.

The ICCs in Figure 1 may be interpreted such that an item is more difficult to the right of the chart, and easier to the left of the chart, and that low *Pi*(θ) of the item implies highly unlikeliness of providing correct responses at a given latent trait level [7]. For example, in Figure 1 at the level of −1 for the latent trait (θ), the probability of correct response varies with each of the three items, 50%, 27%, and 12% as indicated on a vertical line on the Figure 1.

While IRT is popular in the psychometrics discipline, there have been several studies on the applications of the IRT theory in the fields of health behavior research [6,7], as well as in financial literacy measurement [9]. The Two-Parameter Logistic model (2PL) is an extension of the Rasch 1PL model, with additional parameter of an item's discrimination parameter. Item discrimination represents how well an item differentiates among respondents on the latent trait continuum, e.g., differentiating respondents of different ability levels. The IRT 2PL model and ICCs will be discussed in greater detail later in this study.

In summary, the IRT 2PL model provides analytical advantages in terms of parameter parsimony, easier parameter interpretation, distinguishability among multiple items, and visual presentation. The parameters, difficulty (b*i*) and differentiation (a*i*), may be derived from historical data observation, and the magnitude of the parameters may be used to easily interpret how difficult or different each item is relative to other items. The parameters also provide a foundation to build the probability function of a right response (Pi (θ) or Pi(Xi = 1|θ) in visual presentation of ICCs on a level of latent trait variable (θ). When there are multiple opportunities available in the marketplace, essential decision-making factors involve estimation and comparison of their return and risk profiles. With such capability, IRT may be applied to FTR markets, where the astonishing number of paths are available, 1.4 million paths in U.S. New England ISO alone, and a consistent and standardized evaluation model is required for FTR participants to understand return and risk profiles of path they are interested in.

This paper is the first experiment to apply IRT, particularly the IRT 2PL model, to the U.S. energy market, in evaluating and selecting the FTR paths to bid in market auctions. This paper is organized as follows: Section 2. Literature Review; Section 3. Data and Methodology; Section 4. Results and Discussion; and Section 5. Conclusion and Implications.

### **2. Literature Review**

### *2.1. Financial Transmission Rights*

The first FTR auction took place in 1999, in the PJM Interconnection in the U.S. In the auctions, ISOs have a goal of maximizing FTR revenues, subject to the constraints of transmission capacity and contingencies [10]. Electric suppliers calculate FTR values of the paths to bid, based on their own forecasts of future LMP prices in the interested locations. The FTR calculation may have analytical frameworks of game theoretic models, with multiple participants, or network contingencies in the ISO systems [11,12].

FTRs are defined in U.S. Dollar (\$) per mega-watts (MWs), from a source (receipt, inject) pricing point to a sink (delivery, withdrawal) pricing point on a transmission line path. In New England, FTR products are offered in monthly and annual auctions, for two categories, onpeak hours (weekdays hours ending 0800–2300) and offpeak hours (weekdays hours ending 2400–0700, and 24 h on weekends and NERC holidays). Available pricing locations in ISO-NE include generator nodes, external nodes, hub (specified set of predefined pricing nodes), load zones (aggregate of pricing nodes in a specific area), and DRR (demand response resources) aggregate zones [13]. Figure 2 presents a flow chart that summarizes typical FTR auction procedures involving several entities in terms of exchanging data and information [13].

**Figure 2.** Financial Transmission Rights (FTR) auction flow chart.

FTR auction results in ISO-NE provide the magnitude of the auctions and major FTR participants [14]. For the April 2020 auction, a total of 5625 FTR paths were cleared, with onpeak at 2719 and offpeak at 2906, on a total of 25,586 MWs. There was a total of 28 FTR winners, with the top five companies accounting for 71% of total MWs cleared: NextEra Energy Marketing, Vitol, MAG Energy Solutions, Exelon Generation, Transgrid Midwest. In the annual FTR auction for 2020, the number of cleared FTR paths was 3348 (onpeak 1593 and offpeak 1755), on a total of 17,514 MWs, onpeak and offpeak combined. The top five participants accounted for 75% of total MWs cleared: Vitol, Mercuria Energy America, Castleton Commodities Merchant, Citigroup Energy, NextEra Energy Marketing.

ISO operates the wholesale electricity market that consists of two markets, Day-Ahead Market (DAM) and Real Time Market (RTM). DAM is a forward sport market, where DAM LMP are settled in day-ahead auctions. Generators submit offers, and customer loads submit bids to the ISO with hourly MWs for each hour of the next day. The ISO calculates a nodal price, or a locational marginal price (LMP) of a location, based on all the submitted offers and bids, subject to the Lagrange multipliers, or constraints of active power balance and transmission [10].

Settled LMP is made up of three components, energy, congestion cost, and loss [15]. Congestion cost is created by binding constraints of transmission lines and generation resources in the auction, resulting in incremental cost for some points and different LMPs. FTR is the difference in the congestion cost component of the LMPs between two locations. RTM is a balancing market to DAM that addresses

actual power systems and generated MWs. Energy sellers in DAM ge<sup>t</sup> paid real time prices for the MWs generated in real time over the MWs sold in day-ahead market with cleared DAM prices [16].

There is uncertainty in future LMPs that may be significantly different from the LMPs that FTR market participants expected when estimating the values of FTRs for bidding. In this context, FTR holders may face the risks of liability at the time of LMP settles, due to counterflows on the awarded paths, unexpected outages, severe economic congestions, and transmission losses [17]. Load serving utilities (LSE) or suppliers to retail customers are exposed to major risks of LMP, comprising energy (fuel), transmission constraint cost at a given time, and line losses. The transmission constraint cost may be called a transmission opportunity cost, or difference between the clearing LMPs on a given path [18]. Other reliability services costs include capacity for adequate resources, and ancillary services to maintain the electric systems.

In this context, the biggest challenge for FTR participants is how to simulate market participants' behavior [10], as well as to calculate expected FTR payoffs and financial risks associated with the FTR path [19]. A realistic view is that it is practically difficult to formulate all the electricity prices and market behaviors, given thousands of pricing nodes available in the ISO [11]. Due the complexity of predicting clearing LMPs and estimating the value of FTR derivative product, the research studies that addressed FTR bidding strategies usually involved simulation approaches or problem formulation with a limitation of two to four pricing nodes [10,19].

There have been some studies related to FTR from bidder and generator standpoints, but none of them addressed the question of how to evaluate and select FTR paths to bid among multiple choices. Hogan [18] also noted that the U.S. energy market design has been successful with bid-based LMP and FTRs, but still has remaining challenges with both theory and implementation.

Li and Shahidehpour [19] illustrated a three-bus system with four FTR bidders, subject to the ISO's goal of maximizing auction revenues, as well as the impacts of transmission constraints, forecasted LMP differences, and bidder's risk tolerance on FTR bidding strategies. Das et al. [12] experimented with a matrix-game model to analyze FTR bidding strategies. This study involved multiple FTR participants on a sample network, with assumptions that bidders have forecasts of LMPs, and assessed impacts of various bid prices.

From a power plant generator's perspective, Liu and Wu [11] investigated an FTR position by exploring the interaction between generator's bidding and transmission rights holding. The study suggested that transmission rights helped reconfigure a generator's behavior in bidding their electricity into the ISO. Liu and Gross [20] proposed a way, based on simulation approaches, to integrate bi-lateral transaction with a centralized pool market, or ISO, for the efficient allocation of transmission services affecting FTR evaluations.

### *2.2. Item Response Theory in Psychometrics*

IRT model has been developed as a new way of data analysis for categorical data to measure a latent trait variable (also called ability, denoted as θ), as well as to model the item responses (Xi) of respondents. The data may be dichotomous (binary) or polytomous [4,21]. Major assumptions in the IRT models include monotonicity of the latent trait variable and the probability of an item correct response, unidimensionality of measuring one single latent ability with a set of items, and local independence among the item responses.

There are basic IRT parameter logistic models: Rasch 1PL (1-parameter logistic model) and 2PL (2-parameter), depending on the number of parameters used in modeling for items, and a parameter of a single latent variable underlying the item responses of a respondent [21]. Two basic parameters in IRT are item difficulty (*bi*, location index), and item discrimination (*<sup>a</sup>i*, differentiation). A latent trait (ability, θ) parameter is a construct or a factor measured by the item responses.

Further to the Rasch 1PL model described in Equation (1) and Figure 1, 2PL model is introduced here as an extension of 1PL. The IRT 2PL model has one more parameter, discrimination (*<sup>a</sup>i*), than the Rasch model, and may be interpreted in a way that the higher discrimination (*<sup>a</sup>i*) of an item is, the more discriminating the item is with a steeper slope on an ICC. Conversely, a flatter ICC of an item indicates the item is less likely to discriminate among respondents than other items. The discrimination coe fficient (*<sup>a</sup>i*) typically takes the value of [−0.5, 2] [6].

The IRT 2PL model is expressed as in Equation (2):

$$P\_i(\theta) = P\_i\left(X\_i = \mathbf{1} \mid \theta\right) = \frac{e^{a\_i(\theta - b\_i)}}{1 + e^{a\_i(\theta - b\_i)}} = \frac{1}{1 + e^{-a\_i(\theta - b\_i)}}\tag{2}$$

where exp (e) is the constant 2.718, bi = di fficulty parameter for an item i, ai = discrimination parameter for an item i, and θ = ability level.

Figure 3 presents an illustration of 2PL-based item characteristic curves (ICCs) for three items, built on Equation (2). It displays the impacts of varying discrimination coe fficients of *a* = 0.5, 1, and 2 for three items, given a latent trait level, and the same di fficulty coe fficient (*bi*) of 0 for all items. Item discrimination parameter represents a slope on an inflection point of each ICC. As the discrimination coe fficients (*<sup>a</sup>i*) describe a sharp distinction between respondents to each item, the corresponding latent trail levels (θ) also vary in a given range of probability of correct response (P(θ)) from 25% to 75%, as referenced in two horizontal lines. When discrimination coe fficient (*<sup>a</sup>i*) = 0.5, the range of the trait (θ) is [−2.1, 2.1], at *a* = 1, the range is [−1.3, 1.3], and at *a* = 2 the range is narrower to [−0.8, 0.8]. The results indicate that a greater discrimination coe fficient (a*i*) produces a tighter range of a latent trail level (θ) with a steeper slope, and more discrimination power among the respondents between lower and upper groups.

**Figure 3.** Item characteristic curves, IRT 2PL model: di fficulty coe fficient b = 0 for all items.

While there are more advanced IRT models to add a feature of weighting scores in survey responses in education and psychology [22], there have been applications of IRT in other disciplines of health care and financial sectors. Hays et al. [6] experimented IRT models in health outcome measurements. Their study involved analysis of the 9-item measure from study participants in the HIV Cost and Services Utilization Study (HCSUS) [23]. The 9-items included physical functioning, including basic activities, instrumental activities, and mobility, in the past four weeks, and interpreted the di fficulty parameter (b*i*) and discrimination parameter (a*i*) of each activity in terms of physical functioning level and distinction among the activities.

Warne et al. [7] introduced the IRT 2PL model to health behavior research on life-time substance-use data from the Adolescent Risk Health Behavior Questionnaire [24]. The study analyzed 1,360 responses on health behaviors related to 23 substance items, i.e., alcohol, tobacco, and other drug use, and interpreted di fficulty parameter (b*i*) of each substance as the likelihood that a respondent had tried it. The di fficulty parameters (b*i*) of the substances helped identify two groups of substances that respondents are likely to endorse. Himefarb [25] also introduced IRT models to chiropractic and health educators as a standard way of standardized assessments in their practice

In financial sector, Knoll and Houts [9] developed a measure of financial knowledge components in financial literacy by applying the IRT 2PL to narrow down the items from three national surveys: the ALP, a RAND-managed Internet-based panel, The Health and Retirement Study (HRS) conducted by the University of Michigan since 1992, and The National Survey (NS-NFCS) portion of the 2009 National Financial Capability Study. The study suggested that the index based on their selected twenty items would be useful to compare financial knowledge among programs and populations.

### **3. Data and Methodology**
