*3.4. Summary*

These three topics highlight how the entire literature on a topic can go decades without recognizing likely sources of bias. I cannot speak to how extensive this is among the many big topics in empirical economics, but there must be other topics that have had similar problems. For example, there is a literature on how the state unemployment rate (or other measures of local economies) affects various health or social outcomes—I have had several articles in this literature. I cannot recall one (including mine) that recognized that using state fixed effects (or controls) exacerbates any bias from measurement error in the economic measure, which would likely cause attenuation bias—a lesson I learned far too late into my career. This is not terribly harmful in this case, as at least it is a bias against finding significant estimates, but it has been an unrecognized bias, nonetheless.

### **4. Recommended Changes and New Topics for Graduate Econometrics**

In this section, I propose seven new recommendations for redesigning graduate econometrics courses, most of which follow from the premises from Section 2. However, first, I contend that the first two of Angrist and Angrist and Pischke (2017) recommendations for changing undergraduate econometrics would work well for changes to graduate econometrics. These recommended changes are:


The first is consistent with basic tenets of pedagogical theory, as the practice of some skill to learn about certain concepts can be much more instructive than learning the abstract equations underlying the concepts. The second one is important for avoiding potential biases, and it goes hand in hand with my first new added recommendation below.

What follows are my new recommended changes. These are the new components, changes in pedagogy, and shifts in emphasis that should help to develop effective and responsible academics and practitioners. The recommended changes and shifts I will discuss are:


A. Increase emphasis on some regression basics ("holding other factors constant" and regression objectives)

These two concepts of "holding other factors constant" and the various regression objectives are important building blocks needed to understand when there could be potential bias to a coefficient estimate and for determining the optimal set of control variables to use—Angrist and Pischke's second point. In addition, together they should help foster understanding of why modeling strategies should be different depending on the objectives of a regression analysis.

I believe it is commonly assumed that students will understand "holding other factors constant" from the few pages, if that, devoted to the concept in textbooks. However, in my view, this is usually not the case. Lessons on this topic should include a discussion of the purpose of holding other factors constant, a demonstration of what happens when you do so, and in what circumstances would you not want to hold certain factors constant. In Arkes (2019), I describe a simple issue of whether adding cinnamon to your chocolate-chip cookie improves the taste. In this example, I ask which is the better approach: (1) make two batches from scratch, adding cinnamon to one; or (2) make one batch, split it in two, and add cinnamon to one of them. Most would agree that the second would be a better test because you do not want any other factor that could affect the outcome of taste (butter, sugar, and chocolate chips) to vary as you switch from the no-cinnamon to cinnamon batch, i.e., you want to hold those other factors constant. This is the point of multivariate models: design the model so that the only relevant factor that changes is the treatment or key explanatory variable. That said, with interval (quantitative) variables, it is impossible to perfectly control for the variable, and so perhaps the best that can be said is that one is attempting to adjust for the variable.

In Arkes (2019), I describe what I believe are the four main objectives of regression analyses: (1) estimating causal effects; (2) forecasting/predicting an outcome; (3) determining predictors for an outcome; and (4) analyzing relative performance by removing the influence of contextual factors, which is similar to the concept of "anomaly detection."I proceed to describe how the choice of

control variables (what should be held constant) should depend on the objective. For example, a causal-e ffects model might attempt to estimate the e ffect of a college degree on the probability of getting in a car accident in a given year. An insurance company, on the other hand, might be more interested in predicting the probability of a person getting in a car accident—the second objective above. One potential control variable in both analyses would be whether the person has a white-collar job. That could be a mediating factor (a "bad control") for how a college degree a ffects the probability of an accident, so it would be best to exclude that variable in the causal-e ffects analysis. However, the insurance company might find that variable to be a valuable contributor to obtain a more accurate prediction of the probability of an accident. The insurance company does not care about obtaining the correct estimate of how a college degree a ffects the likelihood of an accident. Likewise, forecasting GDP (or Gross State Product, GSP) growth would involve a di fferent strategy from that for estimating the e ffects of tax rates on GDP/GSP growth. In these cases of predicting an outcome or forecasting, including explanatory variables is not meant to hold other factors constant but rather to improve the prediction/forecast.

Some textbooks, e.g., Greene (2012), indicate that the *adjusted R*<sup>2</sup> could be used as part of the "model selection criteria". However, any measure of goodness-of-fit would primarily be useful for determining whether a variable should be included for forecasting/prediction. For estimating causal effects, whether a potential control variable contributes to explaining the dependent variable should not be a factor in determining whether it should be included in the model. These are just a few examples of why understanding the objective of the regression is important.

### B. Reduce emphasis on getting the standard errors correct

This was a passing point by Angrist and Pischke (2017). However, in my view, it deserves status as one of the main recommendations. The justification for this recommendation is partly based on one of the premises from Section 2: that biases to standard errors are typically minimal compared to the potential biases to coe fficient estimates. To this point, Harford (2014) argues that sampling bias can be much more harmful than sampling error, as demonstrated by the 1936 Literary Digest poll that found a 55-41 advantage for Landon over Roosevelt in the Presidential election. The 2.4-million sample size (and tiny standard errors) did not matter when there was sampling bias. This idea goes back to Leamer (1988), who argued that corrections for heteroscedasticity are mere "white-washing" if there is no consideration of the validity of the coe fficient estimates.

Further justification for reducing the emphasis on corrections for standard errors comes from the vagueness of the *p*-value and statistical significance. Getting the standard errors correct is typically meant to make proper confidence intervals or correct conclusions on hypothesis tests, which are usually based on t-stats or *p*-values meeting certain thresholds. However, as I learned not too long ago (and far too late into my career), the *p*-value by itself actually has little meaning, given the Bayesian critique of *p*-values. This is discussed by Ioannidis (2005), who points out that the probability that an empirical relationship is real depends on: (1) the t-statistic; (2) the *a priori* probability that there could be an empirical relationship; and (3) the statistical power of the study (and this depends on the probability of a false negative and requires an alternative hypothesized value). The *p*-value is based just on the first one, the t-statistic. The less likely there is such a relationship, *a priori*, the less likely any given t-statistic indicates a significant relationship, as Nuzzo (2014) demonstrates. For example, even for an *a priori* toss-up (50% chance there is a relationship), *p*-values of 0.05 and 0.01 translate to only 71% and 89% probabilities that the relationship is real.

Unfortunately, it is nearly impossible to know beforehand what the probability is that there is an effect of one variable on another. This uncertainty means that higher levels of significance than is the current convention would be needed to make any strong conclusions about statistical relationships being real. Given the vagueness of the *p*-value and that high levels of significance should be used to make any strong conclusions, errors in the standard errors would tend to be much less impactful to those conclusions than would potentially much larger biases in coe fficient estimates.

Correcting standard errors for heteroskedasticity and clustering is still important, ye<sup>t</sup> it is easy to recognize when it is needed and typically takes only a few characters of code to correct for. Recognizing and addressing biases to coefficient estimates is more difficult and takes much more practice to become proficient, and so greater emphasis should go towards those concepts.

### C. Adopt new approaches to teach how to recognize biases

As I described above, I do not believe that "conditional mean dependence of the error term" is an effective concept to teach how to recognize biases. I believe that calling a source of bias what it is (e.g., reverse causality) rather than what it does (conditional dependence of the error term) is a good starting point. I believe it would be more effective if we were to list the most common sources of bias, provide some visual depictions of the biases (when possible), and give examples of the various types of situations in which they might arise. In Arkes (2019), I list what I believe are the six most common biases for coefficient estimates when estimating causal effects: reverse causality, omitted-variables bias, self-selection bias, measurement error, and including mediating factors or outcomes as control variables. In addition, I give guidance on how to recognize such biases. These are the main *alternative stories* that need to be considered before making conclusions from results. (I since added a 7th bias, from *improper reference groups*2.) Useful visual depictions could be the "directed acyclical graphical" (DAG) approach (Pearl and Mackenzie 2018; Cunningham 2018), basic flowcharts (Arkes 2019), and animations produced by Nick Huntington-Klein on his website: http://nickchk.com/causalgraphs.html. These tools demonstrate when there could be bias and what needs to be controlled for.

As an example of using visualizations (with flowcharts), let us take again the research issue from Section 3 on how occupation-specific bonuses affect retention decisions in the military:

$$\mathcal{R}\_{\rm iso} = \beta\_1 \times (\text{BONUS})\_{\rm iso} + \chi\_{\rm iso}\beta\_2 + \mu\_{\rm o} + \varepsilon\_{\rm io}\mu\_{\rm o}$$

Figure 1 demonstrates the concept of reverse causality and omitted-variables bias. An arrow in such a pictorial representation of a model would represent the causal effect of a one-unit change in the pointing variable on the pointed-to variable. The objective would be to estimate **A**, the average causal effect of the occupational-specific bonus on the probability that a serviceperson reenlists. We hope that β ˆ 1 is an unbiased estimate of **A** in Figure 1. However, β ˆ 1 captures all the reasons why the bonus and the retention decision might move together (or not), after adjusting for the factors in X.

**Figure 1.** A visual representation of reverse causality.

To determine whether there is any potential bias, it does not require a formal theoretical model with assumptions on what factors affect what other factors. Rather, one would start by considering the

<sup>2</sup> The new source of bias on improper reference groups is available under the "eResources" tab at https://www.routledge.com/ Regression-Analysis-A-Practical-Introduction-1st-Edition/Arkes/p/book/9781138541405 or at https://tinyurl.com/yytmnq65.

reasons why the bonus and retention variables move (or do not move) together, other than the bonus affecting the retention decision. This would also determine what needs to be controlled for.

The question to ask for reverse causality is whether the probability of reenlistment could a ffect the bonus, represented by the arrow labeled **B** in Figure 1. It very likely could, as a decrease in the probability of reenlistment for people in a certain occupation (due perhaps to increases in civilian labor market demand for the skill or increases in the deployment rates for the occupation) would cause the military service to have to increase the bonus; and an increase in the probability of reenlistment would allow the service to reduce the bonus.

Because **B** is likely negative, there would be a negative bias from the reverse causality on the estimated e ffect of the bonus on the probability of reenlistment. This bias would cause βˆ 1 to be lower than the value of **A** in Figure 1. (It requires much deeper and more-convoluted thought to determine the sign of the bias from an argumen<sup>t</sup> based on conditional mean dependence of the error term.) Thus, we would have an alternative story for why the estimated e ffect of the bonus is what it is—i.e., alternative to the causal-e ffects story. Attempts to address this with fixed e ffects would need to make sure that within the fixed-e ffects group, there still would not be any potential reverse causality (or omitted-variables bias).

For omitted-variables bias, in my experience, students have a hard time thinking of whether any variable might a ffect both the treatment and the outcome. Therefore, I found it to be more e ffective to use three steps: (a) What factors are the main drivers of why some have high vs. low (or 1 vs. 0) values of the treatment? (b) Which of those (if any) can you not adequately control for? (c) Could any of those factors a ffect the outcome beyond any e ffects through the treatment?

In Figure 1, we would need to think of what causes variation in the bonus in the sample. I imagine the list would include the occupation, the year (and factors specific to a given year, such as national economic conditions), the particular demand for the skills of servicepersons in a given occupation, and the working conditions for those in that occupation—such working conditions could change over time, and they would probably be rougher (say, more negative in theory) during periods of wars or increased deployments. All of these factors could a ffect the outcome beyond any e ffects through the bonus, and so if not controlled for, they would cause omitted-variables bias. I demonstrate this in Figure 1, with the omitted factor being working conditions for those in the occupation, using an oval to represent that we do not have a measure for it. Therefore, if we cannot adequately control for this, then better working conditions for an occupation in a given year would negatively a ffect the bonus (directly or indirectly through higher retention, leading to reverse causality) and positively a ffect retention, so **C** < 0 and **D** > 0. Thus, not adequately controlling for working conditions (and other things that could impact both the bonus and retention) for the occupation would lead to a negative omitted-variables bias for βˆ 1 (the product of **C** and **D** in Figure 1 would be negative). These are perhaps the most common sources of bias, and they follow directly from such a figure. However, there are other sources of bias, such as measurement error, that need to be considered.

Figure 2 demonstrates another type of omitted-variables bias based on the research issue of how the state unemployment rate (representing the strength of the labor market) a ffects marijuana use for teenagers. Therefore, **A** is the true average e ffect of a one-percentage-point increase in the state unemployment rate on the probability of a teenager using marijuana.

The problem is that whereas there is probably not any general factor that systematically a ffects both the state unemployment rate and teenage marijuana use, it still could be that states that have a higher general propensity to use marijuana (outside the influence of the economy) tend to have higher or lower unemployment rates, but not due to any systematic relationship. Therefore, whereas the occupational bonus and retention propensity for an occupation, in Figure 1, might have "spurious correlation" (due to a systematic relationship) contributing to why the variables move together, the state unemployment rate and propensity for teenage marijuana use might have "incidental correlation" that contributes to why they move together. If so, this would cause omitted-variables bias. (Line **C**, without arrows, is indicative of an incidental correlation that does not have an underlying systematic relationship).

**Figure 2.** A visual representation of omitted-variables bias from incidental correlation.

Alternatively, you could put a specific state (say, California) as the potential omitted factor at the bottom of the figure and then have an arrow pointing from the California variable to both the unemployment rate (a positive effect, as California tends to have higher unemployment rates than the U.S.) and teen marijuana use (I am guessing positive). In this case, there would be positive omitted-variables bias from not controlling for California. For other states, it could be different. And, as a whole, it is quite possible that either the negative or positive biases could dominate the other, leading to a non-trivial bias in the estimate.

In this case, controlling for the states (with dummy variables or state fixed effects) would help towards addressing this problem. However, based on the concept mentioned at the end of Section 3, using fixed effects (or, controlling for a categorization) when the treatment has some error could cause greater bias from measurement error. In this case, a higher proportion of the usable variation in the state unemployment rate within states would be due to measurement error. (That is, in the ratio of variation due to measurement error divided by overall usable variation, state fixed effects reduces the denominator significantly but does not reduce the numerator.)

Let me give two tangents. First, omitted-variables bias is not a problem if the regression objective is forecasting or determining predictors of an outcome—this provides an example of how an understanding of regression objectives (recommendation A above) is important. Second, Arkes (2019) notes that the conventional definition of omitted-variables bias needs some modification. Note that in Figures 1 and 2, the correlation between the treatment and the omitted variable is based on the omitted variable affecting the treatment or incidental correlation. If, on the other hand, the omitted variable were a mediating factor and were affected by the treatment, then there would not be omitted-variables bias by excluding the variable; rather, there would be bias by *including* the variable. Therefore, the conventional definition that an omitted variable is correlated with the treatment and affects the outcome needs to add as a condition that the correlation is not solely due to the treatment affecting the omitted variable. Seeing this in a flow-chart demonstration can help with this concept.

In another visual lesson to recognize the direction of the biases, the bias from non-differential measurement error can be demonstrated with a simple bar graph of an outcome (say, income) for two groups (no-college-degree and college-degree). One can then easily see what would happen to the difference in income if people ge<sup>t</sup> randomly misclassified as to whether they have a college degree. The two averages would converge, and the estimated effect of a college degree would be biased downwards, at least from measurement error.

Finally, any teaching of how to recognize biases would be served well by having numerous examples to apply the concept to. This is consistent with the lessons from the book, *How Learning Works* (Ambrose et al. 2010), in which the authors argue that mastery of a subject requires much practice applying the topic and knowing when to and which topic to apply to a new situation. This is also consistent with higher levels of understanding, based on Bloom's Taxonomy. Furthermore, seeing research mistakes in action could provide meaningful lessons. And, dissecting media reports on research and gauging how trustworthy that research is (based on simply reading the media report) might be worthwhile for developing intuition on scrutinizing research.

D. Shift focus to the more practical quasi-experimental methods

This recommendation actually is in the same spirit as but diverges from the third of Angrist and Pischke (2017) recommendations. They espouse a shift in the focus of econometrics classes to randomized control trials (RCT) and quasi-experimental methods. One method they mention is the Regression-Discontinuity (RD) approach, which has appeared to become the new favorite approach for graduate students. This strategy parallels an earlier article (Angrist and Pischke 2010) and their book (Angrist and Pischke 2009).

However, I would argue there could be a better approach. The methods Angrist and Pischke espouse are more for academics who can search for randomness or a discontinuity and build a topic from that. It is not as e ffective for non-academics and other academics who are trying to address a specific policy question that will probably not a fford the opportunity to apply an RCT or RD to the problems they are given. Furthermore, it limits the usefulness of economists. As Sims (2010) stated:

"If applied economists narrow the focus of their research and critical reading to various forms of pseudo-experimental, the profession loses a good part of its ability to provide advice about the e ffects and uncertainties surrounding policy issues".

Sims (2010) also suggested that many of the quasi-experimental studies have limited scope with regards to the extrapolation of the results. This could occur, for example, due to non-linearities or just the nature of Local Average Treatment E ffects that some quasi-experimental methods estimate. This further limits the usefulness of economics research.

Meanwhile, the less-complicated quasi-experimental methods might be more fruitful for most people conducting economic research and may be less limiting in extrapolation of the estimates. In particular, from my experiences in the non-academic world, a fixed-e ffects model is often the only plausible approach to addressing some potential sources of bias.

Given that the fixed-e ffects method would likely be a more useful tool than RD and other quasi-experimental methods, more emphasis should be placed on the nuances of fixed e ffects. These include many particulars of fixed e ffects that I wish I had learned in graduate school. For example:


Although the RCT is the most valid type of study, it is easy to analyze and few will have the resources to conduct one. The RD approach is a rare occurrence, and it is more for "finding a topic to use the method" than "finding a method for a topic". The fixed-e ffects method is much more widely used, and so shifting focus to the nuances of fixed e ffects would be more practical and useful to most students.

E. Add emphasis on interpretations on statistical significance and *p*-values

Perhaps the most important topic for which interpretations need to be taught better is on statistical significance and insignificance. In a recent article in *Nature*, Amrhein et al. (2019) call for an end to

statistical significance and *p*-values and instead to use confidence intervals. There have been similar calls for teaching statistical analysis beyond the "*p*-value" approach over the last few decades—e.g., Gigerenzer (2004), Wasserstein and Wasserstein and Lazar (2016), and Wasserstein et al. (2019). Wasserstein et al. (2019) said: "Statistics education will require major changes at all levels to move to a post '*p*<0.05' world". However, most textbooks continue to teach hypothesis tests based on the conventional approach that uses *p*-values.

I am aware of only two textbooks that discuss the problems of *p*-values and potential solutions: Paolella (2018) and Arkes (2019). Paolella (2018) points out all the problems with hypothesis tests, the *p*-value, and even the use of confidence intervals. He makes the point that hypothesis tests should not be used, but he notes how the *p*-values still might be useful. A single study with a low *p*-value provides little evidence for a theory or an empirical relationship. However, repeated studies with low *p*-values would provide stronger evidence. This confirms the value of the importance of replications. Furthermore, as Paolella argues, finding a *p*-value of 0.06 on a new drug that could cure cancer does not mean that society should discard any further research on the drug. Rather, the result should be interpreted as "something might be there" and it should be further investigated (Paolella 2018).

One area that could also use better instruction is on the various possible explanations for insignificance. Amrhein et al. (2019) find that over half of 791 articles across five journals made the mistake of interpreting insignificance as meaning that there is no effect—and these do not include the hot-hand studies. Aczel et al. (2018) find an even worse statistic for three leading psychology journals: 72% of 137 studies from 2015 with negative results had incorrect interpretations of those results. What highlights the problem with these interpretations is that an insignificant estimate may still provide more evidence for the alternative hypothesis than for the null (Aczel et al. 2018). Abadie (2020) makes the argumen<sup>t</sup> that an insignificant estimate might have more information than a significant estimate. In addition, there is always the possibility that a biased coefficient estimate has caused the insignificance; and a bias could cause significance when there is no causal effect. Furthermore, as described in Arkes (2019), if a treatment were to positively affect some and negatively affect others, then it could be that an insignificant effect is the average of these positive and negative effects that are, to some extent, cancelling each other out. Thus, it would be improper to conclude that the treatment has no effect based on an insignificant estimate. That said, a precisely estimated coefficient very close to zero ("precise nulls", as some call them), if free from potential biases, could mean that there is evidence for no meaningful *average* effect.

In light of the problems with the traditional *p*-value approach and the misinterpretations of insignificant estimates, lessons from Kass and Raftery (1995) or Startz (2014) on how to calculate posterior odds and on determining the most likely hypothesis would be useful components to the teaching on any statistical testing. Unfortunately, these often introduce an inconvenient vagueness in properly interpreting a hypothesis test. However, it is the proper approach to interpreting statistical tests. In addition, introducing the Bayesian critique should give the important lesson that strong conclusions on an empirical relationship should require quite high levels of significance.

Another important consideration for hypothesis tests would be the costs (loss) from a wrong conclusion. Therefore, such costs should be considered when determining the optimal significance level for the hypothesis test (Kim and Ji 2015; Kim 2020). Adjustments to the optimal significance level should be made for quite large samples. In addition, there should be some discussion on statistical significance for a meaningful effect size (rather than using zero as a baseline effect).

In the end, perhaps the post-(*p* < 0.05) world should be one without hypothesis tests. Even the correct conclusions of "fail to reject" and "reject" (and not including "accept") come across as more conclusive than they actually are. And, they do not account for the potential biases and the practicality of the estimated relationship.

F. Advocate less complexity

The current go-to model for the Department of Defense (DoD) for evaluating the e ffects of various manpower policies (including bonuses) on retention is the Dynamic Retention Model (DRM). This is a complex model that only relatively recently has become able to be estimated, given the huge computing power it requires. Even though I had a (very) minor role in an application of it, I do not have a strong understanding of the model. And, my educated speculation is that no one at DoD funding such studies understands the model neither.

However, I do understand the model enough to know that the DRM is deficient in many ways, as described in Arkes et al. (2019). In retention models, the DRM estimates complex concepts, such as the discount rate and a taste-for-military parameter. However, it fails to control for basic factors that could partly address the reverse causality for bonuses I describe above, such as military occupation, fiscal year, and their interactions (Arkes 2018). Furthermore, the DRM will never be able to address the other problems noted above of measurement error and excess supply, as we only observe whether a person reenlists, not their willingness to reenlist. Thus, the DRM will probably not give a more reliable answer than the simpler and more-direct models. And, in my view, guesses from subject-matter-experts would be more reliable than what any model would tell us.

These empirical challenges are probably not well known to DoD o fficials. Therefore, they appear to be enamored by the complexity of the model. Some may put more faith in complex models. However, the simpler models are often more credible, as they rely on fewer assumptions.

One lesson may come from the history of instrumental-variables models. Early studies tended to not pay much attention to the validity of the instruments. For example, Sims (2010) noted that Ehrlich (1975) research on capital punishment lacked any discussion on the validity of the numerous instruments that were used, such as lagged endogenous variables. Later studies (e.g., Bound et al. 1995) noted the major problems with instrumental variables if assumptions were violated. This is an example of how the problems with complex models come out as people start understanding them better.

### G. Add a simple ethical component

We conduct research to help inform society on the best public policies, health behaviors, business practices, and more. What we hope to see in others' research is the product of the optimal model they can develop, not the product of their e fforts to find statistical significance. This means that our goal in conducting research should not be to find statistical significance, but rather to develop the best model to answer a research question and to give a responsible assessment of that model.

I recommend a few basic lessons in ethics (or good research practices). The first one would stress honesty in research and would give examples of when or how people might not be honest, such as with p-hacking. This could include some e fforts to detect p-hacking, as described in Christensen and Miguel (2018). The second lesson would be the simple concept that "significance is not the goal of research". This is obvious to my students when they hear it (after they have taken other statistics and econometrics classes), but it is new to them and proves to be a valuable lesson. One student said, in an end-of-term reflection paper, that she had an insignificant estimate on her treatment variable in her thesis. She had the temptation to change the model to find significance, but she resisted that temptation based on this simple lesson that significance is not the goal. Other students, before hearing this lesson, tell me that something must be wrong with their model because their main coe fficient estimate was insignificant. A simple statement on the order of "insignificant estimates are okay" might help change the culture. The third lesson in ethics would be on the importance of making responsible conclusions. This should involve being completely forthright about all potential pitfalls and biases to the coe fficient estimates that could not be addressed and being careful with the conclusions on significance based on the Bayesian critique of *p*-values. This is important for society to properly synthesize the meaning and conclusions that can be drawn from a study. Overall, having textbooks incorporate lessons on the ethics of research might be a good step towards contributing to more honest research.

These lessons may also benefit from what Baicker et al. (2013) did for the study on how an expansion of Medicaid in Oregon a ffected health outcomes. They developed their model and published

the research plan before implementing it. New resources, such as from the Center for Open Science, are promoting the online posting of research plans3.

### **5. Implications for Undergraduate Econometrics**

It follows logically that if my argumen<sup>t</sup> is correct that graduate econometrics training needs to be changed as I suggest, so too does undergraduate econometrics. Here is an equation from the textbook assigned in the undergraduate econometrics class I took many years ago, which remains in the current edition:

$$\beta\_2 = \frac{\sum \left( y\_i \mathbf{x}\_{2i} \right) \left( \lambda^2 \sum \ x\_{2i}^2 + \sum \ v\_i^2 \right) - \left( \lambda \sum \ y\_i \mathbf{x}\_{2i} + \sum \ y\_i v\_i \right) \left( \lambda \sum \ x\_{2i}^2 \right)}{\sum \left( \lambda^2 \sum \ x\_{2i}^2 + \sum \ v\_i^2 \right) - \left( \lambda \sum \ x\_{2i}^2 \right)^2}$$

It makes me wonder what would be a more efficient use of students' time: deciphering equations such as this or learning how to recognize biases.

One colleague said to me as I was writing my textbook, "Undergraduate econometrics is taught as if everyone will go on to a Ph.D. Economics program."I would take that statement further and argue that undergraduate econometrics is generally taught as if everyone will become an econometric theorist. However, few will.

To highlight how misguided it might be to use a high-level math approach rather than a more practical approach, consider these numbers. There are about 26,500 undergraduate economics majors per year (Stock 2017). And, according to the American Economic Association, there are about 1000 new Economics Ph.D.'s each year4. I will guess that no more than 10% of those Ph.D.'s become econometric theorists. There are also some Economics Ph.D. students who may not have had undergraduate econometrics. Therefore, less than 4% of undergraduate econometrics students end up receiving an Economics Ph.D., and easily less than 1% of them end up becoming econometric theorists.

Just as with graduate econometrics, I would agree with the first two of Angrist and Pischke (2017) recommended changes to undergraduate econometrics:


However, I would argue that there is even less justification (than for graduate econometrics) for their third point (increase emphasis on RCT and quasi-experimental methods), at least for most quasi-experimental methods that have limited opportunities to be applied. More so than graduate students, few undergraduate econometric students will become academics, and so few will have the opportunity to search for randomness, valid instrumental variables, or discontinuities. Rather, they will mostly have to make the best of non-random data. Therefore, lessons should focus on developing skills for dealing with such data, understanding what the potential sources of bias are, figuring out how (if possible) to address the potential biases, and making responsible conclusions. These are the skills that will be needed for most people using regression analysis to try to solve problems. Learning how to conduct regression analysis without learning how to properly scrutinize a model and interpret results (in terms of causality and significance) has the potential to do more harm than good.

Table 2 is similar to Table 1, but it is for the top undergraduate textbooks, as used by Angrist and Pischke (2017), and with a few more I added. It shows, again, my estimate for the number of pages centered around a given topic. There is the same problem as with graduate textbooks that important concepts are not covered much or at all, while much space (and likely time in undergraduate classes) is

<sup>3</sup> See https://www.cos.io/our-services/prereg?\_ga=2.152997817.1848170691.1585117163-115791253.1585117163.

<sup>4</sup> This comes from https://www.aeaweb.org/resources/students/careers/the-economics-profession.

devoted to concepts that are relatively minor for what would be useful to learn, in my view. Although the things that could cause bias in standard errors appear to have a large emphasis, there remains minimal coverage of things that could bias coe fficient estimates. Furthermore, while those books that do discuss how to interpret an insignificant coe fficient estimate do so correctly (albeit, briefly for each of them), it appears that only four of the eight books discuss it in the main discussion of hypothesis tests.



Note: "N/A" in the last row indicates that I do not believe the topic of how to interpret insignificant estimates is discussed.

With a more practical approach, there can be useful lessons that would actually be applicable for most of the undergraduate econometrics students. I gran<sup>t</sup> that not all undergraduate econometrics students will have a job using econometrics. However, perhaps the greater skill they should come away with is the ability to recognize sources of bias. This could help them understand why correlation could but does not always mean causation. It could help them understand other important statistical concepts such as omitted-variables bias and Type I errors, both of which have applications to many workplace situations. Learning about biases could help engender a healthy skepticism in the statistics and research they hear about every day. And, these are skills that could form the foundation for more efficient learning in graduate econometrics, for those who take that route.

### **6. Conclusions and Topics for Further Discussion**

The goal we should have as econometrics instructors is to teach the skills that would encourage solid, honest, and responsible research that can help improve the world. Being able to have a voice for improving the world requires trust that what we produce is valid. Therefore, e fforts in instruction should foster honesty, responsibility, and the skills and research practices that produce valid research.

This means that we need to assess what concepts and what methods of instruction are the most important for producing solid researchers. Based on this idea, I have made the case, building on Angrist and Pischke (2017), that we need to shift emphases.

The teaching of graduate (and undergraduate) econometrics needs to be revamped. As instructors, we need to think about what most students will be doing with their skills, what are the most practical

lessons from econometrics, what potential problems are most likely to a ffect the validity of a study, and how do we produce ethical and responsible researchers.

Not everyone is going to be an academic with the freedom to search the world for random assignment and choose their own topics from the randomness they find. Rather, most students will become non-academic practitioners who will need to address important problems with data that do not have random assignment. Their task would be to recognize the potential sources of bias, design the optimal method to address the issue, choose the optimal set of control variables, recognize the remaining sources of bias that could not be addressed, and make responsible conclusions. Or, as consumers of research, they should have the tools to recognize biases, which can also apply to the everyday statistics they hear in the news or even properly assessing events by considering alternative stories that could explain why two variable move (or do not move) together. These are the things that econometrics courses should be aimed towards, both at the graduate and undergraduate levels.

Certainly, such shifts would impact certain fields in which there would be methods particular to that field, such as Macroeconomics. And, anyone studying econometric theory would need a new course on the high-level math underlying econometrics. However, such shifts would make sense to spend class and student-studying time more e fficiently, avoiding spending time on field-specific methods or the high-level math for students who would never need such material. Furthermore, a class that spent more time giving examples that demonstrate the nuances of certain methods should help students better understand the mathematical theory behind the models.

Let me end by calling for a larger assessment of what skills Ph.D. economists need in their research. Would most Ph.D. students benefit from a shift in focus from the high-level math to something more practical? Should basic graduate econometrics be any di fferent from undergraduate econometrics? For what I believe is a large share of Ph.D. economists, two good low-math undergraduate courses (that incorporate the changes I describe above), along with applied graduate courses and plenty of practice, should be su fficient to prepare them to become successful researchers. Based on my experiences at research organizations and in academia, I believe that these lessons would have been su fficient for most of my colleagues. The redesign and shifts that I have discussed, as I have argued in this article, would have helped me avoid most of my research mistakes.

**Funding:** This research received no external funding.

**Acknowledgments:** I would like to thank Thomas Ahn, Jiadi Chen, Judith Hermis, Ercio Munoz, Daniel Stone, and anonymous referees for their helpful comments.

**Conflicts of Interest:** The strategy for the teaching of econometrics that I espouse in this article is consistent with much of (although not all of) the teachings in my own textbook. There are ideas in this paper that go beyond what is in my textbook, and at least one idea that is inconsistent with my textbook. If this article were to lead to more book sales, there would be a very-minor financial benefit. That said, my views expressed here were based on: (1) my perspectives that shaped my textbook; and (2) other of my perspectives that have evolved since the publication of my textbook.
