*1.3. India's Ministry of Education*

Similarly, in 2002, India's Ministry of Education developed the capacity to gather data related to organizational budgets and school expenditures to better distribute resources in India's developing areas outside of their major cities such as Mumbai, Delhi, and Bangalore [10]. Before 2002, India's government had only released broad data to the World Bank related to government spending on education as a percentage of India's overall gross domestic product (GDP) without any student- or school-level data [11]. As technology proliferated in India and the Indian government established more policies to emphasize data-driven decision-making, more data was able to be collected related to school enrollment growth, the establishment of new schools, and gender equity, culminating in some of India's most comprehensive education reports in the mid-2000s [12]. When the COVID-19 pandemic rocked the world of education, India's school closures were among the longest in the world, averaging 73 weeks per school compared to the global average of 35 weeks [13]. Contributing to this length was the fact that many of India's public schools are entrenched in densely populated urban areas or remote rural areas with inadequate access to medical care [13]. Yet, because of India's increasingly centralized educational data system, India was able to swiftly compile a comprehensive report targeting equity gaps among India's most under-resourced rural schools, allowing India's government to provide interventions and assistance, as well as guidance on how to formulate future year budgets to fill these gaps [14].

#### *1.4. The United States*

Among developed nations, the United States (U.S.) likely has the longest-standing and most comprehensive educational data collection methods and reporting structures in the world. In the United States context, governmental policies after the first Morrill Act of 1862 greatly expanded educational opportunities for the U.S. people, and the Department of Education Act of 1867 created the U.S. Office of Education, which later became the U.S. Department of Education [15]. The aim of the office was to organize educational functions at the federal level and provide resources for states to measure the educational progress of students in their schools. In 1867, the Office of Education began making early attempts at building large datasets to measure educational goals and outcomes, with the first national-level education surveys administered and data collected being largely from public grade schools in 1870 [15]. However, scholars have long lamented that better, more robust data was not collected earlier in the history of postsecondary data collection in the United States [16].

Partially owing to the success of the 1870s surveys, the second Morrill Act of 1890 greatly expanded on the federal government's data collection program, thrusting the U.S. into the 1900s when multiple data collection and analysis efforts built upon the Second Morrill Act: the statistical program of 1920, the Vocational Rehabilitation Act of 1943, the

Information and Education Exchange Act of 1948, and the establishment of the National Center for Education Statistics (NCES) in 1962. Today, the NCES includes secondary and postsecondary data at the school, district, state, and regional levels and is one of the most robust national educational datasets in the world [15]. Moreover, the Civil Rights Era of the 1960s and President Lyndon Baines Johnson's aggressive education agenda produced many landmark education developments in the U.S., including the signing of the Elementary and Secondary School Act (ESSA) and the Higher Education Act (HEA) in 1965 [15], both of which required data reporting by schools to the federal government. These acts paved the way for the Office of Education (now known as the Department of Education) to begin administering the National Assessment of Educational Progress in 1969, the "largest nationally representative and continuing assessment of what students in public and private schools in the United States know and are able to do in various subjects" [17] (para. 1). To date, it remains the largest and most comprehensive collection and report of big education data in the United States and the world.

Decades later, the United States developed even more formal attempts to compile large education datasets, introduced in 1990 with the advent of the National Education Goals Panel pursuant to a Congressional mandate under President George H. W. Bush [18]. The aim of the panel was to annually report on national and state educational progress toward the National Education Goals adopted by the President and the nation's governors, as well as requirements by the U.S. Office of Management and Budget for data documenting the effectiveness of federal programs both in and outside of education under the Government Performance and Results Act (GPRA) of 1994 [18]. More recently, the American Recovery and Reinvestment Act (2009) indicated that federal education officials sought to ensure that data and evidence are used to inform policy and practice [19]. The Act provided USD 10 B to "help local educational agencies hire, retain, or rehire employees who provided school-level educational and related services", including bolstering data collection and analysis initiatives related to the profession of education in the United States [19] (para. 1).

### **2. Issues with Big Data**

Many developed nations (developed nations defined as sovereign states with a high quality of life and high Human Development Index per the International Monetary Fund) gather high-quality data to make informed educational decisions [20,21]. However, many developing nations do not have the resources to compile the types of large, national-level datasets that the European Union, India, or the United States has. Moreover, researchers have criticized these organizations and countries for failing to target equity gaps and facilitate resources for the most marginalized populations [20,21]. In these cases, more data does not mean and has not meant more progress for the most impoverished, at-need communities around the world.

Moreover, many developing nations (defined as sovereign states with a lower Human Development Index than developed nations per the International Monetary Fund) in South America, Africa, and Asia do not report local- or national-level data beyond information shared with the OECD, rendering it difficult for developed nations, charitable non-profit organizations, and schools themselves to make data-informed decisions to improve the education and lives of children, their families, their local communities, and their nations. As a result, this study will explore how developed or developing nations can assemble large, inclusive educational datasets, using the United States as an exemplar and deeply flawed model. Although the U.S. has built enviable educational datasets, these datasets are often compiled inequitably and do not allow for appropriate disaggregation to inform targeted invention and policy work to assist children and families most in need. By learning from the U.S.—the positives and negatives—other countries can compile datasets in an equitable fashion to ensure that minoritized populations are heard and supported by their school systems and governments.
