**6. Summary**

Big data research is important. The large sample sizes are almost always able to discern statistically significant relationships. Randomized trials are not available or feasible for many pressing clinical questions in our field. These examples come from the Pediatrix Clinical Data Warehouse [16]. The source of CDW data is medical records from approximately 350 NICUs that are managed by MEDNAX, Inc. (Sunrise, FL, USA)—approximately one fourth of NICU admissions in the United States. Despite its size, the CDW has several limitations. It is not geographically representative. The data are generated from physicians' documentation, and some information might be better obtained via a standardized case report form (a method used by the Vermont Oxford Network). Similarly, each neonatal dataset has its own set of unique limitations [17]. There are addition limitations to all United States NICU data that is currently collected [18]. How one defines the denominator when using these sources can introduce bias and influence the study results, validity, and generalizability. For this reason, we urge everyone to think critically about both the numerator and denominator—they both matter.

**Author Contributions:** Conceptualization, V.N.T. and R.H.C.; validation, formal analysis, data curation, R.H.C.; writing—original draft preparation, V.N.T.; writing—review and editing, R.H.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** The authors wish to acknowledge Deepti M. Tolia and Carol Bedsole Clark for their insight, patience, and contributions to this work.

**Conflicts of Interest:** The authors declare no conflict of interest.
