1. Introduction
Consumption of caffeine remains a topic of popular interest, but it is also often a cause of confusion for medical professionals, nutritionists, and the public. The editors of this special issue of
Nutrients, related to the impact of coffee and caffeine on human health, invited us to provide a summary of the recently published article, “Systematic Review of the Potential Adverse Effects of Caffeine Consumption in Healthy Adults, Pregnant Women, Adolescents and Children”, for a broad audience. The large (64-page) systematic review was published in Food and Chemical Toxicology in April 2017, received much attention in the press, and was chosen “Best Paper of the Year” by the Editors of the journal [
1]. The format of the paper followed a systematic review (SR) approach, which used an established and recognized framework that was specifically chosen to ensure transparency. Staying true to this framework required a large amount of documentation, which rendered the paper groundbreaking in terms of content but perhaps challenging to read and digest. At the same time, tracking statistics have demonstrated that the general public, in fact, has an interest in the SR findings with regard to caffeine. Scientific findings lose their value if they cannot be easily comprehended by diverse audiences. The Institute of Medicine (IOM) also recognizes this fact, and their guidance related to systematic reviews suggests that plain-language summaries can improve the work’s usability for general audiences [
2]. Thus, the aim of this paper is to provide a plain-language summary of this important review, and the reader is referred to the original work for full references [
1]. We hope that this approach will allow the findings to be more understandable and help individuals make educated decisions regarding their (or their patients’) consumption of caffeine.
Caffeine (1,3,7-trimethylxanthine) is a pharmacologically active component of many foods, beverages, dietary supplements, and drugs. Interestingly, it is also used to treat very ill, often premature, newborns afflicted with apnea (temporary cessation of breathing) [
3]. Caffeine is probably best recognized for its use as a flavor in cola-type beverages, and for its natural occurrence in some seeds, such as coffee and cocoa. Coffee is one of the major contributors of caffeine to the diet [
4] and it has been consumed safely for centuries, as have black and green tea. Energy drinks entered the market in the 1980s, introducing another popular source of caffeine. A number of other caffeine-added products have also attempted entry into the marketplace, such as maple syrup, beef jerky, donuts, and chewing gum. These products, with varying degrees of success, have attempted to provide novel sources of caffeine to the consumer.
The long history of caffeine use and the wide array of new products offered as sources suggest that consumers continue to desire caffeine’s pharmacological effects. In the last decades, caffeine has received both favorable and unfavorable attention from various stakeholders, such as the scientific community, the press, and Non-Government Organizations. Any general internet search yields many consumer questions related to the health and safety of caffeine. Mixed messaging in the press related to benefits and potential adverse effects, combined with the possible difficulty of assessing one’s own exposure to caffeine, can lead to a great deal of uncertainty for the consumer. To address this concern in the United States, health-care professionals made a public request in the form of a letter to the FDA to gather data related to overall caffeine safety [
5]. As part of this request for more investigation, the IOM’s Food and Nutrition Board and Board on Health Science Policy hosted a two-day workshop in August of 2013, entitled, “Caffeine in Food and Dietary Supplements: Examining Safety”. This workshop provided a public forum for discussion and examination of the potential health hazards of caffeine, which were later summarized in a large (190-page) publication [
6]. The bulk of the data presented at that time came from the Oak Ridge National Laboratory (ORNL) report that was commissioned by the FDA [
7]. The IOM’s public forum event to discuss caffeine safety was not unprecedented—in the past couple of decades, many other countries have initiated discussions about the use of caffeine in food and beverages, with the intent of better understanding the consumption practices and potential safety concerns (India [
8]; Australia and New Zealand [
9], Europe [
10], and Canada [
11]). The European Food Safety Authority (EFSA) has the most recent publication of such an effort [
10].
Most of the authoritative reviews or discussions mentioned above allowed for some sort of public and stakeholder input, either via submission of public comments directly or participation in public forums for discussion, and three major themes or requests continually surfaced: (1) help the consumer understand how much caffeine is actually in food and beverages (exposure); (2) help the consumer understand what level of caffeine is safe (risk); and (3) better elucidate what sort of adverse effects are associated with particular doses (dose-effect). Throughout the discussions and various publications, another commonality was the repeated references to one particular publication—Nawrot et al. (2003) [
11]—and subsequent references to the suggested “safe values” for ingestion of caffeine those authors put forward.
Nawrot et al. (2003) [
11] is a peer-reviewed publication from Health Canada, which conducted a narrative, but not systematic, review of scientific literature. We believe that at least part of the reason this article has been so heavily cited is that it is easy to read and covers multiple areas of interest related to caffeine. In developing their conclusions, Nawrot et al. (2003) [
11] reviewed many potential adverse-event areas; however, given the voluminous scope, they focused primarily on five outcomes (1) acute toxicity (defined herein as abuse, overdose, and potential death); (2) cardiovascular; (3) bone and calcium; (4) behavior; and (5) development and reproductive toxicity. The authors also touched on genotoxicity, mutagenicity, and carcinogenicity, but these have not been a focal point of concern for caffeine outside of reproductive toxicity. The authors concluded after conducting their qualitative review that the consumption of up to 300 mg/day for pregnant women and 2.5 mg/kg body weight/day for children is not associated with adverse effects. They went on to conclude that an intake dose of up to 400 mg caffeine/day is not associated with adverse effects in healthy adults [
11]. Importantly, since Nawrot et al. [
11] was published in 2003, more than 10,000 papers on caffeine-related topics have been published, and of those, more than 5000 address effects or exposure in humans. In addition, 800+ reviews related to various human health effects of caffeine have also been published (nearly all are specific to a particular adverse endpoint category).
With this as background and in light of the wealth of new data in the peer-reviewed literature, and because Health Canada’s work is so commonly referenced in discussions and debates over caffeine safety, the goal of our systematic review was to investigate whether or not the Nawrot et al. (2003) [
11] conclusions remain current as an acceptable level of protection to the healthy general public. We chose the same outcomes for evaluation, because these endpoints reflect importance, as documented in other comprehensive evaluations [
6,
10,
11,
12,
13], and indicate stakeholder interest. Therefore, it is useful to determine whether the values that were put forth by Nawrot et al. (2003) [
11] remain appropriate and as such can still serve as a basis to assure the typical healthy caffeine consumer of a reasonable certainty of no harm. This evaluation also allows scientists to move on from this question and focus more on sensitive subpopulations that may be at greater risk.
Thus, the need for our systematic review was established. Specifically, our objective was to determine whether the literature published since the 2003 Health Canada review supports the conclusion that caffeine consumption at amounts up to 400 mg/day for healthy adults, 300 mg/day for healthy pregnant women, and 2.5 mg/kg body weight/day for healthy children is not associated with adverse effects. We also evaluated the consumption of 2.5 mg/kg body weight/day in adolescents, although this was not specifically addressed by Nawrot et al. (2003) [
11].
2. Materials and Methods
The Systematic Review (SR) was conducted using the IOM’s
Finding What Works in Health Care—Standards for Systematic Reviews as guidance [
14]. The overall work flow of the systematic review is shown in
Figure 1 and it included problem formulation; developing a protocol; conducting a systematic search (informed by a librarian) of three databases; screening of literature for inclusion/exclusion; critically appraising individual studies; conducing endpoint, outcome, and overall syntheses and weight-of-evidence analyses; and, reporting the systematic review.
Consistent with IOM recommendations, the first step that is involved establishing a team with appropriate expertise and experience (
Table 1). The project team was composed of eight scientists from ToxStrategies with a range of expertise, as well as a scientific advisory board (SAB), of which each member had expertise in an outcome (e.g., cardiovascular) evaluated in the review.
Develop the Population Exposure Comparator Outcome (PECO). As part of the IOM framework problem formulation, the specific research question or objective addressed in the systematic review was based on a “PECO” format (which is different from the PICO (population, intervention, comparator, and outcome) format that is often used in nutrition and clinical medicine). Specifically, the PECO was:
“For (population), is caffeine intake above (dose), compared to intakes (dose) or less, associated with adverse effects on (outcome)?” As an example, for healthy adults, the PECO would be, “For healthy adults, is caffeine intake above 400 mg/day, compared to 400 mg/day or less, associated with adverse cardiovascular effects?”
The SR focused on five outcomes (
Figure 2): acute, cardiovascular, bone and calcium, behavior, and development and reproduction (further descriptions of the endpoints included within each of these outcomes can be found in the results section of each outcome. It should be noted and emphasized that, within each outcome (e.g., cardiovascular), there were many endpoints (e.g., morbidity, mortality, blood pressure, heart rate, etc.)
. A sixth outcome, pharmacokinetics (PK), was included as a contextual topic; the objective was to generally characterize the current understanding of caffeine kinetics and critically review any information that advances the science. Thus, this topic particularly pertained to the differences and similarities between our populations of interest, characterization of kinetics in children and adolescent populations of interest, and characterization of kinetic parameters (particularly fast/slow phenotypes) in the context of the outcomes of interest.
Four populations were evaluated: healthy adults, healthy pregnant women, healthy adolescents (aged 12–19 years), and healthy children (aged 3–<12 years). For all outcomes, except acute, the daily intake (exposure) values that were evaluated were based on those established by Nawrot et al. (2003) as acceptable levels of daily intake. Thus, the exposure values (the “E” in the PECO) were 400 mg/day (10 g for acute), 300 mg/day, and 2.5 mg/kg body weight/day for adults, pregnant women, and adolescents and children, respectively. Similarly, comparators (the “C” in the PECO) were ≤400 mg/day for adults (10 g for acute), ≤300 mg/day for pregnant women, and ≤2.5 mg/kg body weight/day for adolescents and children. Thus, for example, we investigated whether the literature supports a finding that a daily exposure of 400 mg caffeine per day is safe for adults (the exposure), or rather, whether the literature supports the safety of daily exposures to less than 400 mg caffeine body weight per day for adults (the comparator).
Protocol Registration. Consistent with expectations for transparency as part of the framework, a protocol for each outcome was developed and registered on PROSPERO (PROSPERO protocol nos. CRD42015026704, CRD42015027413, CRD42015026673, CRD42015026609, and CRD42015026736;
https://www.crd.york.ac.uk/PROSPERO/). Each protocol included: (1) context and rationale for the review; (2) study selection and screening criteria; (3) descriptions of outcome measures, time points, and comparison groups; (4) search strategy; (5) procedures for study selection; (6) data extraction strategy; (7) approach for critically appraising individual studies; and (8) method for evaluating the body of evidence. The objective of registering a protocol is to make the approach apparent a priori, as is consistent with the IOM guidelines and standard practice of systematic review.
Literature Search. A comprehensive search strategy was iteratively developed and employed with the assistance of a librarian who had expertise in the conduct of SRs. Three databases were searched: PubMed, EMBASE, and the Cochrane Database of Systematic Reviews. DistillerSR (a software tool that facilitates systematic review) was used for screening and selecting studies, as well as for documenting the extraction and evaluation of data. It is important to note that, to be included in the SR, studies had to provide a quantitative estimate or measurement of individual exposure to a caffeine source associated with an adverse effect. We included many forms of caffeine, such as coffee, tea, chocolate, cola-type beverages, energy drinks, supplements, medicines, and energy shots. For included studies, basic information that was reported by the author was extracted from each study (i.e., direct extraction of information from the text), along with other selected information needed to inform the PECO questions (e.g., dose/exposure calculations) that may have required interpretation by the analysts. For example, the exposure (dose) of caffeine was extracted directly from the studies when the authors of the studies evaluated caffeine directly or reported findings based on the amount of caffeine in given sources. In cases where this was not directly reported, the reviewers standardized the quantity of caffeine; this process was explained in supplementary materials to the original publication, and the interested reader can find more details there.
Individual Study Evaluation. During extraction of information from an individual study, the level of adversity (potential for harm) of the endpoints within the study was characterized [
15]. That is, the reviewer noted whether the study evaluated a clinical (e.g., morbidity or mortality) or physiological endpoint (e.g., blood pressure changes), as well as the importance of the effect for decision making (e.g., mortality vs. blood pressure changes). Additionally, from each study and each eligible endpoint within a study, specific values were selected or determined in order to compare to the PECO (i.e., the conclusions of Nawrot et al., 2003 [
11]). This involved identifying effect and no-effect levels. Specifically, we endeavored to establish a lowest-observed-effect level (LOEL), or, preferably, a no-observed-effect level (NOEL) (e.g., a daily exposure of X caffeine/day was without effects on Y endpoint in study Z), which could then be used for comparison to the PECO.
Following data extraction, individual studies were assessed for the risk of bias (internal validity) using the National Toxicology Program’s Office of Health Assessment and Translation (OHAT) Risk of Bias Rating Tool for Human and Animal Studies [
15]. Bias is differentiated from the broader concept of quality of the methodology and is aimed at assessing the systematic error—a measure of whether the design and conduct of a study compromised the credibility of the link between exposure and outcome [
14,
15,
16]. This approach evaluated what are called “specific domains” based on study type (i.e., controlled trial vs. observational study). Specific domains related to bias included selection, confounding, performance, detection/measurement, attrition/missing data, reporting, and other types of bias. Each domain was rated from “definitely low risk of bias” to “definitely high risk of bias” per the OHAT tool. These ratings for individual studies were then considered in the weight-of-evidence assessment when developing conclusions for the endpoint, outcome, and overall (
Figure 3).
Determination of Weight of Evidence. Following the appraisal of individual studies, the body of evidence was evaluated using a weight-of-evidence approach for each endpoint, each outcome, and overall (
Figure 3). Similar to the approach and conclusions of Nawrot et al. (2003) [
11], the objective in the weight-of-evidence assessment was not to find the most protective amount or the lowest amount associated with an effect,
per se, but rather, to make a determination that is based on the body of evidence as a whole, which included considerations for positive and negative findings, quality of data, level of adversity, consistency, and magnitude of effect (for studies with effects below the comparator). The weight-of-evidence approach implemented was based on the framework established by the IOM [
14] and it was complemented by guidance from the National Toxicology Program handbook on systematic reviews [
17], given the specific application to toxicological assessments. We also relied on the GRADE (Grades of Recommendation, Assessment, Development and Evaluation) process in determining and implementing our weight-of-evidence approach [
18,
19].
In evaluating and conducting a qualitative synthesis of the body of evidence, data were described based on the volume of data above and below the comparator, as well as the types of effects and quality of evidence of data that are above and below the comparator. An initial level of confidence in the evidence was assigned based on key features of study design: controlled exposure, exposure prior to outcome, individual outcome data, and comparison group used [
17]. Then, using expert judgement, a number of additional factors were considered for the overall body of evidence, which yielded increases or decreases in the confidence level. These factors included the following: overall risk of bias, indirectness (when the population, exposure, or outcome differ from those in which we were interested), magnitude of effect, confounding, and overall consistency [
17,
18,
19]. Consideration of endpoint importance in terms of the endpoint’s degree of adversity [
18,
19] was also important in reaching weight-of-evidence conclusions.
Weight-of-evidence determinations were made by endpoint, outcomes, and overall (
Figure 4). Such determinations were also made by population, because the comparators were different for healthy adults, pregnant women, and children. Conclusions were developed by categorizing evidence relative to the comparator (an intake value not associated with adverse effects) as follows: comparator is acceptable (i.e., evidence supports the Nawrot et al., 2003 [
11], conclusions regarding intake), comparator is too high (i.e., evidence suggests the comparator is too high for a given endpoint), or comparator is too low (i.e., evidence suggests the comparator could be higher for a given endpoint). Using a similar approach, conclusions were also developed for the outcome. When developing outcome conclusions, clinical endpoints with a high level of adversity were given the most weight. Several tools were used to facilitate and support the weight-of-evidence evaluation, including generation of evidence tables, risk-of-bias heat maps, summary plots of selected NOEL/LOEL data from individual studies, and a tabular summary of the confidence in the evidence for each outcome and endpoint. Conclusions were not developed for endpoints that contained fewer than five studies; in these instances, summary thoughts were provided, but data were determined to be insufficient to reach a conclusion.
Transparency in Reporting. All data from the systematic review were placed in a freely available Agency for Healthcare Research and Quality (AHRQ) Systematic Review Database Repository (SRDR).
4. Discussion
The article, “Systematic Review of the Potential Adverse Effects of Caffeine Consumption in Healthy Adults, Pregnant Women, Adolescents and Children [
1]”, summarized herein, provides a comprehensive assessment of evidence in the peer-review literature regarding caffeine safety. Results demonstrated that the conclusions from Health Canada established in 2003 [
11] still hold true today. That is, moderate caffeine consumption—up to 400 mg/day in healthy adults, 300 mg/day in healthy pregnant women, or 2.5 mg/kg body weight/day in children and adolescents—is unlikely to be associated with adverse effects. The Special Issue of Nutrients afforded us the opportunity to provide a plain-language summary of the systematic review, thus improving the usability of the SR for health-care professionals and consumers of caffeine.
Serious considerations were given to the strengths and weaknesses of the systematic review. Key strengths included: (1) Use of the systematic review format based on IOM standards (IOM, 2011) [
14]; this format imparts transparency and rigor to the review process (and subsequent confidence in the overall assessment); (2) Assessment of five health outcomes (reproductive and developmental toxicity, behavior, cardiovascular, bone and calcium homeostasis, and acute toxicity); (3) Assessment of four populations (healthy adults, healthy pregnant women, healthy adolescents, healthy children); (4) A large evidence base (>5000 studies considered for eligibility, >381 included across the five outcomes); (5) A multidisciplinary team consisting of subject-matter experts and systematic-review experts; (6) Full transparency in analysis and reporting via the registration of systematic review protocols on PROSPERO, use of the AHRQ Systematic Review Data Repository, and open access to both this summary and the systematic review publication in
Food and Chemical Toxicology. Additionally, the review sponsor supports a website containing all relevant resources (
http://ilsina.org/caffeine-systematic-review-2017).
Weaknesses of the systematic review included: (1) The large volume of information reviewed precluded the ability to discuss or present all aspects of each study (e.g., all findings, critical appraisal of individual study strengths and limitations); (2) The evidence base was complex and heterogeneous. Study design and reporting varied widely, both within an outcome or endpoint and between outcomes and endpoints; for example, different methods were used to assess caffeine intake, or different approaches were used to measure effects on sleep; (3) Limitations in the overall evidence base did not allow for an assessment of chronic exposures for all endpoints evaluated in the review; for example, data from studies that reported physiological endpoints (e.g., blood-pressure changes) were most often obtained from short-term (often single-exposure) controlled trials; (4) Not all study designs properly controlled for confounding; (5) Various sources of potential bias (pregnancy signal and recall bias) were discussed briefly here, but the reader is also referred to an article in this special issue devoted solely to this topic [
27]; (6) Difficulties encountered in characterizing exposure (discussed in more detail below).
One of the largest areas of uncertainty in the underlying body of evidence assessed herein, and one of much interest to the consumer, is that of exposure. In the case of the SR, confidence in the characterization of exposure for each individual study was not high. Several of the caffeine sources that were included in the SR are complex mixtures with other potentially active compounds, and the amount of caffeine within each source can be highly variable. This is a problem for coffee in particular [
4], which was the primary substance evaluated in >20% of studies assessed in this SR. To address this, we attempted to standardize this metric in the SR. It should be noted, however, that the evidence also contains a large number of controlled trials in which exposure was well characterized, although these studies were associated primarily with physiological endpoints. Providing consumers with information that is related to caffeine levels contained in specific products (e.g., better product labeling) will help them to make educated decisions regarding their personal exposure level.
From recent literature, one can see that other aspects of caffeine consumption are important to consider when determining caffeine safety; for example, the conditions under which various sources of caffeine are consumed and whether caffeine consumption is habitual or not. Our SR evaluated consumption of total caffeine amounts within a day; however, as consistent with the kinetic behavior of caffeine, effects may vary based on how the caffeine is consumed within a day. The most dramatic examples of this are the case studies that report lethality events that are associated with rapid and excessive consumption of capsules or powders (the comparator for lethality (10 g) is equivalent to ~100 cups of coffee). This concern is supported by recent FDA activity designating pure or highly concentrated caffeine in powder or liquid as unlawful (FDA guidance, 2018;
https://www.fda.gov/newsevents/newsroom/pressannouncements/ucm604485.htm). Therefore, it is important for the consumer to understand such nuances of exposure. To that end, considering the wide array of caffeine-containing products in the marketplace, and hence, the potential for exposure to caffeine, the consumer’s own perception of the effects of caffeine and self-limitation will remain an important area of research. A recent review by Nehlig (2018) [
36] provides insight into consumer self-limiting based on objective (what caffeine does to the body that may not be recognized by the consumer) and subjective effects (the caffeine effects sought by the consumer) of caffeine. Further research will likely continue in the area of interindividual sensitivity and consumption practices, as related to genetic makeup [
37].
Based on our findings, we would suggest that any discussion with consumers or patients should consider the magnitude and level of the adversity of effects. That is, the pharmacological effects of caffeine are anticipated to cause certain physiological changes and thus require some characterization of the level of significance to health (because not all physiological changes are adverse). An example is that caffeine intake is expected to result in increased alertness, which is often desirable; however, under some conditions (such as prior to bedtime), this is an adverse effect, leading to difficulty sleeping. Another good example is that, while data suggest that caffeine intake can result in changes to heart rate or blood pressure, it is less clear at what level these effects are clinically significant.
The findings of the SR support the safety of standard consumption practices in the United States, because both mean and upper-end estimated intakes (mean of 165 mg/day and 90th percentile of 395 mg/day, all ages) are below the comparator value evaluated herein. Findings of this assessment, however, also confirm that there is no “bright-line” safe exposure, because potential effects depend on many conditional factors; further, there is some limited evidence that self-regulation reduces consumption [
38]. With regard to child and adolescent populations, limited data were identified; however, based on the available studies reviewed, there is no evidence to suggest a need for a change from the recommendation of 2.5 mg/kg body weight/day. Our review supports that additional research would be valuable in this area, as well as in other areas that were identified as having insufficient information—a finding similar to that of other investigators (e.g., Ruxton 2014 [
39]). This includes more research on effects in sensitive populations and establishing better quantitative characterization of interindividual variability, as well as subpopulations (e.g., unhealthy populations, those with preexisting conditions), conditions (e.g., co-exposures), and outcomes (e.g., exacerbation of risk-taking behavior) that could render individuals at greater risk relative to healthy adults and pregnant women.
In addition to the area of self-regulation mentioned above, this work identified other suggested research areas, listed here per outcome area. Bone & calcium: more research in non-adult populations as well as a better understanding of caffeine’s effects on physiology and the role of calcium would be valuable. Cardiovascular disease: a better understanding of dose-response relationships following chronic exposure for some endpoints (e.g., endothelial function and heart rate variability) would be useful. Additionally, for certain physiological effects, research should better characterize what, if any, magnitude of change may be considered harmful. Behavior: more research is necessary on children and adolescents; particularly with regards to caffeine’s effects on sleep and risk-taking behavior. It would also be helpful if more consideration for/or a better understanding of the effects of caffeine withdrawal on these endpoints. The are no data available on pregnant women that fit the quantitative inclusion criteria, so studies designed to account for this would be beneficial. Finally, investigating a better understanding of the effects of caffeine on anxiety and sleep in sensitive subpopulations as well as in individuals with polymorphisms (e.g., ADORA2A) would be of use. Reproductive and developmental: more research is necessary to understand the effect of caffeine on childhood cancer and childhood behavior with properly designed/controlled studies. In addition, more consideration and accounting for the pregnancy signal would be beneficial. Overall, as noted for all outcomes, better exposure characterization in pregnant women to reduce measurement error, which continues to be a major challenge for observational study design, would be valuable. Acute: the main identified research need in this area is improved exposure characterization; testing of blood concentrations would prove valuable.