In this presentation, we draw a parallel between public health screening processes of asymptomatic individuals to detect a disease (i.e., cancer) and the attempt of identifying individuals who may be classified as being “at risk” of conducting a violent act. We discuss concepts related to the early detection of a condition through screening. In particular, we focus on the underlying history of the disease over time, as it moves through different stages: onset, start of asymptomatic detectability, start of symptomatic detectability, and death. Notably, such description refers to the latent (possible) insurgence of the disease and to its would-be progression if left undetected and untreated. As part of this framework, we discuss the relevant quantities that are involved when one performs a screening test. These include the probabilities of true and false positive and negative test results and the area under the curve (AUC) index. A positive test result is typically defined through the crossing of a threshold by some measured index. The behavior of the probabilities that describe the features of the test procedure as one changes the threshold that defines a positive result can be illustrated through an online application that was developed for this presentation, available at
https://marcobonetti.shinyapps.io/shinyapp/. Other quantities that are discussed are the negative predictive value (NPV) and the positive predictive value (PPV). The latter quantity represents the probability that a subject is truly a case when the test result is positive. The PPV may be surprisingly low, especially if the disease has a low prevalence in the population. We then discuss some modern methods for screening through the analysis of online data streams, either through active or passive data collection, the difference being that in the first kind, the subject is active in the data collection process; in the second kind, secondary data streams are used, such as Twitter tweets, Facebook posts, or Google searches. Machine learning algorithms allow the study of phenomena at the population level (e.g., trends in a phenomenon of interest), but also the prediction of features for individual subjects. Two specific examples are discussed, both based on the use of Facebook data. The first example deals with the prediction of a depression diagnosis from Facebook language, with a comparison of the results of the automatic classification algorithm and the true (known) depression diagnoses of a set of consenting subjects. The second example is a Facebook’s suicide prevention program, which has been in place for over 10 years. The program identifies individuals that, based on their posts, comments and live videos, may be at risk of committing suicide.