2.1. Microorganisms Used
Aside from the well-known Escherichia coli bacterium, which has been commonly used in many previous experiments, we have also tested our detection technique using Bacillus subtilis, Planococcus halocryophilus, and Pseudoalteromonas haloplanktis. B. subtilis is a common and resistant soil bacterium. P. halocryophilus and P. haloplanktis are adapted to salty and cold environments, making them suitable analog organisms for life detection on Mars and the ocean worlds such as Europa and Enceladus.
Escherichia coli is the most widely studied prokaryotic organism. It is a rod-shaped, facultative anaerobe gram-negative bacterium, which lives in the guts of warm-blooded organisms. The organisms are about 2 µm long, 1 µm in diameter, and are motile due to the possession of flagella [
4]. It is a peritrichous bacterium, meaning it has 5–10 flagella, which are randomly distributed across the cell surface [
7]. We used
E. coli K-12 wild-type (DSM 498) in our study, obtained from Dirk Wagner, Helmholtz Centre Potsdam GFZ German Research Centre for Geosciences.
Bacillus subtilis is a facultative anaerobic bacterium and one of the best-studied gram-positive organisms.
B. subtilis occurs in the natural soil environment and is also found in human’s gastrointestinal tract [
8]. It is rod-shaped and can survive in extreme conditions in terms of temperatures and desiccation because it is able to form endospores. It is typically 4 to 10 µm long and 0.25 to 1 µm in diameter [
9]. We used the type strain
B. subtilis Marburg (DSM 10), strain designation Marburg, obtained from Dirk Wagner, Helmholtz Centre Potsdam GFZ German Research Centre for Geosciences. This strain moves by using approximately 20 flagella per cell, which are non-randomly distributed over the cell surface [
10].
Pseudoalteromonas haloplanktis is a curved, rod-shaped psychrophilic bacterium isolated from Antarctica. It is typically 1.2 to 2.3 µm long and 0.5 to 0.6 µm wide. This strictly aerobic extremophile is gram-negative, non-spore-forming, and motile, employing a single polar flagellum [
11]. It has its growth optimum at 26 °C and can grow at temperatures as low as 0 °C [
12]. Due to its ability to survive in cold environments, we used the type strain
Pseudoalteromonas haloplanktis 545 (DSM 6060), obtained from DSMZ (Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures GmbH).
Planococcus halocryophilus is a gram-positive bacterium. It is aerobic and found in the arctic permafrost. The cells are non-spore-forming cocci, which occur individually or in pairs, and have a diameter of between 0.8 and 1.2 µm. The bacterium is motile and thrives in environments of high salinity and low temperature [
13]. It grows at temperatures as low as −10 °C and is metabolically active at temperatures as low as −25 °C [
14]. Cells remain motile under all growth conditions [
14]. The strain was first isolated from a permafrost active layer-soil in Ellesmere Island, Eureka, Canada. We used the type strain
Planococcus halocryophilus Or1 (DSM 24743), obtained from DSMZ (Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures GmbH).
2.3. Measurements
We used a
Primo Star Full Köhler phase contrast microscope with a 40× objective for the observations. Connected with the microscope was an
AxioCam 105 color camera, which transmitted the pictures live to a computer. The microscope in combination with the digital camera leads to a pixel scale of 0.11 µm restricted by the Abbe diffraction limit (approximately 0.4 µm). We used a Neubauer counting chamber with a depth of 10 µm for the observations of the microbial sample and of a milky solution as a reference sample. The microscopic measurements were performed at temperatures of 25 °C, 30 °C, 35 °C, and 40 °C. To maintain a stable temperature, we used a thermal insulating chamber consisting of 4 cm polystyrene walls and a 5 mm insulating aluminum layer. We used a
DOSTMANN P700 Universal Precision Thermometer. The probe of the thermometer was attached to the slide of the microscope. The chamber enclosed the microscope and the camera. We used a
Philips ThermoProtect 2100 W hairdryer for controlling the temperature in the chamber. The setup of the temperature control chamber can be seen in
Figure 1.
Due to time constraints, we tracked 50 individual movement pathways over ten seconds for each species, which was approximately 9.4% of the total individual E. coli cells observed, 10% of the total individual P. halocryophilus cells, 12% of the total individual P. haloplanktis cells, and approximately 40% of the total individual B. subtilis cells. Exposure times varied between species (and therefore optimal lighting for best image quality). The obtained videos had between 6.94 and 5.41 frames per second (in each case, the shutter speed was fast enough that the microbes could be clearly seen in the individual images, without smearing out).
Video files (.tiff) were imported for tracking into the software Fiji/ImageJ. For this, we used the manual tracking plugin, for which we calibrated the time interval and the x/y calibration according to the parameters of the microscopic recordings. With the software, we obtained the speed and the x/y-coordinates of the microbes for each video frame by clicking on their center with the mouse pointer.
2.4. Errors
We obtained images of 2560 × 1920 pixels with our system. Due to the digital zoom in the Fiji software, we could determine the location of the microbes with a ±1 pixel (=0.11 µm) accuracy, which is, depending on the frames rate of the videos, equivalent to a ±0.62 µm/s accuracy for E. coli and B subtilis, ±0.73 µm/s for P. halocryophilus, and ±0.76 µm/s for P. haloplanktis. Another source of error in tracking is that we only tracked the microbes in two dimensions, even though they were in a 10 micrometer (about ten microbial body lengths) deep medium. This limitation applies to all observed microbes. Another potential error was that for observations of this kind, externally induced shifts by convection must be prevented. Therefore, we observed a reference sample (a milk solution in the motility medium) on the stage with the same configuration of heating source, thermal chamber walls, and microscope, and adjusted until no convective motion in the medium occurred before the start of each run.
2.5. Classes and Features of Data
We worked with aggregates (defined as the averaged properties of the observed cells over the length of the recordings) of features over the recorded time to avoid biases caused by methodological errors, such as variations of exposure time from recording to recording. At the same time, it is important to provide the tracking and classification algorithms only with information that does not give them a priori information about the observed species, such as by an increase in the occurrence of a specific species in a certain coordinate range. Therefore, no coordinate data was passed on to the algorithms.
Two classes were used to classify the cells and the abiotic particles with the supervised learning algorithms: The first class consisted of the four observed bacterial species at 25 °C, and the second class consisted of the simulated mobility traces due to Brownian motion at the same temperature. Subsequently, the cells were classified into four classes representing the used species.
The microbes and the simulated particles were tracked using individual motility/mobility traces over ten seconds. Depending on the species and exposure time, between 54 and 69 consecutive images were taken for each microbial pathway. Two consecutive images were needed to determine the speed and three consecutive images for the angle change (e.g., a ten-second image sequence of 54 consecutive images resulted in 53 speed information and 52 angle information units for each microbial pathway).
To ensure that the different exposure times do not provide the classifiers with a priori information, we determined the average values of the microbial pathway features over ten seconds. We intended to select properties that classify the motion as simple as possible to minimize the computational requirements. A too complex analysis of the motion behavior (including complex consecutive motion properties) would risk overfitting certain features that might work well for our data set but fail for some other data set. A general problem with classifications is that it is not known in advance what the most suitable features are for classification. In our case, this means: what are the motility features leading to the best accuracy of classifying microbial species, and how to classify best biotic vs. abiotic particles? The features we used were: (1) mean speed of the particle, (2) standard deviation of mean particle speed, (3) relative amount of clockwise directional change, (4) relative amount of counterclockwise directional change, (5) relative amount of low (<20°) directional change (6) average directional changing angle, (7) standard deviation of directional changing angles, (8) relative amount of low speed (set arbitrarily at < 1.4 µm/s and used as a tunable parameter), (9) mean distance of particles after ten seconds.
Measured angles during movement are taken from three consecutive coordinates based on the motility behavior of the microbes: The angles here are the angles that form the two sides
and
. 1 is the coordinate of the microbe in the first video frame, 2 in the following one, and 3 in the one thereafter (see
Figure 2).
Besides the features (1), (2), (6), and (7) (mean speed and angles, respectively, and their standard deviations), which are the most apparent properties to quantify the microbial pathways, there are additional ways to quantify motility patterns. Bacteria moving near surfaces experience hydrodynamic forces that attract them towards the surface and cause them to move in circular trajectories [
15]. Due to this phenomenon, we also quantified the direction of the feature with (3) and (4). In feature (5), we defined a directional change of <20° as a straight movement. The definition of low speed <1.4 µm/s is added as a feature (8). This is an empirical parameter determined after the observations were concluded, and it is suitable for classification because of its statistically significant variations between bacterial species. The speed of 1.4 µm/s equals approximately the mean displacement per second of a particle with a diameter of 0.5 µm at 25 °C. Feature (9) is a way to easily quantify the straightness of an object’s average movement over the observation period.
It is crucial to select features that contribute most to the predicted output. Irrelevant features in the data can decrease the accuracy of the classification. Feature selection means less redundant data and, therefore, less impact from noise, more accurate models, and faster training of algorithms. To remove irrelevant features, we implemented an exhaustive feature selector for sampling and evaluating all possible feature combinations to classify the species. Calculating all nine features from the observed coordinates of the respective frames required very little computer power.
We used 10-fold cross-validation for all algorithms, during which the dataset was shuffled randomly and split into ten groups. Each group was used once as a test group, with the remaining groups being used as training data. A model fitting the training set was evaluated in the test set. The result of the evaluation was kept. The model was then discarded, and the next group was used as a test group. Then, the quality of the algorithm was evaluated by the respective individual test results. The accuracy of the algorithm was then determined for all test groups. This increases the accuracy of estimating how the models are expected to perform with data not used during the training of the model. We used Python 3.7 in the scientific environment Spyder for running the algorithms. The classifiers were Logistic Regression Classifiers (LRC), Linear Discriminant Analysis (LDA), K-Nearest Neighbor Classifiers (KNN), Classification and Regression Trees (CART), Naïve Bayes (NB), and Support Vector Machines (SVM). For detailed information on these classifiers, see
Supporting Information and [
16].
2.6. Simulation of Brownian Motion
Einstein (1905) developed a statistical mechanics theory for Brownian motion [
17]. It states that the mean displacement of a Brownian particle is proportional to the square root of the elapsed time. That is where the mean displacement squared is
, the diffusivity is
and the elapsed time is
. The diffusivity is given by:
with
(Boltzmann’s constant),
being the temperature, and μ being the ratio of the particle’s drift speed to an applied force, which can be calculated by
.
is the dynamic viscosity of the fluid, and
is the particle’s radius. This leads to:
Based on these equations, we simulated the movements of particles due to Brownian motion at the temperatures investigated with Matlab. We ran simulations with viscosities for water at 25 °C (η = 0.89 mPa*s) assuming spherical particles, with radii between 0.25 µm and 1 µm. The advantage of the simulations is that one can easily vary all relevant parameters. An additional benefit is that massive data sets can be quickly generated to train supervised learning procedures.