*5.6. Sequencing*

Prepared libraries were sequenced on an Illumina MiSeq platform, 2 × 300 sequence reading in paired ends mode. The run contained PhiX libraries (PhiX Control Kit v3, Illumina® San Diego, CA, USA), to serve as an internal positive quality control.

#### *5.7. 16S rRNA Bacterial Gene Analyses*

Reads from the sequencing run were imported into the QIIME 2 version 2019.10 artifact [77]. Then sequences were trimmed at first 21 bp for forward and reverse reads and truncated to 250 for forward reads and 240 for reverse reads. Reads were then denoised with DADA2 (Divisive Amplicon Denoising Algorithm v. 2) [78] and merged together. Sequences were aligned with MAFFT (Multiple Alignment using Fast Fourier Transform) [79] and used to construct a phylogeny with fasttree [80]. Rarefaction was performed with at least 14,096 sequences per sample for subsequent stages of the analysis. Taxonomic assignments of representative sequences were conducted using q2-feature-classifier with the sklearn classifier [81] trained on SILVA 132 database at 99% similarity level [82].

#### *5.8. ITS2 Region Analyses*

Analogous steps as for "*16S rRNA bacterial gene analyses*" were performed for the ITS2 analysis. Reads were trimmed and denoised separately, they were then merged for further analysis. Reads were then trimmed at first 21 bp for forward and reverse reads and truncated to 300 for forward reads and 215 for reverse reads. Taxonomic assignments were conducted analogous to 16S analysis with classifier trained on ITS gene clustered at 99% similarities within UNITE database released 04.02.2020 containing all eukaryotes [83]. Sampling depth was set to 34,100 sequences for the diversity analyses.

#### *5.9. Analyses of Amplicon from Honeybee Samples*

Amplicons for the 16S region and ITS2 were sequenced using the Illumina MiSeq platform. Data were trimmed and merged. For 16S analyses only full-length reads over 229 bp with medium length of all sequences at 414 bp were used. Sequences were assigned to taxonomy using classifier trained on SILVA 132 database with minimum similarity 90% of read matching to the reference. For ITS2 analyses only full-length reads over 269 bp with medium length of all sequences at 337 bp were used. Sequences were assigned to taxonomy using classifier trained on all eukaryotes UNITE database v8.2 with the minimum similarity of 90% of the read matching to the reference [84,85].

#### *5.10. Screening for Pathogen Infected Honeybee Samples*

Isolated DNA was used as the template for screening pathogens: *Nosema apis*, *Nosema ceranae*, *Nosema bombi*, tracheal mite (*Acarapis woodi*), any organism in the parasitic order Trypanosomatida, including Crithidia spp. (i.e., *Crithidia mellificae*), neogregarines including *Mattesia* and *Apicystis* spp. (i.e., *Apicistis bombi*), using PCR techniques described earlier [67,86–88]. Primers used for pathogen detection are listed in the Supplementary Materials Table S5. Detection of the pathogens in honeybee samples.

#### *5.11. Statistical Analysis*

Analyses of correlations and Principal Component Analysis (PCA) were performed using software Statistica (version 12.0, StatSoft Inc., Oklahoma; USA) at the significance level of α = 0.05. The analysis was used to determine the relationships between the bee sample and the bacterial group, plant group, and fungi group. The optimum number of principal components obtained in the PCA analysis was established based on Cattell's criterion. The data matrix for the PCA of the Polish bees had 37 columns and 6 rows and of the world samples of bees had 61 columns and 6 rows (UK, Spain, Greek, Thailand and Poland) had 61 columns and 6 rows. The input matrix was auto-scaled. One-way ANOVA was performed to establish the correlation between honeybees' health status and the bacteria, fungi and plant pollen detected. For the ANOVA test, the level of statistical significance was assumed to be α = 0.05, and the same level of statistical significance was used in all comparisons. The results for which *p* values are equal to, or less than, 0.05 were obtained differ significantly from each other.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/2076-0 817/10/3/381/s1, **Table S1.** Taxonomy analysis of 16S and ITS amplicon sequencing. **Table S1. A.** Taxonomy of 16S amplicon sequencing. Six sub-groups of three bees each were analysed. Table presents relative abundance of each of the listed bacterium (numbers as percentages) in relation to the entire amplicon. **Table S1. B.** Taxonomy of ITS2 amplicon sequencing. Six subgroups of three bees each were analysed. Hits for plants and fungi were grouped into separate fractions. % for either Plant or Fungi is a sum of all counts for each fraction (highlighted in grey). **Table S2. A.** Taxonomy of 16S amplicon sequencing. Four subgroups of two bees (UK, GR, ES) or four bees (TAI) each were analysed. Table presents relative abundance of each of the listed bacterium (numbers as percentages) in relation to the entire amplicon. **Table S2. B.** Taxonomy of ITS2 amplicon sequencing. Four subgroups of two bees (UK, GR, ES) or four bees (TAI) each were analysed. Hits for plants and fungi were grouped into separate fractions. % for either Plant or Fungi is a sum of all counts for each fraction (highlighted in grey). **Table S3.** Detection of the pathogens in honeybee samples. **Table S4.** The correlation between honeybees' health status and the detected bacteria. Data with significant differences are written in bold. The ANOVA test, α = 0.05; p ≤ 0.05. **Figure S3.** Loading plot (a) and score plot (b) of the principal components analysis (PC1 and PC2) carried out on the analytical data of the taxonomy detected in all Polish bees (PL1 to PL6). **Table S5.** The list of primers and PCR conditions.

**Author Contributions:** A.A.P. (senior author), designed the experiments, analysed data, and wrote the paper. P.L., analysed data, prepared figures, supplemental information, and methods. P.J.H., analysed data, especially of metabiome and parasites, co-wrote the paper. A.P. co-wrote the paper. J.M.-M., conducted laboratory work for sequencing library preparation, sequencing, and detection of pathogens. Ł.G., analysed raw data from metabiom sequencing, prepared tables, co-wrote the paper. D.S., analysed data from metabiom sequencing, co-wrote the paper. S.G., performed genetic analyses. D.Z., drafted and made a correction of the manuscript. M.G., and R.R., analysed data using PCA, interpreted the results, co-wrote the paper. P.K., analysed Thai data. R.M.H., and M.H.P., analysed UK data, co-wrote the paper. A.L.S., analysed data, prepared figures and tables and co-wrote

the paper. All authors read and approved the final manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** The publication of the article was financed by the Polish National Agency for Academic Exchange under the Foreign Promotion Programme (NAWA), for AAP (bee-research.umcs.pl; Api Lab UMCS PPI/PZA/2019/1/00039). Honeybees were collected during the Miniatura 2 project ID 418332 founded by the NCN for AAP and the first research plans were possible thanks to EU: GB-TAF-7137 SYNTHESYS project for AAP. Work in the Starosta lab was funded by EMBO Installation Grants 2017, and POIR. 04.04.00-00-3E9C/17-00 for ALS. Work in the Hurd lab was funded by the BBSRC (BB/L023164/1) and granted to PJH.

**Institutional Review Board Statement:** In Europe, the EU Directive 2010/63/EU on the protection of animals used for scientific purposes laid the down the ethical framework for the use of animals in scientific experiments. The scope of this directive also includes specific invertebrate species, i.e. cephalopods, but no insects. Thus, according to European legislation no specific permits were required for the described studies.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are openly available in https://www. ncbi.nlm.nih.gov/bioproject/PRJNA686953 (accessed on 21 March 2021).

**Data Citation:** Gancarz: M., Hurd, P.J., Latoch, P., Polaszek, A., Michalska-Madej, J., Grochowalski, Ł., Strapagiel, D., Gnat, S., Załuski, D., Rusinek, R., Starosta, A.L.., Krutmuang, P., Martín Hernández, R., Higes Pascual, M., Ptaszy ´nska, A.A. (2021). Dataset of the next-generation sequencing of variable 16S rRNA from bacteria and ITS2 regions from fungi and plants derived from honeybees kept under anthropogenic landscapes. Data in Brief.

**Lead Contact:** Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Aneta A. Ptaszy ´nska (aneta.ptaszynska@poczta.umcs.lublin.pl).

**Conflicts of Interest:** The authors declare no competing interest.

#### **References**

