*2.3. RNAseq*

Library preparation and Next Generation Sequencing was performed at CD Genomics (Shirley, New York, NY, USA). The concentration and quality of the RNA samples was again measured using Agilent 2100 Bioanalyzer before library preparation. After normalization, rRNA was depleted from the total RNA sample using the Ribo-Zero rRNA removal kit and followed by purification and fragmentation steps. To construct the sequencing libraries, a strand-specific cDNA synthesis was performed, the 3 ends were adenylated and adaptors were ligated. The resulting libraries were subjected to a quality control and normalization process. Paired-end sequencing was performed with Illumina HiSeq X Ten PE150 (Illumina, San Diego, CA, USA).

The data presented in this study (raw data of microarray and RNA-seq experiments) are openly available in Gene Expression Omnibus (GEO) under the series reference GSE159036.

### *2.4. CircRNA Detection and Quantification in RNA-Seq Data*

First, quality of the sequencing was checked and, then, reads were mapped to the human genome (hg19, downloaded from UCSC Genome Browser [23]) using STAR version 2.5.4b (https://code.google. com/archive/p/rna-star/) [24] or BWA version 0.7.17-1 (http://maq.sourceforge.net) [25]. Subsequently, circRNA prediction was performed by CIRCexplorer2 version 2.3.3 (https://circexplorer2.readthedocs. io/en/latest/) [26] and CIRI2 version 2.0.5 (https://sourceforge.net/projects/ciri/) [27] adhering to the recommendation by the authors. Moreover, only circRNAs supported by both algorithms were considered as bona fide circRNAs and used in subsequent analyses, a method that has been described in the literature by other authors [28]. CircRNA expression was based on back-spliced junctions-spanning reads according to CIRI2 quantification. After filtering out circRNAs with low expression (sum of reads < 10), di fferential expression analysis was performed using DESeq2 version 1.28.1 (http://www. bioconductor.org/packages/release/bioc/html/DESeq2.html) [29] package for R (version 3.6.3) (https: //www.r-project.org/) in R-studio (Version 1.0.136) (https://rstudio.com/), considering as di fferentially expressed those circRNA showing a *p*-value < 0.05 and a fold-change higher than 2. To select a group of candidates for validation purposes, two additional filters were applied: (1) BaseMean value higher than five (BM > 5) and (2) the transcripts having a read count value of zero in any of the samples were removed. Finally, we selected ten candidate circRNAs having the highest absolute fold-change value.

### *2.5. Linear RNA Detection and Quantification in RNA-Seq Data*

After sequencing and quality control, the Kallisto version 0.44.0 (http://pachterlab.github.io/ kallisto/) [30] algorithm was used for transcript pseudoalignment, identification and quantification. Di fferential expression analysis was performed using DESeq2 package. Before running the DESeq2 algorithm, we applied a filter of a minimum number of reads (sum of number of reads higher than or equal to 10) to remove low expression transcripts. For subsequent analysis, due to the large number of transcripts detected and our small sample size, we added a filter to keep only the most robust transcripts regarding their read count. Therefore, we created a detection filter, as follows: (1) Transcripts detected in both groups—transcripts having at least two reads in 3 4 of samples from each group. (2) Transcripts detected only in one of the groups—transcripts having at least two reads in 3 4 of samples in the negative group but only in 1 sample from the positive group (detected only in negative group), and transcripts having at least two reads in 3 4 of positive group but only in 1 from negative group (detected only in positive group). Transcripts selected with this detection filter were considered as consistent linear RNAs. Finally, di fferentially expressed transcripts were considered those showing a *p*-value < 0.05 and an absolute fold-change value higher than two. For validation purposes, we selected ten candidate linear RNAs having the highest absolute fold-change value and having an assigned gene name.

All these profiling experiments and data analysis steps, tools and filters are summarized in Figure 1.

### *2.6. cDNA Synthesis and Quantitative PCR*

For the validation of candidate microRNAs, total RNA (10 ng) was reverse transcribed into cDNA using TaqManTM Advanced miRNA cDNA synthesis kit (Applied BiosystemsTM, Foster City, CA, USA). To quantify miRNA expression, we used TaqMan Advanced miRNA assays and TaqMan Fast Advanced Master Mix (Applied BiosystemsTM, Foster City, CA, USA), following the manufacturer's protocol. Assay for hsa-miR-191-5p was used as a reference miRNA for normalization purposes.

**Figure 1.** Summary of the study indicating profiling experiments, data analysis tools, filters and candidate selection criteria. Pos—positive. Neg—negative. DE—differentially expressed. FC—fold-change. BM—BaseMean. BSJ—backspliced junction. Padj—adjusted *p*-value. BWA—Burrows-Wheeler Aligner.

For the validation of candidate circRNA, total RNA (500 ng) was reverse transcribed into cDNA with random primers using the High Capacity cDNA Reverse Transcription Kit (Applied Biosystems, Foster City, CA, USA), following the kit protocol. PCR reaction was set up using 10 ng of cDNA as the template and using Power SYBRGreen Master Mix (Applied BiosystemsTM Foster City, CA, USA) with the following thermal cycling program: A temperature of 50 ◦C for 2 min, 95 ◦C for 10 min, 40 cycles of 95 ◦C for 15 s and 60 ◦C for 1 min, followed by a dissociation curve analysis. Divergent primers were used so that the amplicon spans the backspliced junction (BSJ), as described previously [18]. EEF1A1 and B2M were used as the reference genes, using the mean value of both genes for normalization purposes. The presence of a single-peak in the melting curve indicated the specificity of the amplification.

For the validation of candidate linear RNAs, total RNA (500 ng) was reverse transcribed into cDNA with oligo-dT and random primers using the miScript II RT (Qiagen, Hilden, Germany) kit with the HiFlex buffer, as indicated in manufacturer's protocol. PCR reaction was set up using 10 ng of cDNA as template and using miSscript SYBRGreen PCR kit (Qiagen, Hilden, Germany) with the following thermal cycling program: A temperature of 95 ◦C for 15 min, 40 cycles of 94 ◦C for 15 s, 55 ◦C for 30 s and 70 ◦C for 30 s, followed by a dissociation curve analysis. The amplification of each linear RNA was carried out with QuantiTect primer Assays (Qiagen, Hilden, Germany) (Table S2) and B2M was used as a reference gene for normalization. The presence of a single-peak in the melting curve indicated the specificity of the amplification.

All retrotranscription reactions were run in a Veriti Thermal Cycler (Applied Biosytems, Foster City, CA, USA) and quantitative PCR reactions were performed in a CFX384 Touch Real-Time PCR Detection System (Bio-Rad laboratories, Inc., Hercules, CA, USA).

Prior to run all the validation experiments, a technical validation of the circRNA amplification was carried out. The RT-qPCR amplification products were subsequently purified with the ExoSAP-IT™ PCR Product Cleanup Reagent (Applied Biosystems, Foster City, CA, USA) following the manufacturer's instructions and Sanger sequenced (ABIprism 3130) (Applied Biosystems, Foster City, CA, USA) in order to check for the presence of the BSJ. In addition, the PCR products were also subjected to electrophoresis on agarose gel to confirm the presence of a single amplification product.

The Raw Cq values and melting curves were analyzed in CFXMaestro 1.0 (Bio-Rad laboratories, Inc., Hercules, CA, USA). Expression levels represented as fold-change (FC) were calculated using the 2ˆDDCq method. Statistical analysis was done by R in RStudio on DCq data. Distribution of data was tested with Shapiro–Wilk test and the difference of the distribution was assessed by Student t-test or non-parametric Wilcoxon Rank sum test. In each circRNA and linear RNA dataset, outlier samples were removed before the statistical analysis, identifying those values using boxplot.stat function in R.

### *2.7. Gene Overrepresentation Test*

In order to describe the function of differentially expressed transcripts, the PANTHER overrepresentation test was applied (released 7 November 2019, Panther [31]), based on the Complete Biological Process from Gene Ontology database (released 9 December 2019). As a reference background, we used the list of genes giving rise to detected transcripts defined above. Fisher test was carried out and false discovery rate (FDR) correction was applied to raw *p*-values, taking as a significant threshold FDR values below 0.05.

### *2.8. ROC Curve Analysis*

Based on DCq values obtained by RT-qPCR, we performed a receiver operating characteristic curve (ROC curve) analysis in R environment in RStudio. Three different datasets were used: (1) circRNA validation data, (2) linear RNA validation data and (3) samples in both datasets to test different combination of transcripts. This last dataset was smaller given that we needed to include only samples without any missing value (Table S1). The combination of different transcripts was computed with ROC function of "Epi" package.
