*4.2. Data Analysis*

Data were imported into and analysed using the *R* statistical programming platform (version 3.4.3, https://cran.r-project.org) [36] using packages shipped with the standard version, together with the following additional packages: data.table [37], nortest [38], formattable [39], org.Hs.eg.db [40].

To match gene expression data to IDR data for proteins in the D2P2 data set, it was first necessary to match an ENSEMBL protein id (from the EMSEMBL database, http://www.ensembl.org/index.html) to each of the genes identified in the RNAseq experiment. This was done by matching entries in the RNAseq data with entries in the org.Hs.eg.db annotation database from which fields for ENSEMBL protein id (ENSEMBLPROT) and gene name (SYMBOL) were extracted and appended to the RNAseq data using ENTREZID as a common key. 18,686 of 23,445 entries in the RNAseq data set were matched and also had identical gene names. This set was used in the further analysis. The annotation for the vast majority of the non-matched genes indicated that they represented non-protein-coding genes, putative protein encoding genes or pseudogenes. 1009 of the 1050 adhesion regulated genes were matched to an ENSEMBL protein id and a control set of 17,612 genes that were not shown to be regulated by adhesion were uniquely matched. Entries for which "SEQID" in the D2P2 IDR data matched the ENSEMBL protein id in sets or subsets of the adhesion-regulated genes or non-regulated genes were used for analysis of the sets or subsets. Only D2P2 entries for IDRs ≥ 30 amino acid residues were used and data for the 9 different IDR predictors were extracted from the database and analysed separately.

Differences in IDR number between test sets and control sets were evaluated statistically by *z*-scores and associated *p*-values calculated from the measured test value compared to the mean of

1000 control values, calculated from 1000 re-samples (with replacement) randomly selected from the control data. The size of the control re-samples was the same as the size of the test set. Differences in IDR length between test sets and control sets were evaluated statistically using a Mann–Whitney test, a non-parametric test appropriate for non-normally distributed data. *p*-values were adjusted for multiple testing using the false discovery rate method.

**Author Contributions:** Conceptualization, G.A. and A.P.W.; Methodology, G.A. and A.P.W.; Formal Analysis, G.A. and A.P.W.; Investigation, G.A. and A.P.W.; Data Curation, G.A. and A.P.W.; Writing-Original Draft Preparation, A.P.W.; Writing-Review & Editing, G.A. and A.P.W.; Visualization, G.A. and A.P.W.; Supervision, A.P.W.; Project Administration, A.P.W.; Funding Acquisition, A.P.W.

**Funding:** This research was funded by the Swedish Cancer Society and the Swedish Research Council.

**Conflicts of Interest:** The authors declare no conflict of interest.
