*2.8. Evaluating Model Performance*

With local sequence and 22 types of epigenomic features in our highly imbalanced dataset, the prediction performance was assessed through an Area Under Receiver Operating Characteristic (AUROC) with test data. Note that we had 21 test datasets corresponding to 21 chromosomes besides chromosomes 8 and 9, which were used as the validation data. Since the numbers of enhancer-promoter pairs varied with chromosomes, we reported a weighted average and standard deviation of the AUROC's across 21 chromosomes, where the weight was the (# of samples on each chromosome)/(total # of samples on 21 chromosomes).

To examine if either sequence or epigenomics model prediction performance could benefit from the other data source, we also evaluated the chromosome-wise test AUROCs for the combined model structure in Figure 4. Data splitting and training configurations (including tuning both structural and training parameters) followed exactly as before. We combined the models with the highest mean AUROCs from the two data sources respectively in Table 4, which were the sequence attention model (central region) and epigenomics basic CNN model. We also conducted the paired *t*-test across the 21 test chromosomes for the difference between two methods (with a null hypothesis *H*<sup>0</sup> that two methods gave the same test AUROC for each chromosome). The *p* values reported here were not adjusted for multiple comparisons, but would remain significant (or insignificant) after the Bonferroni adjustment.
