**1. Introduction**

Non-coding genome sequences, including enhancers, promoters, and other regulatory elements, play important roles in transcriptional regulation. In particular, through enhancer-promoter interactions (i.e., physical contacts), the enhancers and promoters coordinately regulate gene expression. Although enhancers can be distal from promoters in the genome, they are brought close to, and possibly in contact with, each other in the 3-D space through chromatin looping. Some enhancers even bypass adjacent promoters to interact with the target promoters in response to histone or transcriptional modifications on the genome. An accurate mapping of such distant interactions is of particular interest for understanding gene expression pathways and identifying target genes of GWAS loci [1–3].

Experimental methods based on chromosome conformation capture (3C, 4C, and Hi-C) or extensions that incorporate ChIP-sequencing such as paired-end tag sequencing (ChIA-PET) are, however, costly, and the results are only available for a few cell types [4–7]. Computational tools offer an alternative by utilizing various DNA sequence and/or epigenomic annotation data to predict EPIs with machine learning models built from experimentally obtained EPI data [8–11].

Whalen, et al. [11] reported that a gradient boosting method, called TargetFinder, accurately distinguished between interacting and non-interacting enhancer-promoter pairs based on epigenomic profiles. They included histone modifications and transcription factor binding (based on ChIP-seq), and DNase I hypersensitive sites (DNase-seq) with a focus on distal interaction (>10 kb) in high resolution. The idea was further extended to predict EPIs solely from local DNA sequence data and achieved high prediction accuracy [12–14].In particular, convolutional neural networks (CNNs), known for capturing stationary patterns in data with successful applications in image and text recognition [15,16], were shown to perform well in predicting EPIs based on DNA sequence alone. A natural question is whether CNNs can further improve the prediction performance with regional epigenomic features. It is also noted that for DNA sequence data, differing from for images, a simple CNN model seemed to perform well [14]; a similar conclusion was drawn for other biological data [17].

On the other hand, two recent studies pointed out an experimental design issue of randomly splitting the original data for training and testing as adopted by most, if not all, previous studies: many promoters interact with multiple, possibly overlapping, enhancers concurrently. Such pairs, some in the training data while others in the test data, are not independent, leading to possibly over-fitting a model and over-estimating its predictive performance [18,19] Since promoters primarily interact with enhancers on the same chromosome, the problem could be avoided by having different chromosomes split into the training and test data. Based on such a valid training and test data, Xi and Beer [19] concluded that local epigenomic features around enhancers and promoters alone were not informative enough to predict EPIs and they suggested to re-evaluate similar studies on EPI prediction. Although combining local epigenomic features and sequence data was found to improve prediction in a recent study [20] it was based on a random splitting of the whole dataset into training and test data, thus possibly suffering from inflated performance. After correcting for such experimental design bias, we would like to address two important and interesting questions: (1) whether or not local enhancer-promoter sequences are more informative than corresponding local epigenomic features; (2) whether or not we can gain more information by combining local sequence and epigenomic annotations.

Here, we report our extensive study of local sequence and epigenomic data for predicting long-range EPIs; in addition to more recent and popular CNNs and gradient boosting as adopted in most of prior studies, we also considered more traditional feed-forward neural networks (FNNs) [21,22]. After avoiding the previous experimental design issue, we found that local sequence data alone were insufficient to predict EPIs well; in comparison, local epigenomic signals, albeit not highly predictive either, were more informative than sequence data. Furthermore, combining local sequence data with local epigenomic profiles did not improve over using local epigenomic data alone. These results may be useful for future studies.
