**1. Introduction**

Since the reform and opening-up, drastic urbanization has been taking place in China. In a stark contrast, the development of rural areas, however, is not in concert with that of urban areas, but is greatly lagging behind and restricted. Mass population migration, from rural to urban areas, has given rise to a succession of impacts on rural areas, including population decline, industry recession and land abandonment [1,2]. In 2018, China stepped up its efforts to revitalize rural regions. Building the new style of rural community with better infrastructure is one of the important measures to improve the wellbeing of rural people. Thus, a spatial-explicit understanding of rural settlements regarding their distributions is of critical essence to effective land management and policy making.

Satellite-based earth observation is a key enabler for capturing spatial information of buildings in rural areas. High spatial resolution (HSR) images open new opportunities for slums and informal settlement detection and rural land cover mapping [3,4]. Compared with medium resolution image which mainly offers spectral information (in terms of a single image) [5], using HSR images can leverage both spectral and spatial information. HSR image analysis basically relies on image classification (e.g., pixel-based) and segmentation (e.g., Object-Based Image Analysis (OBIA)) techniques [6,7], with the help of handcrafted features extracted from spectral (e.g., reflectance and spectral indices, like Normalized Difference Vegetation Index (NDVI)) and spatial (texture statistics, morphological profiles, and oriented gradients) [8,9]. With an ever-increasing focus on rural areas, satellite images have been extensively used for rural settlement mapping [10,11]. Nevertheless, applying HSR images to rural settlement detection remains a challenging task due to the following issues. First, the size and spatial distribution of rural settlements varies significantly, e.g., clustered or scattered, because rural planning is changing over time. Second, the intra-class variation makes it difficult to distinguish rural settlements from construction materials when using spectral information alone. Third, when considering large spatial areas, the spectral and spatial responses from ground objects present an extremely complex pattern [8]. In order to discriminate rural settlements, more context information is required in the classification. In previous studies, such as [12,13], landscape metrics were used as the spatial contextual information to identify rural settlements from HSR satellite imagery. These methods exploit tailored segment-based features and have achieved acceptable performance. However, parameters optimization and handcrafted features selection are laborious tasks and are highly hinged upon expert experience, and trial-and-error tests.

Deep learning methods, such as convolutional neural networks (CNNs), have shown great potential for automatically features learning without human intervention. CNNs are able to generate robust feature representations hierarchically and have become increasingly popular in image classification and semantic segmentation [14]. Semantic segmentation for remote sensing data usually refers to extracting terrestrial objects from earth observation images using CNNs model, that is, each pixel is assigned a semantic label in pixel-based classification [15]. The fully convolutional network (FCN) [16] extends CNNs to segmentation, emerging as the preferred scheme for semantic labeling tasks. FCN inputs images of arbitrary sizes into a standard CNN, extract feature maps using layer-wise activation and abstraction, and then output high resolution predictions in an end-to-end fashion. The essential advantages of FCNs are the intrinsic ability to enhance feature representation and the flexibility to accept input images of any size. Previous studies have applied FCN and its variants to detect buildings and settlements [17–19]. It is further found that incorporating contextual relations in CNNs can improve classification accuracy [20,21]. Nevertheless, most of the above-mentioned approaches are designed to extract target objects in urban areas from the standard datasets [22]. In rural areas, built-up areas tend to be sparse and can be easily omitted [23]. Due to the significant differences in the appearance of urban and rural buildings, directly employing existing deep approaches to map rural settlements does not guarantee good performance. In addition, the difficulty in image interpretation increases sharply as the spatial resolution increases. Therefore, we wish to make use of the advantages of deep learning technique to contribute to the area of rural settlements identification in HSR images. By far, only a few studies applied FCNs to extract rural residential areas [24,25], and most of them were limited by the spatial resolution of images or the extent of application. The effectiveness of FCNs in rural settlement mapping using HSR images requires further in-depth examination. In short, it is imperative to develop an effective method to buttress automatic extraction of rural settlements using HSR images.

The overall objective of this paper is to develop a framework for automatically identifying rural settlements in HSR satellite images based on deep learning technique. Our main contributions are: (1) This application introduces a deep FCN method to recognize rural settlements. Specifically, dilated convolutions are used to extract deep features at high spatial resolution. (2) A multiple scale context subnetwork, which adopts a popular squeeze and excitation (SE) module [26] to aggregate multi-scale

context, is exploited to generate discriminative representations. The proposed deep learning-based rural settlement extraction scheme can flexibly take multi-spectral HSR images as input to distinguish different types of rural settlements.

### **2. Study Area and Data**

In this research, eleven towns of Tongxiang County were selected as study area, a typical rural region undergoing rapid rural development and transformation in the Yangtze River delta of China (120◦30 13"E, 30◦41 10"W). Tongxiang, located in the Hangjiahu plain, has a temperate climate with distinct seasonality. Since 2000, several land consolidation projects have been carried out to promote the construction of new countryside. Currently, the construction and renovation of countryside are still ongoing in Tongxiang, so the old scattered low-rise houses are mixed with uniformly planned residential buildings. Therefore, this area is an ideal study area to examine our proposed method. We preliminarily divide these settlements into two categories. Figure 1 shows examples of two types of rural settlements in the study area—low-density settlements and high-density settlements.

**Figure 1.** GaoFen-2 image of Tongxiang study area on July 2016. Example of (**a**) low-density rural settlement and (**b**) high-density rural settlement.


China's GaoFen-2 (GF-2) HSR images were used, comprising four multispectral bands (MSS) with a spatial resolution of 4 m and a panchromatic band (PAN) with a spatial resolution of 1 m. The acquisition time of two images was on July 2016. And we collected the land use data of the study area in 2015 (provided by the Bureau of Land and Resources, Tongxiang, China) to generate ground truth data.
