**Appendix A**

**Figure A1.** Illustration of how the input RS image evolves to the output population count in the ResNet-N by an example image patch. The activations of the first three and the last feature map of each network layer were visualized. The principal component analysis (PCA) dimension-reduction technique [77] was used to compress all feature maps of each layer to 3 RGB channels for visualization. It is shown that the shallow neural layers (Conv1 and Conv2) excavate concrete features such as texture, shape, and edge from natural landscapes. Then, the deep layers (Conv3, Conv4, and Conv5) extract informative abstract features based on the shallow features for population estimation.

> **Table A1.** Source and administrative unit level of census data or total population count for modifying raw population estimation of each year.


**Figure A2.** The movement path of population center in China from 1985 to 2010.

#### **References**

