**1. Introduction**

Land tenure data contain geospatial, anthropological and socioeconomic attributes since it builds on both the physical delineation of land and the identification of social relations governing land use, land access and land ownership [1,2]. Collecting land tenure data is, however, neither administratively straightforward nor always technically feasible or financially affordable. There are even many challenges which make collecting land tenure data complex, such as data availability and data accessibility [3,4]. However, new data collection technologies, including, amongst others, voluntary geographic information in connection to social media technologies, Unmanned Aerial Vehicles (UAVs) and big data mining may overcome some of these barriers. Yet, there is a dearth of the methodological reflections in how such geospatial technologies can identify and formalize land tenure relations. What these technologies are currently able to do includes: (1) underpinning land tenure-enabling environments; (2) mining land tenure data; (3) transforming land tenure relations [5]. However, the quality of all these heavily depends on the completeness and full access of the terrain and the data sources. In many cases these basic criteria cannot be guaranteed, leaving the land tenure information scarce [1].

A promising and yet unexplored technology to derive socio-legal land tenure information is Earth Observation technology. The utilization of Earth Observation (EO) data has increased significantly in many disciplines. Literature shows applications ranging from environmental and regional studies to economics, and peace and conflict research, for example [6–9]. More specific to the interest of this paper, there is growing body of literature on methods to extract and map cadastral boundaries using EO data [10–19]. However, this literature rarely effectively bridges the knowledge gap between social land tenure and spatial descriptions of boundaries. In other words, the (even automated/machine-learning based) spatial descriptions do not identify the underlying social or legal relations to land, such as effective land ownership, private or communal land use or land access rights or presumed land claims.

The methods in detecting, extracting and identifying land tenure relations always require both geometric or topographic characteristics and ground-truth information of land tenure. However, spatially explicit land tenure relations through EO data remains one of the foremost challenges. As a societal institution, land tenure has a great influence on how people decide on land use. Such decisions are observable in land cover changes and spatio-temporal patterns of land use (inducted from similarities, differences, repetitions or sudden changes in space and time). The dynamics of landscape changes are intrinsically linked to land tenure relations and decisions [20–22]. Detecting and extracting physical features is possible by connecting spectral reflectance values, shapes, and texture features of ground components to be pre-defined. By sampling and generalizing these connections, one can construct algorithms, which detect and predict spatio-temporal patterns with EO data, such as the (rate of) land fragmentation, land ceiling and urban encroachment [16,23,24]. Such spatio-temporal processes could be connected to land tenure information if these are aligned with automated identification and reconstruction of cadastral boundaries. For example, the morphology of a cadastral boundary is associated with the spatial nature of land tenure on the aspects of, physical realm of land interests, temporal practices of land use rights and the legal nature of boundaries [25].

Then, how do we derive the features or characteristics of land tenure if we only have access to the physical objects or spectral changes in objects in time and in space? According to [26], land tenure aspects may cross multiple spatially observable boundaries in a given landscape. Additionally, tenure and land right boundaries are also not always visible through specific elements in the landscape or through specific spectral reflectance values. One still needs to combine the location of specific landscape elements to alternative data source, such as agricultural census data at the regional or national scale, and/or household surveys and a participatory mapping at the local scale [1,27]. Nevertheless, what if these locally collated datasets are not available? Is it in such cases still possible to rely on EO data only, combined with a set of basic assumptions about the spatial nature of land tenure? We hypothesize that this is possible; however, this requires a set of fundamental proxies connected to specific documented knowledge on land tenure. This article will describe how this is possible and under which conditions this is possible.

The first challenge to overcome this problem is to address the degree of semantic information connected to spatial information. When it comes to extracting socio-spatial aspects of land tenure using EO data, the formalized and proven semantic rules do not yet exist. Or more precisely, the rules and assumptions, which induce a land tenure relation type, do not yet exist. EO data only distinguishes "low-level semantic features" of land cover information such as physical features, spatial objects and configuration of ground components. In contrast, land tenure information requires "high-level semantic features" connected to knowledge-based information, and reflecting institutionalized human-land relationships and based upon the varying human socio-economic activities on land such as land use and ownership trajectories. In other words, the low-level semantic features provided by direct EO data acquisition methods are insufficient for the derivation of land tenure relations. One needs some sort of socialization of the pixels, i.e., a high-level semantic data collection and interpretation procedure which represents knowledge epitomized by indirect access to EO data. In practice, there is a discrepancy between the levels of detected low and high-level semantic features and it is labelled as the "semantic gap" [24,28,29]. Therefore, it is important see how the process of socialization of pixels can take

place and how EO data can be (re-)interpreted into semantic land tenure relations with a rational and rigorous methodology. Only then, it is possible to identify, bridge and close the semantic gap.

Hence, this paper makes a review of the challenges posed by the identification of land tenure relations from Earth Observation data. In order to overcome some of these challenges, we propose to use a mix of methods and information fusion to identify proxies that may help derive unknown land tenure relations. This illustrates our approach by constructing proxies for land tenure relations over North Korea. The research questions are:


We first present the conceptual foundations of EO data applications for identifying land tenure relations. The next section addresses substantive and methodological considerations. Then, we explore a set of proxies in relation to five land tenure related questions. Finally, the conclusion gives brief summary and provides recommendation on how to proceed with this research.
