On-Demand Processing of Data Cubes from Satellite Image Collections with the gdalcubes Library
Abstract
:1. Introduction
2. Representing Satellite Imagery as On-Demand Data Cubes with gdalcubes
2.1. Data Cubes vs. Image Collections
- (i)
- Spatial dimensions refer to a single spatial reference system (SRS);
- (ii)
- Cells of a data cube have a constant spatial size (with regard to the cube’s SRS);
- (iii)
- The spatial reference is defined by a simple offset and the cell size per axis, i.e., the cube axes are aligned with the SRS axes;
- (iv)
- Cells of a data cube have a constant temporal duration, defined by an integer number and a date or time unit (years, months, days, hours, minutes, or seconds);
- (v)
- The temporal reference is defined by a simple start date/time and the temporal duration of cells;
- (vi)
- For every combination of dimensions, a cell has a single, scalar (real) attribute value.
2.2. Constructing User-Defined Data Cubes from Image Collections
- Spatial reference system;
- Spatiotemporal extent;
- Spatial size and temporal duration of cells (resolution);
- Spatial image resampling method, and;
- Temporal aggregation method.
- Allocate and initialize an in-memory chunk buffer for the resulting chunk data (a four-dimensional bands, t, y, x array);
- Find all images of the collection that intersect with the spatiotemporal extent of the chunk;
- For all images found:
- 3.1.
- Crop, reproject, and resample according to the spatiotemporal extent of the chunk and the data cube view and store the result as an in-memory three-dimensional (bands, y, x) array;
- 3.2.
- Copy the result to the chunk buffer at the correct temporal slice. If the chunk buffer already contains values at the target position, update a pixel-wise aggregator (e.g., mean, median, min., max.) to combine pixel values from multiple images which are written to the same cell in the data cube.
- Finalize the pixel-wise aggregator if needed (e.g., divide pixel values by n for mean aggregation).
2.3. Data Cube Operations
2.4. The gdalcubes Library
3. Study Cases
3.1. Constructing a Multi-Sensor Data Cube from Precipitation, Vegetation Data, and Land Surface Temperature Data
3.2. Processing Sentinel-2 Time Series
4. Discussion
4.1. Interactive Analyses of Large EO Datasets
4.2. Scalable and Distributed Processing in the Cloud
4.3. Interfaces to Other Software and Languages
4.4. Limitations
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Copernicus—The European Earth Observation Programme. Available online: https://ec.europa.eu/growth/sectors/space/copernicus_en (accessed on 14 June 2019).
- Lillesand, T.; Kiefer, R.W.; Chipman, J. Remote Sensing and Image Interpretation; John Wiley & Sons: New York, NY, USA, 2015. [Google Scholar]
- Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
- Lewis, A.; Oliver, S.; Lymburner, L.; Evans, B.; Wyborn, L.; Mueller, N.; Raevksi, G.; Hooke, J.; Woodcock, R.; Sixsmith, J.; et al. The Australian Geoscience Data Cube—Foundations and lessons learned. Remote Sens. Environ. 2017, 202, 276–292. [Google Scholar] [CrossRef]
- Giuliani, G.; Chatenoux, B.; Bono, A.D.; Rodila, D.; Richard, J.P.; Allenbach, K.; Dao, H.; Peduzzi, P. Building an Earth Observations Data Cube: Lessons learned from the Swiss Data Cube (SDC) on generating Analysis Ready Data (ARD). Big Earth Data 2017, 1, 100–117. [Google Scholar] [CrossRef]
- Lu, M.; Appel, M.; Pebesma, E. Multidimensional Arrays for Analysing Geoscientific Data. ISPRS Int. J. Geo-Inf. 2018, 7, 313. [Google Scholar] [CrossRef]
- Warmerdam, F. The geospatial data abstraction library. In Open Source Approaches in Spatial Data Handling; Springer: Berlin/Heidelberg, Germany, 2008; pp. 87–104. [Google Scholar]
- Baumann, P.; Dehmel, A.; Furtado, P.; Ritsch, R.; Widmann, N. The Multidimensional Database System RasDaMan. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, SIGMOD ’98, Seattle, WA, USA, 1–4 June 1998; ACM: New York, NY, USA, 1998; pp. 575–577. [Google Scholar]
- Stonebraker, M.; Brown, P.; Zhang, D.; Becla, J. SciDB: A Database Management System for Applications with Complex Analytics. Comput. Sci. Eng. 2013, 15, 54–62. [Google Scholar] [CrossRef]
- Appel, M.; Lahn, F.; Buytaert, W.; Pebesma, E. Open and scalable analytics of large Earth observation datasets: From scenes to multidimensional arrays using SciDB and GDAL. ISPRS J. Photogramm. Remote Sens. 2018, 138, 47–56. [Google Scholar] [CrossRef]
- Open Data Cube. Available online: https://www.opendatacube.org (accessed on 23 May 2019).
- Pangeo—A Community Platform for Big Data Geoscience. Available online: https://pangeo.io (accessed on 23 May 2019).
- Hoyer, S.; Hamman, J. xarray: ND labeled Arrays and Datasets in Python. J. Open Res. Softw. 2017, 5, 10. [Google Scholar] [CrossRef]
- Rocklin, M. Dask: Parallel computation with blocked algorithms and task scheduling. In Proceedings of the 14th Python in Science Conference, Austin, TX, USA, 6–12 July 2015; pp. 126–132. [Google Scholar]
- R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2008; ISBN 3-900051-07-0. [Google Scholar]
- Hijmans, R.J. raster: Geographic Data Analysis and Modeling, R Package Version 2.9-5; 2019. Available online: https://CRAN.R-project.org/package=raster (accessed on 27 June 2019).
- Pebesma, E. stars: Spatiotemporal Arrays, Raster and Vector Data Cubes, R Package Version 0.3-1; 2019. Available online: https://CRAN.R-project.org/package=stars (accessed on 27 June 2019).
- Baumann, P.; Rossi, A.P.; Bell, B.; Clements, O.; Evans, B.; Hoenig, H.; Hogan, P.; Kakaletris, G.; Koltsida, P.; Mantovani, S.; et al. Fostering Cross-Disciplinary Earth Science Through Datacube Analytics. In Earth Observation Open Science and Innovation; Mathieu, P.P., Aubrecht, C., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 91–119. [Google Scholar] [Green Version]
- Nativi, S.; Mazzetti, P.; Craglia, M. A view-based model of data-cube to support big earth data systems interoperability. Big Earth Data 2017, 1, 75–99. [Google Scholar] [CrossRef] [Green Version]
- Strobl, P.; Baumann, P.; Lewis, A.; Szantoi, Z.; Killough, B.; Purss, M.; Craglia, M.; Nativi, S.; Held, A.; Dhu, T. The Six Faces of The Datacube. In Proceedings of the 2017 Conference on Big Data from Space (BIDS’ 2017), Toulouse, France, 28–30 November 2017; pp. 28–30. [Google Scholar]
- Gebbert, S.; Leppelt, T.; Pebesma, E. A Topology Based Spatio-Temporal Map Algebra for Big Data Analysis. Data 2019, 4, 86. [Google Scholar] [CrossRef]
- Rew, R.; Davis, G. NetCDF: An interface for scientific data access. IEEE Comput. Graph. Appl. 1990, 10, 76–82. [Google Scholar] [CrossRef]
- SQLite. Available online: https://www.sqlite.org (accessed on 24 May 2019).
- Stenberg, D.; Fandrich, D.; Tse, Y. libcurl: The Multiprotocol File Transfer Library. Available online: http://curl.haxx.se/libcurl (accessed on 24 May 2019).
- Tinyexpr. Available online: https://github.com/codeplea/tinyexpr (accessed on 24 May 2019).
- Date. Available online: https://howardhinnant.github.io/date/date.html (accessed on 24 May 2019).
- Tiny-process-library. Available online: https://gitlab.com/eidheim/tiny-process-library (accessed on 24 May 2019).
- JSON for Modern C++. Available online: https://github.com/nlohmann/json (accessed on 24 May 2019).
- Copernicus Open Access Hub. Available online: https://scihub.copernicus.eu (accessed on 24 May 2019).
- Inglada, J.; Christophe, E. The Orfeo Toolbox remote sensing image processing software. In Proceedings of the 2009 IEEE International Geoscience and Remote Sensing Symposium, Cape Town, South Africa, 12–17 July 2009; Volume 4, pp. 76–82. [Google Scholar]
- NumPy C-API. Available online: https://docs.scipy.org/doc/numpy/reference/c-api.html (accessed on 14 June 2019).
- Neteler, M.; Bowman, M.; Landa, M.; Metz, M. GRASS GIS: A multi-purpose Open Source GIS. Environ. Model. Softw. 2012, 31, 124–130. [Google Scholar] [CrossRef]
- Walt, S.V.D.; Colbert, S.C.; Varoquaux, G. The NumPy Array: A Structure for Efficient Numerical Computation. Comput. Sci. Eng. 2011, 13, 22–30. [Google Scholar] [CrossRef] [Green Version]
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 |
Operator | Description |
---|---|
raster_cube | Create a raster data cube from an image collection and a data cube view |
reduce_time | Apply a reducer function independently over all pixel time series |
reduce_space | Apply a reducer function independently over all spatial slices |
apply_pixel | Apply an arithmetic expression on band values over all pixels |
filter_pixel | Filter pixels with a logical predicate on one or more band values |
join_bands | Stack the bands of two identically shaped cubes in a single cube |
window_time | Apply a reducer function or kernel filter over moving windows for all pixel time series |
write_ncdf | Export a data cube as a netCDF file |
chunk_apply | Apply a user-defined function over chunks of a data cube |
MOD13A2 | GPM | MOD11A1 | |
---|---|---|---|
Selected Variables | NDVI | liquid_accum | LST_DAY |
Spatial Resolution | 1 km × 1 km | 0.1 × 0.1 | 1 km × 1 km |
Area of Interest | global (land only) | global (60 N–60 S full) | Europe (land only) |
Temporal Resolution | 16 days | daily | daily |
Time Range | 2014-01-01–2019-01-01 | 2014-01-01–2019-01-01 | 2014-01-01–2019-01-01 |
File Format | HDF4 | GeoTIFF (zip compressed) | HDF4 |
SRS | MODIS sinusoidal | Lat/Lon grid | MODIS sinusoidal |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Appel, M.; Pebesma, E. On-Demand Processing of Data Cubes from Satellite Image Collections with the gdalcubes Library. Data 2019, 4, 92. https://doi.org/10.3390/data4030092
Appel M, Pebesma E. On-Demand Processing of Data Cubes from Satellite Image Collections with the gdalcubes Library. Data. 2019; 4(3):92. https://doi.org/10.3390/data4030092
Chicago/Turabian StyleAppel, Marius, and Edzer Pebesma. 2019. "On-Demand Processing of Data Cubes from Satellite Image Collections with the gdalcubes Library" Data 4, no. 3: 92. https://doi.org/10.3390/data4030092
APA StyleAppel, M., & Pebesma, E. (2019). On-Demand Processing of Data Cubes from Satellite Image Collections with the gdalcubes Library. Data, 4(3), 92. https://doi.org/10.3390/data4030092