Overview of `eolearn.core`

eolearn.core is the main subpackage which implements the basic building blocks:

EOPatch,
EOTask,
EONode,
EOWorkflow,
EOExecutor,

and commonly used functionalities.

EOPatch

The first basic object in the package is a data container, called EOPatch.

eopatch

It is designed to store all types of EO data for a single geographical location.
The EOPatch can contain data (of the same location) for multiple times. If the EOPatch contains multiple collections of temporal data, they must have the same temporal axis (the images must correspond to the same time-points).
There is no limit to how much data a single EOPatch can store, but typically it shouldn’t be more than the size of your RAM.

Each EOPatch has an attribute bbox of type sentinelhub.BBox to define its area. The attribute timestamps defines the temporal component of an EOPatch, which is either None (for patches without a temporal dimension) or a list of datetime.datetime objects.

EO data can be divided into categories, called “feature types” according to the following properties:

`FeatureType`	Type of data	Time component	Spatial component	Type of values	Python object	Shape
DATA	raster	yes	yes	float	`numpy.ndarray`	`t x n x m x d`
MASK	raster	yes	yes	integer	`numpy.ndarray`	`t x n x m x d`
SCALAR	raster	yes	no	float	`numpy.ndarray`	`t x d`
LABEL	raster	yes	no	integer	`numpy.ndarray`	`t x d`
DATA_TIMELESS	raster	no	yes	float	`numpy.ndarray`	`n x m x d`
MASK_TIMELESS	raster	no	yes	integer	`numpy.ndarray`	`n x m x d`
SCALAR_TIMELESS	raster	no	no	float	`numpy.ndarray`	`d`
LABEL_TIMELESS	raster	no	no	integer	`numpy.ndarray`	`d`
VECTOR	vector	yes	yes	/	`geopandas.GeoDataFrame`	Required columns `geometry` and `TIMESTAMP`
VECTOR_TIMELESS	vector	no	yes	/	`geopandas.GeoDataFrame`	Required column `geometry`
META_INFO	anything	no	no	anything	anything	anything

Note: t specifies time component, n and m are spatial components (height and width), and d is an additional component for data with multiple channels.

Let’s start by loading an existing EOPatch and displaying it’s content (i.e. features):

[1]:

import os

from eolearn.core import EOPatch

INPUT_FOLDER = os.path.join("..", "..", "example_data")
INPUT_EOPATCH = os.path.join(INPUT_FOLDER, "TestEOPatch")

eopatch = EOPatch.load(
    INPUT_EOPATCH, lazy_loading=False  # Set this parameter to True to load data in memory only when first needed
)

eopatch

[1]:

EOPatch(
  bbox=BBox(((465181.0522318204, 5079244.8912012065), (466180.53145382757, 5080254.63349641)), crs=CRS('32633'))
  timestamps=[datetime.datetime(2015, 7, 11, 10, 0, 8), ..., datetime.datetime(2017, 12, 22, 10, 4, 15)], length=68
  mask_timeless={
    LULC: numpy.ndarray(shape=(101, 100, 1), dtype=uint16)
    RANDOM_UINT8: numpy.ndarray(shape=(101, 100, 13), dtype=uint8)
    VALID_COUNT: numpy.ndarray(shape=(101, 100, 1), dtype=int64)
  }
  vector={
    CLM_VECTOR: geopandas.GeoDataFrame(columns=['TIMESTAMP', 'VALUE', 'geometry'], length=55, crs=EPSG:32633)
  }
  label={
    IS_CLOUDLESS: numpy.ndarray(shape=(68, 1), dtype=bool)
    RANDOM_DIGIT: numpy.ndarray(shape=(68, 2), dtype=int8)
  }
  meta_info={
    maxcc: 0.8
    service_type: 'wcs'
    size_x: '10m'
    size_y: '10m'
  }
  scalar_timeless={
    LULC_PERCENTAGE: numpy.ndarray(shape=(6,), dtype=float64)
  }
  scalar={
    CLOUD_COVERAGE: numpy.ndarray(shape=(68, 1), dtype=float16)
  }
  vector_timeless={
    LULC: geopandas.GeoDataFrame(columns=['index', 'RABA_ID', 'AREA', 'DATE', 'LULC_ID', 'LULC_NAME', 'geometry'], length=88, crs=EPSG:32633)
  }
  mask={
    CLM: numpy.ndarray(shape=(68, 101, 100, 1), dtype=uint8)
    CLM_INTERSSIM: numpy.ndarray(shape=(68, 101, 100, 1), dtype=bool)
    CLM_MULTI: numpy.ndarray(shape=(68, 101, 100, 1), dtype=bool)
    CLM_S2C: numpy.ndarray(shape=(68, 101, 100, 1), dtype=bool)
    IS_DATA: numpy.ndarray(shape=(68, 101, 100, 1), dtype=uint8)
    IS_VALID: numpy.ndarray(shape=(68, 101, 100, 1), dtype=bool)
  }
  label_timeless={
    LULC_COUNTS: numpy.ndarray(shape=(6,), dtype=int32)
  }
  data_timeless={
    DEM: numpy.ndarray(shape=(101, 100, 1), dtype=float32)
    MAX_NDVI: numpy.ndarray(shape=(101, 100, 1), dtype=float64)
  }
  data={
    BANDS-S2-L1C: numpy.ndarray(shape=(68, 101, 100, 13), dtype=float32)
    CLP: numpy.ndarray(shape=(68, 101, 100, 1), dtype=float32)
    CLP_MULTI: numpy.ndarray(shape=(68, 101, 100, 1), dtype=float32)
    CLP_S2C: numpy.ndarray(shape=(68, 101, 100, 1), dtype=float32)
    NDVI: numpy.ndarray(shape=(68, 101, 100, 1), dtype=float32)
  }
)

There are multiple ways how to access a feature in the EOPatch.

[2]:

from eolearn.core import FeatureType

# All of these access the same feature:
bands = eopatch.data["BANDS-S2-L1C"]
# or
bands = eopatch[FeatureType.DATA]["BANDS-S2-L1C"]
# or
bands = eopatch[(FeatureType.DATA, "BANDS-S2-L1C")]
# or
bands = eopatch[FeatureType.DATA, "BANDS-S2-L1C"]

type(bands), bands.shape

[2]:

(numpy.ndarray, (68, 101, 100, 13))

Vector features are handled by geopandas:

[3]:

eopatch[FeatureType.VECTOR, "CLM_VECTOR"].head()

[3]:

	TIMESTAMP	VALUE	geometry
0	2015-07-31 10:00:09	1.0	POLYGON ((465181.052 5080254.633, 465181.052 5...
1	2015-08-20 10:07:28	1.0	POLYGON ((465181.052 5080254.633, 465181.052 5...
2	2015-09-19 10:05:43	1.0	POLYGON ((465181.052 5080254.633, 465181.052 5...
3	2015-09-29 10:06:33	1.0	POLYGON ((465181.052 5080254.633, 465181.052 5...
4	2015-12-08 10:04:09	1.0	POLYGON ((465181.052 5080254.633, 465181.052 5...

Special features are bounding box and timestamps:

[4]:

print(eopatch.timestamps[:5])
print(repr(eopatch.bbox))

eopatch.bbox.geometry  # draws the shape of BBox

[datetime.datetime(2015, 7, 11, 10, 0, 8), datetime.datetime(2015, 7, 31, 10, 0, 9), datetime.datetime(2015, 8, 20, 10, 7, 28), datetime.datetime(2015, 8, 30, 10, 5, 47), datetime.datetime(2015, 9, 9, 10, 0, 17)]
BBox(((465181.0522318204, 5079244.8912012065), (466180.53145382757, 5080254.63349641)), crs=CRS('32633'))

[4]:

../../_images/examples_core_CoreOverview_8_1.svg

A list of all features in an EOPatch can be obtained with:

[5]:

eopatch.get_features()

[5]:

[(<FeatureType.DATA: 'data'>, 'CLP_S2C'),
 (<FeatureType.DATA: 'data'>, 'CLP'),
 (<FeatureType.DATA: 'data'>, 'NDVI'),
 (<FeatureType.DATA: 'data'>, 'BANDS-S2-L1C'),
 (<FeatureType.DATA: 'data'>, 'CLP_MULTI'),
 (<FeatureType.MASK: 'mask'>, 'CLM'),
 (<FeatureType.MASK: 'mask'>, 'IS_DATA'),
 (<FeatureType.MASK: 'mask'>, 'CLM_MULTI'),
 (<FeatureType.MASK: 'mask'>, 'CLM_INTERSSIM'),
 (<FeatureType.MASK: 'mask'>, 'IS_VALID'),
 (<FeatureType.MASK: 'mask'>, 'CLM_S2C'),
 (<FeatureType.SCALAR: 'scalar'>, 'CLOUD_COVERAGE'),
 (<FeatureType.LABEL: 'label'>, 'IS_CLOUDLESS'),
 (<FeatureType.LABEL: 'label'>, 'RANDOM_DIGIT'),
 (<FeatureType.VECTOR: 'vector'>, 'CLM_VECTOR'),
 (<FeatureType.DATA_TIMELESS: 'data_timeless'>, 'DEM'),
 (<FeatureType.DATA_TIMELESS: 'data_timeless'>, 'MAX_NDVI'),
 (<FeatureType.MASK_TIMELESS: 'mask_timeless'>, 'RANDOM_UINT8'),
 (<FeatureType.MASK_TIMELESS: 'mask_timeless'>, 'LULC'),
 (<FeatureType.MASK_TIMELESS: 'mask_timeless'>, 'VALID_COUNT'),
 (<FeatureType.SCALAR_TIMELESS: 'scalar_timeless'>, 'LULC_PERCENTAGE'),
 (<FeatureType.LABEL_TIMELESS: 'label_timeless'>, 'LULC_COUNTS'),
 (<FeatureType.VECTOR_TIMELESS: 'vector_timeless'>, 'LULC'),
 (<FeatureType.META_INFO: 'meta_info'>, 'maxcc'),
 (<FeatureType.META_INFO: 'meta_info'>, 'size_x'),
 (<FeatureType.META_INFO: 'meta_info'>, 'size_y'),
 (<FeatureType.META_INFO: 'meta_info'>, 'service_type')]

Let’s create a new EOPatch and store some features inside.

[6]:

import numpy as np

from sentinelhub import CRS, BBox

# Since EOPatch represents geolocated data, it should always have a bounding box
new_eopatch = EOPatch(bbox=BBox((0, 0, 1, 1), CRS.WGS84))

new_eopatch[FeatureType.MASK_TIMELESS, "NEW_MASK"] = np.zeros((68, 10, 13), dtype=np.uint8)

# If temporal features are added to an EOPatch that does not have timestamps (or if the dimensions do not match),
# the user is warned that the EOPatch is temporall ill-defined

new_eopatch.timestamps = eopatch.timestamps
new_eopatch[FeatureType.DATA, "BANDS"] = eopatch[FeatureType.DATA, "BANDS-S2-L1C"]

# The following wouldn't work as there are restrictions to what kind of data can be stored in each feature type
# new_eopatch[FeatureType.MASK, 'NEW_MASK'] = np.zeros((10, 10, 13), dtype=np.uint8)
# new_eopatch[FeatureType.VECTOR, 'NEW_MASK'] = np.zeros((10, 10, 13), dtype=np.uint8)

new_eopatch

[6]:

EOPatch(
  bbox=BBox(((0.0, 0.0), (1.0, 1.0)), crs=CRS('4326'))
  timestamps=[datetime.datetime(2015, 7, 11, 10, 0, 8), ..., datetime.datetime(2017, 12, 22, 10, 4, 15)], length=68
  mask_timeless={
    NEW_MASK: numpy.ndarray(shape=(68, 10, 13), dtype=uint8)
  }
  data={
    BANDS: numpy.ndarray(shape=(68, 101, 100, 13), dtype=float32)
  }
)

It is also possible to delete a feature:

[7]:

del new_eopatch[FeatureType.MASK_TIMELESS, "NEW_MASK"]

new_eopatch

[7]:

EOPatch(
  bbox=BBox(((0.0, 0.0), (1.0, 1.0)), crs=CRS('4326'))
  timestamps=[datetime.datetime(2015, 7, 11, 10, 0, 8), ..., datetime.datetime(2017, 12, 22, 10, 4, 15)], length=68
  data={
    BANDS: numpy.ndarray(shape=(68, 101, 100, 13), dtype=float32)
  }
)

We can save EOPatch into a local folder. In case an EOPatch already exists in the specified location, we have to allow to overwrite its features.

[8]:

from eolearn.core import OverwritePermission

OUTPUT_FOLDER = os.path.join(".", "outputs")
os.makedirs(OUTPUT_FOLDER, exist_ok=True)

NEW_EOPATCH_PATH = os.path.join(OUTPUT_FOLDER, "NewEOPatch")

new_eopatch.save(NEW_EOPATCH_PATH, overwrite_permission=OverwritePermission.OVERWRITE_FEATURES)

Let’s load the saved version and compare it with original:

[9]:

loaded_eopatch = EOPatch.load(NEW_EOPATCH_PATH)

new_eopatch == loaded_eopatch

[9]:

True

Each EOPatch can be shallow or deep copied:

[10]:

new_eopatch.copy()
new_eopatch.copy(deep=True)

[10]:

EOPatch(
  bbox=BBox(((0.0, 0.0), (1.0, 1.0)), crs=CRS('4326'))
  timestamps=[datetime.datetime(2015, 7, 11, 10, 0, 8), ..., datetime.datetime(2017, 12, 22, 10, 4, 15)], length=68
  data={
    BANDS: numpy.ndarray(shape=(68, 101, 100, 13), dtype=float32)
  }
)

EOTask

The next core object is EOTask, which is a single well-defined operation on one or more EOPatch objects.

We can create a new EOTask by creating a class that inherits from the abstract EOTask class:

class FooTask(EOTask):

    def __init__(self, foo_param):
        """ Task-specific parameters
        """
        self.foo_param = foo_param

    def execute(self, eopatch, *, patch_specific_param):

        # Do what foo does on EOPatch and return it

        return eopatch

In the initialization method we define task-specific parameters.
Each task has to implement the execute method.
execute method has to be defined in a way that:
- positional arguments have to be instances of EOPatch,
- other types of arguments should be keyword arguments.
Otherwise the task itself can do anything.

Example of a task that adds a new feature to existing EOPatch:

[11]:

from typing import Any, Tuple

from eolearn.core import EOTask


class AddFeatureTask(EOTask):
    """Adds a feature to the given EOPatch.

    :param feature: Feature to be added
    :type feature: (FeatureType, feature_name) or FeatureType
    """

    def __init__(self, feature: Tuple[FeatureType, str]):
        self.feature = feature

    def execute(self, eopatch: EOPatch, *, data: Any) -> EOPatch:
        """Returns the EOPatch with added features.

        :param eopatch: input EOPatch
        :param data: data to be added to the feature
        :return: input EOPatch with the specified feature
        """
        eopatch[self.feature] = data

        return eopatch

Let’s see how such a task could be used.

[12]:

eopatch = EOPatch(bbox=BBox((0, 0, 1, 1), CRS.WGS84), timestamps=[f"2017-0{i}-01" for i in range(1, 6)])

add_feature_task = AddFeatureTask((FeatureType.DATA, "NEW_BANDS"))

data = np.zeros((5, 100, 100, 13))

eopatch = add_feature_task.execute(eopatch, data=data)

eopatch

[12]:

EOPatch(
  bbox=BBox(((0.0, 0.0), (1.0, 1.0)), crs=CRS('4326'))
  timestamps=[datetime.datetime(2017, 1, 1, 0, 0), ..., datetime.datetime(2017, 5, 1, 0, 0)], length=5
  data={
    NEW_BANDS: numpy.ndarray(shape=(5, 100, 100, 13), dtype=float64)
  }
)

The majority of eo-learn consists of different EOTasks implementing different operations on EO data.

The list of all EOTasks is available in the documentation.

EONode and EOWorkflow

EOTasks can be joined together into an acyclic processing graph called EOWorkflow. Since eo-learn 1.0 these tasks first have to be wrapped into instances of EONode class.

Here is a simple example of how an EOWorkflow can be created:

[13]:

from eolearn.core import EONode, EOWorkflow, LoadTask, SaveTask

new_feature = FeatureType.LABEL, "NEW_LABEL"

load_task = LoadTask(path=INPUT_FOLDER)
add_feature_task = AddFeatureTask(new_feature)
save_task = SaveTask(path=OUTPUT_FOLDER, overwrite_permission=OverwritePermission.OVERWRITE_FEATURES)

# Each EONode object defines dependecies to other EONode objects:
load_node = EONode(load_task, inputs=[], name="Load EOPatch")
add_feature_node = EONode(add_feature_task, inputs=[load_node], name="Add a new feature")
save_node = EONode(save_task, inputs=[add_feature_node], name="Save EOPatch")

workflow = EOWorkflow([load_node, add_feature_node, save_node])
# or
workflow = EOWorkflow.from_endnodes(save_node)

# Alternatively, a linear workflow could also be built with a helper function:
# from eolearn.core import linearly_connect_tasks
# nodes = linearly_connect_tasks(load_task, add_feature_task, save_task)
# workflow = EOWorkflow(nodes)

Let’s display the dependency graph:

[14]:

%matplotlib inline

workflow.dependency_graph()

[14]:

../../_images/examples_core_CoreOverview_29_0.svg

EOWorkflow is executed by specifying EOPatch-related parameters:

[15]:

results = workflow.execute({
    load_node: {"eopatch_folder": "TestEOPatch"},
    add_feature_node: {"data": np.zeros((68, 3), dtype=np.uint8)},
    save_node: {"eopatch_folder": "WorkflowEOPatch"},
})

results

[15]:

WorkflowResults(outputs={}, start_time=datetime.datetime(2023, 8, 28, 15, 19, 54, 733751), end_time=datetime.datetime(2023, 8, 28, 15, 19, 54, 961589), stats={'LoadTask-939b27aa45a511eeb8db-91a8de8b81da': NodeStats(node_uid='LoadTask-939b27aa45a511eeb8db-91a8de8b81da', node_name='Load EOPatch', start_time=datetime.datetime(2023, 8, 28, 15, 19, 54, 733806), end_time=datetime.datetime(2023, 8, 28, 15, 19, 54, 822464), exception_info=None), 'AddFeatureTask-939b2a9b45a511eea69d-e2612971e907': NodeStats(node_uid='AddFeatureTask-939b2a9b45a511eea69d-e2612971e907', node_name='Add a new feature', start_time=datetime.datetime(2023, 8, 28, 15, 19, 54, 825206), end_time=datetime.datetime(2023, 8, 28, 15, 19, 54, 825267), exception_info=None), 'SaveTask-939b2cb545a511eea722-ed1665ca815d': NodeStats(node_uid='SaveTask-939b2cb545a511eea722-ed1665ca815d', node_name='Save EOPatch', start_time=datetime.datetime(2023, 8, 28, 15, 19, 54, 827230), end_time=datetime.datetime(2023, 8, 28, 15, 19, 54, 960678), exception_info=None)}, error_node_uid=None)

A result of a workflow execution is a WorkflowResults object. It contains information about times of each node execution and information about potential errors.

Note:

A difference between executing tasks directly and executing tasks in a workflow is that in a workflow each EOPatch input object will be first shallow-copied before being passed to any task.

EOExecutor

EOExecutor handles the execution and monitoring of EOWorkflows. It enables executing a workflow multiple times and in parallel. At the end, it generates a report containing the summary of the workflow’s execution process.

Execute previously defined workflow with different arguments.

[16]:

from eolearn.core import EOExecutor

execution_args = [  # EOWorkflow will be executed for each of these 5 dictionaries:
    {
        load_node: {"eopatch_folder": "TutorialEOPatch"},
        add_feature_node: {"data": idx * np.ones((10, 3), dtype=np.uint8)},
        save_node: {"eopatch_folder": f"ResultEOPatch{idx}"},
    }
    for idx in range(5)
]

executor = EOExecutor(workflow, execution_args, save_logs=True, logs_folder=OUTPUT_FOLDER)

results = executor.run(workers=3)  # The execution will use at most 3 parallel processes

100%|██████████| 5/5 [00:00<00:00, 510.50it/s]

Make the report:

[17]:

executor.make_report()

print(f"Report was saved to location: {executor.get_report_path()}")

Report was saved to location: /home/ubuntu/Sinergise/eo-learn/examples/core/outputs/eoexecution-report-2022_02_09-12_38_30/report.html

Overview of eolearn.core