Overview of eolearn.core

eolearn.core is the main subpackage which implements the basic building blocks:

  • EOPatch,

  • EOTask,

  • EONode,

  • EOWorkflow,

  • EOExecutor,

and commonly used functionalities.

EOPatch

The first basic object in the package is a data container, called EOPatch.

eopatch

  • It is designed to store all types of EO data for a single geographical location.

  • The EOPatch can contain data (of the same location) for multiple times. If the EOPatch contains multiple collections of temporal data, they must have the same temporal axis (the images must correspond to the same time-points).

  • There is no limit to how much data a single EOPatch can store, but typically it shouldn’t be more than the size of your RAM.

Each EOPatch has an attribute bbox of type sentinelhub.BBox to define its area. The attribute timestamps defines the temporal component of an EOPatch, which is either None (for patches without a temporal dimension) or a list of datetime.datetime objects.

EO data can be divided into categories, called “feature types” according to the following properties:

FeatureType

Type of data

Time component

Spatial component

Type of values

Python object

Shape

DATA

raster

yes

yes

float

numpy.ndarray

t x n x m x d

MASK

raster

yes

yes

integer

numpy.ndarray

t x n x m x d

SCALAR

raster

yes

no

float

numpy.ndarray

t x d

LABEL

raster

yes

no

integer

numpy.ndarray

t x d

DATA_TIMELESS

raster

no

yes

float

numpy.ndarray

n x m x d

MASK_TIMELESS

raster

no

yes

integer

numpy.ndarray

n x m x d

SCALAR_TIMELESS

raster

no

no

float

numpy.ndarray

d

LABEL_TIMELESS

raster

no

no

integer

numpy.ndarray

d

VECTOR

vector

yes

yes

/

geopandas.GeoDataFrame

Required columns geometry and TIMESTAMP

VECTOR_TIMELESS

vector

no

yes

/

geopandas.GeoDataFrame

Required column geometry

META_INFO

anything

no

no

anything

anything

anything

Note: t specifies time component, n and m are spatial components (height and width), and d is an additional component for data with multiple channels.

Let’s start by loading an existing EOPatch and displaying it’s content (i.e. features):

[1]:
import os

from eolearn.core import EOPatch

INPUT_FOLDER = os.path.join("..", "..", "example_data")
INPUT_EOPATCH = os.path.join(INPUT_FOLDER, "TestEOPatch")

eopatch = EOPatch.load(
    INPUT_EOPATCH, lazy_loading=False  # Set this parameter to True to load data in memory only when first needed
)

eopatch
[1]:
EOPatch(
  bbox=BBox(((465181.0522318204, 5079244.8912012065), (466180.53145382757, 5080254.63349641)), crs=CRS('32633'))
  timestamps=[datetime.datetime(2015, 7, 11, 10, 0, 8), ..., datetime.datetime(2017, 12, 22, 10, 4, 15)], length=68
  mask_timeless={
    LULC: numpy.ndarray(shape=(101, 100, 1), dtype=uint16)
    RANDOM_UINT8: numpy.ndarray(shape=(101, 100, 13), dtype=uint8)
    VALID_COUNT: numpy.ndarray(shape=(101, 100, 1), dtype=int64)
  }
  vector={
    CLM_VECTOR: geopandas.GeoDataFrame(columns=['TIMESTAMP', 'VALUE', 'geometry'], length=55, crs=EPSG:32633)
  }
  label={
    IS_CLOUDLESS: numpy.ndarray(shape=(68, 1), dtype=bool)
    RANDOM_DIGIT: numpy.ndarray(shape=(68, 2), dtype=int8)
  }
  meta_info={
    maxcc: 0.8
    service_type: 'wcs'
    size_x: '10m'
    size_y: '10m'
  }
  scalar_timeless={
    LULC_PERCENTAGE: numpy.ndarray(shape=(6,), dtype=float64)
  }
  scalar={
    CLOUD_COVERAGE: numpy.ndarray(shape=(68, 1), dtype=float16)
  }
  vector_timeless={
    LULC: geopandas.GeoDataFrame(columns=['index', 'RABA_ID', 'AREA', 'DATE', 'LULC_ID', 'LULC_NAME', 'geometry'], length=88, crs=EPSG:32633)
  }
  mask={
    CLM: numpy.ndarray(shape=(68, 101, 100, 1), dtype=uint8)
    CLM_INTERSSIM: numpy.ndarray(shape=(68, 101, 100, 1), dtype=bool)
    CLM_MULTI: numpy.ndarray(shape=(68, 101, 100, 1), dtype=bool)
    CLM_S2C: numpy.ndarray(shape=(68, 101, 100, 1), dtype=bool)
    IS_DATA: numpy.ndarray(shape=(68, 101, 100, 1), dtype=uint8)
    IS_VALID: numpy.ndarray(shape=(68, 101, 100, 1), dtype=bool)
  }
  label_timeless={
    LULC_COUNTS: numpy.ndarray(shape=(6,), dtype=int32)
  }
  data_timeless={
    DEM: numpy.ndarray(shape=(101, 100, 1), dtype=float32)
    MAX_NDVI: numpy.ndarray(shape=(101, 100, 1), dtype=float64)
  }
  data={
    BANDS-S2-L1C: numpy.ndarray(shape=(68, 101, 100, 13), dtype=float32)
    CLP: numpy.ndarray(shape=(68, 101, 100, 1), dtype=float32)
    CLP_MULTI: numpy.ndarray(shape=(68, 101, 100, 1), dtype=float32)
    CLP_S2C: numpy.ndarray(shape=(68, 101, 100, 1), dtype=float32)
    NDVI: numpy.ndarray(shape=(68, 101, 100, 1), dtype=float32)
  }
)

There are multiple ways how to access a feature in the EOPatch.

[2]:
from eolearn.core import FeatureType

# All of these access the same feature:
bands = eopatch.data["BANDS-S2-L1C"]
# or
bands = eopatch[FeatureType.DATA]["BANDS-S2-L1C"]
# or
bands = eopatch[(FeatureType.DATA, "BANDS-S2-L1C")]
# or
bands = eopatch[FeatureType.DATA, "BANDS-S2-L1C"]

type(bands), bands.shape
[2]:
(numpy.ndarray, (68, 101, 100, 13))

Vector features are handled by geopandas:

[3]:
eopatch[FeatureType.VECTOR, "CLM_VECTOR"].head()
[3]:
TIMESTAMP VALUE geometry
0 2015-07-31 10:00:09 1.0 POLYGON ((465181.052 5080254.633, 465181.052 5...
1 2015-08-20 10:07:28 1.0 POLYGON ((465181.052 5080254.633, 465181.052 5...
2 2015-09-19 10:05:43 1.0 POLYGON ((465181.052 5080254.633, 465181.052 5...
3 2015-09-29 10:06:33 1.0 POLYGON ((465181.052 5080254.633, 465181.052 5...
4 2015-12-08 10:04:09 1.0 POLYGON ((465181.052 5080254.633, 465181.052 5...

Special features are bounding box and timestamps:

[4]:
print(eopatch.timestamps[:5])
print(repr(eopatch.bbox))

eopatch.bbox.geometry  # draws the shape of BBox
[datetime.datetime(2015, 7, 11, 10, 0, 8), datetime.datetime(2015, 7, 31, 10, 0, 9), datetime.datetime(2015, 8, 20, 10, 7, 28), datetime.datetime(2015, 8, 30, 10, 5, 47), datetime.datetime(2015, 9, 9, 10, 0, 17)]
BBox(((465181.0522318204, 5079244.8912012065), (466180.53145382757, 5080254.63349641)), crs=CRS('32633'))
[4]:
../../_images/examples_core_CoreOverview_8_1.svg

A list of all features in an EOPatch can be obtained with:

[5]:
eopatch.get_features()
[5]:
[(<FeatureType.DATA: 'data'>, 'CLP_S2C'),
 (<FeatureType.DATA: 'data'>, 'CLP'),
 (<FeatureType.DATA: 'data'>, 'NDVI'),
 (<FeatureType.DATA: 'data'>, 'BANDS-S2-L1C'),
 (<FeatureType.DATA: 'data'>, 'CLP_MULTI'),
 (<FeatureType.MASK: 'mask'>, 'CLM'),
 (<FeatureType.MASK: 'mask'>, 'IS_DATA'),
 (<FeatureType.MASK: 'mask'>, 'CLM_MULTI'),
 (<FeatureType.MASK: 'mask'>, 'CLM_INTERSSIM'),
 (<FeatureType.MASK: 'mask'>, 'IS_VALID'),
 (<FeatureType.MASK: 'mask'>, 'CLM_S2C'),
 (<FeatureType.SCALAR: 'scalar'>, 'CLOUD_COVERAGE'),
 (<FeatureType.LABEL: 'label'>, 'IS_CLOUDLESS'),
 (<FeatureType.LABEL: 'label'>, 'RANDOM_DIGIT'),
 (<FeatureType.VECTOR: 'vector'>, 'CLM_VECTOR'),
 (<FeatureType.DATA_TIMELESS: 'data_timeless'>, 'DEM'),
 (<FeatureType.DATA_TIMELESS: 'data_timeless'>, 'MAX_NDVI'),
 (<FeatureType.MASK_TIMELESS: 'mask_timeless'>, 'RANDOM_UINT8'),
 (<FeatureType.MASK_TIMELESS: 'mask_timeless'>, 'LULC'),
 (<FeatureType.MASK_TIMELESS: 'mask_timeless'>, 'VALID_COUNT'),
 (<FeatureType.SCALAR_TIMELESS: 'scalar_timeless'>, 'LULC_PERCENTAGE'),
 (<FeatureType.LABEL_TIMELESS: 'label_timeless'>, 'LULC_COUNTS'),
 (<FeatureType.VECTOR_TIMELESS: 'vector_timeless'>, 'LULC'),
 (<FeatureType.META_INFO: 'meta_info'>, 'maxcc'),
 (<FeatureType.META_INFO: 'meta_info'>, 'size_x'),
 (<FeatureType.META_INFO: 'meta_info'>, 'size_y'),
 (<FeatureType.META_INFO: 'meta_info'>, 'service_type')]

Let’s create a new EOPatch and store some features inside.

[6]:
import numpy as np

from sentinelhub import CRS, BBox

# Since EOPatch represents geolocated data, it should always have a bounding box
new_eopatch = EOPatch(bbox=BBox((0, 0, 1, 1), CRS.WGS84))

new_eopatch[FeatureType.MASK_TIMELESS, "NEW_MASK"] = np.zeros((68, 10, 13), dtype=np.uint8)

# If temporal features are added to an EOPatch that does not have timestamps (or if the dimensions do not match),
# the user is warned that the EOPatch is temporall ill-defined

new_eopatch.timestamps = eopatch.timestamps
new_eopatch[FeatureType.DATA, "BANDS"] = eopatch[FeatureType.DATA, "BANDS-S2-L1C"]

# The following wouldn't work as there are restrictions to what kind of data can be stored in each feature type
# new_eopatch[FeatureType.MASK, 'NEW_MASK'] = np.zeros((10, 10, 13), dtype=np.uint8)
# new_eopatch[FeatureType.VECTOR, 'NEW_MASK'] = np.zeros((10, 10, 13), dtype=np.uint8)

new_eopatch
[6]:
EOPatch(
  bbox=BBox(((0.0, 0.0), (1.0, 1.0)), crs=CRS('4326'))
  timestamps=[datetime.datetime(2015, 7, 11, 10, 0, 8), ..., datetime.datetime(2017, 12, 22, 10, 4, 15)], length=68
  mask_timeless={
    NEW_MASK: numpy.ndarray(shape=(68, 10, 13), dtype=uint8)
  }
  data={
    BANDS: numpy.ndarray(shape=(68, 101, 100, 13), dtype=float32)
  }
)

It is also possible to delete a feature:

[7]:
del new_eopatch[FeatureType.MASK_TIMELESS, "NEW_MASK"]

new_eopatch
[7]:
EOPatch(
  bbox=BBox(((0.0, 0.0), (1.0, 1.0)), crs=CRS('4326'))
  timestamps=[datetime.datetime(2015, 7, 11, 10, 0, 8), ..., datetime.datetime(2017, 12, 22, 10, 4, 15)], length=68
  data={
    BANDS: numpy.ndarray(shape=(68, 101, 100, 13), dtype=float32)
  }
)

We can save EOPatch into a local folder. In case an EOPatch already exists in the specified location, we have to allow to overwrite its features.

[8]:
from eolearn.core import OverwritePermission

OUTPUT_FOLDER = os.path.join(".", "outputs")
os.makedirs(OUTPUT_FOLDER, exist_ok=True)

NEW_EOPATCH_PATH = os.path.join(OUTPUT_FOLDER, "NewEOPatch")

new_eopatch.save(NEW_EOPATCH_PATH, overwrite_permission=OverwritePermission.OVERWRITE_FEATURES)

Let’s load the saved version and compare it with original:

[9]:
loaded_eopatch = EOPatch.load(NEW_EOPATCH_PATH)

new_eopatch == loaded_eopatch
[9]:
True

Each EOPatch can be shallow or deep copied:

[10]:
new_eopatch.copy()
new_eopatch.copy(deep=True)
[10]:
EOPatch(
  bbox=BBox(((0.0, 0.0), (1.0, 1.0)), crs=CRS('4326'))
  timestamps=[datetime.datetime(2015, 7, 11, 10, 0, 8), ..., datetime.datetime(2017, 12, 22, 10, 4, 15)], length=68
  data={
    BANDS: numpy.ndarray(shape=(68, 101, 100, 13), dtype=float32)
  }
)

EOTask

The next core object is EOTask, which is a single well-defined operation on one or more EOPatch objects.

We can create a new EOTask by creating a class that inherits from the abstract EOTask class:

class FooTask(EOTask):

    def __init__(self, foo_param):
        """ Task-specific parameters
        """
        self.foo_param = foo_param

    def execute(self, eopatch, *, patch_specific_param):

        # Do what foo does on EOPatch and return it

        return eopatch
  • In the initialization method we define task-specific parameters.

  • Each task has to implement the execute method.

  • execute method has to be defined in a way that:

    • positional arguments have to be instances of EOPatch,

    • other types of arguments should be keyword arguments.

  • Otherwise the task itself can do anything.

Example of a task that adds a new feature to existing EOPatch:

[11]:
from typing import Any, Tuple

from eolearn.core import EOTask


class AddFeatureTask(EOTask):
    """Adds a feature to the given EOPatch.

    :param feature: Feature to be added
    :type feature: (FeatureType, feature_name) or FeatureType
    """

    def __init__(self, feature: Tuple[FeatureType, str]):
        self.feature = feature

    def execute(self, eopatch: EOPatch, *, data: Any) -> EOPatch:
        """Returns the EOPatch with added features.

        :param eopatch: input EOPatch
        :param data: data to be added to the feature
        :return: input EOPatch with the specified feature
        """
        eopatch[self.feature] = data

        return eopatch

Let’s see how such a task could be used.

[12]:
eopatch = EOPatch(bbox=BBox((0, 0, 1, 1), CRS.WGS84), timestamps=[f"2017-0{i}-01" for i in range(1, 6)])

add_feature_task = AddFeatureTask((FeatureType.DATA, "NEW_BANDS"))

data = np.zeros((5, 100, 100, 13))

eopatch = add_feature_task.execute(eopatch, data=data)

eopatch
[12]:
EOPatch(
  bbox=BBox(((0.0, 0.0), (1.0, 1.0)), crs=CRS('4326'))
  timestamps=[datetime.datetime(2017, 1, 1, 0, 0), ..., datetime.datetime(2017, 5, 1, 0, 0)], length=5
  data={
    NEW_BANDS: numpy.ndarray(shape=(5, 100, 100, 13), dtype=float64)
  }
)

The majority of eo-learn consists of different EOTasks implementing different operations on EO data.

The list of all EOTasks is available in the documentation.

EONode and EOWorkflow

EOTasks can be joined together into an acyclic processing graph called EOWorkflow. Since eo-learn 1.0 these tasks first have to be wrapped into instances of EONode class.

Here is a simple example of how an EOWorkflow can be created:

[13]:
from eolearn.core import EONode, EOWorkflow, LoadTask, SaveTask

new_feature = FeatureType.LABEL, "NEW_LABEL"

load_task = LoadTask(path=INPUT_FOLDER)
add_feature_task = AddFeatureTask(new_feature)
save_task = SaveTask(path=OUTPUT_FOLDER, overwrite_permission=OverwritePermission.OVERWRITE_FEATURES)

# Each EONode object defines dependecies to other EONode objects:
load_node = EONode(load_task, inputs=[], name="Load EOPatch")
add_feature_node = EONode(add_feature_task, inputs=[load_node], name="Add a new feature")
save_node = EONode(save_task, inputs=[add_feature_node], name="Save EOPatch")

workflow = EOWorkflow([load_node, add_feature_node, save_node])
# or
workflow = EOWorkflow.from_endnodes(save_node)

# Alternatively, a linear workflow could also be built with a helper function:
# from eolearn.core import linearly_connect_tasks
# nodes = linearly_connect_tasks(load_task, add_feature_task, save_task)
# workflow = EOWorkflow(nodes)

Let’s display the dependency graph:

[14]:
%matplotlib inline

workflow.dependency_graph()
[14]:
../../_images/examples_core_CoreOverview_29_0.svg

EOWorkflow is executed by specifying EOPatch-related parameters:

[15]:
results = workflow.execute({
    load_node: {"eopatch_folder": "TestEOPatch"},
    add_feature_node: {"data": np.zeros((68, 3), dtype=np.uint8)},
    save_node: {"eopatch_folder": "WorkflowEOPatch"},
})

results
[15]:
WorkflowResults(outputs={}, start_time=datetime.datetime(2023, 8, 28, 15, 19, 54, 733751), end_time=datetime.datetime(2023, 8, 28, 15, 19, 54, 961589), stats={'LoadTask-939b27aa45a511eeb8db-91a8de8b81da': NodeStats(node_uid='LoadTask-939b27aa45a511eeb8db-91a8de8b81da', node_name='Load EOPatch', start_time=datetime.datetime(2023, 8, 28, 15, 19, 54, 733806), end_time=datetime.datetime(2023, 8, 28, 15, 19, 54, 822464), exception_info=None), 'AddFeatureTask-939b2a9b45a511eea69d-e2612971e907': NodeStats(node_uid='AddFeatureTask-939b2a9b45a511eea69d-e2612971e907', node_name='Add a new feature', start_time=datetime.datetime(2023, 8, 28, 15, 19, 54, 825206), end_time=datetime.datetime(2023, 8, 28, 15, 19, 54, 825267), exception_info=None), 'SaveTask-939b2cb545a511eea722-ed1665ca815d': NodeStats(node_uid='SaveTask-939b2cb545a511eea722-ed1665ca815d', node_name='Save EOPatch', start_time=datetime.datetime(2023, 8, 28, 15, 19, 54, 827230), end_time=datetime.datetime(2023, 8, 28, 15, 19, 54, 960678), exception_info=None)}, error_node_uid=None)

A result of a workflow execution is a WorkflowResults object. It contains information about times of each node execution and information about potential errors.

Note:

A difference between executing tasks directly and executing tasks in a workflow is that in a workflow each EOPatch input object will be first shallow-copied before being passed to any task.

EOExecutor

EOExecutor handles the execution and monitoring of EOWorkflows. It enables executing a workflow multiple times and in parallel. At the end, it generates a report containing the summary of the workflow’s execution process.

Execute previously defined workflow with different arguments.

[16]:
from eolearn.core import EOExecutor

execution_args = [  # EOWorkflow will be executed for each of these 5 dictionaries:
    {
        load_node: {"eopatch_folder": "TutorialEOPatch"},
        add_feature_node: {"data": idx * np.ones((10, 3), dtype=np.uint8)},
        save_node: {"eopatch_folder": f"ResultEOPatch{idx}"},
    }
    for idx in range(5)
]

executor = EOExecutor(workflow, execution_args, save_logs=True, logs_folder=OUTPUT_FOLDER)

results = executor.run(workers=3)  # The execution will use at most 3 parallel processes
100%|██████████| 5/5 [00:00<00:00, 510.50it/s]

Make the report:

[17]:
executor.make_report()

print(f"Report was saved to location: {executor.get_report_path()}")
Report was saved to location: /home/ubuntu/Sinergise/eo-learn/examples/core/outputs/eoexecution-report-2022_02_09-12_38_30/report.html