eolearn.ml_tools.train_test_split

Tasks used for train set preparation

class eolearn.ml_tools.train_test_split.TrainTestSplitType(value)[source]

Bases: enum.Enum

An enum defining TrainTestSplitTask’s methods of splitting the data into subsets

PER_PIXEL = 'per_pixel'
PER_CLASS = 'per_class'
PER_VALUE = 'per_value'
class eolearn.ml_tools.train_test_split.TrainTestSplitTask(*args, **kwargs)[source]

Bases: eolearn.core.eotask.EOTask

Randomly assign each pixel or groups of pixels to multiple subsets (e.g., test/train/validate).

Input pixels are defined by an input feature (e.g., MASK_TIMELESS with polygon ids, connected component ids, or similar), that groups together pixels with similar properties.

There are three modes of split operation:

  • PER_PIXEL (default), where pixels are assigned to a subset randomly, regardless of their value,

  • PER_CLASS, where pixels of the same value are assigned to the same subset,

  • PER_VALUE, where pixels of the same value are assigned to a the same subset consistently across eopatches. In other words, if a group of pixels of the same value lies on multiple eopatches, they are assigned to the same subset in all eopatches. In this case, the seed argument of the execute method is ignored.

Classes are defined by a list of cumulative probabilities, passed as the bins argument, the same way as the bins argument in numpy.digitize. Valid classes are enumerated from 1 onward and if no_data_value is provided, all values equal to it get assigned to class 0.

To get a train/test split as 80/20, bins argument should be provided as bins=[0.8].

To get a train/val/test split as 60/20/20, bins argument should be provided as bins=[0.6, 0.8].

Splits can also be made into as many subsets as desired, e.g., bins=[0.1, 0.2, 0.3, 0.7, 0.9].

After the execution of this task an EOPatch will have a new (FeatureType, new_name) feature where each pixel will have a value representing the train, test and/or validation set.

Parameters
  • feature ((FeatureType, feature_name, new_name)) – The input feature out of which to generate the train mask.

  • bins (a float or list of floats) – Cumulative probabilities of all value classes or a single float, representing a fraction.

  • split_type (str) – Value split type, either ‘per_pixel’, ‘per_class’ or ‘per_value’.

  • ignore_values (a list of integers) – A list of values to ignore and not assign them to any subsets.

execute(eopatch, *, seed=None)[source]
Parameters
  • eopatch (EOPatch) – input EOPatch

  • seed (numpy.int64) – An argument to be passed to numpy.random.seed function.

Returns

Input EOPatch with the train set mask.

Return type

EOPatch