eolearn.ml_tools.train_test_split

Tasks used for train set preparation

class eolearn.ml_tools.train_test_split.TrainTestSplitType(value)[source]

Bases: enum.Enum

An enum defining TrainTestSplitTask’s methods of splitting the data into subsets

PER_PIXEL = 'per_pixel'
PER_CLASS = 'per_class'
PER_VALUE = 'per_value'
class eolearn.ml_tools.train_test_split.TrainTestSplitTask(input_feature, output_feature, bins, split_type=TrainTestSplitType.PER_PIXEL, ignore_values=None)[source]

Bases: eolearn.core.eotask.EOTask

Randomly assign each pixel or groups of pixels to multiple subsets (e.g., test/train/validate).

When sampling PER_PIXEL the input feature only specifies the shape of the output feature. For PER_CLASS and PER_VALUE the input MASK_TIMELESS feature should group together pixels with similar properties, e.g. polygon ids, connected component ids, etc. The task then ensures that such groups are kept together (so the whole polygon is either in train or test).

There are three modes of split operation:

  • PER_PIXEL (default), where pixels are assigned to a subset randomly, regardless of their value,

  • PER_CLASS, where pixels of the same value are assigned to the same subset,

  • PER_VALUE, where pixels of the same value are assigned to the same subset consistently across EOPatches. In other words, if a group of pixels of the same value lies on multiple EOPatches, they are assigned to the same subset in all EOPatches. In this case, the seed argument of the execute method is ignored.

Classes are defined by a list of cumulative probabilities, passed as the bins argument, the same way as the bins argument in numpy.digitize. Valid classes are enumerated from 1 onward and if ignore_values is provided, all values equal to it get assigned to class 0.

To get a train/test split as 80/20, bins argument should be provided as bins=[0.8].

To get a train/val/test split as 60/20/20, bins argument should be provided as bins=[0.6, 0.8].

Splits can also be made into as many subsets as desired, e.g., bins=[0.1, 0.2, 0.3, 0.7, 0.9].

After execution each pixel will have a value representing the train, test and/or validation set stored in the output feature.

Parameters
  • input_feature ((FeatureType, feature_name)) – The input feature to guide the split.

  • input_feature – The output feature where to save the mask.

  • bins (a float or list of floats) – Cumulative probabilities of all value classes or a single float, representing a fraction.

  • split_type (TrainTestSplitType) – Value split type, either ‘PER_PIXEL’, ‘PER_CLASS’ or ‘PER_VALUE’.

  • ignore_values (a list of integers) – A list of values in input_feature to ignore and not assign them to any subsets.

execute(eopatch, *, seed=None)[source]
Parameters
  • eopatch (EOPatch) – input EOPatch

  • seed (int or None) – An argument to be passed to numpy.random.seed function.

Returns

Input EOPatch with the train set mask.

Return type

EOPatch