eolearn.features.extra.clustering

Module for computing clusters in EOPatch

class eolearn.features.extra.clustering.ClusteringTask(features, new_feature_name, distance_threshold=None, n_clusters=None, affinity='cosine', linkage='single', remove_small=0, connectivity=None, mask_name=None)[source]

Bases: EOTask

Tasks computes clusters on selected features using sklearn.cluster.AgglomerativeClustering.

The algorithm produces a timeless data feature where each cell has a natural number which corresponds to specific group. The cells marked with -1 are not marking clusters. They are either being excluded by a mask or later removed by depending on the ‘remove_small’ threshold.

Class constructor

Parameters:
  • features (Feature) – A collection of features used for clustering. The features need to be of type DATA_TIMELESS

  • new_feature_name (str) – Name of feature that is the result of clustering

  • distance_threshold (float | None) – The linkage distance threshold above which, clusters will not be merged. If non None, n_clusters must be None nd compute_full_tree must be True

  • n_clusters (int | None) – The number of clusters found by the algorithm. If distance_threshold=None, it will be equal to the given n_clusters

  • affinity (Literal['euclidean', 'l1', 'l2', 'manhattan', 'cosine']) – Metric used to compute the linkage. Can be “euclidean”, “l1”, “l2”, “manhattan”, “cosine”.

  • linkage (Literal['ward', 'complete', 'average', 'single']) – Which linkage criterion to use. The linkage criterion determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion. - ward minimizes the variance of the clusters being merged. - average uses the average of the distances of each observation of the two sets. - complete or maximum linkage uses the maximum distances between all observations of the two sets. - single uses the minimum of the distances between all observations of the two sets.

  • remove_small (int) – If greater than 0, removes all clusters that have fewer points as “remove_small”

  • connectivity (None | np.ndarray | Callable) – Connectivity matrix. Defines for each sample the neighboring samples following a given structure of the data. This can be a connectivity matrix itself or a callable that transforms the data into a connectivity matrix, such as derived from neighbors_graph. If set to None it uses the graph that has adjacent pixels connected.

  • mask_name (str | None) – An optional mask feature used for exclusion of the area from clustering

execute(eopatch)[source]
Parameters:

eopatch (EOPatch) – Input EOPatch

Returns:

Transformed EOPatch

Return type:

EOPatch