pymatgen.analysis.diffusion.aimd.clustering module

This module implements clustering algorithms to determine centroids, with adaption for periodic boundary conditions. This can be used, for example, to determine likely atomic positions from MD trajectories.

class Kmeans(max_iterations: int = 1000)[source]

Bases: object

Simple kmeans clustering.

Parameters:

max_iterations (int) – Maximum number of iterations to run KMeans algo.

cluster(points, k, initial_centroids=None)[source]
Parameters:
  • points (ndarray) – Data points as a mxn ndarray, where m is the number of features and n is the number of data points.

  • k (int) – Number of means.

  • initial_centroids (np.array) – Initial guess for the centroids. If None, a randomized array of points is used.

Returns:

centroids are the final centroids, labels provide the index for each point, and ss in the final sum squared distances.

Return type:

centroids, labels, ss

static get_centroids(points, labels, k, centroids)[source]

Each centroid is the geometric mean of the points that have that centroid’s label. Important: If a centroid is empty (no points have that centroid’s label) you should randomly re-initialize it.

Parameters:
  • points – List of points

  • labels – List of labels

  • k – Number of means

  • centroids – List of centroids

static get_labels(points, centroids)[source]

For each element in the dataset, chose the closest centroid. Make that centroid the element’s label.

Parameters:
  • points – List of points

  • centroids – List of centroids

should_stop(old_centroids, centroids, iterations)[source]

Check for stopping conditions.

Parameters:
  • old_centroids – List of old centroids

  • centroids – List of centroids

  • iterations – Number of iterations thus far.

class KmeansPBC(lattice, max_iterations=1000)[source]

Bases: Kmeans

A version of KMeans that work with PBC. Distance metrics have to change, as well as new centroid determination. The points supplied should be fractional coordinates.

Parameters:
  • lattice – Lattice

  • max_iterations – Maximum number of iterations to run KMeans.

get_centroids(points, labels, k, centroids)[source]

Each centroid is the geometric mean of the points that have that centroid’s label. Important: If a centroid is empty (no points have that centroid’s label) you should randomly re-initialize it.

Parameters:
  • points – List of points

  • labels – List of labels

  • k – Number of means

  • centroids – List of centroids

get_labels(points, centroids)[source]

For each element in the dataset, chose the closest centroid. Make that centroid the element’s label.

Parameters:
  • points – List of points

  • centroids – List of centroids

should_stop(old_centroids, centroids, iterations)[source]

Check for stopping conditions.

Parameters:
  • old_centroids – List of old centroids

  • centroids – List of centroids

  • iterations – Number of iterations thus far.

get_random_centroid(points)[source]

Generate a random centroid based on points.

Parameters:

points – List of points.

get_random_centroids(points, k)[source]

Generate k random centroids based on points.

Parameters:
  • points – List of points.

  • k – Number of means.