pymatgen.analysis.diffusion.aimd.clustering module¶

This module implements clustering algorithms to determine centroids, with adaption for periodic boundary conditions. This can be used, for example, to determine likely atomic positions from MD trajectories.

class Kmeans(max_iterations: int = 1000)[source]¶

Bases: object

Simple kmeans clustering.

Parameters:: max_iterations (int) – Maximum number of iterations to run KMeans algo.

cluster(points, k, initial_centroids=None)[source]¶

Parameters:

points (ndarray) – Data points as a mxn ndarray, where m is the number of features and n is the number of data points.
k (int) – Number of means.
initial_centroids (np.array) – Initial guess for the centroids. If None, a randomized array of points is used.

Returns:

centroids are the final centroids, labels provide the index for each point, and ss in the final sum squared distances.

Return type:

centroids, labels, ss

static get_centroids(points, labels, k, centroids)[source]¶

Each centroid is the geometric mean of the points that have that centroid’s label. Important: If a centroid is empty (no points have that centroid’s label) you should randomly re-initialize it.

Parameters:

points – List of points
labels – List of labels
k – Number of means
centroids – List of centroids

static get_labels(points, centroids)[source]¶

For each element in the dataset, chose the closest centroid. Make that centroid the element’s label.

Parameters:

points – List of points
centroids – List of centroids

should_stop(old_centroids, centroids, iterations)[source]¶

Check for stopping conditions.

Parameters:

old_centroids – List of old centroids
centroids – List of centroids
iterations – Number of iterations thus far.

class KmeansPBC(lattice, max_iterations=1000)[source]¶

Bases: Kmeans

A version of KMeans that work with PBC. Distance metrics have to change, as well as new centroid determination. The points supplied should be fractional coordinates.

Parameters:

lattice – Lattice
max_iterations – Maximum number of iterations to run KMeans.

get_centroids(points, labels, k, centroids)[source]¶

Each centroid is the geometric mean of the points that have that centroid’s label. Important: If a centroid is empty (no points have that centroid’s label) you should randomly re-initialize it.

Parameters:

points – List of points
labels – List of labels
k – Number of means
centroids – List of centroids

get_labels(points, centroids)[source]¶

For each element in the dataset, chose the closest centroid. Make that centroid the element’s label.

Parameters:

points – List of points
centroids – List of centroids

should_stop(old_centroids, centroids, iterations)[source]¶

Check for stopping conditions.

Parameters:

old_centroids – List of old centroids
centroids – List of centroids
iterations – Number of iterations thus far.

get_random_centroid(points)[source]¶

Generate a random centroid based on points.

Parameters:: points – List of points.

get_random_centroids(points, k)[source]¶

Generate k random centroids based on points.

Parameters:

points – List of points.
k – Number of means.

pymatgen.analysis.diffusion.aimd.clustering module¶

Table of Contents

Related Topics

This Page