megnet.data.graph module

Abstract classes and utility operations for building graph representations and data loaders (known as Sequence objects in Keras). Most users will not need to interact with this module.

class BaseGraphBatchGenerator(dataset_size: int, targets: numpy.ndarray, sample_weights: Optional[numpy.ndarray] = None, batch_size: int = 128, is_shuffle: bool = True)[source]

Bases: keras.utils.data_utils.Sequence

Base class for classes that generate batches of training data for MEGNet. Based on the Sequence class, which is the data loader equivalent for Keras. Implementations of this base class must implement the _generate_inputs(), which generates the lists of graph descriptions for a batch. The process_atom_features() function and related functions are used to modify the features for each atom, bond, and global features when creating a batch.

Parameters
  • dataset_size (int) – Number of entries in dataset

  • targets (ndarray) – Feature to be predicted for each network

  • sample_weights (npdarray) – sample weights

  • batch_size (int) – Maximum batch size

  • is_shuffle (bool) – Whether to shuffle the data after each step

on_epoch_end()[source]

code to be executed on epoch end

process_atom_feature(x: numpy.ndarray) numpy.ndarray[source]
Parameters

x (np.ndarray) – atom features

Returns

processed atom features

process_bond_feature(x: numpy.ndarray) numpy.ndarray[source]
Parameters

x (np.ndarray) – bond features

Returns

processed bond features

process_state_feature(x: numpy.ndarray) numpy.ndarray[source]
Parameters

x (np.ndarray) – state features

Returns

processed state features

class Converter[source]

Bases: monty.json.MSONable

Base class for atom or bond converter

convert(d: Any) Any[source]

Convert the object d :param d: Any object d :type d: Any

Returns: returned object

class DummyConverter[source]

Bases: megnet.data.graph.Converter

Dummy converter as a placeholder

convert(d: Any) Any[source]

Dummy convert, does nothing to input :param d: input object :type d: Any

Returns: d

class EmbeddingMap(feature_matrix: numpy.ndarray)[source]

Bases: megnet.data.graph.Converter

Convert an integer to a row vector in a feature matrix

Parameters

feature_matrix – (np.ndarray) A matrix of shape (N, M)

convert(int_array: numpy.ndarray) numpy.ndarray[source]

convert atomic number to row vectors in the feature_matrix :param int_array: (1d array) number array of length L

Returns

(matrix) L*M matrix with N the length of d and M the length of centers

class GaussianDistance(centers: numpy.ndarray = array([0., 0.05050505, 0.1010101, 0.15151515, 0.2020202, 0.25252525, 0.3030303, 0.35353535, 0.4040404, 0.45454545, 0.50505051, 0.55555556, 0.60606061, 0.65656566, 0.70707071, 0.75757576, 0.80808081, 0.85858586, 0.90909091, 0.95959596, 1.01010101, 1.06060606, 1.11111111, 1.16161616, 1.21212121, 1.26262626, 1.31313131, 1.36363636, 1.41414141, 1.46464646, 1.51515152, 1.56565657, 1.61616162, 1.66666667, 1.71717172, 1.76767677, 1.81818182, 1.86868687, 1.91919192, 1.96969697, 2.02020202, 2.07070707, 2.12121212, 2.17171717, 2.22222222, 2.27272727, 2.32323232, 2.37373737, 2.42424242, 2.47474747, 2.52525253, 2.57575758, 2.62626263, 2.67676768, 2.72727273, 2.77777778, 2.82828283, 2.87878788, 2.92929293, 2.97979798, 3.03030303, 3.08080808, 3.13131313, 3.18181818, 3.23232323, 3.28282828, 3.33333333, 3.38383838, 3.43434343, 3.48484848, 3.53535354, 3.58585859, 3.63636364, 3.68686869, 3.73737374, 3.78787879, 3.83838384, 3.88888889, 3.93939394, 3.98989899, 4.04040404, 4.09090909, 4.14141414, 4.19191919, 4.24242424, 4.29292929, 4.34343434, 4.39393939, 4.44444444, 4.49494949, 4.54545455, 4.5959596, 4.64646465, 4.6969697, 4.74747475, 4.7979798, 4.84848485, 4.8989899, 4.94949495, 5.]), width=0.5)[source]

Bases: megnet.data.graph.Converter

Expand distance with Gaussian basis sit at centers and with width 0.5.

Parameters
  • centers – (np.array) centers for the Gaussian basis

  • width – (float) width of Gaussian basis

convert(d: numpy.ndarray) numpy.ndarray[source]

expand distance vector d with given parameters :param d: (1d array) distance array

Returns

(matrix) N*M matrix with N the length of d and M the length of centers

class GraphBatchDistanceConvert(atom_features: List[numpy.ndarray], bond_features: List[numpy.ndarray], state_features: List[numpy.ndarray], index1_list: List[int], index2_list: List[int], targets: Optional[numpy.ndarray] = None, sample_weights: Optional[numpy.ndarray] = None, batch_size: int = 128, is_shuffle: bool = True, distance_converter: Optional[megnet.data.graph.Converter] = None)[source]

Bases: megnet.data.graph.GraphBatchGenerator

Generate batch of structures with bond distance being expanded using a Expansor

Parameters
  • atom_features – (list of np.array) list of atom feature matrix,

  • bond_features – (list of np.array) list of bond features matrix

  • state_features – (list of np.array) list of [1, G] state features, where G is the global state feature dimension

  • index1_list – (list of integer) list of (M, ) one side atomic index of the bond, M is different for different structures

  • index2_list – (list of integer) list of (M, ) the other side atomic index of the bond, M is different for different structures, but it has to be the same as the correponding index1.

  • targets – (numpy array), N*1, where N is the number of structures

  • sample_weights – (numpy array), N*1, where N is the number of structures

  • batch_size – (int) number of samples in a batch

  • is_shuffle – (bool) whether to shuffle the structure, default to True

  • distance_converter – (bool) converter for processing the distances

process_bond_feature(x) numpy.ndarray[source]

Convert bond distances into Gaussian expanded vectors :param x: input distance array :type x: np.ndarray

Returns: expanded matrix

class GraphBatchGenerator(atom_features: List[numpy.ndarray], bond_features: List[numpy.ndarray], state_features: List[numpy.ndarray], index1_list: List[int], index2_list: List[int], targets: Optional[numpy.ndarray] = None, sample_weights: Optional[numpy.ndarray] = None, batch_size: int = 128, is_shuffle: bool = True)[source]

Bases: megnet.data.graph.BaseGraphBatchGenerator

A generator class that assembles several structures (indicated by batch_size) and form (x, y) pairs for model training.

Parameters
  • atom_features – (list of np.array) list of atom feature matrix,

  • bond_features – (list of np.array) list of bond features matrix

  • state_features – (list of np.array) list of [1, G] state features, where G is the global state feature dimension

  • index1_list – (list of integer) list of (M, ) one side atomic index of the bond,

  • structures (M is different for different) –

  • index2_list – (list of integer) list of (M, ) the other side atomic index of the bond, M is different for different structures, but it has to be the same as the corresponding index1.

  • targets – (numpy array), N*1, where N is the number of structures

  • sample_weights – (numpy array), N*1, where N is the number of structures

  • batch_size – (int) number of samples in a batch

class StructureGraph(nn_strategy: Optional[Union[str, pymatgen.analysis.local_env.NearNeighbors]] = None, atom_converter: Optional[megnet.data.graph.Converter] = None, bond_converter: Optional[megnet.data.graph.Converter] = None, **kwargs)[source]

Bases: monty.json.MSONable

This is a base class for converting converting structure into graphs or model inputs Methods to be implemented are follows:

  1. convert(self, structure)

    This is to convert a structure into a graph dictionary

  2. get_input(self, structure)

    This method convert a structure directly to a model input

  3. get_flat_data(self, graphs, targets)

    This method process graphs and targets pairs and output model input list.

Parameters
  • nn_strategy (str or NearNeighbors) – NearNeighbor strategy

  • atom_converter (Converter) – atom converter

  • bond_converter (Converter) – bond converter

  • **kwargs

as_dict() Dict[source]

Serialize to dict Returns: (dict) dictionary of information

convert(structure: pymatgen.core.structure.Structure, state_attributes: Optional[List] = None) Dict[source]

Take a pymatgen structure and convert it to a index-type graph representation The graph will have node, distance, index1, index2, where node is a vector of Z number of atoms in the structure, index1 and index2 mark the atom indices forming the bond and separated by distance. For state attributes, you can set structure.state = [[xx, xx]] beforehand or the algorithm would take default [[0, 0]] :param state_attributes: (list) state attributes :param structure: (pymatgen structure) :param (dictionary):

classmethod from_dict(d: Dict) megnet.data.graph.StructureGraph[source]

Initialization from dictionary :param d: dictionary :type d: dict

Returns: StructureGraph object

static get_atom_features(structure) List[Any][source]

Get atom features from structure, may be overwritten :param structure: (Pymatgen.Structure) pymatgen structure

Returns

List of atomic numbers

static get_flat_data(graphs: List[Dict], targets: Optional[List] = None) tuple[source]

Expand the graph dictionary to form a list of features and targets tensors. This is useful when the model is trained on assembled graphs on the fly. :param graphs: (list of dictionary) list of graph dictionary for each structure :param targets: (list of float or list) Optional: corresponding target

values for each structure

Returns

tuple(node_features, edges_features, global_values, index1, index2, targets)

get_input(structure: pymatgen.core.structure.Structure) List[numpy.ndarray][source]

Turns a structure into model input

graph_to_input(graph: Dict) List[numpy.ndarray][source]

Turns a graph into model input :param (dict): Dictionary description of the graph

Returns

Inputs in the form needed by MEGNet

Return type

([np.ndarray])

class StructureGraphFixedRadius(nn_strategy: Optional[Union[str, pymatgen.analysis.local_env.NearNeighbors]] = None, atom_converter: Optional[megnet.data.graph.Converter] = None, bond_converter: Optional[megnet.data.graph.Converter] = None, **kwargs)[source]

Bases: megnet.data.graph.StructureGraph

This one uses a short cut to call find_points_in_spheres cython function in pymatgen. It is orders of magnitude faster than previous implementations

Parameters
  • nn_strategy (str or NearNeighbors) – NearNeighbor strategy

  • atom_converter (Converter) – atom converter

  • bond_converter (Converter) – bond converter

  • **kwargs

convert(structure: pymatgen.core.structure.Structure, state_attributes: Optional[List] = None) Dict[source]

Take a pymatgen structure and convert it to a index-type graph representation The graph will have node, distance, index1, index2, where node is a vector of Z number of atoms in the structure, index1 and index2 mark the atom indices forming the bond and separated by distance. For state attributes, you can set structure.state = [[xx, xx]] beforehand or the algorithm would take default [[0, 0]] :param state_attributes: (list) state attributes :param structure: (pymatgen structure) :param (dictionary):

classmethod from_structure_graph(structure_graph: megnet.data.graph.StructureGraph) megnet.data.graph.StructureGraphFixedRadius[source]

Initialize from pymatgen StructureGraph :param structure_graph: pymatgen StructureGraph object :type structure_graph: StructureGraph

Returns: StructureGraphFixedRadius object

itemgetter_list(data_list: List, indices: List) tuple[source]

Get indices of data_list and return a tuple :param data_list: data list :type data_list: list :param indices: (list) indices

Returns

(tuple)