megnet.data.graph module¶
Abstract classes and utility operations for building graph representations and data loaders (known as Sequence objects in Keras). Most users will not need to interact with this module.
- class BaseGraphBatchGenerator(dataset_size: int, targets: numpy.ndarray, sample_weights: Optional[numpy.ndarray] = None, batch_size: int = 128, is_shuffle: bool = True)[source]¶
Bases:
keras.utils.data_utils.Sequence
Base class for classes that generate batches of training data for MEGNet. Based on the Sequence class, which is the data loader equivalent for Keras. Implementations of this base class must implement the
_generate_inputs()
, which generates the lists of graph descriptions for a batch. Theprocess_atom_features()
function and related functions are used to modify the features for each atom, bond, and global features when creating a batch.- Parameters
dataset_size (int) – Number of entries in dataset
targets (ndarray) – Feature to be predicted for each network
sample_weights (npdarray) – sample weights
batch_size (int) – Maximum batch size
is_shuffle (bool) – Whether to shuffle the data after each step
- process_atom_feature(x: numpy.ndarray) numpy.ndarray [source]¶
- Parameters
x (np.ndarray) – atom features
- Returns
processed atom features
- class DummyConverter[source]¶
Bases:
megnet.data.graph.Converter
Dummy converter as a placeholder
- class EmbeddingMap(feature_matrix: numpy.ndarray)[source]¶
Bases:
megnet.data.graph.Converter
Convert an integer to a row vector in a feature matrix
- Parameters
feature_matrix – (np.ndarray) A matrix of shape (N, M)
- class GaussianDistance(centers: numpy.ndarray = array([0., 0.05050505, 0.1010101, 0.15151515, 0.2020202, 0.25252525, 0.3030303, 0.35353535, 0.4040404, 0.45454545, 0.50505051, 0.55555556, 0.60606061, 0.65656566, 0.70707071, 0.75757576, 0.80808081, 0.85858586, 0.90909091, 0.95959596, 1.01010101, 1.06060606, 1.11111111, 1.16161616, 1.21212121, 1.26262626, 1.31313131, 1.36363636, 1.41414141, 1.46464646, 1.51515152, 1.56565657, 1.61616162, 1.66666667, 1.71717172, 1.76767677, 1.81818182, 1.86868687, 1.91919192, 1.96969697, 2.02020202, 2.07070707, 2.12121212, 2.17171717, 2.22222222, 2.27272727, 2.32323232, 2.37373737, 2.42424242, 2.47474747, 2.52525253, 2.57575758, 2.62626263, 2.67676768, 2.72727273, 2.77777778, 2.82828283, 2.87878788, 2.92929293, 2.97979798, 3.03030303, 3.08080808, 3.13131313, 3.18181818, 3.23232323, 3.28282828, 3.33333333, 3.38383838, 3.43434343, 3.48484848, 3.53535354, 3.58585859, 3.63636364, 3.68686869, 3.73737374, 3.78787879, 3.83838384, 3.88888889, 3.93939394, 3.98989899, 4.04040404, 4.09090909, 4.14141414, 4.19191919, 4.24242424, 4.29292929, 4.34343434, 4.39393939, 4.44444444, 4.49494949, 4.54545455, 4.5959596, 4.64646465, 4.6969697, 4.74747475, 4.7979798, 4.84848485, 4.8989899, 4.94949495, 5.]), width=0.5)[source]¶
Bases:
megnet.data.graph.Converter
Expand distance with Gaussian basis sit at centers and with width 0.5.
- Parameters
centers – (np.array) centers for the Gaussian basis
width – (float) width of Gaussian basis
- class GraphBatchDistanceConvert(atom_features: List[numpy.ndarray], bond_features: List[numpy.ndarray], state_features: List[numpy.ndarray], index1_list: List[int], index2_list: List[int], targets: Optional[numpy.ndarray] = None, sample_weights: Optional[numpy.ndarray] = None, batch_size: int = 128, is_shuffle: bool = True, distance_converter: Optional[megnet.data.graph.Converter] = None)[source]¶
Bases:
megnet.data.graph.GraphBatchGenerator
Generate batch of structures with bond distance being expanded using a Expansor
- Parameters
atom_features – (list of np.array) list of atom feature matrix,
bond_features – (list of np.array) list of bond features matrix
state_features – (list of np.array) list of [1, G] state features, where G is the global state feature dimension
index1_list – (list of integer) list of (M, ) one side atomic index of the bond, M is different for different structures
index2_list – (list of integer) list of (M, ) the other side atomic index of the bond, M is different for different structures, but it has to be the same as the correponding index1.
targets – (numpy array), N*1, where N is the number of structures
sample_weights – (numpy array), N*1, where N is the number of structures
batch_size – (int) number of samples in a batch
is_shuffle – (bool) whether to shuffle the structure, default to True
distance_converter – (bool) converter for processing the distances
- class GraphBatchGenerator(atom_features: List[numpy.ndarray], bond_features: List[numpy.ndarray], state_features: List[numpy.ndarray], index1_list: List[int], index2_list: List[int], targets: Optional[numpy.ndarray] = None, sample_weights: Optional[numpy.ndarray] = None, batch_size: int = 128, is_shuffle: bool = True)[source]¶
Bases:
megnet.data.graph.BaseGraphBatchGenerator
A generator class that assembles several structures (indicated by batch_size) and form (x, y) pairs for model training.
- Parameters
atom_features – (list of np.array) list of atom feature matrix,
bond_features – (list of np.array) list of bond features matrix
state_features – (list of np.array) list of [1, G] state features, where G is the global state feature dimension
index1_list – (list of integer) list of (M, ) one side atomic index of the bond,
structures (M is different for different) –
index2_list – (list of integer) list of (M, ) the other side atomic index of the bond, M is different for different structures, but it has to be the same as the corresponding index1.
targets – (numpy array), N*1, where N is the number of structures
sample_weights – (numpy array), N*1, where N is the number of structures
batch_size – (int) number of samples in a batch
- class StructureGraph(nn_strategy: Optional[Union[str, pymatgen.analysis.local_env.NearNeighbors]] = None, atom_converter: Optional[megnet.data.graph.Converter] = None, bond_converter: Optional[megnet.data.graph.Converter] = None, **kwargs)[source]¶
Bases:
monty.json.MSONable
This is a base class for converting converting structure into graphs or model inputs Methods to be implemented are follows:
- convert(self, structure)
This is to convert a structure into a graph dictionary
- get_input(self, structure)
This method convert a structure directly to a model input
- get_flat_data(self, graphs, targets)
This method process graphs and targets pairs and output model input list.
- Parameters
- convert(structure: pymatgen.core.structure.Structure, state_attributes: Optional[List] = None) Dict [source]¶
Take a pymatgen structure and convert it to a index-type graph representation The graph will have node, distance, index1, index2, where node is a vector of Z number of atoms in the structure, index1 and index2 mark the atom indices forming the bond and separated by distance. For state attributes, you can set structure.state = [[xx, xx]] beforehand or the algorithm would take default [[0, 0]] :param state_attributes: (list) state attributes :param structure: (pymatgen structure) :param (dictionary):
- classmethod from_dict(d: Dict) megnet.data.graph.StructureGraph [source]¶
Initialization from dictionary :param d: dictionary :type d: dict
Returns: StructureGraph object
- static get_atom_features(structure) List[Any] [source]¶
Get atom features from structure, may be overwritten :param structure: (Pymatgen.Structure) pymatgen structure
- Returns
List of atomic numbers
- static get_flat_data(graphs: List[Dict], targets: Optional[List] = None) tuple [source]¶
Expand the graph dictionary to form a list of features and targets tensors. This is useful when the model is trained on assembled graphs on the fly. :param graphs: (list of dictionary) list of graph dictionary for each structure :param targets: (list of float or list) Optional: corresponding target
values for each structure
- Returns
tuple(node_features, edges_features, global_values, index1, index2, targets)
- class StructureGraphFixedRadius(nn_strategy: Optional[Union[str, pymatgen.analysis.local_env.NearNeighbors]] = None, atom_converter: Optional[megnet.data.graph.Converter] = None, bond_converter: Optional[megnet.data.graph.Converter] = None, **kwargs)[source]¶
Bases:
megnet.data.graph.StructureGraph
This one uses a short cut to call find_points_in_spheres cython function in pymatgen. It is orders of magnitude faster than previous implementations
- Parameters
- convert(structure: pymatgen.core.structure.Structure, state_attributes: Optional[List] = None) Dict [source]¶
Take a pymatgen structure and convert it to a index-type graph representation The graph will have node, distance, index1, index2, where node is a vector of Z number of atoms in the structure, index1 and index2 mark the atom indices forming the bond and separated by distance. For state attributes, you can set structure.state = [[xx, xx]] beforehand or the algorithm would take default [[0, 0]] :param state_attributes: (list) state attributes :param structure: (pymatgen structure) :param (dictionary):
- classmethod from_structure_graph(structure_graph: megnet.data.graph.StructureGraph) megnet.data.graph.StructureGraphFixedRadius [source]¶
Initialize from pymatgen StructureGraph :param structure_graph: pymatgen StructureGraph object :type structure_graph: StructureGraph
Returns: StructureGraphFixedRadius object