maml.apps.bowsr package

Implementation of BOWSR paper Zuo, Yunxing, et al. “Accelerating Materials Discovery with Bayesian Optimization and Graph Deep Learning.” arXiv preprint arXiv:2104.10242 (2021).

Subpackages

  • maml.apps.bowsr.model package
    • EnergyModel
      • EnergyModel.predict_energy()
    • maml.apps.bowsr.model.base module
      • EnergyModel
        • EnergyModel.predict_energy()
    • maml.apps.bowsr.model.cgcnn module
    • maml.apps.bowsr.model.dft module
      • DFT
        • DFT.predict_energy()
    • maml.apps.bowsr.model.megnet module

maml.apps.bowsr.acquisition module

Module implements the new candidate proposal.

class maml.apps.bowsr.acquisition.AcquisitionFunction(acq_type: str, kappa: float, xi: float)

Bases: object

An object to compute the acquisition functions.

static _ei(x: list | np.ndarray, gpr: GaussianProcessRegressor, y_max: float, xi: float, noise: float)

static _gpucb(x: list | np.ndarray, gpr: GaussianProcessRegressor, noise: float)

static _poi(x: list | np.ndarray, gpr: GaussianProcessRegressor, y_max: float, xi: float, noise: float)

static _ucb(x: list | np.ndarray, gpr: GaussianProcessRegressor, kappa: float, noise: float)

calculate(x: list | np.ndarray, gpr: GaussianProcessRegressor, y_max: float, noise: float)

Calculate the value of acquisition function.

  • Parameters
    • x (ndarray) – Query point need to be evaluated.
    • gpr (GaussianProcessRegressor) – A Gaussian process regressor fitted to known data.
    • y_max (float) – The current maximum target value.
    • noise (float) – Noise added to acquisition function if noisy-based Bayesian optimization is performed, 0 otherwise.

maml.apps.bowsr.acquisition._trunc(values: ndarray, decimals: int = 3)

Truncate values to decimal places :param values: input array :type values: np.ndarray :param decimals: number of decimals to keep. :type decimals: int

Returns: truncated array

maml.apps.bowsr.acquisition.ensure_rng(seed: int | None = None)

Create a random number generator based on an optional seed. This can be an integer for a seeded rng or None for an unseeded rng.

maml.apps.bowsr.acquisition.lhs_sample(n_intervals: int, bounds: np.ndarray, random_state: RandomState)

Latin hypercube sampling.

  • Parameters
    • n_intervals (int) – Number of intervals.
    • bounds (nd.array) – Bounds for each dimension.
    • random_state (RandomState) – Random state.

maml.apps.bowsr.acquisition.predict_mean_std(x: list | np.ndarray, gpr: GaussianProcessRegressor, noise: float)

Speed up the gpr.predict method by manually computing the kernel operations.

  • Parameters
    • x (list/ndarray) – Query point need to be evaluated.
    • gpr (GaussianProcessRegressor) – A Gaussian process regressor fitted to known data.
    • noise (float) – Noise added to standard deviation if test target instead of GP posterior is sampled. 0 otherwise.

maml.apps.bowsr.acquisition.propose_query_point(acquisition, scaler, gpr, y_max, noise, bounds, random_state, sampler, n_warmup=10000)

Strategy used to find the maximum of the acquisition function. It uses a combination of random sampling (cheap) and the ‘L-BFGS-B’ optimization method by first sampling n_warmup points at random and running L-BFGS-B from n_iter random starting points.

  • Parameters
    • acquisition – The acquisition function.
    • scaler – The Scaler used to transform features.
    • gpr (GaussianProcessRegressor) – A Gaussian process regressor fitted to known data.
    • y_max (float) – The current maximum target value.
    • noise (float) – The noise added to the acquisition function if noisy-based Bayesian optimization was performed.
    • bounds (ndarray) – The bounds of candidate points.
    • random_state (RandomState) – Random number generator.
    • sampler (str) – Sampler generating warmup points. “uniform” or “lhs”.
    • n_warmup (int) – Number of randomly sampled points to select the initial point for minimization.

maml.apps.bowsr.optimizer module

Module implements the BayesianOptimizer.

class maml.apps.bowsr.optimizer.BayesianOptimizer(model: EnergyModel, structure: Structure, relax_coords: bool = True, relax_lattice: bool = True, use_symmetry: bool = True, use_scaler: bool = True, noisy: bool = True, seed: int | None = None, **kwargs)

Bases: object

Bayesian optimizer used to optimize the structure.

add_query(x: ndarray)

Add query point into the TargetSpace.

  • Parameters x (ndarray) – Query point.

as_dict()

Dict representation of BayesianOptimizer.

classmethod from_dict(d)

Reconstitute a BayesianOptimizer object from a dict representation of BayesianOptimizer created using as_dict().

  • Parameters d (dict) – Dict representation of BayesianOptimizer.

get_derived_structure(x: ndarray)

Get the derived structure.

  • Parameters x (ndarray) – The input of getting perturbed structure.

get_formation_energy(x: ndarray)

Calculate the formation energy of the perturbed structure. Absolute value is calculated on practical purpose of maximization of target function in Bayesian optimization.

  • Parameters x (ndarray) – The input of formation energy calculation.

get_optimized_structure_and_energy(cutoff_distance: float = 1.1)

  • Parameters cutoff_distance (float) – Cutoff distance of the allowed shortest atomic distance in reasonable structures. When the cutoff_distance is 0, any structures will be considered reasonable.

property gpr()

Returns the Gaussian Process regressor.

optimize(n_init: int, n_iter: int, acq_type: str = ‘ei’, kappa: float = 2.576, xi: float = 0.0, n_warmup: int = 1000, is_continue: bool = False, sampler: str = ‘lhs’, **gpr_params)

Optimize the coordinates and/or lattice of the structure by minimizing the model predicted formation energy of the structure. Model prediction error can be considered as a constant white noise.

  • Parameters
    • n_init (int) – The number of initial points.
    • n_iter (int) – The number of iteration steps.
    • acq_type (str) – The type of acquisition function. Three choices are given, ucb: Upper confidence bound, ei: Expected improvement, poi: Probability of improvement.
    • kappa (float) – Tradeoff parameter used in upper confidence bound formulism.
    • xi (float) – Tradeoff parameter between exploitation and exploration.
    • n_warmup (int) – Number of randomly sampled points to select the initial point for minimization.
    • is_continue (bool) – whether to continue previous run without resetting GPR
    • sampler (str) – Sampler generating initial points. “uniform” or “lhs”.
    • **gpr_params – Passthrough.

propose(acquisition_function: AcquisitionFunction, n_warmup: int, sampler: str)

Suggest the next most promising point.

  • Parameters
    • acquisition_function (AcquisitionFunction) – AcquisitionFunction.
    • n_warmup (int) – Number of randomly sampled points to select the initial point for minimization.
    • sampler (str) – Sampler. Options are Latin Hyperparameter Sampling and uniform sampling.

set_bounds(**bounds_parameter)

Set the bound value of wyckoff perturbation and lattice perturbation.

set_gpr_params(**gpr_params)

Set the parameters of internal GaussianProcessRegressor.

set_space_empty()

Empty the target space.

property space()

Returns the target space.

maml.apps.bowsr.optimizer.atoms_crowded(structure: Structure, cutoff_distance: float = 1.1)

Identify whether structure is unreasonable because the atoms are “too close”.

  • Parameters
    • structure (Structure) – Pymatgen Structure object.
    • cutoff_distance (float) – The minimum allowed atomic distance.

maml.apps.bowsr.optimizer.struct2perturbation(structure: Structure, use_symmetry: bool = True, wyc_tol: float = 0.0003, abc_tol: float = 0.001, angle_tol: float = 0.2, symprec: float = 0.01)

Get the symmetry-driven perturbation of the structure.

  • Parameters
    • structure (Structure) – Pymatgen Structure object.
    • use_symmetry (bool) – Whether to use constraint of symmetry to reduce parameters space.
    • wyc_tol (float) – Tolerance for wyckoff symbol determined coordinates.
    • abc_tol (float) – Tolerance for lattice lengths determined by crystal system.
    • angle_tol (float) – Tolerance for lattice angles determined by crystal system.
    • symprec (float) – Tolerance for symmetry finding.
  • Returns WyckoffPerturbations for derivation of symmetrically
  unique sites. Used to derive the coordinates of the sites.

indices (list): Indices of symmetrically unique sites. mapping (dict): A mapping dictionary that maps equivalent atoms

onto each other.

lp (LatticePerturbation): LatticePerturbation for derivation of lattice

  of the structure.
  • Return type wps (list)

maml.apps.bowsr.perturbation module

Module implements the perturbation class for atomic and lattice relaxation.

class maml.apps.bowsr.perturbation.LatticePerturbation(spg_int_symbol: int, use_symmetry: bool = True)

Bases: object

Perturbation class for determining the standard lattice.

property abc(: list[float )

Returns the lattice lengths.

property fit_lattice(: boo )

Returns whether the lattice fits any crystal system.

property lattice(: Lattic )

Returns the lattice.

sanity_check(lattice: Lattice, abc_tol: float = 0.001, angle_tol: float = 0.3)

Check whether the perturbation mode exists.

  • Parameters
    • lattice (Lattice) – Lattice in Structure.
    • abc_tol (float) – Tolerance for lattice lengths determined by crystal system.
    • angle_tol (float) – Tolerance for lattice angles determined by crystal system.

class maml.apps.bowsr.perturbation.WyckoffPerturbation(int_symbol: int, wyckoff_symbol: str, symmetry_ops: list[SymmOp] | None = None, use_symmetry: bool = True)

Bases: object

Perturbation class for determining the standard wyckoff position and generating corresponding equivalent fractional coordinates.

property fit_site()

Returns whether the site fits any standard wyckoff position.

get_orbit(p: list | np.ndarray, tol: float = 0.001)

Returns the orbit for a point.

  • Parameters
    • p (list/numpy.array) – Fractional coordinated point.
    • tol (float) – Tolerance for determining if sites are the same.

sanity_check(site: Site | PeriodicSite, wyc_tol: float = 0.0003)

Check whether the perturbation mode exists.

  • Parameters
    • site (PeriodicSite) – PeriodicSite in Structure.
    • wyc_tol (float) – Tolerance for wyckoff symbol determined coordinates.

property site()

Returns the site.

standardize(p: list | np.ndarray, tol: float = 0.001)

Get the standardized position of p.

  • Parameters
    • p (list/numpy.array) – Fractional coordinated point.
    • tol (float) – Tolerance for determining if sites are the same.

maml.apps.bowsr.perturbation.crystal_system(int_number: int)

Method for crystal system determination.

  • Parameters int_number (int) – International number of space group.

maml.apps.bowsr.perturbation.get_standardized_structure(structure: Structure)

Get standardized structure.

  • Parameters structure (Structure) – Pymatgen Structure object.

maml.apps.bowsr.perturbation.perturbation_mapping(x, fixed_indices)

Perturbation mapping.

  • Parameters
    • x
    • fixed_indices

Returns:

maml.apps.bowsr.preprocessing module

Module implements the scaler.

class maml.apps.bowsr.preprocessing.DummyScaler()

Bases: MSONable

Dummy scaler does nothing.

as_dict()

Serialize the instance into dictionary Returns:

fit(target: list | np.ndarray)

Fit the DummyScaler to the target.

  • Parameters target (ndarray) – The (mxn) ndarray. m is the number of samples, n is the number of feature dimensions.

classmethod from_dict(d)

Deserialize from a dictionary :param d: Dict, dictionary contain class initialization parameters.

Returns:

inverse_transform(transformed_target: list | np.ndarray)

Inversely transform the target.

  • Parameters transformed_target (ndarray) – The (mxn) ndarray. m is the number of samples, n is the number of feature dimensions.

transform(target: list | np.ndarray)

Transform target.

  • Parameters target (ndarray) – The (mxn) ndarray. m is the number of samples, n is the number of feature dimensions.

class maml.apps.bowsr.preprocessing.StandardScaler(mean: list | np.ndarray | None = None, std: list | np.ndarray | None = None)

Bases: MSONable

StandardScaler follows the sklean manner with addition of dictionary representation.

as_dict()

Dict representation of StandardScaler.

fit(target: list | np.ndarray)

Fit the StandardScaler to the target.

  • Parameters target (ndarray) – The (mxn) ndarray. m is the number of samples, n is the number of feature dimensions.

classmethod from_dict(d)

Reconstitute a StandardScaler object from a dict representation of StandardScaler created using as_dict().

  • Parameters d (dict) – Dict representation of StandardScaler.

inverse_transform(transformed_target: ndarray)

Inversely transform the target.

  • Parameters transformed_target (ndarray) – The (mxn) ndarray. m is the number of samples, n is the number of feature dimensions.

transform(target: ndarray)

Transform target according to the mean and std.

  • Parameters target (ndarray) – The (mxn) ndarray. m is the number of samples, n is the number of feature dimensions.

maml.apps.bowsr.target_space module

Module implements the target space.

class maml.apps.bowsr.target_space.TargetSpace(target_func: Callable, wps: list[WyckoffPerturbation], abc_dim: int, angles_dim: int, relax_coords: bool, relax_lattice: bool, scaler: StandardScaler | DummyScaler, random_state: RandomState)

Bases: object

Holds the perturbations of coordinates/lattice (x_wyckoff/x_lattice) – formation energy (Y). Allows for constant-time appends while ensuring no duplicates are added.

property bounds(: ndarra )

Returns the search space of parameters.

lhs_sample(n_intervals: int)

Latin hypercube sampling.

  • Parameters n_intervals (int) – Number of intervals.

property params(: ndarra )

Returns the parameters in target space.

probe(x)

Evaluate a single point x, to obtain the value y and then records them as observations.

  • Parameters x (ndarray) – A single point.

register(x, target)

Append a point and its target value to the known data.

  • Parameters
    • x (ndarray) – A single query point.
    • target (float) – Target value.

set_bounds(abc_bound: float = 1.2, angles_bound: float = 5, element_wise_wyckoff_bounds: dict | None = None)

Set the bound value of wyckoff perturbation and lattice perturbation/volume perturbation.

set_empty()

Empty the param, target of the space.

property target(: ndarra )

Returns the target (i.e., formation energy) in target space.

uniform_sample()

Creates random points within the bounds of the space.

maml.apps.bowsr.target_space._hashable(x)

Ensure that an point is hashable by a python dict.


© Copyright 2022, Materials Virtual Lab