ChemistryFeaturization.jl
Unified graph building and featurizing for Weave.jl, AtomicGraphNets.jl, and (maybe soon) more!
Documentation is starting to be built in the wiki!
This package is currently focused on bulk systems. For organic molecules, MolecularGraph is recommended. PubChem stores many molecular features for the compounds they catalog, and their data can be accessed via PubChemCrawler.
Features
Graph-building and featurization from CIF files
- Build graphs (as SimpleWeightedGraphs) from CIF files using PyCall to pymatgen functions
- Visualization using GraphPlot, check out the
visualize_graph
function in thegraph_functions.jl
file, you can make pretty pictures like these, whether the graph is simpler or more complicated (thickness of connections indicates weight of edge in graph (higher weights for nearer neighbors)):
(NB: this animation's syntax is slightly out of date, new one to come!)
- Flexible featurization (currently onehot-style) and decoding: choose features to include, level of discretization, etc., and directly decode feature vectors to check values:
julia> features = Symbol.(["Group", "Row", "Block", "Atomic mass", "Atomic radius", "X"])
6-element Array{Symbol,1}:
:Group
:Row
:Block
Symbol("Atomic mass")
Symbol("Atomic radius")
:X
julia> atom_feature_vecs, featurization = make_feature_vectors(features)
[ Info: 16 elements were dropped so that all features are defined.
julia> decode_feature_vector(atom_feature_vecs["Si"], featurization)
Dict{Symbol,Any} with 6 entries:
Symbol("Atomic mass") => (27.1071, 53.2064)
Symbol("Atomic radius") => (0.955, 1.19)
:Group => 14
:Row => 3
:Block => "p"
:X => (1.684, 2.012)
SMILES input
Sean to add...
Requirements
- Julia 1.4+
- packages listed in
Project.toml
- In addition, you will need your
PyCall
to have access to thepymatgen
package, which can be added usingConda.jl
as:Conda.add("pymatgen"; channel="conda-forge")
, as well as therdkit
package (Conda.add("rdkit"; channel="conda-forge")
)
Future Plans:
- "hybrid" featurizations using features from multiple paradigms if available
- more input file formats? e.g. SELFIES