In the Materials Virtual Lab, our goal is to do ground-breaking materials reasearch. The development of state-of-the-art software frameworks for materials analysis and performing high-throughput first principles computations is at the heart of what we do. We also believe that open-source development leads to sharing of best practices and more robust materials science analyses and software.
The Materials Virtual Lab is also a key partner of the Materials Project. We are the lead developers of the Python Materials Genomics (pymatgen) materials analysis library, pymatgen-db MongoDB plugin and the Custodian error detection and management framework. We are also co-developers of the FireWorks scientific workflow software. These packages are described below.
Workshop on MAVRL’s scientific software stack
In 2014, Prof Shyue Ping Ong, LBL Staff Scientist Dr Anubhav Jain and Zhi Deng conducted a half-day workshop on the software stack used by the Materials Virtual Lab. These slides are available through slideshare below. These slides provide an introduction to the software stack as well as useful usage examples.
Pymatgen (Python Materials Genomics) is a robust, open-source Python library for materials analysis. It currently powers the public Materials Project (http://www.materialsproject.org), an initiative to make calculated properties of all known inorganic materials available to materials researchers. These are some of the main features:
- Highly flexible classes for the representation of Element, Site, Molecule, Structure objects.
- Extensive io capabilities to manipulate many VASP (http://cms.mpi.univie.ac.at/vasp/) and ABINIT (http://www.abinit.org/) input and output files and the crystallographic information file format. This includes generating Structure objects from
vasp input and output. There is also support for Gaussian input files and XYZ file for molecules.
- Comprehensive tool to generate and view compositional and grand canonical phase diagrams.
- Electronic structure analyses (DOS and Bandstructure).
- Integration with the Materials Project REST API.
Find out more at the official pymatgen homepage.
Pymatgen-db is a database add-on for the Python Materials Genomics (pymatgen) materials analysis library. It enables the creation of Materials Project-style MongoDB databases for management of materials data. A query engine is also provided to enable the easy translation of MongoDB docs to useful pymatgen objects for analysis purposes. Find out more at the pymatgen-db page.
Custodian is a simple, robust and flexible just-in-time (JIT) job management framework written in Python. Using custodian, you can create wrappers that perform error checking, job management and error recovery. It has a simple plugin framework that allows you to develop specific job management workflows for different applications.
Error recovery is an important aspect of many high-throughput projects that generate data on a large scale. When you are running on the order of hundreds of thousands of jobs, even an error-rate of 1% would mean thousands of errored jobs that would be impossible to deal with on a case-by-case basis.
The specific use case for custodian is for long running jobs, with potentially random errors. For example, there may be a script that takes several days to run on a server, with a 1% chance of some IO error causing the job to fail. Using custodian, one can develop a mechanism to gracefully recover from the error, and restart the job with modified parameters if necessary.
The current version of Custodian also comes with two sub-packages for error handling for Vienna Ab Initio Simulation Package (VASP) and NwChem calculations. Find out more at the custodian page.
FireWorks is a free, open-source code for defining, managing, and executing scientific workflows authored by Dr Anubhav Jain of Lawrence Berkeley National Laboratory. It can be used to automate calculations over arbitrary computing resources, including those that have a queueing system. Find out more at the official Fireworks page.