“Automatic GROMACS” (a.k.a. AUTOMACS or just amx for short) is a biophysics simulation pipeline designed to help you run scalable, reproducible, and extensible simulations using the popular GROMACS integrator. The documentation walkthrough below describes how the codes work (and how you should use them). The components section describes the codes available in this particular copy of automacs, including any extensions you may have downloaded.
Everything but the kitchen sink can be found by searching the index. Before starting the walkthrough, you must collect additional automacs modules by running the following setup
command with the all
recipe. If this is your first time running automacs on a particular computer, you should also configure the gromacs paths by running make gromacs_config
.
make gromacs_config local
make setup all
The following sections explain how you can interact with the codes and outline the relatively minimal set of design constraints required for extending the codes to new use cases. The components section at the end includes the “live” documentation for your extension modules.
1 Concept¶
Automacs is a set of python codes which prepares molecular simulations using common tools orbiting the popular GROMACS integrator. The purpose of this project is to ensure that simulations are prepared according to a standard method which also bundles simulation data with careful documentation. Automacs (hopefully) makes it possible to generate large simulation datasets with a minimum of description and so-called “manual” labor which invites mistakes and wastes time. Automacs codes are meant to be cloned once for each simulation rather than installed in a central location. This means that each simulation has a copy of the code used to create it. The codes are extremely modular, and users can share novel experiments using git and the automacs configuration.
1.1 ”Overloaded Python”¶
High-level programming languages often rely on functions which can accept many different kinds of input while producing a consistent result that matches our intuition. This is called overloading. The automacs codes are overloaded in two ways. First, simulation data files and directories for different procedures are organized in a uniform way. These file-naming conventions are described in the framework. Users who follow these rules can benefit from generic functions that apply to many different simulation types. For example, performing restarts or ensemble changes in GROMACS uses a single generic procedure, regardless of whether you are doing atomistic or coarse-grained simulations. Second, the procedure codes are organized to reflect the consistent naming conventions so that they can be used in as many situations as possible. The simulation-specific settings are separated from the generic, modular steps required to build a simulation so that users can simulate a variety of different systems without rewriting any code. In the next section, we will describe how this separation happens.
1.2 Procedures¶
Automacs executes code in an extremely straightforward way: users first request an experiment, and then they run it. After you clone automacs, you can run a simulation with a single make command — the automacs interface consists only of make
commands and a set of customizations written to simple text files which we will explain shortly. In the following example, we choose to run the protein
experiment.
make go protein clean
The make go
command does three things: it clears the data, prepares a new script, and then runs it. We always start by cleaning up the data from a previous run — all useful data should be archived in a completed copy of automacs. Passing the clean
flag to make go
cleans up any old data by calling make clean sure
for you. The make prep
command lists all of the available experiments, which are detected according to instructions in the configuration. When you add extra modules to automacs, they typically come with new experiments, which means that make prep
returns a long list.
MENU
│
├──quick
│ ├── 12.. clear_lipidome
│ ├── 14.. generate_charmm_landscape
│ ├── 15.. generate_lipidome_restraints
│ ├── 16.. generate_lipidome_structures
│ ├── 21.. table
│ ├── 27.. vmd_cgmd_bilayer
│ ├── 28.. vmd_original_for_reference
│ └── 29.. vmd_protein
├──metarun
│ ├── 1... bilayer288
│ ├── 4... bilayer_control_flat_multiply
│ ├── 5... bilayer_control_multiply
│ ├── 6... bilayer_protein_aamd_banana
│ ├── 7... bilayer_protein_aamd_banana_ion_change
│ ├── 13.. enth-martini-demo
│ ├── 17.. lipidome
│ ├── 22.. test-h0
│ ├── 23.. test_helix0_flat
│ ├── 24.. trialanine-demo
│ ├── 25.. ultra
│ └── 26.. ultra2
└──run
├── 2... bilayer_control
├── 3... bilayer_control_flat
├── 8... bilayer_protein_adhesion
├── 9... bilayer_protein_adhesion_aamd
├── 10.. bilayer_protein_topology_only
├── 11.. bilayer_release
├── 18.. martinize
├── 19.. multiply
└── 20.. protein
An experiment identifies the script you wish to run (we sometimes call them “procedures” or alternately “parent scripts”), and how the simulation should be customized. In the example above, we choose the protein
experiment which serves as a demonstration of a simple protein-in-water simulation and requires very few extra codes. The make go
command above calls make prep protein
, which finds the right procedure script and copies it to script.py
in the root folder. It also collects the customizations and writes them to an experiment file called expt.json
, which will be discussed in the next section. The menu of experiments shown above indicates that protein
is a “run”. This is the standard experiment style, however we can also construct a “metarun”, which is a sequence of standard procedures, or a “quick” script, which is a very short piece of code. These will be outlined in the experiments section.
New users who wish to see how automacs works can run e.g. make clean && make go protein
or make go protein clean
(the latter does not ask for confirmation before deleting data). While this runs, you can take a look at script.py to see what the experiment looks like. These scripts always call on the customizations found in individual experiments (like protein
). These can be viewed in three places. The experiment file amx/proteins/protein_expts.py
is the source which generates the expt.json
with a few extra parameters. You can view this file from the terminal, but it is also included in this documentation along with the other components section. You can also run make look
which starts a python terminal with the state
variable, which you can read directly (it’s a dictionary, but you can use the dot operator like a class to look at e.g. state.step
). Of these three options, the experiment file is the only place you should change the parameters. We have combined everything into one step using make go
to simplify things, however automacs has a fairly minimal interface, and users can run the automacs scripts with only an expt.json
file and the associated python modules. Everything else is syntactic sugar.
If you wanted to skip the sugar and run the codes directly, you can use make prep protein
to prepare the expt.json
and script.py
files and then simply run python script.py
. If everything is in order, the simulation would run to completion. In this basic use-case, automacs has simply organized and executed some code for you. In practice, only the most mature codes run perfectly the first time. To make development easier, and to save a record of everything automacs does, we use make run
to supervise the exection of script.py
. We will explain this in detail in the section supervised execution below.
Using automacs is as simple as choosing an experiment, customizing it, and then running it with make go
. The best practice is to always copy and rename the experiments to change them so that you don’t lose track of which experiments work, and which ones still need some fine tuning.
1.2.1 Procedure scripts¶
Procedure scripts (sometimes we call these “parent scripts”) are standard python scripts which must only import a single package into the global namespace.
from amx import *
Using import *
may be somewhat un-Pythonic, however it allows our scripts to read like an entry in a lab notebook for running a computational experiment, and it generally makes them much more concise. The automacs import scheme does a lot of bookkeeping work for you behind the scenes. It reads the experiment, imports required modules that are attached to your local copy of automacs, and also ensures that all of your codes (functions, classes, etc.) have access to a namespace variable called state
. This dictionary variable (along with its partners expt
and settings
discussed later), effectively solves the problem of passing information between functions. Any function can read or write to the state, which is carefully passed to new codes and written to disk when the simulation is completed.
The most typical script is called protein.py
and generates an atomistic protein-in-water simulation.
#!/usr/bin/env python
"""
PROTEIN SIMULATION
Atomistic protein in water.
"""
from amx import *
init()
make_step(settings.step)
write_mdp()
if state.pdb_source: get_pdb(state.pdb_source)
else: get_start_structure(state.start_structure)
remove_hetero_atoms(
structure='start-structure.pdb',
out='start-structure-trim.pdb')
gmx('pdb2gmx',
base='vacuum',
structure='start-structure-trim.pdb',
gro='vacuum-alone',
water=settings.water,
ff=settings.force_field,
log='pdb2gmx')
copy_file('system.top','vacuum.top')
extract_itp('vacuum.top')
write_top('vacuum.top')
gmx('editconf',
structure='vacuum-alone',
gro='vacuum',
c=True,d='%.2f'%settings.water_buffer,
log='editconf-vacuum-room')
minimize('vacuum',method='steep')
solvate_protein(
structure='vacuum-minimized',
top='vacuum.top')
minimize('solvate')
counterions(
structure='solvate-minimized',
top='solvate',
ff_includes='ions')
minimize('counterions')
write_structure_pdb(
pdb='start-structure.pdb',
structure='counterions')
write_top('system.top')
equilibrate()
finished(state)
As long as your procedure script leads off with from amx import *
or alternately import amx
, then the import magic will import the core automacs functions (which also loads GROMACS), any extension modules you request, and distribute the state
to all of them. The remainder of the script is just a sequence of functions that generate new configurations, run inputs, and all the assorted paraphernalia for a typical simulation.
1.2.2 Functions¶
The individual functions in an automacs-style procedure typically perform a single, specific task that a user might otherwise perform at the terminal. Some functions can be used to copy files, write topologies, or execute the GROMACS integrator.
One of the most useful functions is called minimize()
, which automates the process of performing energy minimization in GROMACS by taking a configuration file (and its topology), generating run inputs and executing the GROMACS integrator via mdrun.
def minimize(name,method='steep',top=None):
"""
Energy minimization procedure.
Minimize a structure found at `name.gro` with topology
specified by the keyword argument `top` (otherwise `name.top`)
according to inputs found in input-<method>-in.mdp and ideally
prepared with :meth:`write_mdp <amx.automacs.write_mdp>`.
Writes output files to `em-<name>-<method>` and writes a
final structure to `<name>-minimized.gro`
"""
gmx('grompp',base='em-%s-%s'%(name,method),
top=name if not top else re.sub('^(.+)\.top$',r'\1',top),
structure=name,log='grompp-%s-%s'%(name,method),
mdp='input-em-%s-in'%method,skip=True)
tpr = state.here+'em-%s-%s.tpr'%(name,method)
if not os.path.isfile(tpr):
raise Exception('cannot find %s'%tpr)
gmx('mdrun',
base='em-%s-%s'%(name,method),
log='mdrun-%s-%s'%(name,method))
shutil.copyfile(
state.here+'em-'+'%s-%s.gro'%(name,method),
state.here+'%s-minimized.gro'%name)
The minimize function has straightforward inputs and outputs, but it also makes use of state.here
, which holds the path to the current step (a folder) for this simulation. Note that most simulations only require a single step, whereas multi-step procedures might use a handful of steps. It also expects to find an mdp
file with the appropriate name, and hence implicitly relies on another function called write_mdp
to prepare these files. Most functions work this way, so that they can be easily used in many situations. Ideally, their docstrings, which are collected in the documentation index should explain the correct inputs and outputs.
1.2.3 Supervised execution¶
Robust simulation procedures can always be run with python script.py
once they are prepared, however automacs includes a useful “supervision” feature that provides two advantages that are particularly useful for developing code.
- The shared namespace called
state
is saved to a file calledstate.json
when the job is complete. All functions that are imported by automacs are decorated with a function that logs its exeuction to thestate.history
variable. - Errors are logged to special variables inside of the
state
so that user-developers can correct errors and continue the experiment from the last successful step. The code makes use of Python’s internal syntax parser in order to find the earliest change in your code. This can be particularly useful when you are adding steps to a procedure which is still under development because it means that you don’t have to repeat the earlier steps. Even if the procedure script located atscript.py
doesn’t change, automacs still knows where to continue execution without repeating itself. - In the event that users wish to “chain” together a sequence of multiple discrete simulation steps, automacs can look back to completed steps (with states saved to e.g.
state_1.json
) in order to access important details about the simulation, including its geometry and composition. Chaining multiple steps requires a “metarun” procedure and uses the alternatemake metarun
command instead ofmake run
, but otherwise the execution is the same. The no-repetition feature described above in item two also works when chaining steps together as part of a "metarun".
Since the second feature above (which we call “iterative reexecution”) is aimed at developers, it is hidden from the user and happens automatically when you repeat a failed simulation. That is, simulations will automatically continue from a failed state when you run make run
after an error.
The more important function of the shared namespace is that all parameters are automatically available to all imported functions via state
dictionary. This saves users from writing interfaces between functions, and also provides a snapshot of your simulation in state.json
when it is completed. This is explained further in the settings blocks documentation below.
The remainder of the documentation covers the GROMACS- and automacs-specific Configuration, the command-line interface, and the general Framework for organizing the data. The last part of the documentation, titled components also provides a “live” snapshot of the documentation for extension modules.
2 Configuration¶
Automacs clearly divides experiment parameters from settings required to run the code. The scientific parameters are placed in experiment files and are designed to be portable. Software locations, hardware settings, and the precise configuration of your copy of automacs are set in two places: one is specific to GROMACS, and the other configures automacs.
2.1 GROMACS¶
Automacs needs to find GROMACS exeuctables at the terminal in order to run your simulations. If you install a single copy of GROMACS for all users, than the default configuration will suffice, but either way, automacs will look for a gromacs configuration file in one of two locations.
Running make prep
for the first time causes automacs to check for the configuration. If it can’t find one, it throws an error and asks you to run the configure script. If you run make gromacs_config home
, it will write the example configuration file to a hidden file in your home directory at ~/.automacs.py
. You can override the global configuration with a local one, written to ./gromacs_config.py
, by running make gromacs_config local
, or by copying the file to the automacs root directory yourself. We recommend setting up a global configuration for a particular system, and using the local copies to customize particular simulations that might require more or less computing power.
These configuration files consist of a single dictionary called machine_configuration
with keys that should correspond to a portion of the hostname. Any key that uniquely matches the hostname provides the configuration for the simulation (otherwise the LOCAL
key provides the default configuration). The following example includes an entry for a popular supercomputer in Texas called stampede
.
machine_configuration = {
#---the "LOCAL" machine is default, protected
'LOCAL':dict(
gpu_flag = 'auto',
),
'stampede':dict(
gmx_series = 5,
#---refer to a string that contains the PBS header flags
cluster_header = stampede_header,
ppn = 16,
walltime = "24:00",
nnodes = 1,
#---many systems use the "_mpi" suffix on the executables
suffix = '',
#---many systems will only run GROMACS binaries through mpi
mdrun_command = '$(echo "ibrun -n NPROCS -o 0 mdrun_mpi")',
allocation = 'ALLOCATION_CODE_HERE',
submit_command = 'sbatch',
),
}
Users can customize the number of processors per node (ppn
), the number of nodes (nnodes
), allocation codes, and even the batch submission command so that these jobs can run properly on many different machines. These parameters are packaged into a cluster-continue.sh
file within each step directory when users run make cluster
on their supercomputing platform. The default configuration provided by make gromacs_config local
provides a few useful examples. Users can submit cluster-continue.sh
directly to the queue to continue the jobs. The extend
and until
parameters in the machine configuration are used to set the number of additional or total picoseconds to run for, otherwise the jobs will consume all of the available walltime and gently stop just before it’s up.
Since each cluster typically has its own PBS header format, users can place these in text files (e.g. stampede_header
above). Automacs will automatically replaced any capitalized text in these headers with the value corresponding to keys in the machine_configuration
dictionary. For example, the nnodes = 1
setting causes NNODES
in the stampede_header
to be replaced with the number 1
. This replacement strategy makes it easy to choose a specific configuration for new jobs, or to set the default configuration on a machine using make gromacs_config home
once without having to bother with it when you create new simulations.
2.1.1 Versions¶
Many systems have multiple copies of GROMACS which can be loaded or unloaded using environment modules. To load a certain module, you can add them to a string or list in the module
key in the machine_configuration
. You can also add module
commands to the cluster header scripts described above.
2.2 Automacs¶
Automacs can use a number of different extension modules which can be easily shared with other users by packaging them in git repositories. Most users will opt to automatically clone several extensions at once using the setup procedure described below. Individual extensions can also be directly added to a copy of automacs using a simple command which manipulates the local config.py
file. This file describes all of paths that automacs uses, so that you are free to store your codes wherever you like. Extensions must be added from git repositories using the make set
utility, which writes config.py
.
make set module ~/path/to/extension.git local/path/to/extension
make set module source="https://github.com/username/extension" spot="inputs/path/to/extension"
The spot
argument is unconstrained; you can store the codes anywhere within the root directory. We prefer to put minor extensions in the inputs
folder, and major extensions directly in the root directory. The config.py
file will change as you add modules and interface functions. A local copy of your config.py is rendered here, as part of the live documentation of this copy of automacs.
2.2.1 Setup¶
At the top of the documentation we recommend that users run the make setup
command. Running e.g. make setup all
will pull all of the standard modules from their internet sources (typically github, however private repositories are also allowed as long as you use ssh aliases).
Cloning the proteins
code repository (part of the protein
and all
collections) will give you access to the protein
experiment listed under make prep
along with a few other experiments. We recommend using this as an example to get familiar with the automacs framework. Running make go protein reset
after the setup will run a simulation of the villin headpiece. The starting structure is set in the pdb_source
key in the protein experiment file. All of the experiment files can be viewed in this documentation by reading the experiments subsections of the components list.
2.2.2 Config file¶
This script clones a copy of automacs, and generates an initial copy of config.py
with the bare minimum settings. It then uses make set
to add extension modules, and to point the code to two command-line interface modules found in amx/cli.py
and inputs/docs/docs.py
using make set commands
. The latter is responsible for compiling this documentation and is written to take advantage of the makefile interface.
Starting simulations often requires starting configurations such as a protein crystal structure or the initial configuration for a polymer or bilayer. These files tend to be large, and should not be packaged alongside code. You can always place them in their own extension module.
2.2.3 Paths¶
The config.py
file describes the rules for finding experiments. Since many extensions may provide many different standalone experiments and test sets, you may have a large list of experiments. Rather than requiring that each experiment has its own file, you can organize multiple experiments into one experiment file. Automacs finds these files according to the inputs
item in config.py
. This can be a single string with a path to your experiments file, or a list of paths. Any path can contain wildcards. For the most flexibility, you can also set inputs
to '@regex^.*?_expts\\.py$'
, where everything after @regex
is a regular expression for matching any file in the automacs subtree. In this example, we require all experiment files to be named e.g. my_experiment_group_expts.py
.
2.2.4 Interfaces¶
Another item in the config.py
dictionary is called commands
. It provides explicit paths to python scripts containing command-line interface functions described in the interface section.
2.2.5 Bulky inputs¶
Starting simulations often requires starting configurations such as a protein crystal structure or the initial configuration for a polymer or bilayer. These files tend to be large, and should not be packaged alongside code. You can always place them in their own extension module.
2.3 ”Live” Docs¶
This documentation uses the modules list config.py
to include the automatic documentation of any extension modules alongside this walkthrough. These are listed in the components section below. Some extensions may only include starting structures or raw data, in which case they will be blank. This scheme ensures that adding codes to your copy of automacs will make it easy to read the accompanying documentation. Each copy of the documentation also serves as a “live” snapshot of the available codes.
3 Interface¶
Automacs execution is controlled almost entirely by text files which hold experiments. There are a few commands that run the experiments which are executed by a very peculiar, overloaded Makefile
which routes user commands to the appropriate python codes using the makeface
functions. We use this scheme because make is ubiquitous on many computer systems, it often includes automatic completion, and it’s easy to remember. The interface is extremely generic: almost any python function can be exposed to the interface. To see which make
sub-commands are available, simply run make
without arguments.
make targets
│
├──back
├──clean
├──cluster
├──config
├──docs
├──download
├──flag_search
├──gitcheck
├──gitpull
├──go
├──gromacs_config
├──layout
├──locate
├──look
├──metarun
├──notebook
├──prep
├──prep?
├──qsub
├──quick
├──run
├──set
├──setup
├──upload
└──watch
3.1 Commands¶
As we discused in the procedures section, users can run experiments using make go
. To see which experiments are available, use make prep
which lists them. These are the two most important commands. The make go
command will run make clean sure
if you send it the reset
flag, (this will clear any old data from the directory, so be careful) and then immediately run one of the execution commands, depending on the type of experiment, using make run
, make metarun
, or make quick
.
3.2 Additions¶
In the configuration section we mentioned that the commands
key in config.py
tells automacs where to find new functions. Any paths set in the commands
list are scanned by the makeface
module for python functions. These function names become sub-commands for make
.
Arguments are passed from the makefile
to the python code according to a few simple rules. The functions cannot use *args
or **kwargs
because the interface code performs introspection to send arguments to the function. You can use key="value"
pairs to specify both arguments and keyword arguments. Sending a single flag (e.g. sure
in make clean sure
) will send sure=True
as a boolean to the function. Order only matters if you are passing arguments (for obvious reasons). The safest bet is to use keyword arguments to avoid mistakes, however functions are straightforward because they only take one argument e.g. make go
.
Most of the automacs utility functions are found in the command-line interface
module and the control
module.
3.3 Tricks¶
The experiment files provide execution instructions to automacs depending on their format. These formats are specified in the controlspec code
, which determines the control flow for the execution. The remainder of the interface functions are really just helper functions that make some tasks easier. The following lists covers a few useful functions.
make back
will help you run simulations in the background.make watch
will monitor the newest log file in your simulation.make locate
will help you find a specific function.make upload
will upload files to a cluster.make download
will download your files from the same cluster.make notebook
will generate a Jupyter notebook from an experiment.
4 Framework¶
The automacs codes have developed from a set of BASH, Perl, and Python scripts designed to construct specific simulations. Over time, the automacs developers chose convention over configuration when designing new codes. This means that most functions are designed to be generic, discrete simulation steps create files with coherent naming schemes, and input/output flags for most functions look very similar. This makes the codes more general, and hence easy to apply to new simulations. Most of the variations between simulations are directed by experiments described in this section. Experiments are almost entirely specified by special python dictionaries and strings which are designed for readability.
In this section we describe the experiment files, file naming conventions, and namespaces.
4.1 Experiments¶
A key design feature of automacs is that its computational experiments are specified almost entirely by text. While this text depends on many functions, we have sought to separate generic functions from highly-customized experiments so that users can easily reproduce, modify, and repeat experiments.
In the finding experiments section, we explained that experiments can be located anywhere in the automacs directory tree as long as the config.py
is set correctly and the experiments are written to scripts suffixed with _expts.py
which only contain a single dictionary literal. This dictionary adds new experiments to the list (and automacs protects against redundant naming).
We highly recommend that users only create new experiments rather than modifying existing ones. Our experiments have many parameters, and validated experiments in a test set should always be preserved for posterity. There is no limit to the number of experiments you can write, so the best practice is to use clear experiment names and avoid changing already-validated experiments.
4.1.1 Modes¶
The make prep
command lists all available experiments organized into three modes: run, metarun, and quick. The make go
function chooses the correct mode and executes them accordingly. Setting the mode is accomplished by including the right keys in your experiment dictionary (this is explained in the control flow section below). Each type has a specific use case.
- Experiments run with
make run
are the standard simulation type. They require a single procedure (or “parent script”) which receives a singlesettings
block that contains all of the settings. Theprotein
demonstration is the canonical example of a run. - Experiments which use
make metarun
consist of a sequence of standard “run” experiments. Each step can contain its own settings, and these settings only override the defaults specified by the corresponding run. Each run creates a distinct step in the sequence. - Quick scripts are executed via
make quick
. They do not use themake prep
command, and are executed directly. Instead of using a parent script, they are executed directly from code embedded with thequick
parameter in the experiment itself. Quick scripts can be part of a metarun sequence.
The “metarun” method allows you to create a sequence of simulation steps (which we sometimes call “chaining”). The information is passed between steps using state.json
, which is described below in the :any:`posterity <posterity> `
4.1.2 Control flow¶
Recall that each experiment is an item in a dictionary literal found in files suffixed with _expts.py
according to the configuration. Each of the three experiment types described in the previous section must have a specific set of keys validated by the controlspec code
.
The best way to make a new experiment, of any type, is to copy one that already works. This saves you the effort of parsing the controlspec code
. This code provides lists of required keys for each experiment type, along with their minor variations. It is designed to be extensible, so you can modify the control flow without too much trouble, however most of the test sets packaged with automacs extension modules include examples of all of the variations.
If you fail to include the right keys, you will receive a specific error message. The standard experiment runs are the easiest, they require the following keys: ['script','extensions','params','tags','settings','cwd']
. The cwd
keys is appended automatically, the script
is the relative path to the parent script, and the settings
block holds the parameters for the experiment. The extensions allow your codes to import from other automacs extension modules (this helps eliminate redundancy across the codes). The tags
are simple metadata used for distinguishing e.g. atomistic and coarse-grained simulations. The params
key often points to a parameters.py
file that can be read from write_mdp
.
4.1.3 Settings¶
Aside from the script
parameter, which supplies the path to the parent script (e.g. the protein.py
script described earlier
), the settings
block contains most of the parameters. The following is an example from the protein
experiment.
step: protein # name of the folder is s01-protein
force field: charmm27 # which gromacs-standard force-field to use (see pdb2gmx list)
water: tip3p # which water model (another question from pdb2gmx)
equilibration: nvt-short,nvt,npt # which equilibration step to use (must have `input-name-in.mdp` below)
pdb source: 1yrf # PDB code for download. overrides the start structure
start structure: None # path to PDB structure or None to use a single PDB in inputs
protein water gap: 3.0 # Angstroms distance around the protein to remove water
water buffer: 1.2 # distance (nm) of solvent to the box wall
solvent: spc216 # starting solvent box (use spc216 from gromacs share)
ionic strength: 0.150 # desired molar ionic strength
cation: NA # name of the cation for neutralizing the system
anion: CL # name of the anion for neutralizing the system
#---INTEGRATOR PARAMETERS generated via parameters.py
mdp_specs:| {
'group':'aamd',
'mdps':{
'input-em-steep-in.mdp':['minimize'],
'input-em-cg-in.mdp':['minimize',{'integrator':'cg'}],
'input-md-nvt-eq-in.mdp':['nvt-protein','nvt-protein',{'nsteps':10000}],
'input-md-nvt-short-eq-in.mdp':['nvt-protein-short',{'nsteps':10000}],
'input-md-npt-eq-in.mdp':['npt-protein',{'nsteps':10000}],
'input-md-in.mdp':{'nsteps':100000},
},
}
The settings block is designed in a format meant to resemble YAML for its readability. Keys and values are separated by a colon, whitespace is omitted, and everything that follows the colon is interpreted first as python syntax, and if that fails, then as a float or a string. Multiline values (see mdp_specs
above) are noted with :|
instead of just a colon, and they continue until the arbitrary tab at the beginning of each line is absent. Commonts are allowed with hashes. These blocks are interpreted by the yamlb function
, which also uses the jsonify
function to check for repeated keys.
All of the settings are passed to expt.json
, unpacked into the global namespace variable mysteriously labelled settings
, and thereby exposed to any python functions imported by automacs. Note that the global namespace variable called state
described below will check settings
if it cannot find a key that you ask for. In that sense, the settings
are available everywhere in your code. See the state section below for more details.
4.2 Extensions¶
To avoid redundant codes in separate modules, the automacs modules can import codes from other modules. These codes are imported based on a list of globs <https://en.wikipedia.org/wiki/Glob_(programming)> in the extensions
item of the experiment. These can be paths relative to the root directory that point to other modules. Alternately, users can use syntax sugar to access other modules by the name of their directory. For example, adding all of the codes in bilayer module to your experiment can be done by adding @bilayer/codes/*.py
to your extensions list. The @bilayer
will be replaced by the location of the bilayer module according to config.py
, typically inputs/bilayers
. The @module
syntax sugar works in any key in the settings blocks, so that your experiments can draw from other extension modules without knowing where they are ahead of time. The path substitutions are handled in a special settings parser
.
Any objects imported by the main amx
module can be overridden by objects in the extension modules by adding their names to the _extension_override
list at the top of the script. Similarly, objects can be automatically shared between extensions using the _shared_extensions
list. These allow you to write a single code that either changes a core functionality in the main amx
module or is shared with in-development extension modules. The import scheme is handled almost entirely in the runner/importer.py
, which is omitted from the documentation for technical reasons. One example of a shared extension is the dotplace
function, which makes sure gro
file output has aligned decimal places.
4.3 State¶
In order to transmit key settings and measurements between simulation procedure steps or within functions in the same procedure, we store them in an overloaded dictionary called the state
. We use a special DotDict
class to access dictionary keys as attributes. For this reason, all spaces in the keys are replaced with underscores.
As we mentioned above, the state
consults the settings
when it cannot find a key that you ask for. This means that you can keep simulation parameters sequestered in the settings
while keeping on-the-fly calculations in state
. Everything gets saved to state.json
at the end.
We recommend accessing settings by using state
. In the example experiment above (in the settings block section), the water model is set by water
in settings. You could access it using the following syntax:
settings['water']
settings.water
settings.get('water','tips3p')
state['water']
state.water
state.q('water','tips3p')
We prefer the last two methods. Use settings.get
or state.q
if you wish to set a default in case the parameter is absent. Requesting an absent parameter from settings
will throw an exception, however, requesting an absent parameter from the state
always returns None
. This means that you can write e.g. if state.add_proteins: ...
to concisely control the execution of your simulation.
4.3.1 Posterity¶
In the introduction to the documentation we described the “supervised execution” of automacs codes. In short, this feature allows you to continue from the last command in a failed execution, but more importantly, it sends the state
everywhere and saves it to state.json
when the simulation is finished.
Saving variables¶
These features provide a simple way to create a sequence of simulation steps that depend on each other. These simulations are executed by make metarun
— sometimes we call this “chaining”. Information can passed to the next step simply by saving it in the state. For example, you might want to make a small bilayer larger by using the multiply
function (currently located in the extras
module). After constructing a simple bilayer, the composition is stored in state.composition
. In the second step of the metarun, the multiply.py
parent script can refer to state.before[-1]
to access a dictionary that holds the previous state. This also includes the settings at state.before[-1]['settings']
so that you don’t need to repeat your settings in the following steps of the metarun. This scheme allows sequential steps to communicate important details about the outcome, geometry, or other features of a simulation.
GROMACS History¶
In addition to saving the previous states, automacs also intercepts any calls to GROMACS commands and logs them in a special variable called history_gmx
. Users can call e.g. get_last_gmx_call('mdrun')
to retrieve the inputs and ouputs for the most recent call to any gromacs utility, typically mdrun
. This makes it easy to get the last checkpoint, run input file, or structure.
History¶
A complete record of everything that automacs does is recorded in state.history
. Every time an automacs function is called, it is added to this list in a pythonic format, with explicit *args
and **kwargs
. This feat is accomplished by the loud
function, which decorates every imported function, except for those named in the _acme_silence
variable (so that you can silence functions with extremely long arguments). The history is also written to a log file in each step folder called e.g. s01-protein/s01-protein.log
.
4.4 Naming conventions¶
While the state
and settings
described above are explicit features of automacs that determine its execution, we also follow a number of more implicit rules about preparing the data. These are fairly general, and only serve to make it easy to keep track of all of your data.
4.4.1 Directories¶
In order to ensure that automacs procedures can be re-used in many different situations, we enforce a consistent directory structure. This makes it easy for users to write shorter codes which infer the location of previous files without elaborate control statements. The basic rule is that each procedure gets a separate folder, and that subsequent procedures can find input data from the previous procedure folder.
We find it hard to imagine that users would chain more than 99 steps together, so we name each step with a common convention that includes the step number e.g. s01-bilayer
and s02-large-bilayer
etc. Many automacs functions rely on this naming structure. For example, the upload
function is designed to send only your latest checkpoint to a supercomputer to continue a simulation, and thereby avoid sending all of your data to a new system. The step folders also correspond to discrete states of the system, which are backed up to e.g. state_1.json
when the step is complete. When chaining runs together as part of a metarun, users can access previous states by using the history variables which record a history of what automacs has done so far.
The step name is always provided by the step
variable in the settings block. To create the directory, each parent script typically calls make_step(settings.step)
after its required initialization
. You will see state.here
used throughout the codes. It points to the current step directory.
4.4.2 Files¶
Within each procedure directory, we also enforce a file naming scheme that reflects much of the underlying GROMACS conventions. In particular, when simulations are extended across multiple executions, we follow the md.part0001.xtc
numbering rules. Every time the mdrun
integrator is invoked, automacs writes individual trajectory, input binary, and checkpoint files. Where possible, it also writes a configuration file at the conclusion of each run.
When we construct new simulations, we also follow a looser set of rules that makes it easy to see how the simulations were built.
- All GROMACS output to standard output and standard errors streams (that is, written to the terminal) is captured and stored in files prefixed with
log-<gromacs_binary>
. In this case we label the log file with the gromacs utility function used to generate it. Since many of these functions are called several times, we also use a name for that part of the procedure. For example, during bilayer construction, the files01-bilayer/log-grompp-solvate-steep
holds the preprocessor output for the steepest descent minimization of the water-solvated structure. - While output streams are routed to log files, the formal outputs from the GROMACS utilities are suffixed with a name that corresponds to their portion of the construction procedure. We use the prefix
em
to denote energy minimization andmd
to denote molecular dynamics. For example, minimizing a protein in vaccuum might output files such asem-vacuum.tpr
while the NVT equilibration step might be labelledmd-nvt.xtc
. - Intermediate steps that do not involve minimization or dynamics are typically prefixed with a consistent name. For example, when adding water to a protein or a bilayer, automacs will generate several intermediate structures, all prefixed with the word “solvate” e.g.
solvate-dense.gro
.
4.4.3 Getting inputs¶
A few protected keywords in the settings
blocks will copy input files for you. The sources
list should be a list of folders to copy into the current step, while files
points to individual files. All paths should be relative to the root directory, however there is syntax sugar for pointing to extensions.
4.5 Simplicity¶
In this section we have described how automacs organizes files. In general the file-naming rules are not absolute requirements for the simulations to complete. Instead, these “rules” have two purposes. First, if you use highly consistent and descriptive naming schemes, then you can easily re-use code in new situations. For example, many of the automacs procedures were developed for atomistic simulations. A few simple name changes along with some extra input files are oftentimes enough to port these procedures to coarse-grained systems or develop more complicated simulations.
The second purpose of our elaborate-yet-consistent naming scheme is to ensure that the data you produce are durable. Carefuly naming can ensure that future users who wish to study your data will not require an excessive amount of training to understand what it holds. An obvious naming scheme makes it easy to share data, find old simulations, and more importantly, parse the data with analysis programs once the dataset is complete. The omnicalc analysis package is designed to process data prepared by automacs, and these file-naming rules and saved state.json
files make it easy for these programs to be used together.
This concludes the automacs walkthrough. Check out the BioPhysCode project for extra calculation codes, that are designed to read and interpret simulations produced with automacs. Good luck!
5 Components¶
This section catalogs the codes loaded into the current copy of automacs. It parses the codes according to the local copy of config.py, which configures the connections to external codes.
5.1 proteins¶
The Proteins
extension is a component of automacs located at amx/proteins
and sourced from http://github.com/biophyscode/amx-proteins.git
.
5.2 bilayers¶
The Bilayers
extension is a component of automacs located at inputs/bilayers
and sourced from http://github.com/bradleyrp/amx-bilayers.git
.
5.3 charmm¶
The Charmm
extension is a component of automacs located at inputs/charmm
and sourced from http://github.com/bradleyrp/amx-charmm.git
.
5.4 docs¶
The Docs
extension is a component of automacs located at inputs/docs
and sourced from http://github.com/bradleyrp/amx-docs.git
.
5.5 extras¶
The Extras
extension is a component of automacs located at inputs/extras
and sourced from http://github.com/bradleyrp/amx-extras.git
.
5.6 martini¶
The Martini
extension is a component of automacs located at inputs/martini
and sourced from http://github.com/bradleyrp/amx-martini.git
.
5.7 polymers¶
The Polymers
extension is a component of automacs located at inputs/polymers
and sourced from http://github.com/bradleyrp/amx-polymers.git
.
5.8 vmd¶
The Vmd
extension is a component of automacs located at inputs/vmd
and sourced from http://github.com/bradleyrp/amx-vmd.git
.