“Automatic GROMACS” (a.k.a. AUTOMACS or just amx for short) is a biophysics simulation pipeline designed to help you run scalable, reproducible, and extensible simulations using the popular GROMACS integrator. The documentation walkthrough below describes how the codes work (and how you should use them). The components section describes the codes available in this particular copy of automacs, including any extensions you may have downloaded.

Everything but the kitchen sink can be found by searching the index. Before starting the walkthrough, you must collect additional automacs modules by running the following setup command with the all recipe. If this is your first time running automacs on a particular computer, you should also configure the gromacs paths by running make gromacs_config.

make gromacs_config local
make setup all

The following sections explain how you can interact with the codes and outline the relatively minimal set of design constraints required for extending the codes to new use cases. The components section at the end includes the “live” documentation for your extension modules.

1   Concept

Automacs is a set of python codes which prepares molecular simulations using common tools orbiting the popular GROMACS integrator. The purpose of this project is to ensure that simulations are prepared according to a standard method which also bundles simulation data with careful documentation. Automacs (hopefully) makes it possible to generate large simulation datasets with a minimum of description and so-called “manual” labor which invites mistakes and wastes time. Automacs codes are meant to be cloned once for each simulation rather than installed in a central location. This means that each simulation has a copy of the code used to create it. The codes are extremely modular, and users can share novel experiments using git and the automacs configuration.

1.1   ”Overloaded Python”

High-level programming languages often rely on functions which can accept many different kinds of input while producing a consistent result that matches our intuition. This is called overloading. The automacs codes are overloaded in two ways. First, simulation data files and directories for different procedures are organized in a uniform way. These file-naming conventions are described in the framework. Users who follow these rules can benefit from generic functions that apply to many different simulation types. For example, performing restarts or ensemble changes in GROMACS uses a single generic procedure, regardless of whether you are doing atomistic or coarse-grained simulations. Second, the procedure codes are organized to reflect the consistent naming conventions so that they can be used in as many situations as possible. The simulation-specific settings are separated from the generic, modular steps required to build a simulation so that users can simulate a variety of different systems without rewriting any code. In the next section, we will describe how this separation happens.

1.2   Procedures

Automacs executes code in an extremely straightforward way: users first request an experiment, and then they run it. After you clone automacs, you can run a simulation with a single make command — the automacs interface consists only of make commands and a set of customizations written to simple text files which we will explain shortly. In the following example, we choose to run the protein experiment.

make go protein clean

The make go command does three things: it clears the data, prepares a new script, and then runs it. We always start by cleaning up the data from a previous run — all useful data should be archived in a completed copy of automacs. Passing the clean flag to make go cleans up any old data by calling make clean sure for you. The make prep command lists all of the available experiments, which are detected according to instructions in the configuration. When you add extra modules to automacs, they typically come with new experiments, which means that make prep returns a long list.

MENU
│
├──quick
│  ├── 12.. clear_lipidome
│  ├── 14.. generate_charmm_landscape
│  ├── 15.. generate_lipidome_restraints
│  ├── 16.. generate_lipidome_structures
│  ├── 21.. table
│  ├── 27.. vmd_cgmd_bilayer
│  ├── 28.. vmd_original_for_reference
│  └── 29.. vmd_protein
├──metarun
│  ├── 1... bilayer288
│  ├── 4... bilayer_control_flat_multiply
│  ├── 5... bilayer_control_multiply
│  ├── 6... bilayer_protein_aamd_banana
│  ├── 7... bilayer_protein_aamd_banana_ion_change
│  ├── 13.. enth-martini-demo
│  ├── 17.. lipidome
│  ├── 22.. test-h0
│  ├── 23.. test_helix0_flat
│  ├── 24.. trialanine-demo
│  ├── 25.. ultra
│  └── 26.. ultra2
└──run
   ├── 2... bilayer_control
   ├── 3... bilayer_control_flat
   ├── 8... bilayer_protein_adhesion
   ├── 9... bilayer_protein_adhesion_aamd
   ├── 10.. bilayer_protein_topology_only
   ├── 11.. bilayer_release
   ├── 18.. martinize
   ├── 19.. multiply
   └── 20.. protein

An experiment identifies the script you wish to run (we sometimes call them “procedures” or alternately “parent scripts”), and how the simulation should be customized. In the example above, we choose the protein experiment which serves as a demonstration of a simple protein-in-water simulation and requires very few extra codes. The make go command above calls make prep protein, which finds the right procedure script and copies it to script.py in the root folder. It also collects the customizations and writes them to an experiment file called expt.json, which will be discussed in the next section. The menu of experiments shown above indicates that protein is a “run”. This is the standard experiment style, however we can also construct a “metarun”, which is a sequence of standard procedures, or a “quick” script, which is a very short piece of code. These will be outlined in the experiments section.

New users who wish to see how automacs works can run e.g. make clean && make go protein or make go protein clean (the latter does not ask for confirmation before deleting data). While this runs, you can take a look at script.py to see what the experiment looks like. These scripts always call on the customizations found in individual experiments (like protein). These can be viewed in three places. The experiment file amx/proteins/protein_expts.py is the source which generates the expt.json with a few extra parameters. You can view this file from the terminal, but it is also included in this documentation along with the other components section. You can also run make look which starts a python terminal with the state variable, which you can read directly (it’s a dictionary, but you can use the dot operator like a class to look at e.g. state.step). Of these three options, the experiment file is the only place you should change the parameters. We have combined everything into one step using make go to simplify things, however automacs has a fairly minimal interface, and users can run the automacs scripts with only an expt.json file and the associated python modules. Everything else is syntactic sugar.

If you wanted to skip the sugar and run the codes directly, you can use make prep protein to prepare the expt.json and script.py files and then simply run python script.py. If everything is in order, the simulation would run to completion. In this basic use-case, automacs has simply organized and executed some code for you. In practice, only the most mature codes run perfectly the first time. To make development easier, and to save a record of everything automacs does, we use make run to supervise the exection of script.py. We will explain this in detail in the section supervised execution below.

Using automacs is as simple as choosing an experiment, customizing it, and then running it with make go. The best practice is to always copy and rename the experiments to change them so that you don’t lose track of which experiments work, and which ones still need some fine tuning.

1.2.1   Procedure scripts

Procedure scripts (sometimes we call these “parent scripts”) are standard python scripts which must only import a single package into the global namespace.

from amx import *

Using import * may be somewhat un-Pythonic, however it allows our scripts to read like an entry in a lab notebook for running a computational experiment, and it generally makes them much more concise. The automacs import scheme does a lot of bookkeeping work for you behind the scenes. It reads the experiment, imports required modules that are attached to your local copy of automacs, and also ensures that all of your codes (functions, classes, etc.) have access to a namespace variable called state. This dictionary variable (along with its partners expt and settings discussed later), effectively solves the problem of passing information between functions. Any function can read or write to the state, which is carefully passed to new codes and written to disk when the simulation is completed.

The most typical script is called protein.py and generates an atomistic protein-in-water simulation.

#!/usr/bin/env python

"""
PROTEIN SIMULATION
Atomistic protein in water.
"""

from amx import *

init()
make_step(settings.step)
write_mdp()
if state.pdb_source: get_pdb(state.pdb_source)
else: get_start_structure(state.start_structure)
remove_hetero_atoms(
    structure='start-structure.pdb',
    out='start-structure-trim.pdb')
gmx('pdb2gmx',
    base='vacuum',
    structure='start-structure-trim.pdb',
    gro='vacuum-alone',
    water=settings.water,
    ff=settings.force_field,
    log='pdb2gmx')
copy_file('system.top','vacuum.top')
extract_itp('vacuum.top')
write_top('vacuum.top')
gmx('editconf',
    structure='vacuum-alone',
    gro='vacuum',
    c=True,d='%.2f'%settings.water_buffer,
    log='editconf-vacuum-room')
minimize('vacuum',method='steep')
solvate_protein(
    structure='vacuum-minimized',
    top='vacuum.top')
minimize('solvate')
counterions(
    structure='solvate-minimized',
    top='solvate',
    ff_includes='ions')
minimize('counterions')
write_structure_pdb(
    pdb='start-structure.pdb',
    structure='counterions')
write_top('system.top')
equilibrate()
finished(state)

As long as your procedure script leads off with from amx import * or alternately import amx, then the import magic will import the core automacs functions (which also loads GROMACS), any extension modules you request, and distribute the state to all of them. The remainder of the script is just a sequence of functions that generate new configurations, run inputs, and all the assorted paraphernalia for a typical simulation.

1.2.2   Functions

The individual functions in an automacs-style procedure typically perform a single, specific task that a user might otherwise perform at the terminal. Some functions can be used to copy files, write topologies, or execute the GROMACS integrator.

One of the most useful functions is called minimize(), which automates the process of performing energy minimization in GROMACS by taking a configuration file (and its topology), generating run inputs and executing the GROMACS integrator via mdrun.

def minimize(name,method='steep',top=None):
  """
  Energy minimization procedure.

  Minimize a structure found at `name.gro` with topology
  specified by the keyword argument `top` (otherwise `name.top`)
  according to inputs found in input-<method>-in.mdp and ideally
  prepared with :meth:`write_mdp <amx.automacs.write_mdp>`.
  Writes output files to `em-<name>-<method>` and writes a
  final structure to `<name>-minimized.gro`
  """
  gmx('grompp',base='em-%s-%s'%(name,method),
    top=name if not top else re.sub('^(.+)\.top$',r'\1',top),
    structure=name,log='grompp-%s-%s'%(name,method),
    mdp='input-em-%s-in'%method,skip=True)
  tpr = state.here+'em-%s-%s.tpr'%(name,method)
  if not os.path.isfile(tpr):
    raise Exception('cannot find %s'%tpr)
  gmx('mdrun',
    base='em-%s-%s'%(name,method),
    log='mdrun-%s-%s'%(name,method))
  shutil.copyfile(
    state.here+'em-'+'%s-%s.gro'%(name,method),
    state.here+'%s-minimized.gro'%name)

The minimize function has straightforward inputs and outputs, but it also makes use of state.here, which holds the path to the current step (a folder) for this simulation. Note that most simulations only require a single step, whereas multi-step procedures might use a handful of steps. It also expects to find an mdp file with the appropriate name, and hence implicitly relies on another function called write_mdp to prepare these files. Most functions work this way, so that they can be easily used in many situations. Ideally, their docstrings, which are collected in the documentation index should explain the correct inputs and outputs.

1.2.3   Supervised execution

Robust simulation procedures can always be run with python script.py once they are prepared, however automacs includes a useful “supervision” feature that provides two advantages that are particularly useful for developing code.

  1. The shared namespace called state is saved to a file called state.json when the job is complete. All functions that are imported by automacs are decorated with a function that logs its exeuction to the state.history variable.
  2. Errors are logged to special variables inside of the state so that user-developers can correct errors and continue the experiment from the last successful step. The code makes use of Python’s internal syntax parser in order to find the earliest change in your code. This can be particularly useful when you are adding steps to a procedure which is still under development because it means that you don’t have to repeat the earlier steps. Even if the procedure script located at script.py doesn’t change, automacs still knows where to continue execution without repeating itself.
  3. In the event that users wish to “chain” together a sequence of multiple discrete simulation steps, automacs can look back to completed steps (with states saved to e.g. state_1.json) in order to access important details about the simulation, including its geometry and composition. Chaining multiple steps requires a “metarun” procedure and uses the alternate make metarun command instead of make run, but otherwise the execution is the same. The no-repetition feature described above in item two also works when chaining steps together as part of a "metarun".

Since the second feature above (which we call “iterative reexecution”) is aimed at developers, it is hidden from the user and happens automatically when you repeat a failed simulation. That is, simulations will automatically continue from a failed state when you run make run after an error.

The more important function of the shared namespace is that all parameters are automatically available to all imported functions via state dictionary. This saves users from writing interfaces between functions, and also provides a snapshot of your simulation in state.json when it is completed. This is explained further in the settings blocks documentation below.

The remainder of the documentation covers the GROMACS- and automacs-specific Configuration, the command-line interface, and the general Framework for organizing the data. The last part of the documentation, titled components also provides a “live” snapshot of the documentation for extension modules.

2   Configuration

Automacs clearly divides experiment parameters from settings required to run the code. The scientific parameters are placed in experiment files and are designed to be portable. Software locations, hardware settings, and the precise configuration of your copy of automacs are set in two places: one is specific to GROMACS, and the other configures automacs.

2.1   GROMACS

Automacs needs to find GROMACS exeuctables at the terminal in order to run your simulations. If you install a single copy of GROMACS for all users, than the default configuration will suffice, but either way, automacs will look for a gromacs configuration file in one of two locations.

Running make prep for the first time causes automacs to check for the configuration. If it can’t find one, it throws an error and asks you to run the configure script. If you run make gromacs_config home, it will write the example configuration file to a hidden file in your home directory at ~/.automacs.py. You can override the global configuration with a local one, written to ./gromacs_config.py, by running make gromacs_config local, or by copying the file to the automacs root directory yourself. We recommend setting up a global configuration for a particular system, and using the local copies to customize particular simulations that might require more or less computing power.

These configuration files consist of a single dictionary called machine_configuration with keys that should correspond to a portion of the hostname. Any key that uniquely matches the hostname provides the configuration for the simulation (otherwise the LOCAL key provides the default configuration). The following example includes an entry for a popular supercomputer in Texas called stampede.

machine_configuration = {
  #---the "LOCAL" machine is default, protected
  'LOCAL':dict(
    gpu_flag = 'auto',
    ),
  'stampede':dict(
    gmx_series = 5,
    #---refer to a string that contains the PBS header flags
    cluster_header = stampede_header,
    ppn = 16,
    walltime = "24:00",
    nnodes = 1,
    #---many systems use the "_mpi" suffix on the executables
    suffix = '',
    #---many systems will only run GROMACS binaries through mpi
    mdrun_command = '$(echo "ibrun -n NPROCS -o 0 mdrun_mpi")',
    allocation = 'ALLOCATION_CODE_HERE',
    submit_command = 'sbatch',
    ),
  }

Users can customize the number of processors per node (ppn), the number of nodes (nnodes), allocation codes, and even the batch submission command so that these jobs can run properly on many different machines. These parameters are packaged into a cluster-continue.sh file within each step directory when users run make cluster on their supercomputing platform. The default configuration provided by make gromacs_config local provides a few useful examples. Users can submit cluster-continue.sh directly to the queue to continue the jobs. The extend and until parameters in the machine configuration are used to set the number of additional or total picoseconds to run for, otherwise the jobs will consume all of the available walltime and gently stop just before it’s up.

Since each cluster typically has its own PBS header format, users can place these in text files (e.g. stampede_header above). Automacs will automatically replaced any capitalized text in these headers with the value corresponding to keys in the machine_configuration dictionary. For example, the nnodes = 1 setting causes NNODES in the stampede_header to be replaced with the number 1. This replacement strategy makes it easy to choose a specific configuration for new jobs, or to set the default configuration on a machine using make gromacs_config home once without having to bother with it when you create new simulations.

2.1.1   Versions

Many systems have multiple copies of GROMACS which can be loaded or unloaded using environment modules. To load a certain module, you can add them to a string or list in the module key in the machine_configuration. You can also add module commands to the cluster header scripts described above.

2.2   Automacs

Automacs can use a number of different extension modules which can be easily shared with other users by packaging them in git repositories. Most users will opt to automatically clone several extensions at once using the setup procedure described below. Individual extensions can also be directly added to a copy of automacs using a simple command which manipulates the local config.py file. This file describes all of paths that automacs uses, so that you are free to store your codes wherever you like. Extensions must be added from git repositories using the make set utility, which writes config.py.

make set module ~/path/to/extension.git local/path/to/extension
make set module source="https://github.com/username/extension" spot="inputs/path/to/extension"

The spot argument is unconstrained; you can store the codes anywhere within the root directory. We prefer to put minor extensions in the inputs folder, and major extensions directly in the root directory. The config.py file will change as you add modules and interface functions. A local copy of your config.py is rendered here, as part of the live documentation of this copy of automacs.

2.2.1   Setup

At the top of the documentation we recommend that users run the make setup command. Running e.g. make setup all will pull all of the standard modules from their internet sources (typically github, however private repositories are also allowed as long as you use ssh aliases).

Cloning the proteins code repository (part of the protein and all collections) will give you access to the protein experiment listed under make prep along with a few other experiments. We recommend using this as an example to get familiar with the automacs framework. Running make go protein reset after the setup will run a simulation of the villin headpiece. The starting structure is set in the pdb_source key in the protein experiment file. All of the experiment files can be viewed in this documentation by reading the experiments subsections of the components list.

2.2.2   Config file

This script clones a copy of automacs, and generates an initial copy of config.py with the bare minimum settings. It then uses make set to add extension modules, and to point the code to two command-line interface modules found in amx/cli.py and inputs/docs/docs.py using make set commands. The latter is responsible for compiling this documentation and is written to take advantage of the makefile interface.

Starting simulations often requires starting configurations such as a protein crystal structure or the initial configuration for a polymer or bilayer. These files tend to be large, and should not be packaged alongside code. You can always place them in their own extension module.

2.2.3   Paths

The config.py file describes the rules for finding experiments. Since many extensions may provide many different standalone experiments and test sets, you may have a large list of experiments. Rather than requiring that each experiment has its own file, you can organize multiple experiments into one experiment file. Automacs finds these files according to the inputs item in config.py. This can be a single string with a path to your experiments file, or a list of paths. Any path can contain wildcards. For the most flexibility, you can also set inputs to '@regex^.*?_expts\\.py$', where everything after @regex is a regular expression for matching any file in the automacs subtree. In this example, we require all experiment files to be named e.g. my_experiment_group_expts.py.

2.2.4   Interfaces

Another item in the config.py dictionary is called commands. It provides explicit paths to python scripts containing command-line interface functions described in the interface section.

2.2.5   Bulky inputs

Starting simulations often requires starting configurations such as a protein crystal structure or the initial configuration for a polymer or bilayer. These files tend to be large, and should not be packaged alongside code. You can always place them in their own extension module.

2.3   ”Live” Docs

This documentation uses the modules list config.py to include the automatic documentation of any extension modules alongside this walkthrough. These are listed in the components section below. Some extensions may only include starting structures or raw data, in which case they will be blank. This scheme ensures that adding codes to your copy of automacs will make it easy to read the accompanying documentation. Each copy of the documentation also serves as a “live” snapshot of the available codes.

3   Interface

Automacs execution is controlled almost entirely by text files which hold experiments. There are a few commands that run the experiments which are executed by a very peculiar, overloaded Makefile which routes user commands to the appropriate python codes using the makeface functions. We use this scheme because make is ubiquitous on many computer systems, it often includes automatic completion, and it’s easy to remember. The interface is extremely generic: almost any python function can be exposed to the interface. To see which make sub-commands are available, simply run make without arguments.

make targets
│
├──back
├──clean
├──cluster
├──config
├──docs
├──download
├──flag_search
├──gitcheck
├──gitpull
├──go
├──gromacs_config
├──layout
├──locate
├──look
├──metarun
├──notebook
├──prep
├──prep?
├──qsub
├──quick
├──run
├──set
├──setup
├──upload
└──watch

3.1   Commands

As we discused in the procedures section, users can run experiments using make go. To see which experiments are available, use make prep which lists them. These are the two most important commands. The make go command will run make clean sure if you send it the reset flag, (this will clear any old data from the directory, so be careful) and then immediately run one of the execution commands, depending on the type of experiment, using make run, make metarun, or make quick.

3.2   Additions

In the configuration section we mentioned that the commands key in config.py tells automacs where to find new functions. Any paths set in the commands list are scanned by the makeface module for python functions. These function names become sub-commands for make.

Arguments are passed from the makefile to the python code according to a few simple rules. The functions cannot use *args or **kwargs because the interface code performs introspection to send arguments to the function. You can use key="value" pairs to specify both arguments and keyword arguments. Sending a single flag (e.g. sure in make clean sure) will send sure=True as a boolean to the function. Order only matters if you are passing arguments (for obvious reasons). The safest bet is to use keyword arguments to avoid mistakes, however functions are straightforward because they only take one argument e.g. make go.

Most of the automacs utility functions are found in the command-line interface module and the control module.

3.3   Tricks

The experiment files provide execution instructions to automacs depending on their format. These formats are specified in the controlspec code, which determines the control flow for the execution. The remainder of the interface functions are really just helper functions that make some tasks easier. The following lists covers a few useful functions.

  1. make back will help you run simulations in the background.
  2. make watch will monitor the newest log file in your simulation.
  3. make locate will help you find a specific function.
  4. make upload will upload files to a cluster.
  5. make download will download your files from the same cluster.
  6. make notebook will generate a Jupyter notebook from an experiment.

4   Framework

The automacs codes have developed from a set of BASH, Perl, and Python scripts designed to construct specific simulations. Over time, the automacs developers chose convention over configuration when designing new codes. This means that most functions are designed to be generic, discrete simulation steps create files with coherent naming schemes, and input/output flags for most functions look very similar. This makes the codes more general, and hence easy to apply to new simulations. Most of the variations between simulations are directed by experiments described in this section. Experiments are almost entirely specified by special python dictionaries and strings which are designed for readability.

In this section we describe the experiment files, file naming conventions, and namespaces.

4.1   Experiments

A key design feature of automacs is that its computational experiments are specified almost entirely by text. While this text depends on many functions, we have sought to separate generic functions from highly-customized experiments so that users can easily reproduce, modify, and repeat experiments.

In the finding experiments section, we explained that experiments can be located anywhere in the automacs directory tree as long as the config.py is set correctly and the experiments are written to scripts suffixed with _expts.py which only contain a single dictionary literal. This dictionary adds new experiments to the list (and automacs protects against redundant naming).

We highly recommend that users only create new experiments rather than modifying existing ones. Our experiments have many parameters, and validated experiments in a test set should always be preserved for posterity. There is no limit to the number of experiments you can write, so the best practice is to use clear experiment names and avoid changing already-validated experiments.

4.1.1   Modes

The make prep command lists all available experiments organized into three modes: run, metarun, and quick. The make go function chooses the correct mode and executes them accordingly. Setting the mode is accomplished by including the right keys in your experiment dictionary (this is explained in the control flow section below). Each type has a specific use case.

  1. Experiments run with make run are the standard simulation type. They require a single procedure (or “parent script”) which receives a single settings block that contains all of the settings. The protein demonstration is the canonical example of a run.
  2. Experiments which use make metarun consist of a sequence of standard “run” experiments. Each step can contain its own settings, and these settings only override the defaults specified by the corresponding run. Each run creates a distinct step in the sequence.
  3. Quick scripts are executed via make quick. They do not use the make prep command, and are executed directly. Instead of using a parent script, they are executed directly from code embedded with the quick parameter in the experiment itself. Quick scripts can be part of a metarun sequence.

The “metarun” method allows you to create a sequence of simulation steps (which we sometimes call “chaining”). The information is passed between steps using state.json, which is described below in the :any:`posterity <posterity> `

4.1.2   Control flow

Recall that each experiment is an item in a dictionary literal found in files suffixed with _expts.py according to the configuration. Each of the three experiment types described in the previous section must have a specific set of keys validated by the controlspec code.

The best way to make a new experiment, of any type, is to copy one that already works. This saves you the effort of parsing the controlspec code. This code provides lists of required keys for each experiment type, along with their minor variations. It is designed to be extensible, so you can modify the control flow without too much trouble, however most of the test sets packaged with automacs extension modules include examples of all of the variations.

If you fail to include the right keys, you will receive a specific error message. The standard experiment runs are the easiest, they require the following keys: ['script','extensions','params','tags','settings','cwd']. The cwd keys is appended automatically, the script is the relative path to the parent script, and the settings block holds the parameters for the experiment. The extensions allow your codes to import from other automacs extension modules (this helps eliminate redundancy across the codes). The tags are simple metadata used for distinguishing e.g. atomistic and coarse-grained simulations. The params key often points to a parameters.py file that can be read from write_mdp.

4.1.3   Settings

Aside from the script parameter, which supplies the path to the parent script (e.g. the protein.py script described earlier), the settings block contains most of the parameters. The following is an example from the protein experiment.

step: protein                       # name of the folder is s01-protein
force field: charmm27               # which gromacs-standard force-field to use (see pdb2gmx list)
water: tip3p                        # which water model (another question from pdb2gmx)
equilibration: nvt-short,nvt,npt    # which equilibration step to use (must have `input-name-in.mdp` below)
pdb source: 1yrf                    # PDB code for download. overrides the start structure
start structure: None               # path to PDB structure or None to use a single PDB in inputs
protein water gap: 3.0              # Angstroms distance around the protein to remove water
water buffer: 1.2                   # distance (nm) of solvent to the box wall
solvent: spc216                     # starting solvent box (use spc216 from gromacs share)
ionic strength: 0.150               # desired molar ionic strength
cation: NA                          # name of the cation for neutralizing the system
anion: CL                           # name of the anion for neutralizing the system

#---INTEGRATOR PARAMETERS generated via parameters.py
mdp_specs:| {
  'group':'aamd',
  'mdps':{
    'input-em-steep-in.mdp':['minimize'],
    'input-em-cg-in.mdp':['minimize',{'integrator':'cg'}],
    'input-md-nvt-eq-in.mdp':['nvt-protein','nvt-protein',{'nsteps':10000}],
    'input-md-nvt-short-eq-in.mdp':['nvt-protein-short',{'nsteps':10000}],
    'input-md-npt-eq-in.mdp':['npt-protein',{'nsteps':10000}],
    'input-md-in.mdp':{'nsteps':100000},
    },
  }

The settings block is designed in a format meant to resemble YAML for its readability. Keys and values are separated by a colon, whitespace is omitted, and everything that follows the colon is interpreted first as python syntax, and if that fails, then as a float or a string. Multiline values (see mdp_specs above) are noted with :| instead of just a colon, and they continue until the arbitrary tab at the beginning of each line is absent. Commonts are allowed with hashes. These blocks are interpreted by the yamlb function, which also uses the jsonify function to check for repeated keys.

All of the settings are passed to expt.json, unpacked into the global namespace variable mysteriously labelled settings, and thereby exposed to any python functions imported by automacs. Note that the global namespace variable called state described below will check settings if it cannot find a key that you ask for. In that sense, the settings are available everywhere in your code. See the state section below for more details.

4.2   Extensions

To avoid redundant codes in separate modules, the automacs modules can import codes from other modules. These codes are imported based on a list of globs <https://en.wikipedia.org/wiki/Glob_(programming)> in the extensions item of the experiment. These can be paths relative to the root directory that point to other modules. Alternately, users can use syntax sugar to access other modules by the name of their directory. For example, adding all of the codes in bilayer module to your experiment can be done by adding @bilayer/codes/*.py to your extensions list. The @bilayer will be replaced by the location of the bilayer module according to config.py, typically inputs/bilayers. The @module syntax sugar works in any key in the settings blocks, so that your experiments can draw from other extension modules without knowing where they are ahead of time. The path substitutions are handled in a special settings parser.

Any objects imported by the main amx module can be overridden by objects in the extension modules by adding their names to the _extension_override list at the top of the script. Similarly, objects can be automatically shared between extensions using the _shared_extensions list. These allow you to write a single code that either changes a core functionality in the main amx module or is shared with in-development extension modules. The import scheme is handled almost entirely in the runner/importer.py, which is omitted from the documentation for technical reasons. One example of a shared extension is the dotplace function, which makes sure gro file output has aligned decimal places.

4.3   State

In order to transmit key settings and measurements between simulation procedure steps or within functions in the same procedure, we store them in an overloaded dictionary called the state. We use a special DotDict class to access dictionary keys as attributes. For this reason, all spaces in the keys are replaced with underscores.

As we mentioned above, the state consults the settings when it cannot find a key that you ask for. This means that you can keep simulation parameters sequestered in the settings while keeping on-the-fly calculations in state. Everything gets saved to state.json at the end.

We recommend accessing settings by using state. In the example experiment above (in the settings block section), the water model is set by water in settings. You could access it using the following syntax:

  1. settings['water']
  2. settings.water
  3. settings.get('water','tips3p')
  4. state['water']
  5. state.water
  6. state.q('water','tips3p')

We prefer the last two methods. Use settings.get or state.q if you wish to set a default in case the parameter is absent. Requesting an absent parameter from settings will throw an exception, however, requesting an absent parameter from the state always returns None. This means that you can write e.g. if state.add_proteins: ... to concisely control the execution of your simulation.

4.3.1   Posterity

In the introduction to the documentation we described the “supervised execution” of automacs codes. In short, this feature allows you to continue from the last command in a failed execution, but more importantly, it sends the state everywhere and saves it to state.json when the simulation is finished.

Saving variables

These features provide a simple way to create a sequence of simulation steps that depend on each other. These simulations are executed by make metarun — sometimes we call this “chaining”. Information can passed to the next step simply by saving it in the state. For example, you might want to make a small bilayer larger by using the multiply function (currently located in the extras module). After constructing a simple bilayer, the composition is stored in state.composition. In the second step of the metarun, the multiply.py parent script can refer to state.before[-1] to access a dictionary that holds the previous state. This also includes the settings at state.before[-1]['settings'] so that you don’t need to repeat your settings in the following steps of the metarun. This scheme allows sequential steps to communicate important details about the outcome, geometry, or other features of a simulation.

GROMACS History

In addition to saving the previous states, automacs also intercepts any calls to GROMACS commands and logs them in a special variable called history_gmx. Users can call e.g. get_last_gmx_call('mdrun') to retrieve the inputs and ouputs for the most recent call to any gromacs utility, typically mdrun. This makes it easy to get the last checkpoint, run input file, or structure.

History

A complete record of everything that automacs does is recorded in state.history. Every time an automacs function is called, it is added to this list in a pythonic format, with explicit *args and **kwargs. This feat is accomplished by the loud function, which decorates every imported function, except for those named in the _acme_silence variable (so that you can silence functions with extremely long arguments). The history is also written to a log file in each step folder called e.g. s01-protein/s01-protein.log.

4.4   Naming conventions

While the state and settings described above are explicit features of automacs that determine its execution, we also follow a number of more implicit rules about preparing the data. These are fairly general, and only serve to make it easy to keep track of all of your data.

4.4.1   Directories

In order to ensure that automacs procedures can be re-used in many different situations, we enforce a consistent directory structure. This makes it easy for users to write shorter codes which infer the location of previous files without elaborate control statements. The basic rule is that each procedure gets a separate folder, and that subsequent procedures can find input data from the previous procedure folder.

We find it hard to imagine that users would chain more than 99 steps together, so we name each step with a common convention that includes the step number e.g. s01-bilayer and s02-large-bilayer etc. Many automacs functions rely on this naming structure. For example, the upload function is designed to send only your latest checkpoint to a supercomputer to continue a simulation, and thereby avoid sending all of your data to a new system. The step folders also correspond to discrete states of the system, which are backed up to e.g. state_1.json when the step is complete. When chaining runs together as part of a metarun, users can access previous states by using the history variables which record a history of what automacs has done so far.

The step name is always provided by the step variable in the settings block. To create the directory, each parent script typically calls make_step(settings.step) after its required initialization. You will see state.here used throughout the codes. It points to the current step directory.

4.4.2   Files

Within each procedure directory, we also enforce a file naming scheme that reflects much of the underlying GROMACS conventions. In particular, when simulations are extended across multiple executions, we follow the md.part0001.xtc numbering rules. Every time the mdrun integrator is invoked, automacs writes individual trajectory, input binary, and checkpoint files. Where possible, it also writes a configuration file at the conclusion of each run.

When we construct new simulations, we also follow a looser set of rules that makes it easy to see how the simulations were built.

  1. All GROMACS output to standard output and standard errors streams (that is, written to the terminal) is captured and stored in files prefixed with log-<gromacs_binary>. In this case we label the log file with the gromacs utility function used to generate it. Since many of these functions are called several times, we also use a name for that part of the procedure. For example, during bilayer construction, the file s01-bilayer/log-grompp-solvate-steep holds the preprocessor output for the steepest descent minimization of the water-solvated structure.
  2. While output streams are routed to log files, the formal outputs from the GROMACS utilities are suffixed with a name that corresponds to their portion of the construction procedure. We use the prefix em to denote energy minimization and md to denote molecular dynamics. For example, minimizing a protein in vaccuum might output files such as em-vacuum.tpr while the NVT equilibration step might be labelled md-nvt.xtc.
  3. Intermediate steps that do not involve minimization or dynamics are typically prefixed with a consistent name. For example, when adding water to a protein or a bilayer, automacs will generate several intermediate structures, all prefixed with the word “solvate” e.g. solvate-dense.gro.

4.4.3   Getting inputs

A few protected keywords in the settings blocks will copy input files for you. The sources list should be a list of folders to copy into the current step, while files points to individual files. All paths should be relative to the root directory, however there is syntax sugar for pointing to extensions.

4.5   Simplicity

In this section we have described how automacs organizes files. In general the file-naming rules are not absolute requirements for the simulations to complete. Instead, these “rules” have two purposes. First, if you use highly consistent and descriptive naming schemes, then you can easily re-use code in new situations. For example, many of the automacs procedures were developed for atomistic simulations. A few simple name changes along with some extra input files are oftentimes enough to port these procedures to coarse-grained systems or develop more complicated simulations.

The second purpose of our elaborate-yet-consistent naming scheme is to ensure that the data you produce are durable. Carefuly naming can ensure that future users who wish to study your data will not require an excessive amount of training to understand what it holds. An obvious naming scheme makes it easy to share data, find old simulations, and more importantly, parse the data with analysis programs once the dataset is complete. The omnicalc analysis package is designed to process data prepared by automacs, and these file-naming rules and saved state.json files make it easy for these programs to be used together.

This concludes the automacs walkthrough. Check out the BioPhysCode project for extra calculation codes, that are designed to read and interpret simulations produced with automacs. Good luck!

5   Components

This section catalogs the codes loaded into the current copy of automacs. It parses the codes according to the local copy of config.py, which configures the connections to external codes.

5.1   proteins

The Proteins extension is a component of automacs located at amx/proteins and sourced from http://github.com/biophyscode/amx-proteins.git.

5.3   charmm

The Charmm extension is a component of automacs located at inputs/charmm and sourced from http://github.com/bradleyrp/amx-charmm.git.

5.4   docs

The Docs extension is a component of automacs located at inputs/docs and sourced from http://github.com/bradleyrp/amx-docs.git.

5.7   polymers

The Polymers extension is a component of automacs located at inputs/polymers and sourced from http://github.com/bradleyrp/amx-polymers.git.

5.8   vmd

The Vmd extension is a component of automacs located at inputs/vmd and sourced from http://github.com/bradleyrp/amx-vmd.git.