Pond manual

Pond application is a driver that uses aquaduct module to perform further analysis of results from Valve calculations.

Pond can calculate pockets present in the protein and free energy profiles of Master paths.

Pond invocation

Once aquaduct module is installed (see Aqua-Duct installation guide) properly on the machine, Pond is available as pond.py command line tool.

Usage

Basic help of Pond usage can be displayed by following command:

pond.py --help

It should display following information:

usage: pond.py [-h] [-c CONFIG_FILE] [-t THREADS] [-r RESULTS_DIR] [--debug]
               [--debug-file DEBUG_FILE] [--paths-types PATHS_TYPES] [--raw]
               [--raw-master] [--raw-discard-singletons RAW_SINGL]
               [--window-full] [--windows WINDOWS] [--wsize WSIZE]
               [--reference-value REF] [--reference-calc]
               [--reference-radius REF_RADIUS] [--reference-mol REF_MOL]
               [--temperature TEMP] [--gsize GRID_SIZE] [--pockets]
               [--hotspots] [--energy-profile] [--master]
               [--master-radius MASTER_RADIUS] [--master-ctypes MASTER_CTYPES]
               [--io-threshold IO_THRESHOLD] [--path-id PATH_ID]
               [--path-file PATH_FILE] [--path-radius PATH_RADIUS]
               [--path-smooth] [--raw-path] [--extract-path EXTRACT_PATH]
               [--output-file OUTPUT_FILE] [--output-suffix OUTPUT_SUFFIX]

What have I got in my pocket?

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG_FILE        Config file filename. (default: None)
  -t THREADS            Limit Aqua-Duct calculations to given number of
                        threads. (default: None)
  -r RESULTS_DIR        Path to results directory (default: )
  --debug               Prints debug info. (default: False)
  --debug-file DEBUG_FILE
                        Debug log file. (default: None)
  --paths-types PATHS_TYPES
                        Limit calculations to given paths types, i.e. given
                        molecules. (default: )
  --raw                 Use raw data from paths instead of single paths.
                        (default: False)
  --raw-master          Use raw data from paths instead of single paths, only
                        in master paths calculations. (default: False)
  --raw-discard-singletons RAW_SINGL
                        Discard short scope only segments from raw data.
                        (default: 1)
  --window-full         Return full window if windows is used. (default:
                        False)
  --windows WINDOWS     Number of windows to calculate. (default: 1)
  --wsize WSIZE         Size of window in frames. (default: None)
  --reference-value REF
                        Reference value in [kJ/mol/K]. (default: None)
  --reference-calc      Calculate reference value with scope and reference
                        molecules. (default: False)
  --reference-radius REF_RADIUS
                        Radius of reference in [Å]. (default: 2.0)
  --reference-mol REF_MOL
                        Selection of reference molecules. (default: resname
                        WAT)
  --temperature TEMP    Simulation temperature in [K]. (default: 300.0)
  --gsize GRID_SIZE     Size of grid's cells in in [Å]. (default: 1.0)
  --pockets             Calculate pockets. (default: False)
  --hotspots            Calculates hotspots if pockets are calculated.
                        (default: False)
  --energy-profile      Calculates energy profiles for master paths. (default:
                        False)
  --master              Enables master paths calculation. (default: False)
  --master-radius MASTER_RADIUS
                        Calculate profiles for master paths with given radius
                        in [Å]. (default: 2.0)
  --master-ctypes MASTER_CTYPES
                        Limit calculations to given ctypes. (default: )
  --io-threshold IO_THRESHOLD
                        Percent value of maximal density which will be used to
                        partition pocket into inner and outer instead of mean
                        value. (default: None)
  --path-id PATH_ID     Calculate profiles for specified path ID. (default:
                        None)
  --path-file PATH_FILE
                        Use coordinates from specified CSV file. (default:
                        None)
  --path-radius PATH_RADIUS
                        Calculate profiles for path with given radius in [Å].
                        (default: 2.0)
  --path-smooth         If used path coordinates will be smoothed. (default:
                        False)
  --raw-path            Use raw data from paths instead of single paths. Used
                        for path energy profiles calculations and for
                        extracting raw path. (default: False)
  --extract-path EXTRACT_PATH
                        Extract path coordinates with specified ID. (default:
                        None)
  --output-file OUTPUT_FILE
                        Output CSV filename for extracted coordinates.
                        (default: path_coords.csv)
  --output-suffix OUTPUT_SUFFIX
                        Arbitrary suffix appended to output filenames.
                        (default: )

Options common with Valve

All options related to Molecular Dynamic simulation data, configuration file, and threads have the same meaning as in Valve.

For detailed explanation of the following options see Valve manual:

  • -c CONFIG_FILE - Configuration file name. Pond and Valve should use the same file.

  • -t THREADS - Limits Aqua-Duct calculations to given number of threads.

  • --debug and --debug-file - Allows to get debugging information.

Results directory

Pond produces many results files, especially if windows options are used. For the sake of convenience, option -r allows to provide output directory. If provided directory does not exist it will be created without warning. Also, Pond will not warn if results files overwrites existing files.

Pond calculations options

Other options are used to start or adjust Pond calculations:

  • Pockets

    Option --pockets triggers pocket calculations

  • Hot-spots

    Option --hotspots triggers hot-spot calculations but it also requires --pockets.

  • Energy profiles

    Option --energy-profile triggers free energy profiles calculations for master paths.

Pond’s option --paths-types allows to limit calculations only to paths of certain molecules. For example, --paths-types 'WAT OXY' loads paths of WAT (water molecules) and OXY (oxygen) molecules. All other paths (if any) will not be used.

Free energy estimation

Pond can estimate free energy by using calculated density of traced molecules. It becomes particularly useful and relevant when traced molecules include solvent.

Estimation of free energy is done according to Boltzmann inversion. Similar method was used in Rao,S. et al. (doi:10.1080/19336950.2017.1306163) paper.

Following equation relates free energy with density of molecules:

\[n(z) = C \cdot e^{\left( \frac {-E(z) }{ kT} \right) }\]

Where z is point in the space, preferably along some kind of path, n(z) is density of molecules in point z, C is a normalization constant, E is free energy, k is Boltzmann’s constant, and T is temperature.

One can easily transform the above equation to calculate energy:

\[E(z) = -kT\ln\left(n(z)\right) -kT\ln\left(C\right)\]

Term \(kT\ln\left(C\right)\) does not depend on z and can be determined by assumption that free energy in the bulk of traced molecules (solvent) is zero.

Please note that option --temperature allows to set desired temperature in Kelvins.

Note

Pond returns energy in kJ/mol SI units.

Bulk reference

Value of reference correction can be directly passed to Pond using --reference-value option or can be calculated automatically by analysis of the bulk of traced molecules (solvent) using --reference-calc flag. Following additional options influence calculation of bulk reference:

  1. Radius of the bulk around COG.

    Option --reference-radius allows to set the radius of the sphere within which the bulk will be scanned. The sphere is centered in COG. Radius should be selected in such a way that the sphere does not overlaps with other molecules in the simulated system. Default value is 2.0 Å.

  2. Density of reference molecules.

    Once COG if the bulk and radius are setup, Pond scans trajectory and looks for molecules. Option --reference-mol allows to define molecules for which density should be calculated.

Note

If no --reference-value or --reference-calc options are used Pond skips estimation of free energy.

Raw data

Two types of data can by used by Pond:

  1. Separate paths (including passing paths if any), or

  2. Raw paths saved at stage II.

By default separate paths are used. Option --raw makes Pond to use raw paths in all calculations, whereas --raw-master makes Pond to use raw paths only in estimation of free energy profiles and for pockets and hot-spots separate paths are used.

Note

Best results of pockets calculation can be achieved with separate paths.

Warning

Free energy estimation with separate paths may very likely lead to false results at the borders of the scope.

Windows

Pond performs calculations for entire trajectory and/or for user defined windows:

  • --window-full ensures that results will be calculated for the entire trajectory.

  • --windows allows to set number of windows.

  • --wsize allows to change default size of windows.

By default, windows’ sizes are automatically set in such a way that entire trajectory is covered and windows do not overlap to each other. Option --wsize allows to set size of windows (in frames), therefore, windows can also overlap with each other or can span only selected sections of the trajectory.

For example, for 25 000 snapshots long trajectory one can perform calculations in 5000 frames long windows where an overlap of a consecutive window with a next window is 4000 frames long - see following picture:

../_images/aafig-ab8be19e708a33d3d84dee40940fce788f40e2c6.png

To calculate proper number of windows evenly spanning trajectory one can use following equation:

\[WINDOWS = \frac{TOTAL - WSIZE}{SHIFT} + 1\]

Where WINDOWS is a desired number of windows, TOTAL is total length of trajectory, and SHIFT is WSIZE - OVERLAP.

In the above example TOTAL = 25 000, WSIZE = 5 000, SHIFT = WSIZE - OVERLAP = 5 000 - 4 000 = 1 000. Therefore:

\[WINDOWS = \frac{25 000 - 5000}{1 000} + 1 = 21.\]

And Pond should be called with --windows 21 --wsize 5000 options to get above described windows.

Pockets

Pockets are calculated by analysis of paths found by Valve. A regular grid is constructed spanning all paths. Grid size by default is 1 Å and can be altered with --gsize option.

Note

As for now, it is recommended that multiplicative inverse of grid size is an integer number. Therefore, grid size can be safely set to 1, 0.5, 0.25, 0.1 etc.

Next, number of paths crossing each of the grid cells is calculated over the entire trajectory and divided by the number of frames. This gives averaged density of traced molecules.

Cells with null density are removed from the grid and resulted shape corresponds to maximal area penetrated by traced molecules.

Note

If no estimation of free energy is performed values returned for pockets and hot-spots are density (of traced molecules).

Warning

Estimation of free energy for pockets calculated for non raw data may be unreliable.

Inner & outer pocket

Distribution of densities in the grid has positive skew. This suggest that pocket can be partitioned into areas of different overall distribution of traced molecules.

Indeed, Pond saves two types of pockets depending on the distribution of densities in the grid:

  1. Inner pocket

    This is part of the pocket for which densities are greater than mean value.

  2. Outer pocket

    This is counter part for Inner pocket, i.e. the part of the pocket for which densities are less than mean value.

Hot-spots

Further analysis of distribution of densities in the grid allows to select points of the highest densities. They are considered as hot-spots, i.e. points of particular importance at which traced molecules are attracted or trapped and stays for considerably long time.

Currently hot-spots are detected as far right tail of the distribution of densities in the grid.

Energy profiles

Pond estimates energy profiles of master paths. For each point of the master path density of paths of traced molecules is calculated within sphere of radius set by --master-radius option.

Option --master-ctypes allows to select master paths ctypes for which free energy estimation is calculated.