Configuration file options¶
Valve configuration file is a simple and plain text file. It has similar structure as INI files commonly used in one of the popular operating systems and is compliant with Python module ConfigParser
.
Configuration file comprises of several sections. They can be grouped into three categories. Names of sections are in bold text.
- Global settings:
global
- Stages options:
traceable_residues
raw_paths
separate_paths
inlets_clustering
analysis
visualize
- Methods options:
smooth
clustering
reclusteriation
Section global¶
This section allows settings of trajectory data and is reserved for other future global options.
Option |
Default value |
Description |
---|---|---|
top |
None |
Path to topology file. Aqua-Duct supports PDB, PRMTOP, PFS topology files. |
trj |
None |
Path to trajectory file. Aqua-Duct supports NC and DCD trajectory files. |
twoway |
True |
Try to use two-way scanning in the stage II. |
sandwich |
False |
If set |
max_frame |
None |
Maximal number of frame to be read from trajectory data. If set |
min_frame |
0 |
Minimal number of frame to be read from trajectory data. |
step_frame |
1 |
Step used in reading trajectory. Default value of 1 stands for reading every frame. If it is greater than 1, only every step-value frame is read. |
sps |
True |
Try to store data in single precission storage. |
cache_dir |
None |
Allows to set path to the directory for cache data. |
cache_mem |
False |
If set |
Option trj can be used to provide list of trajectory files separated by standard path separator ‘:
’ on POSIX platforms and ‘;
’ on Windows - see os.pathsep
.
Note
Options top and trj are mandatory.
Note
Options min_frame, max_frame, and step_frame can be used to limit calculations to a specific part of trajectory. For example, in order to to run calculations for 1000 frames starting from frame 5000 use the following options:
min_frame = 4999
max_frame = 5999
To run calculations for every 5th frame use:
step_frame = 5
Sandwich¶
Trajectory data can be provided as several files. By default these files are processed in sequential manner making one long trajectory. If option sandwich is used trajectory files are read as layers. For each layer, search for traceable residues is done separately (stage I and II) but processing and analysis (stage III, IV, V, and VI) are done for all paths simultaneously. Usage of sandwich option is further referenced as sandwich mode.
Cache¶
Storage of coordinates for all paths for very long MD trajectories requires huge amount of RAM. User can decide whether aquaduct
should store coordinates in memory or in separated directory. Option cache-mem instruct Valve to store coordinates in RAM; cache-dir stores coordinates in selected directory. If neither of both options is selected, coordinates are calculated on demand.
Note
If no cache is used (memory or dir) Master paths cannot be calculated.
Single precision storage¶
Most of the calculation in Valve is performed by NumPy. By default, NumPy uses double precision floats. Valve does not change this behavior but has special option sps which forces to store all data (both internal data stored in RAM and on the disk) in single precision. This spares a lot of RAM and is recommended when you perform calculation for long trajectories and your amount of RAM is limited.
Common settings of stage sections¶
Stages 1-4 which perform calculations have some common options allowing for execution control and saving/loading data.
Option |
Default value |
Description |
---|---|---|
execute |
runonce |
Option controls stage execution. It can have one of three possible
values: |
dump |
[dump file name] |
File name of dump data. It is used to save results of calculations or to load previously calculated data - this depends on execute option. Default value of this option depends on the stage and for stages 1 to 4 is one of the following (listed in order):
|
Stages 5-6 also use execute option, however, since they do not perform calculations per se, instead of dump option, they use save.
Option |
Default value |
Description |
---|---|---|
execute |
run |
Option controls stage execution. It can have one of three possible
values: |
save |
[save file name] |
File name for saving results. Default value of this option depends on the stage and for stages 1 to 4 is one of the following (listed in order):
Stage 5 saves Stage 6 can save results in two different ways:
|
Stage traceable_residues¶
Option |
Default value |
Description |
---|---|---|
scope |
None |
Definition of Scope of interest. See also Scope definition. |
scope_convexhull |
True |
Flag to set if Scope is direct or convex hull definition. |
scope_everyframe |
False |
Flag to set Scope evaluation mode. If set |
scope_convexhull_inflate |
None |
Increase (or if negative - decrease) size of the scope convex hull. |
object |
None |
Definition of Object of interest. See also Object definition. |
add_passing |
None |
Definition of molecules that should be added to traced molecules even if they were not present in Object. |
Note
Options scope and object are mandatory.
Stage raw_paths¶
This stage also requires definition of the Scope and Object. If appropriate settings are not given, settings from the previous stage are used.
Option |
Default value |
Description |
---|---|---|
scope |
None |
Definition of Scope of interest. See also
Scope definition. If |
scope_convexhull |
None |
Flag to set if the Scope is direct or convex hull definition. |
scope_everyframe |
False |
Flag to set Scope evaluation mode. If set |
scope_convexhull_inflate |
None |
Increase (or if negative - decrease) size of the scope convex
hull. If |
object |
None |
Definition of Object of interest. See also
Object definition. If |
clear_in_object_info |
False |
If it is set to |
discard_singletons |
1 |
If |
discard_empty_paths |
True |
If set to |
Stage separate_paths¶
Option |
Default value |
Description |
---|---|---|
discard_empty_paths |
True |
If set to |
sort_by_id |
True |
If set to |
discard_short_paths |
20 |
This option allows to discard paths which are shorter than the threshold, which is defined as the total number of frames. |
discard_short_object |
2.0 |
This option allows to discard paths whose objects are shorter than the threshold, which is defined as total length in metric units. |
discard_short_logic |
or |
If both |
auto_barber |
None |
This option allows to select molecular entity used in Auto
Barber procedure. See also Auto Barber and
|
auto_barber_mincut |
None |
Minimal radius of spheres used in Auto Barber. If a sphere has
radius smaller than this value, it is not used in AutoBarber
procedure. This option can be switched off by setting it to
|
auto_barber_maxcut |
2.8 |
Maximal radius of spheres used in Auto Barber. If a sphere has
radius greater than this value, it is not used in AutoBarber
procedure. This option can be switched off by setting it to
|
auto_barber_mincut_level |
True |
If set |
auto_barber_maxcut_level |
True |
If set |
auto_barber_tovdw |
True |
If set |
allow_passing_paths |
False |
If set |
separate_barber |
True |
Apply AutoBarber for each type of traced molecules separately. |
calculate_coo |
False |
If set |
Stage inlets_clustering¶
Option |
Default value |
Description |
---|---|---|
recluster_outliers |
False |
If set to |
detect_outliers |
False |
If set, detection of outliers is executed. It could be set as a
floating point distance threshold or set to |
singletons_outliers |
False |
Maximal size of cluster to be considered as outliers. If set to number > 0 clusters of that size are removed and their objects are moved to outliers. See Clustering of inlets for more details. |
max_level |
5 |
Maximal number of recursive clustering levels. |
create_master_paths |
False |
If set to |
master_paths_amount |
None |
Allows to limit number of single paths used for master paths
calculations.
If it is a number
in range |
separate_master |
False |
If set to |
separate_master_all |
True |
If separate_master is used and this option is set |
exclude_passing_in_clustering |
True |
If set to |
add_passing_to_clusters |
None |
Allows to run procedure for adding passing paths inlets to clusters with Auto Barber method. To enable this the option should be set to molecular entity that will be used by Auto Barber. |
renumber_clusters |
False |
If set |
join_clusters |
None |
This option allows to join selected clusters. Clusters’ IDs
joined with |
inlets_center |
cos |
Allows to choose center of inlets points. This central point is further used as a reference point in calculations of clusters’ areas and contours. If set cos center of system calculated as average center of the scope area is used. Alternatively it can be set to coo, then center of the object area is used. |
clustering_order |
old-school |
Allow to change order of clustering steps.
|
Stage analysis¶
Option |
Default value |
Description |
---|---|---|
dump_config |
True |
If set to |
calculate_scope_object_size |
False |
If set to |
scope_chull |
None |
Scope convex hull definition used in calculating volume and area. |
scope_chull_inflate |
None |
Increase (or if negative - decrease) size of the scope convex hull. |
object_chull |
None |
Object convex hull definition used in calculating volume and area. |
cluster_area |
True |
If set |
cluster_area_precision |
20 |
Precision of KDE method in clusters’ areas estimation method. This options controls number of grid points per one square A as used in KDE. Higher values means better precision. Number of points can be calculated as $P^{2/3}$. |
cluster_area_expand |
2 |
Space occupied by clusters’ points can be expanded before KDE calculation. This option controls amount of A by which the cluster space is expanded. Average amount of expansion can be calcualted as $E^{2/3}$. |
Stage visualize¶
Option |
Default value |
Description |
---|---|---|
split_by_type |
False |
If |
retain_all_types |
False |
If |
all_paths_raw |
False |
If |
all_paths_smooth |
False |
If |
all_paths_split |
False |
If is set |
all_paths_raw_io |
False |
If set |
all_paths_smooth_io |
False |
If set |
all_paths_amount |
None |
Allows to limit number of visualised paths. If it is a number
in range |
simply_smooths |
RecursiveVector |
Option indicates linear simplification method to be used in plotting smooth paths. Simplification removes points which do not (or almost do not) change the shape of smooth path. Possible choices are:
Optionally name of the method can be followed by a threshold
value in parentheses, i.e. |
paths_raw |
False |
If set |
paths_smooth |
False |
If set |
paths_raw_io |
False |
If set |
paths_smooth_io |
False |
If set |
paths_states |
False |
If set |
ctypes_raw |
False |
Displays raw paths in a similar manner as non split all_paths_raw but each cluster type is displayed as a separate object. |
ctypes_smooth |
False |
Displays smooth paths in a similar manner as non split all_paths_smooth but each cluster type is displayed as a separate object. |
ctypes_amount |
None |
Allows to limit number of visualised ctypes. If it is a number
in range |
inlets_clusters |
False |
If set |
inlets_clusters_amount |
None |
Allows to limit number of visualised inlets. If it is a number
in range |
show_molecule |
False |
If set to selection of some molecular object in the system,
for example to |
show_molecule_frames |
0 |
Allows to indicate which frames of object defined by show_molecule should be displayed. It is possible to set several frames. In that case frames would be displayed as states. |
show_scope_chull |
False |
If set to selection of some molecular object in the system,
for example to |
show_scope_chull_inflate |
None |
Increase (or if negative decrease) size of the scope convex hull. |
show_scope_chull_frames |
0 |
Allows to indicate for which frames of object defined by show_chull convex hull should be displayed. It is possible to set several frames. In that case frames would be displayed as states. |
show_object_chull |
False |
If set to selection of some molecular object in the system, convex hull of this object is displayed. This works exacly the same way as show_chull but is meant to mark object shape. It can be achieved by using name * and molecular object definition plus some spatial constrains, for example those used in object definition. |
show_object_chull_frames |
0 |
Allows to indicate for which frames of object defined by show_object convex hull should be displayed. It is possible to set several frames. In that case frames would be displayed as states. |
cluster_area |
True |
If set |
cluster_area_precision |
20 |
Precision of KDE method in clusters’ areas estimation method. This options controls number of grid points per one square A as used in KDE. Higher values means better precision. Number of points can be calculated as $P^{2/3}$. |
cluster_area_expand |
2 |
Space occupied by clusters’ points can be expanded before KDE calculation. This option controls amount of A by which the cluster space is expanded. Average amount of expansion can be calcualted as $E^{2/3}$. |
Note
Possibly due to limitations of MDAnalysis
only whole molecules can be displayed. If show_molecule is set to backbone
complete protein will be displayed anyway. This may change in future version of MDAnalysis
and or aquaduct
.
Note
If several frames are selected, they are displayed as states which may interfere with other PyMOL objects displayed with several states.
Note
If several states are displayed, protein tertiary structure data might be lost. This seems to be limitation of either MDAnalysis
or PyMOL.
Clustering sections¶
Default section for definition of clustering method is named clustering and default section for reclustering method definition is named reclustering. All clustering sections shares some common options. Other options depends on the method.
Option |
Default value |
Description |
---|---|---|
method |
barber or dbscan |
Name of clustering method. It has to be one of the following: barber, dbscan, affprop, meanshift, birch, kmeans. Default value depends whether it is clustering section (barber) or reclustering section (dbscan). |
recursive_clustering |
clustering or None |
If set to name of some section that holds clustering method settings, this method is called in the next recursion of clusteriation. Default value for reclustering is None. |
recursive_threshold |
None |
Allows to set threshold that excludes clusters of certain size from reclustering. Value of this option comprises of operator and value. Operator can be one of the following: >, >=, <=, <. Value has to be expressed as floating number and it have to be in the range of 0 to 1. One can use several definitions separated by a space character. Only clusters of size complying with all thresholds definitions are submitted to reclustering. |
barber¶
Clustering by barber method bases on Auto Barber procedure. For each inlets a sphere is constructed according to Auto Barber separate_paths stage settings or according to parameters given in clustering section. Next, inlets that form coherent clouds of mutually intersecting spheres are grouped into clusters. Method barber supports the same settings as Auto Barber settings:
Option |
Value type |
Description |
---|---|---|
auto_barber |
str |
This option allows to select molecular entity used in Auto
Barber procedure. See also Auto Barber and
|
auto_barber_mincut |
float |
Minimal radius of spheres used in Auto Barber. If a sphere has
radius smaller than this value, it is not used to cut. This
option can be switched off by setting it to |
auto_barber_maxcut |
float |
Maximal radius of spheres used in Auto Barber. If a sphere has
radius greater than this value, it is not used to cut. This
option can be switched off by setting it to |
auto_barber_mincut_level |
bool |
If set |
auto_barber_maxcut_level |
bool |
If set |
auto_barber_tovdw |
bool |
If set |
dbscan¶
For detailed description look at sklearn.cluster.DBSCAN
documentation. The following table summarizes options available in Valve and is a copy of original documentation.
Option |
Value type |
Description |
---|---|---|
eps |
float |
The maximum distance between two samples for them to be considered as in the same neighborhood. |
min_samples |
int |
The number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself. |
metric |
str |
The metric to use when calculating distance between instances in a feature array. Can be one of the following:
|
algorithm |
str |
The algorithm to be used by the NearestNeighbors module to compute pointwise distances and find nearest neighbors. Can be one of the following:
|
leaf_size |
int |
Leaf size passed to BallTree or cKDTree. |
affprop¶
For detailed description look at AffinityPropagation
documentation. The following table summarizes options available in Valve and is a copy of original documentation.
Option |
Value type |
Description |
---|---|---|
damping |
float |
Damping factor between 0.5 and 1. |
convergence_iter |
int |
Number of iterations with no change in the number of estimated clusters that stops the convergence. |
max_iter |
int |
Maximum number of iterations. |
preference |
float |
Points with larger values of preferences are more likely to be chosen as exemplars. |
meanshift¶
For detailed description look at MeanShift
documentation. Following table summarized options available in Valve and is a copy of original documentation.
Option |
Value type |
Description |
---|---|---|
bandwidth |
Auto or float |
Bandwidth used in the RBF kernel. If |
cluster_all |
bool |
If true, then all points are clustered, even those orphans that are not within any kernel. |
bin_seeding |
bool |
If true, initial kernel locations are not locations of all points, but rather the location of the discretized version of points, where points are binned onto a grid whose coarseness corresponds to the bandwidth. |
min_bin_freq |
int |
To speed up the algorithm, accept only those bins with at least min_bin_freq points as seeds. If not defined, set to 1. |
birch¶
For detailed description look at Birch
documentation. Following table summarized options available in Valve and is a copy of original documentation.
Option |
Value type |
Description |
---|---|---|
threshold |
float |
The radius of the subcluster obtained by merging a new sample and the closest subcluster should be smaller than the threshold. Otherwise a new subcluster is started. |
branching_factor |
int |
Maximum number of CF subclusters in each node. |
n_clusters |
int |
Number of clusters after the final clustering step, which treats the subclusters from the leaves as new samples. By default, this final clustering step is not performed and the subclusters are returned as they are. |
kmeans¶
For detailed description look at KMeans
documentation. The following table summarized options available in Valve and is a copy of original documentation.
Option |
Value type |
Description |
---|---|---|
n_clusters |
int |
The number of clusters to form as well as the number of centroids to generate. |
max_iter |
int |
Maximum number of iterations of the k-means algorithm for a single run. |
n_init |
int |
Number of times the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia. |
init |
str |
Method for initialization, defaults to |
tol |
float |
Relative tolerance with regards to inertia to declare convergence. |
Smooth section¶
Section smooth supports the following options:
Option |
Value type |
Description |
---|---|---|
method |
str |
Smoothing method. Can be one of the following:
|
recursive |
int |
Number of recursive runs of smoothing method. |
window |
int or float |
In window-based method defines window size. In plain |
step |
int |
In step based method defines size of the step. |
function |
str |
In window based methods defines averaging function. Can be
|
polyorder |
int |
In |