Parameters¶

SEED user-modifiable parameters are contained in three main input files.

The file seed.inp (input_param) contains the most frequently modified input parameters, as they regard a specific SEED run (path and name of structural input files, list of residues forming the binding pocket, switch between polar and apolar docking, …).

The seed.par (par_param) file contains less frequently modified input/output options, parameters for docking, energy and clustering. Modification of most of these parameters is recommended only to advanced users who wish to fine tune the energy model.

The seed_kw.par (KW_param) file contains additional parameters that are specified in a keyword-based format rather than a sequential one. This allows more flexibility and easier addition of new parameters. In general, all the newly introduced analysis methods and options will be specified in this file.

Input Parameters¶

Here we define all the parameters of the seed.inp file.

i1: first line: name of parameter file (seed.par)

second line: name of the keyword-based parameter file (seed_kw.par)

i2: name of coordinate file for the receptor (in SYBYL mol2 format)

i3: Binding site residue list.

First line: number of residues in the binding site.

Following lines: residue indices (one per line).

Note that residues are renumbered sequentially starting from 1 within SEED and the residue index refers to this new numbering; if for example ARG38 is the first residue of the protein, its index is 1 and not 38. The SEED residue indices can be retrieved from seed.out after the line Data for the receptor :. To avoid ambiguity we recommend to renumber the residues starting from 1 in the input file. Binding site metal ions have to be in the list as well.

i4: List of points (e.g. ligand heavy atoms of a known ligand-receptor complex structure) in the binding site used to select polar and apolar receptor vectors which satisfy the angle criterion (see Angle criterion) and discard vectors pointing outside of the binding site.

First line: number of points (0: no removal of vectors using the angle criterion).

Following lines: coordinates of the points (one point per line).

i5: Vectors for the metal ions in the binding site.

Make sure that the residue number of the metal is in the binding site residue list.

First line: total number of coordination points.

Following lines: atom number of metal / x y z of coordination point (vector extremity)

i6: Spherical cutoff for docking:

coordinates of the center and radius of a sphere in which the geometry center of the fragment position must be in order to be accepted. This filter can be discarded by selecting n instead of y as first value.

y, n / sphere center / sphere radius

i7: Fragment library specifications

First line: one character specifying the running mode of SEED: Docking running mode (d) or only Energy evaluation mode (e).

Second line: the first column contains the path of the fragment mol2 file and the second column allows the selection of apolar, polar docking or both (a, p, b). The fragment position is accepted if the total energy (according to the fast energy model) is smaller than a cutoff given in the third column. The second Clustering is applied on the poses for which the binding energy of the cluster representative is smaller than a cutoff value specified in the 4th column. In summary:

Fragment library filename - apolar docking, polar docking, or both (a, p, b) - energy cutoff in kcal/mol - 2nd clustering cutoff in kcal/mol

Third line: Reading mode, either single or multi. This option is only relevant when using the MPI parallel version and only concerns the way the input mol2 library is read. With single SEED expects a single mol2 input file; molecules are read from this file by the master rank, which dispatches them to the first available rank, balancing the computational load among the processes. This is especially important when running Monte Carlo minimization as the variance of the running time per molecule can be large. With the multi option each rank reads from a separate mol2 file. This requires the user to preemptively split the library into a number of parts equal to the number of ranks. In order to relieve the possible load imbalace, whe recommend shuffling the library file before splitting it (scripts are provided). The multi option can be useful when reanalyzing SEED output poses, as each rank writes to a separate output mol2 file, or when running with a limited number of MPI ranks, as with single the master rank only reads and dispatches molecules without doing any conmputation. For the serial version the chosen reading mode is inconsequential as only one process will be started.

As you do not need to modify all the parameters and in most of the cases default values will give good results, we recommend not to write an input file from scratch, but to modify a default template. You can do this here through the par_generator.

Parameter File¶

Here we define all the parameters of the seed.par file.

As mentioned, newly introduced analysis methods make use of the keyword-based parameter file seed_kw.par (KW_param). As the latter keyword-based format (with meaningful defaults) is more flexible and easier to read/write, for each parameter in seed.par we have also defined an equivalent keyword (specified in brackets, with its default value). The keyword-based format can be used to write an intermediate parameter file that can be converted to the corresponding seed.par and seed_kw.par files with the utilities in Python module seed_param_module.py in the scripts/python_scripts directory.

p1 (prot_diel = 2.0): Dielectric constant of the solute (receptor and fragment)

p2 (kept_vec_ratio = 1.0 1.0): Ratio of kept vectors for docking : polar / apolar

p3 (write_mol2 = n y): Output control for structure files (two values on the same line).

First value: write *_clus.mol2 file (y/n)

Second value: write *_best.mol2 file (y/n)

p4 (write_energy = n y): Output control for energy table files (two values on the same line).

First value: write *_clus.dat summary table file (y/n)

Second value: write *_best.dat summary table file (y/n)

p5 (max_poses = 5 1): Maximum number of saved clusters and poses (two values on the same line).

First value: maximum number of cluster members saved in *_clus* output files. Note that this value determines the maximum number of poses per cluster that go through slow energy evaluation. Second value: maximum number of poses saved in *_best* output files.

p6 (log_out = ./outputs/seed.out): Filename for output log file. This is the main SEED output file (seed.out).

The docked fragments are saved in the directory ./outputs

p7 (coul_grid = w ./scratch/coulomb.grid): write (w) or read (r) Coulombic grid / grid filename

p8 (vdw_grid = w ./scratch/vanderwaals.grid): write (w) or read (r) van der Waals grid / grid filename

p9 (desol_grid = w ./scratch/desolvation.grid): write (w) or read (r) receptor desolvation grid / grid filename

p10 (bump_check_slow = 2.0 0.89 0.6): Bump checking: used only for slow energy evaluation (three values)

n x atoms = maximum tolerated bumps /

scaling factor for interatomic distance /

severe overlap factor (beta factor in PROTEINS paper)

p11 (bump_check_fast = 1.0): van der Waals energy cutoff (kcal/mol): this is used as bump checking for the fast energy model.

p12 (hbond_geometry = 50.0 100): Angle (deg) and number of points on the sphere around the ideal hydrogen bonding vector direction.

p13 (num_rotations = 72): Number of fragment rotations around each axis.

p14 (angle_criterion = 70.0 10.0 1.2 0.8)

Settings for the reduction of the seeding vectors (four values).

angle_rmin if distance <= (multipl_fact_rmin*minDist)
angle_rmax if distance >= (multipl_fact_rmax*maxDist)
linear dependence (range between angle_rmin and angle_rmax) for other distances

p15 (vdw_probe_radius = 1.83): Van der Waals probe radius for removal of the receptor polar vectors.

p16 (coul_grid_sizes = 1 20.0 0.5): Settings for the Coulombic term in the fast energy model (three values).

1 = distance dependent dielectric / grid margin / grid spacing

p17 (vdw_grid_sizes = 20.0 0.3): Settings for the van der Waals term in the fast energy model (two values).

grid margin / grid spacing

p18 (slow_energy_vdw_cutoff = 12.0 1.0): Settings for the van der Waals accurate energy model (two values).

nonbonding cutoff / grid spacing

Note that the Coulombic cutoff for formal charges is automatically set to 1.3 x van_der_Waals_cutoff

p19 (apolar_k = -0.333333): Multiplicative factor (k) for apolar docking to skip evaluation of electrostatics. The van der Waals energy cutoff is:

k x Number of fragment atoms, including hydrogen atoms

p20 (solv_grid_sizes = 24.0 0.25): Settings for the solvation grid (two values):

grid margin / grid spacing

p21 (water_radius = 1.4: point_density_SAS = 500

solv_diel = 78.5)

Settings for the solvation term evaluation (three values):

water radius for solvation / number of points per sphere to generate SAS / solvent dielectric constant

p22 (Hydrophobicity_map = 1.0 1.0 1.4 1.0 1.0): Setting for the Hydrophobicity maps (five values):

point densities (A^-2) on the SAS for apolar vectors on the receptor / on the fragment / probe radius to generate SAS for apolar vectors / scaling factor for desolvation and / van der Waals interactions

p23 (scaling_factors = 1.0 1.0 1.0 1.0): Scaling factors for fast and also accurate energy evaluation (four values): van der Waals / electrostatic interaction / receptor desolvation / fragment desolvation

Clustering parameters¶

The clustering with GSEAL proceeds in two steps: the first clustering yields large clusters which contain almost overlapping as well as more distant fragments; the second clustering is done on each cluster found in the first clustering to eliminate fragments which are very close in space.

p24: Non-default similarity weight factors (150 atom elements) for GSEAL:

First line: 0 or number of non-default elements

Following lines: list (first element number / second element number / value )

p25 (gseal1 = 0.9 0.4): Parameters for first clustering (overall clustering):

GSEAL similarity exponential factor / cutoff factor

p26 (gseal2 = 0.9 0.9): Parameters for second clustering (to discard redundant positions):

GSEAL similarity exponential factor / cutoff factor

p27 (max_clu_poses = 20): Maximal number of poses to be clustered

p28 (print_level = 100 1): Setting for the amount of information to be written to the output seed.out:

Maximum number of lines to be written in the output file for the sorted energies and the two clustering procedures /

print level (0 = lean, 1 = adds sorting before postprocessing, 2 = adds 2nd clustering).

Force field parameters¶

p29 (vdw_params): Van der Waals radius and energy minimum (absolute value).

First line: number of records

Following lines: each record contains five values:

sequential index / atom type / element number / van der Waals radius / van der Waals energy minimum

p30 (hbond_params)

Hydrogen bond distances between donor and acceptor.
First  line: Default distance for all atom and element types.
First block:

First line: number of records
Following lines: element number i / element number j / donor-acceptor distance

Second block:

First line: number of records
Following lines: atom type i / atom type j / donor-acceptor distance

p31 (atomic_weights): List of relative atomic weights.

First line: number of elements (without element 0)

element name / element number / atomic weight

Keyword-based parameter file¶

In order to allow more flexibility and easier addition of SEED parameters, we have decided to move from the original sequential format of the seed.par to a keyword based format. This, for legacy reasons, only involves the newly added settings, so that an older seed.par can be used as is, without the need of modifications or rewritings. The new keyword based parameters should be specified in the format <keyword> = <value> as for example:

# Additional parameters
do_mc = y # activates MCSA sampling
mc_temp = 500
mc_max_xyz_step = 0.7 0.1

Comments can be introduced by # and will be ignored. Note that some keywords require multiple values. If the same keyword is repeated multiple times in the file, the last instance will be used. The additional keyword-based parameter file, that we will refer to as seed_kw.par should always be present (even if blank) and its path has to be specified in the second line of i1.

If a keyword is not specified in the seed_kw.par, its default value will be used. The keywords that can be set are the following (defaults are given in brackets):

Monte Carlo parameters¶

The following parameters are needed for running a Monte Carlo Simulated Annealing (MCSA) minimization of the top poses. This option can be enabled by setting do_mc to y (yes) and adding the following related keywords. If do_mc is set to n (no), all the additional MC parameters in this section play no role. See Monte Carlo Simulated Annealing for further details on MCSA.

do_mc (n): Perform MCSA refinement? (y / n)

mc_temp (0.0): Starting temperature of MC run.

mc_max_xyz_step (0.0, 0.0): Maximum rigid body translation step (in Angstrom): coarse (1st value)

and fine (2nd value) moves.

mc_max_rot_step (0.0, 0.0): Maximum rigid body rotation step (in degrees): coarse (1st value)

and fine (2nd value) moves.

mc_rot_freq (0.5): MC move set frequencies:

Frequency \(p\) of rigid body rotation moves (the frequency of

rigid body translation move will be \(q = 1 - p\)).

mc_xyz_fine_freq (0.5): Relative frequency (w.r.t. the number of translation move) of fine translation moves.

mc_rot_fine_freq (0.5): Relative frequency (w.r.t. the number of rotation moves) of fine rotation moves.

mc_niter (0, 0): Number of steps \(N_{out}\) of the outer MC chain (1st value).

Number of steps \(N_{in}\) of the inner MC chain (2nd value).

mc_sa_alpha (1.0): Annealing parameter \(\alpha\).

mc_rseed (-1): Seed for the pseudo-random number generator used by the MC sampler. A value of -1 uses the current CPU time.

Steepest Descent parameters¶

The following parameters are needed for running a steepest descent (SD) minimization of the top poses in rigid-body space. This option can be enabled by setting do_sd to y (yes) and specify the following relevant keywords (or using the defaults). If do_sd is set to n (no), all the additional SD parameters in this section are ignored. Note that rigid-body SD minimization is performed after a the MCSA minimization (if the latter is enabled). See Steepest Descent for further details on SD.

do_sd (n): Perform SD refinement? (y / n)

do_gradient_check (n): Compare analytical and numerical gradients and print them to the log file. This is mainly useful for troubleshooting and debugging.

sd_max_iter (20): Maximum number of SD iterations.

sd_eps_grms (0.02):: Stopping threshold on the minimum value of the gradient (\(\| \boldsymbol{\alpha} \circ \nabla U(\mathbf{x}_i) \|\)).

sd_alpha_xyz (0.1):: Base increment size for rigid-body translations. Expressed in Angstrom.

sd_alpha_rot (0.01):: Base increment size for rigid-body rotations. Expressed in degrees.

sd_learning_rate (0.1):: Starting learning rate \(\eta_0\) for SD.

Parameter File Generator¶

The parameter file generator helps you preparing the input parameter files for a SEED run: seed.inp, seed.par, and seed_kw.par. You can load a template with predefined default values (and CHARMM/CGenFF parameters), edit the user-specific information and save it. The template for seed_kw.par shows example settings for a run with additional MCSA minimization of the poses.

Here you can edit the file with user-specific information. Fields you necessarily have to edit are marked by XXXX

File name