Running modes

SEED can run in two distinct modes: Docking running mode and Energy evaluation mode that can be enabled by setting the first line of i7 to d or e respectively.

  • In the docking running mode each fragment is docked into the receptor in multiple positions and the best ones (in terms of total binding energy) are returned. This is the usual setting for prospective docking campaigns.
  • In the energy evaluation mode SEED will evaluate the total binding energy of a fragment in the position provided by the input coordinates. This is the typical setting for rescoring poses generated beforehand by SEED or other software.

Docking running mode

In order to efficiently speed up the screening of fragments, SEED relies on a two-step docking workflow we will refer to as postprocess scheme.

In the first step, after the initial alignment of the fragment to a reference frame (pre-alignment of the fragment Pre-alignment of the fragment), the generated poses are screened and filtered according to the fast energy model (Fast docking scheme). Poses passing all the filters are sorted according to their total binding energy and clustered in order to find the most representative ones and discard redundant positions (clustering procedure Clustering). In the second step the best binding modes within each cluster are rescored and ranked with the more accurate solvation model (Accurate rescoring).

Pre-alignment of the fragment

Before starting the docking procedure the fragment is pre-aligned to a reference frame. The first atom of the fragment is put in the origin, the second is put along the positive x axis, the third in the xy plane and the others are aligned accordingly. If the third atom is collinear with the first two, the fourth atom is considered and so on.

As the positioning of points on the Solvent Accessible Surface has a dependency on the absolute orientation of the fragment, the pre-alignment step is necessary to ensure consistency of results when docking (first line of i7 set to d) the same fragment provided in different spatial orientations or when carrying out an energy evaluation (first line of i7 set to e) of a fragment previously docked by SEED. Residual differences may be due to the limited precision of coordinates in usual mol2 files.

Fast docking scheme

The fast docking scheme proceeds by applying the following filters sequentially:

  • Sphere for acceptance of the fragment position. This filter is optional. The user can specify a sphere (y/n for enabling the filter and coordinates of the center and radius in i6) in which the geometric center of the fragment pose must be to be accepted.
  • Van der Waals interaction used as bad contacts detection. The fast van der Waals energy is used to detect clashes: a fragment is discarded if it is less favorable than the energy threshold set in p11.
  • Van der Waals interaction for apolar docking. To increase efficiency, apolar fragments are discarded without evaluation of the electrostatic contribution if the fast van der Waals interaction is less favorable than a threshold value. This value is calculated by multiplying p19 by the number of fragment atoms.
  • Total energy. The fragment position is finally accepted if the total energy (fast model) is smaller than a cutoff given in the third column of i7. The total energy is the sum of the van der Waals interaction energy plus the electrostatic contribution (screened intermolecular energy and protein and fragment desolvation terms). These four terms can be weighted by the scaling factors in p23.

Clustering

The docking of a given fragment (with energy evaluation as described according to the Fast docking scheme) is followed by sorting and clustering. Within each fragment type, poses are first sorted according to binding energy. Poses whose binding energy is lower than a user-specified threshold value (see last point in the Fast docking scheme) are then clustered (a maximal number of positions can be set in p27) using the following similarity criterion between two fragment positions \(A\) and \(B\):

\[\begin{eqnarray} S(A,B)=\frac{S_{AB}}{\max (S_{AA},S_{BB})} & \mbox{ where } & S_{XY}=\sum_{i \in X} \sum_{j \in Y} w_{t_i t_j} \exp (- \gamma r^2_{ij}) \end{eqnarray}\]

where \(r_{ij}\) is the distance between two atoms (\(i \in\) fragment pose \(X\), \(j \in Y\)), \(w_{t_i t_j}\) is a user-controlled matrix whose coefficients reflect the similarity between element types (in most cases a unit matrix is used; otherwise the non-default coefficients have to be set in p24 by giving the number of non-default values on the first line and two element types with the non-default value on the following lines) and \(\gamma\) is a coefficient which acts on the broadness of the distribution of the positions. \(B\) is assigned to the cluster of \(A\) if \(S(A,B)\) is larger than a cutoff value \(\delta\) with \(0 \leq \delta \leq 1\).

The clustering proceeds in two steps:

  1. A first clustering with \(\gamma=0.9\) (first term of p25) and \(\delta=0.4\) (second term of p25) yields large clusters which contain almost overlapping as well as more distant fragments;
  2. A second clustering with \(\gamma=0.9\) (first term of p26) and delta=0.9 (second term of p26) is done on each cluster found in 1. to eliminate fragments which are very close in space.

The second clustering is applied on the positions for which the binding energy of the cluster representative is smaller than a cutoff value specified in the 4th column of i7. A single clustering step with \(\gamma=0.9\) and \(\delta=0.9\) would generate too many small clusters. Hence, the first step is a real clustering while the second step is done only to discard redundant positions.

Accurate rescoring

The \(n\) best binding modes within each cluster (\(n\) is set in p5) are rescored and ranked according to the accurate energy model. Note that it is possible that poses passing the total energy filter during fast docking result in a binding energy above the cutoff (set in the 3rd column of i7) with the accurate model. These poses appears in the docking summary in the seed.out log, but are not written to any other output files of the postprocess scheme.

Energy evaluation mode

SEED allows the energy of a particular fragment position to be evaluated without generating new poses (as opposed to the Docking running mode). The fragment mol2 file (i7) must contain the coordinates of the relevant position, for which the binding energy has to be evaluated.

Minimization

NOTE The possibility to minimize the top poses (those for which the accurate binding energy is evaluated) is still experimental and has not been extensively tested yet!

As the generation of the poses is based matching discrete vectors and rotations by fixed discrete increments there is the possibility to run a rigid body minimization of the final pose (in Docking running mode) or of the input pose (in Energy evaluation mode); the latter can be useful to rescore poses not generated by SEED, which might have unfavourable SEED energies. Running minimiation, in general, will result in better agreement (lower RMSD) with crystal poses and more favourable energy values.

Two different minimization protocols are implemented. A stochastic minimization with Monte Carlo simulated annealing (MCSA, see Monte Carlo Simulated Annealing) and a gradient-based minimization with steepest descent (SD, see Steepest Descent).

Monte Carlo Simulated Annealing

As the generation of the poses is based on rotations by fixed discrete increments there is the possibility to run a Monte Carlo (MC) sampler with simulated annealing (SA) on the poses going through the Accurate rescoring in order to do a stochastic minimization.

The MCSA minimization is based on a double-chain scheme; For every step of the outer chain, the inner MC chain is initialized from the last accepted conformation. The inner chain runs for \(N_{in}\) steps (second value of mc_niter); each new conformation is generated by a rigid MC rotational or translational move and accepted according to the change in the van der Waals contribution to the binding energy \(\Delta\Delta G_{vdW}\).

The final conformation of the inner chain is used as proposed conformation for the outer chain and accepted based on the change in total binding energy \(\Delta\Delta G_{binding}\) (comprising all the terms, i.e.: fragment-receptor van der Waals and Coulomb interactions, desolvation penalties of fragment and receptor). If the proposed conformation is rejected, all the samples generated by the inner chain are rejected.

The outer chain runs for \(N_{out}\) MC steps (first value of mc_niter) and at each steps the temperature is annealed according to:

\[T_i = \alpha T_{i-1}\]

where \(\alpha \leq 1\) (mc_sa_alpha).

For each MC basic move (rigid rotation and rigid translation) two maximum step values can be defined (mc_max_xyz_step for translations and mc_max_rot_step for rotations), allowing for tuning the possibility of coarse or fine deviations from the previous conformation. The relative frequency between rotation and translation moves can be tuned (mc_rot_freq), as well as the relative frequency between coarse and fine moves (mc_xyz_fine_freq for translations and mc_rot_fine_freq for rotations).

The MC minimization can be enabled by setting do_mc to y (yes) and adding the relevant keywords to the seed_kw.par, as detailed in Monte Carlo parameters.

Steepest Descent

It is also possible to run steepest descent (SD) minimization of the poses in rigid-body space, according to the following formula:

\[\mathbf{x}_{i+1} = \mathbf{x}_i - \eta_i \frac{\boldsymbol{\alpha} \circ \nabla U(\mathbf{x}_i)}{\| \boldsymbol{\alpha} \circ \nabla U(\mathbf{x}_i) \|}\]

The coordinate update is scaled by the learning rate \(\eta_i\) (parameter sd_learning_rate) and a base increment size \(\boldsymbol{\alpha}\) corresponding to the specific type of degree of freedom (sd_alpha_xyz for rigid-body translations or sd_alpha_rot for rigid-body rotations). The \(\circ\) symbol denotes the element-wise (Hadamard) product.

The learning rate is iteratively updated to avoid overstepping and increase efficiency, according to the following rule: if \(U_{i+1} > U_{i}\) the new pose is rejected and \(\eta_{i+1} = 0.2 \eta_i\), wherease if \(U_{i+1} \leq U_{i}\) the new pose is accepted and \(\eta_{i+1} = 1.2 \eta_i\).

The iteration stops either when a maximum number of iterations is reached (sd_max_iter) or when \(\| \boldsymbol{\alpha} \circ \nabla U(\mathbf{x}_i) \|\) is smaller than a user-specified threshold (sd_eps_grms).

For the calculation of the gradients of the energies only the van der Waals and screened ligand-receptor electrostatic interactions are considered, for which derivatives can be evaluated analytically. The Born radii and desolvation penalties are assumed to be constant. This approximation is justified by the fact that the desolvation energy surface is very smooth at the level of the small pose changes determined by SD. Once SD has stopped, the full energy is recalculated (including the correct Born radii) and printed to the output.

The SD minimization can be enabled (possibly after an MC minimization run) by setting do_sd to y (yes) and adding the relevant keywords to the seed_kw.par, as detailed in Steepest Descent parameters.