LESSON 1: Creating and Verifying HASL Model for Steroids



  1. Invoke SYBYL
  2. To begin this lesson, type source $HASL_DEMO to the unix cshell prompt. This script makes a new directory in your account titled "hasl4.0S_demo" which has soft links to the master HASL tutorial directory. Type cd hasl4.0S_demo to change to this new tutorial directory. Then enter the command sybyl to load and run SYBYL.

  3. Create a SYBYL Molecular Spreadsheet with the steroid data set loaded.
  4. From the menubar, choose the File, Molecular Spreadsheet pulldown and select New... The Data Source is Database. The Database Selection dialog appears. In the Database containing molecules: text window enter $TA_DEMO and press Search Directory. Select jacs.mdb and press Open. This should bring up a Molecular Spreadsheet (MSS) with the 21 steroid molecules identified in the rows of the MSS. Now to load the activity data. From File (on the MSS menubar) select Import... Change the Format to Tripos and press the ... button next to the File: text box. In the any_file dialog enter $TA_DEMO as A name of the existing file. and press carriage return. From the resulting Files: list select cbg.tripos and press OK. Press Import on the Import JACS dialog. The CBG (Corticosteroid Binding Globulin) data has been loaded into column 1 of the MSS.

  5. Add a "Basic" HASL column to the Molecular Spreadsheet.
  6. Press the AutoFill button on the MSS and select Column from the Option menu. Select HASLMODEL as the New column type and press OK. The Add Column (HASL) dialog contains options for optimizing and customizing the individual molecular lattices that will be later merged into the HASL model. In the present case we will use the simplest case, setting HASL Source: to be Use Basic H-Val Parameters. Press the Edit Basic Atomic H-Val Parameters.. button and examine this parameter set with the Edit Basic HASL H-Val Parameters dialog. Each atom type has one of three values {+1 0 -1}denoted by the three button radio sets. Note that you can change these values for the current session. You also have the options: Reset to Program Default, Reset to Personal Default or Save Current as Personal Default. Do not change anything now, press Cancel to return to the main HASL dialog.

    Hydrogen Treatment: is a description of how the program deals with hydrogens. There are up to three options, although one of them, United, is unavailable when using the Basic H-Val Parameter set. All means all atoms are calculated explicitly; Essential means all hydrogens on Polar atoms (N, O, P, S) are calculated explicitly, but hydrogens on carbons are implicitly included with their "heavy" atoms as united atom types; United means all hydrogens are implicitly included with their "heavy" atoms as united atom types. The H-Val Gain Factor is a numerical constant to be multiplied by the H-Vals imported as partial charges, from HINT, or from Molconn-Z. It is important that the resulting parameters lie in the approximate range of +5 to -5. The H-Val Gain Factor is inactive when using Basic or Advanced H-Val parameters.

    The HASL Region: is the dimensions of the lattice over which all molecules in the set will be calculated. By far the easiest method for this is to Pre-Calculate as Union... This calls the Calculate SYBYL Region Automatically dialog. Set the Spacings for X, Y and Z to be 1.5 A and leave the Margins for X, Y and Z at 4 A. Enter as a SYBYL Region File jacs.rgn. Press OK. The SYBYL Region File text box on the main HASL dialog should be filled with jacs.rgn after a brief calculation. The last item to deal with is HASL Model File. This is the name of a file that will maintain records of how this and subsequent calculations in the present session were performed. It is important that this be a reasonable name that indicates the data set used and other features of the calculation. For the present case enter jacs.hsl and press OK to AutoFill the HASL column in the MSS. The suggested HASL Column heading of HASL2 is fine.

  7. Verify the HASL Model Using Cross-Validation
  8. The HASL 4.10S program has a built-in routine to cross-validate the developing HASL Model to ascertain simply its predictive ability. On the SYBYL menubar choose eslc, HASL, Verify HASL Model (Crossvalidation)... In the HASL Verify dialog box, enter the Activity Data Column: (1) and HASL Column: (2). The Iterations: are the number of convergence cycles to allow before the Error Limit: is reached. Obviously this is an area where care must be taken to balance the concerns of speed and accuracy. For the present case set Iterations: to be 200 and Error Limit: to be 0.001. (It is likely that this error limit will not be reached for this exercise.) The current HASL Model File: (jacs.hsl) should be displayed. Choose Leave-One-Out as the Validation Type:; note that the Number of Groups is equal to the number of rows in the MSS. Select Save Cross-Validation results in MSS on so that a new column with the predicted CBG data will be added to the spreadsheet. If you click Examine HASL Model Setup... the Examine HASL Setup dialog will be presented giving you an opportunity to review the parameter choices made in creating the HASL column. Press OK on the HASL Verify dialog box when you are ready to proceed. Go get a coffee or soft drink. This will take from several minutes to an hour depending on your computer.

    When the calculation is complete a dialog will be displayed reporting that the Cross-Validated r2 is 0.794123. This is a representation of how predictive the model may be. What happens in leave-one-out cross-validation is that each row in the MSS is predicted by models constructed from the other molecules. The cross-validated r2 reports the correlation between these predictions and the actual measured values. You can also examine the quality of the predictions by examining the data in Column 3 of the MSS (now labeled HASLCVR3).

  9. Create the Final HASL Model
  10. This step is essentially the same as the previous step except that we are going to create a single HASL model using all of the rows in the table. Note however, that you can create models using a subset of all rows if you choose. From the SYBYL Menubar choose eslc, HASL, Create HASL Model... Enter the Activity Data Column: (1) and HASL Column: (2); Set 300 Iterations: and an Error Limit: of 0.001. Again the HASL Model File: should be jacs.hsl. Select Save HASL results in MSS on. There is a SYBYL NetBatch option for this command but in the present case that shouldn't be necessary. Press OK to begin the procedure. It should take 1-4 minutes to converge the HASL model.

  11. Use the HASL Model to Predict the Activity of Some Molecules
  12. The HASL model you have created can be used to predict the activity of molecules not in the learning set. We are going to attempt predictions for some of the steroid molecules that Cramer et. al evaluated in the original CoMFA paper. These molecules have been named steroid1-steroid10 as before and are in the current HASL 4.0S demo directory. First read one of the steroids into a SYBYL molecular area. Use File, Read... and select any of the molecules presented in the Read File dialog. The, on the SYBYL menubar, choose eslc, HASL, Predict Molecule Activity... The HASL Predict dialog offers a selection of Molcule/Atoms... where you can select the molecule to be predicted (choose All atoms) from the available Sybyl molecular areas. The option Save Molecular Lattice will write the molecular HASL to a disk file (File Name:) if chosen, but this is not normally necessary. The HASL Model File: should be jacs.hsl. Press OK to predict the molecule. This should take only a few seconds and the result will be displayed in both the text window and in a dialog. The actual values, CoMFA predictions (Cramer et. al, J. Am. Chem. Soc. 1988, 110, 59-59-5967.), and HASL predictions for the the 10 steroids are shown in the table below.

    Steroid Activity CoMFA HASL
    steroid1 7.512 6.544 7.392
    steroid2 7.553 7.540 7.354
    steroid3 6.779 6.526 6.732
    steroid4 7.200 7.546 6.592
    steroid5 6.144 5.955 5.924
    steroid6 6.247 7.057 7.032
    steroid7 7.120 5.384 6.558
    steroid8 6.817 7.009 6.775
    steroid9 7.688 7.227 7.534
    steroid10 5.797 6.937 7.464

  13. Map the HASL Model
  14. In effect HASL is "partitioning" the molecular activity to spatially defined grid points that encode both a character (H-Val) as well as the activity. In order to graph this in a contour map we need to filter the HASL model file to select map points meeting criteria based on their H-Val. Select eslc, HASL, Make HASL Contour Field... from the SYBYL menubar. The Create HASL Contour Field dialog first specifies the HASL Model File: (jacs.hsl). The Scale with H-Val Factor check box can be used with the Advanced H-Val parameter set and the Partial Charge, HINT and Molconn-Z parasmeter sets, where there may be more than one positive (or negative) H-Val at the same grid point. Leave this off for this example. The HASL Field Type: identifies the filtering criterion to be used for creation of the contour field. First select H-Val .gt. 0 and enter jacs_gt.cnt as the Contour File:; press OK. You will note in the SYBYL text window a histogram indicating the range and distribution of grid points in the jacs_gt.cnt map. In the Map Contour dialog select Display Area: D1, Style: Transparent (if your graphics supports this, otherwise use Dots or Lines), Contour Value: -0.06, Contour Color: Red and press Accept. Then enter 0.06 as the Contour Value: and Green as the Contour Color: and press Accept again. When you press Done the map will be displayed.

    Now add two other map layers: Select eslc, HASL, Make HASL Contour Field...; choose H-Val .eq. 0 and enter jacs_eq.cnt as the Contour File:; press OK. In the Map Contour dialog select Display Area: D2, Contour Value: -0.06, Contour Color: Red-Orange; press Accept; Contour Value: 0.06, Contour Color: Green-Blue; press Accept and then OK. Lastly, make a map with H-Val .lt. 0 and contour the negative values (-0.06) with Orange and the positive values (0.06) with Blue in D3.

    These maps can be easily manipulated with the SYBYL check box icon on the Fast Access menu. The interpretation is simply that the stronger positive contours (green, green-blue and blue) represent regions that reinforce activity, while the negative contours (red, red-orange and orange) are in regions where the molecular structure(s) are compromising activity. Remember for your interpretation that atoms with H-Val greater than zero are generally H-Bond acceptors and atoms with H-Val less than zero are H-Bond donors. (Atoms with H-Val equal to zero are hydrophobic).

  15. Notes on Using Other H-Val Parameter Definitions
  16. The alternative methods for defining the H-Val parameter for HASL (in the Add Column (HASL) dialog) give different bases for creating the models and should be explored. In general, other than the selection of the HASL Source:, the protocol for running HASL is the same.