CHAPTER 1
General Overview

The Molconn-Z software is designed to carry out the computation of a wide range of topological indices of molecular structure. These indices represent important elements of the molecular structure information which is useful in relating structure to properties. These variables of molecular structure include (but are not limited to) the molecular connectivity chi indices, ^m>_t and ^m_t^v; kappa shape indices, ^m and ^m; electrotopological state indices, S_i; hydrogen electrotopological state indices, HES_i; atom type and bond type electrotopological state indices; topological equivalence indices and total topological index; several information indices, including the Shannon and the Bonchev-Trinajstiç information indices; counts of graph paths, atoms, atoms types, bond types; and others.

These indices have been widely used in QSAR analyses and other types of relationships between the structure of molecules and their properties. Discussion of the definitions and background of the chi, kappa, electrotopological state, and topological equivalence indices are given in Chapter 2 along with appropriate references. Further, references are given for several reviews of the development and use of topological indices at the end of this chapter as well as in Chapter 2.

Molconn-Z is set up to be user-friendly and flexible. See Chapter 4 for detailed information. Input of molecular structure is done with one of several forms of molecular connection tables. There are several options available for input of molecular structure to Molconn-Z. This new version of the MOLCONN software makes provision for the increasingly popular SMILES code, developed by D. Weininger and available from Daylight Software. For those who have the commonly used commercial database and data entry systems, several data formats are now available in Molconn-Z, including those from ChemDesign, MDL Information Systems, Inc., and ChemDraw. See Chapter 6 for detailed information of the use of each of these input formats. Further, for those who do not currently possess such commercial systems and who do not currently have facility with SMILES code, Molconn-Z has its own Molconn connection table format which is carefully described with examples in Chapter 6. In addition, the Molconn-Z source code has provision for the user to establish a user-defined data format to be read into Molconn-Z through a user SUBROUTINE called USRFIL. Contact Hall Associates Consulting for information about customizing the USRFIL SUBROUTINE.

In addition to flexible forms of input, Molconn-Z permits flexibility in output. See Chapter 7 for detailed information. There is a standard output listing file (.L) whose contents can be selected in a menu. There is also a standard output file of the computed indices which can be used in subsequent statistical analysis. This output index file (called the .S file in this manual) can be used directly as the input to standard statistical packages. Output to the .S file includes all the computed indices except the bond type E-state indices. The user may select which records to include in the .S file. This option is obtained by selecting a sub-menu from the main menu. See Chapter 5 for detailed information. The bond type E-state indices are output to the .E file. There is a MENU option for computation of the bond type indices which also activates a request for the .E file name at execution time.

A commonly used statistical package is SAS from The SAS Institute in Cary, NC. Because of the widespread use of SAS, the Molconn-Z software provides files which contain the format of the .S file in the language of SAS (SASINPZ.TXT) and also the format of the .E file (SASINPAE.TXT for all the bond types and SASINPOE.TXT for organic bond types only). These files may be directly inserted into a SAS program which can then read the .S file or .E file for analysis. See Appendix III. These files also serve as a guide for the setup of input to other statistical packages.

The general flow of information for Molconn-Z is described in brief form in the flowchart on the last page of this chapter. A more detailed description of this information flow is given in Chapter 4. In general, the user creates molecule connection tables either by use of a commercial package or, perhaps, simply in an edited text (ASCII) file. Either the actual connection tables are included in the file or the names of the files which contain those connection tables. This file is called the .B (for bonds) file in this manual.

The user starts execution of Molconn-Z and is requested to enter the name of the .B file. Then, the user is given the choice either to use all standard default MENU options or to select and modify any of these options through the interactive use of a menu on the terminal screen. Then the program requests a name for the output listing file, (.L file), the output index file, (.S file) and/or the bond type file (.E file) if any of these files have been selected in the Main MENU. These file, .B, .L, .S, and .E, usually have the same common name, such as TOXICITY.B, TOXICITY.L, TOXICITY.S, TOXICITY.E. With the completed input of this information, Molconn-Z performs its tasks and produces the appropriate files. Error messages, if any, are printed in the output listing .L file. See Appendix VII for detailed information.

It should be noted that for highest speed of execution, the user selects NO .L file; then, no information is sent to the screen and no .L file is produced. If there are error messages, Molconn-Z creates an .MSG file to contain the error messages. This feature is especially useful for large databases.

There are options for two additional output files, if desired. A file may be generated which contains all of the subgraphs of a given order along with some connectivity information. This file could be used in a visual examination of such information or in some subsequent statistical or computational procedure. A file may also be generated which contains the distance matrix.

Molconn-Z is designed to run on any mainframe computer or minicomputer, including UNIX platforms, as well as on microcomputers such as a PC or Macintosh. Instructions concerning installation in the main part of this guide refer primarily to main frame computers and to PCs. See Appendix IV on microcomputers for detailed specification.

The maximum size of molecule which Molconn-Z can accommodate depends upon the complexity of the molecule and on the memory size of the computer. This size information is stored as array size parameters in the file PARAM.DAT which is discussed in Chapter 3. The default sizes which accompany the Molconn-Z software are 100 nonhydrogen atoms and up to 90 rings and circuits in the molecule. The speed of computation and processing depends upon the computer being used. Customization of these parameters can be arranged with Hall Associates.

General References

1. L. B. Kier and L. H. Hall, Molecular Connectivity in Structure-Activity Analysis, Research Studies Press, John Wiley and Sons, Letchworth, England, (1986).

2. L. B. Kier and L. H. Hall, Molecular Connectivity in Chemistry and Drug Research, Academic Press, New York, 1976.

3. L. H. Hall, "Computational Aspects of Molecular Connectivity and its Role in Structure-Property Modeling" in Computational Chemical Graph Theory, Chap. 8, pp 203-233, D. H. Rouvray, ed., Nova Press, New York (1990).

4. L. B. Kier, "Indexes of Molecular Shape from Chemical Graphs" in Computational Chemical Graph Theory, Volume II, Chap. 6, pp 152-174, D. H. Rouvray, ed., Nova Press, New York (1990).

5. L. H. Hall and L. B. Kier, "The Molecular Connectivity Chi Indexes and Kappa Shape Indexes in Structure-Property Relations", in Reviews of Computational Chemistry, Chap. 9, pp 367-422, Donald Boyd and Ken Lipkowitz, eds., VCH Publishers, Inc. (1991).

6. L. B. Kier and L. H. Hall, "An Atom-Centered Index for Drug QSAR Models", in Advances in Drug Design, Vol. 22, B. Testa, ed., Academic Press(1992).

7. A. T. Balaban, ed. Chemical Applications of Graph Theory, Academic Press, New York, (1976).

8. N. Trinajstic', Chemical Graph Theory, Vols. I, II, CRC Press, Boca Raton, FL, (1983).

9. R. B. King, ed., Chemical Applications of Topology and Graph Theory, Amsterdam, (1983).

10. D. H. Rouvray, Am. Sci., 61, 729, (1973). The Search for Useful Topological Indices in Chemistry.

11. D. H. Rouvray, Sci. Am., 255, 40, (1986). Predicting Chemistry from Topology.

12. P. J. Hansen and P. C. Jurs, J. Chem. Ed., 65, 574, (1988). Chemical Applications of Graph Theory II: Isomer Enumeration.

13. A. Sabljic' and N. Trinajstic', Acta Pharm. Jugosl., 31, 189, (1981). Quantitative Structure-Activity Relationships: The Role of Topological Indices.

CHAPTER 1General Overview

General References

General Flowchart

CHAPTER 1
General Overview