Software Developmont in Chemistry. 9th Workshop "Computer in Chemistry", Bitterfeld 1994

World Wide Web (WWW)/Mosaic: A World-wide Information System

Applications in Chemistry

Burkhard Kirste, Heiko Schlichting and Thomas Richter

Fachbereich Chemie, Freie Universität Berlin

Abstract

Currently the information system "World Wide Web" (WWW or W3) is rapidly gaining popularity. WWW is a distributed hypertext and hypermedia system. In contrast to simpler information systems (e.g., "gopher"), it offers nicely formatted pages. When using a browser such as "Mosaic", jumps into other documents or the display of images, sound or video is easily achieved by "clicking" at highlighted text or icons. In chemistry, the following applications are of particular interest: electronic publishing, database and library searches, interactive documents allowing immediate access to 3D views of molecular structures, animations or spectra, and teaching material.

Introduction

"World Wide Web", in short "WWW" or W3, is a worldwide information system which is based on hypertext documents with multimedia extensions. Originally it was developed by Tim Berners-Lee at CERN in Geneva. Early in the year 1993, our group at the Department of Chemistry at the Freie Universität Berlin was the first at a chemistry department in Germany launching a WWW server. Meanwhile (November 1994) over a dozen WWW servers are working at German chemistry departments. Worldwide there are over 8000 WWW servers, and the number of documents offered exceeds half a million.

First a short overview of the system shall be given, which consists of servers, documents, and browsers. Then several examples with particular emphasis on the requirements found in the area of chemistry will be shown, and finally an outlook on future developments will be given.

The application of the system shall be demonstrated by means of a typical example. A hypertext contains active words or phrases which serve as anchors or links to other documents. These active words are clearly marked, in the example by blue color and a line underneath. Now if you put the mouse pointer on such an active word and click, the current document is replaced by the document the link refers to. (Alternatively, a new window might pop up.) This behavior is well-known to everyone who ever used, e.g., the help system of MS Windows on a PC or a similar hypertext system on a Macintosh. Hypertext can be defined as text which is not constrained to be linear (Ted Nelson, 1965).

However, the WWW system is not restricted to text but offers full multimedia support such as graphics, pictures, sound and video. In the example, a larger picture would pop up if you click at the seal. On a local or local network basis, these features are also found on modern multimedia personal computers. As an arbitrary example, let me quote "Microsoft Encarta" which is a multimedia encyclopedia on CD ROM for MS Windows.

Now the particular feature of WWW is the fact that the documents are accessible worldwide and are easily retrievable by anyone who has Internet access. Browsers are available in the public domain for all important types of workstations or personal computers and operating systems (Unix/X window, MS Windows, OS/2, Macintosh etc.). Line-mode browsers suitable for text terminals (e.g., access via modem) are also available. In principle, every participant in the Internet might also provide information since servers are available for most platforms. Thus, the construction of a worldwide encyclopedia, a so-called interpedia, is under discussion.

Background: WWW and HTML

HTML (HyperText Markup Language)

The HyperText Markup Language (HTML) Document Type Definition (DTD) is defined in terms of SGML, the ISO Standard Generalized Markup Language. In contrast to, e.g., PostScript, HTML does not describe the appearance of a page in all its details, but only the structure of the document.

The following examples illustrate how headings and highlighting, structured lists, umlauts or special characters and embedded pictures, and anchors are defined. Formatting (markup) is introduced by tags which are enclosed by angular brackets. Note that HTML defines neither the fonts to be used nor the page size. All anchors are defined by the same scheme, regardless of whether the hyperlink refers to plain text, hypertext, PostScript, pictures, sound or video. Information about the document type is submitted to the browser by means of the file extension (e.g., txt, html, ps, gif, au or mpeg).

URL (Uniform Resource Locator)

In general, at least when documents on a different server are to be accessed, the uniform resource locator (URL) of that document must be specified in the HREF attribute of an anchor. The URL serves as a file descriptor in a worldwide network. It specifies the protocol, e.g., http, the hypertext transfer protocol for use with WWW servers, or gopher or ftp. It should be mentioned that gopher servers were introduced before WWW servers; they have no hypertext capabilities. The URL then specifies the server and the full path name of the file.

How to create HTML

In principle, any ASCII editor may be used for creating HTML documents. Macro capabilities are time savers because they allow quick insertion of tags. Alternatively, dedicated HTML editors may be employed which are available for different platforms; for example, HTML assistant for MS Windows, HTML writer for MS Windows, simple HTML editor (Macintosh), HTML editor for the Macintosh or EMACS in HTML mode.

Finally, various utility programs for converting documents written with some kind of word processor or as TeX/LaTeX documents are available (e.g., latex2html, fm2html (from FrameMaker MIF), rtftohtml; note that the Rich Text Format, RTF, is an optional output of, e.g., "Word for Windows".)

Browsers

In a large part, the current popularity of the WWW system is due to the browser "Mosaic" (also known as xmosaic). Besides presenting nicely formatted HTML documents, this browser allows to spawn various types of viewers (e.g., for pictures, video, sound, PostScript or TeX/LaTeX DVI documents). It offers easy navigation, exporting or printing of documents in several formats (as plain text, PostScript or HTML), annotations etc. Although the present talk focuses on Mosaic, it should be mentioned that various browsers are available for different platforms, e.g., chimera, netscape (mozilla), arena and cello; line-mode browsers are lynx or, from CERN, www.

Servers

Currently we are using the NCSA HTTPD version 1.1 on a Unix workstation. Alternatively, the CERN HTTPD may be employed or the Plexus PERL server. Server software is also available for MS Windows or for the Macintosh.

Applications in Chemistry

A WWW server may be used, of course, to supply information about an institution, e.g., the Department of Chemistry at the Freie Universität Berlin. It is also well-suited for the announcement of talks or courses.

WWW is an excellent medium for providing online documentation of software; three nice examples are the manuals for MolScript, Mathematica and Rasmol. This type of application is easily extended to teaching material in general.

Electronic Publishing

Electronic publishing is a field of utmost importance which may revolutionize the current system of dissemination of scientific information. Thus, delays caused by the printing and distribution process are avoided when papers or preprints are deposited on a WWW server. The best choice for the reader is to provide HTML documents which may contain all kinds of multimedia supplements. However, PostScript documents may also be viewed online, or they can be downloaded and printed locally.

For example, there are a paper dealing with the "Simulation of EPR Spectra", a collection of refereed publications by Henry Rzepa, and a preprint collection of the School of Chemistry at Leeds University.

The journal Chemical Physics offers a full-featured automated electronic archive and distribution server of its preprint database. Hypertext abstracts can be viewed online, and the source of the full paper can be downloaded easily. The journals J. Am. Chem. Soc. and Chem. Rev. offer supplementary material on the ACS gopher server. Tables of contents and abstracts are available for the Applied Spectroscopy journal. Springer Journals offers a preview service requiring payment, tables of contents are distributed free of charge. The first issue of the electronic Journal of Molecular Modeling has been announced for January 1995.

Several online journals and periodicals are available in biology and medicine. True electronic journals are found in the fields of mathematics and physics, e.g., the Electronic Journal of Combinatorics, the Electronic Journal of Differential Equations, the New York Journal of Mathematics and the Journal of Artificial Intelligence Research (also available in print).

It should be emphasized that although the publication of preprints on an institutional WWW server offers a quick means of informing colleagues about new research results, the services of (electronic) publishing companies are still indispensible. Only a professional organization can guarantee quality standards by the established peer review system, authenticity of the documents, proper archiving and continuity of the service. Moreover, there is certainly a need for supplementing the online publication by permanent media such as CD-ROM or printed copies.

Besides papers, conferences are an important means of spreading scientific information. Saving the costs and troubles, but also the more pleasant features of conferences, they may be held electronically. An example is the First Electronic Computational Chemistry Conference held in November 1994.

Searchable Indexes, Forms, Databases and Clickable Maps

So far, the applications presented are based on "real" hypertext documents. However, by introducing CGI (Common Gateway Interface) scripts or programs, the WWW system offers much more flexibility. For instance, HTML documents can be produced "on the fly", and searchable indexes or forms allow more or less sophisticated access to various types of databases.

A searchable index allows the user to enter a search expression. Three typical examples are: an acronym database (look for abbreviations, e.g., "COSY"), abbreviations of chemical compound names (e.g., "DMSO") and our searchable chemistry index.

A lot more flexibility is offered by forms. One typical application of forms (using the "post" method) is to supply information, e.g., a message that is mailed to the administrators of a WWW server. An example is our form for the registration of documents related to chemistry which may be used to enter pointers into our chemistry index. Another typical application (using the "get" method) is to query some kind of database, for example, safety data of chemicals. In this case, we have developed a CGI script and a special-purpose database program which yields HTML output providing further links. Thus, textual explanations for the numbers of R and S safety sets are given on clicking at the respective anchors.

There is an excellent WWW interface available for querying the Brookhaven Protein Data Bank (PDB), making use of a form (PDB Browser). It stands to reason that capabilities for substructure searches via WWW are lacking so far, but the development of an appropriate interface should be feasible and might have a big impact on professional online database searches.

Another application, which is not restricted to chemistry, is offered by online library catalogs; in this field, WWW interfaces are gradually replacing the unpleasant "telnet" facilities. Actually, any resources of the WWW server may be used by means of the CGI concept, e.g., there is a form for the conversion of units calling the Unix utility program "units".

A further way of user interaction with a WWW server is provided by clickable maps, allowing the user to point at a particular spot or region of an image to obtain specific information. An obvious application is found in geographical maps, e.g., the map of German WWW servers. Our welcome page allows a quick selection of one of the main topics by clicking at the corresponding key word or icon. The citric acid cycle represents a nice example of an application in chemistry.

Chemical MIME

It has already been mentioned that a browser such as Mosaic is able to spawn external viewers for certain applications. This idea has been extended to the particular needs of chemistry by Henry Rzepa by introducing chemical MIME (Multipurpose Internet Mail Extensions). An example of applications of chemical MIME is found on our page about the terpene pinene. If the MIME types for chemistry are installed correctly, clicking at the small picture of the 3D molecular model does not spawn a viewer displaying simply a larger static picture, but instead a viewer such as xmol or rasmol will be launched, allowing interactive display with options such as rotating the molecule.

Another extension refers to spectra or chromatograms. It is often desirable to expand a certain portion of, e.g., an NMR spectrum. Clearly, a simple magnification of an image stored as pixel graphics would not solve the problem. Instead, the xy coordinates should be supplied to a plot program such as xmgr or xgraph. Again, an example may be found on our page about pinene.

MIME extensions for chemistry must be installed on the server side as well as on the browser part, e.g.,

MIME extension on the server

NCSA-httpd, file conf/mime.types:

chemical/x-pdb                 pdb
chemical/x-xyz                 xyz
chemical/x-mol                 mol
application/x-xy               xy

MIME extension for the browser

Mosaic for Unix, file $HOME/.mailcap:

chemical/x-pdb; rasmol %s
chemical/x-xyz; xmol -readFormat xyz %s
chemical/x-mol; xmol -readFormat alchemy %s
application/x-xy; xmgr %s

Another example for "chemical MIME" may be found on our pages dealing with amino acids, and there is a nice demonstration of hyperactive molecules by Henry Rzepa and Benjamin Whitaker. Moreover, a viewer such as xmol even allows the display of molecular animations, e.g., reactions or vibrations.

Summary and Outlook

We have shown that the World Wide Web provides fascinating potentialities for applications in science. It might revolutionize the established scheme for the publication of scientific results in printed media and play an important role in electronic publishing. In addition to faster dissemination of information, it offers features that printed media cannot cope with. Thus, 3D coordinates of molecules or xy data of spectra are easily supplied and visualized, yielding true three-dimensional impression of molecular models, animations, or the chance of a detailed inspection of spectra. Moreover, convenient interfaces for querying all kinds of databases can be provided, and access to, e.g., 3D structures or spectra might be offered. It is conceivable that a "living" online chemistry encyclopedia may be built.

For the sake of commercial providers, several schemes for access restrictions are available (e.g., by host, by site or by password) or under development.

Currently the WWW system, which is still in its early stages, is in rapid development and growing exponentially with regard to the number of servers, documents and users. However, it must be admitted that presently several shortcomings and disadvantages are encountered. Thus, HTML 2 is rather restrictive; it does not provide support for sub- and superscripts, mathematical formulas or tables. However, these features will be present in HTML 3 (or HTML+); the first browsers suitable for that format, e.g., arena, are in their test stage.

Another problem is of technical nature. The promised "information superhighway" is not yet reality. In practice, access to remote servers is often slow, particularly at peak hours, or temporarily impossible. Unfortunately, quite a few WWW servers have a bad habit of shifting around their documents, or names of servers are changed, so that document URLs change. Therefore it would be advantageous if the present way of explicitly addressing URLs could be replaced by some indirect or symbolic link to an information server.

The third big problem concerns the question of "indexing the Web", i.e., providing means of quickly finding a particular piece of information. In principle, such indexes may be compiled manually, by schemes such as aliweb (list-based), or by robots that traverse the Web automatically (spider-based). Clearly, none of these methods is really satisfactory. For particular purposes such as the development of distributed databases, an alternative to WWW such as Hyper G might be advantageous; however, Hyper G is not scalable, more difficult to implement and less flexible than WWW.

Acknowledgements

We wish to thank all developers of the Web. B.K. gratefully acknowledges financial support by the Fonds der Chemischen Industrie.

Appendix

Useful Reference Material

WWW (World Wide Web) and Mosaic Information
http://www.chemie.fu-berlin.de/outerspace/www-info_e.html
World Wide Web Frequently Asked Questions
http://sunsite.unc.edu/boutell/faq/www_faq.html
Chemistry, World-Wide Web Virtual Library
http://www.chem.ucla.edu/chempointers.html
Hierarchical Internet Chemistry Index
http://www.chemie.fu-berlin.de/chemistry/index/
Yahoo Hierarchical Hotlist - A Guide to WWW
http://akebono.stanford.edu/yahoo/
Book "World Wide Web Unleashed"
John December and Neil Randall: "World Wide Web Unleashed", Sams Publishing (Indianapolis, Indiana, USA), 1994.
http://www.rpi.edu/~decemj/works/wwwu.html
The Mosaic Handbook for the X Window System
Dale Dougherty, Richard Koman and Paula Ferguson, "The Mosaic Handbook for the X Window System", O'Reilly & Associates, Sebastopol, 1994.
http://gnn.com/gnn/bus/ora/catalog/mosx.desc.html
The Mosaic Handbook for Microsoft Windows
Dale Dougherty and Richard Koman, "The Mosaic Handbook for Microsoft Windows", O'Reilly & Associates, Sebastopol, 1994.
http://gnn.com/gnn/bus/ora/catalog/moswin.desc.html
The Mosaic Handbook for the Macintosh
Dale Dougherty and Richard Koman, "The Mosaic Handbook for the Macintosh", O'Reilly & Associates, Sebastopol, 1994.
http://gnn.com/gnn/bus/ora/catalog/mosmac.desc.html

Burkhard Kirste, 1994/11/15