Previous Next Contents

3. Current Status

A test version of the server http://www.chemie.de/ went public in May 1997. Meanwhile (November 1997) the design and functionality have been improved. Since the performance of the database system Postgres (PostgreSQL) proved to be insufficient, it was replaced by the commercial product Solid.

The collection of links to chemistry-related documents currently amounts to about 14000 entries, 6000 of them have been classified manually. The calendar of events with about 600 entries is probably the largest world-wide in the field of chemistry. A mailing list for discussions of interesting problems has been opened. Server statistics may be viewed by means of a series of informative diagrams displaying the daily number of hits and the volume transferred as well as the number of hits by domains or for the most popular files.

Two tools are offered, namely a searchable dictionary of acronyms and abbreviations and a tool for the conversion of units.

3.1 How it Works

A dynamically configurable database system is employed for the meta index of chemistry-related Internet documents (collection of links), the software descriptions, the calendar of events etc. The layout database can be generated and modified dynamically by means of WWW front ends: Input and output masks are generated automatically, bilingual descriptions of attributes are given (German and English). External users may contribute to the information content of Chemie.DE; registration and user administration are required for that purpose. The integrated database interface PHP/FI allows different views of the data, such as search requests or hierarchical indices.

Depending on the category, information may come from different sources. For the link collection, an intelligent robot is tracking down chemistry-specific documents in the Internet and stores them locally. A filter program compares the content of these documents with those already stored in the pool database with respect to ambiguities, unvalidated links and missing key information. A pattern recognition program tries to classify the documents according to predefined categories. In this process, meta data, title information and hyperlinks are extracted automatically. Additionally, an index of references is kept for each entry which may be seen as a measure of "popularity" of the document. Documents that have been unambiguously classified as being relevant to chemistry will contribute to the word list used for the pattern recognition system. This process may be controlled by means of local front ends. Thus, new documents or those which are difficult to classify can be added to the database semiautomatically. A background process (validator) checks the validity of the database entries in regular intervals and keeps the corresponding meta information abreast.

3.2 Technical Details

The hardware of the main server consists of a dual processor machine with two 200 MHz Pentium Pro CPUs, 128 MB of RAM and two 4 GB disks running in RAID0 mode. Software: The freely available operating system Linux is used, and the web (HTTP) server Apache is running with the CGI wrapper and database interface PHP/FI. The commercial database Solid is used.


Previous Next Contents