Appendix I:
Interfacing NMR Facilities, e.g. the NMR Facility at Madison

A promising approach to these problems is to harness recent developments in information technology to develop specific tools designed to facilitate communication and provide access to prior information and distributed expertise. One such model is being pursued at the University of Wisconsin in the NIH NCRR funded National Magnetic Resonance Facility at Madison. This collaboration grew from contacts between the group in Biochemistry, who had been setting up a data bank for biomolecular NMR, and groups in the Computer Sciences Department, who had independent research projects underway in the areas of distributed computing, scientific experiment management, and visual exploration of very large sets of data and images. Over the past decade these research activities relevant to the NMR collaboratorium have been pursued: (1) The ZOO, develops computer science tools to enable scientists and engineers to manage experimental data from the desktop. (2) The DEVise Project, develops tools for the visual exploration of tabular and multimedia information. (3) The CONDOR Project, develops high throughput computing on large collections of distributively owned computing resources.

The basic approach being taken at Wisconsin is to develop a Customized Desktop Experiment Management System (CDEMS), which utilizes a DEVise viewer, to provide a flexible personal graphical view of the laboratory environment (schema). In early discussions it became apparent that the information technology tools were of immediate and long term interest to the NMR facility and associated data bank. On the other side, the complexity of NMR experiments, the computational intensity of NMR data processing and analysis, and the interrelationships and visual properties of NMR and structural biological data presented challenges to the current state of computer technology. The combined groups from Biochemistry and Computer Sciences teamed up on two projects: (1) The NMR-ZOO Project is equipping NMRFAM with a desktop experiment management system based on the "ZOO" tools. (2) The BMRB Project operates BioMagResBank, a publicly accessible repository for biomolecular NMR data.

The concept behind the NMR-ZOO project is that an NMR-specific customized desktop experiment management system (CDEMS) can be used to manage the full life cycle of an NMR experiment (experiment design, data collection, data analysis, interpretation, and report preparation). A laboratory schema represents useful objects in the environment of each type of investigator: spectrometers, NMR experiments, software, relevant databases, published reports, etc. Specialized experiment schemas are used in launching software applications, including those associated with NMR data acquisition. Each NMR lab staff member and user will have their own personal schema. New users will be provided with an initial view (schema) with commonly used objects taken from the laboratory schema and objects for managing the data they generate in the facility. For novice users, this can be a simplified schema tailored to the specific experiments to be performed. Users themselves delete or create objects (of their own design or from the laboratory schema) to accommodate the tasks to be performed. Users can customize graphical displays to fit their own tastes and priorities. The NMR- ZOO tools sit on top of a commercial relational database management system (RDBMS). The RDBMS keeps a history of all operations and catalogues the locations of primary data and intermediate results. Particular operations, such as a chained series of calculations with associated parameters, can be defined as objects so that they can be launched with simple commands. In order to make this effective, the metadatabase (the data underlying the database) needs to be expanded to include data dictionaries that describe objects of interest to different classes of participants. In addition, it will be possible to implement validation tools that provide running checks for inconsistencies or gaps in the data or can carry out simulations that may suggest additional experiments to be performed.

In implementing the NMR-ZOO, the starting point has been to begin bringing key software packages used at NMRFAM under operation of the CDEMS. The goal is to have a complete set of software packages operating under the CDEMS that can be used to take NMR spectral data from the spectrometers through the full process needed to determine a three-dimensional structure. The Wisconsin team have carried out successful pilot studies that indicate that it will be possible for the CDEMS to control data acquisition. ZOO schemas have been developed to model the input and output of NMRPipe, a package for NMR data processing developed at the NIH, and the package now is running under the CDEMS. Software has been developed at NMRFAM to convert the output of NMRPipe to the submatrix format used by Chifit and Felix (Molecular Simulations). The next package to be incorporated into CDEMS will be Chifit, the software package developed at NMRFAM that is used for automated extraction of chemical shifts, line widths, peak intensities, and phases from 1D - 4D NMR data sets. Automated NMR peak assignment tools (CONTRAST developed at NMRFAM and/or other available packages) and the NOE peak filtering and stereospecific assignment suite APIS developed at NMRFAM will be incorporated into the CDEMS. A ZOO schema has been constructed to model the use of the structure determination package X-PLOR and will be used to bring this package (or more likely its successor, CNS) into the CDEMS family. The Wisconsin team recently has devised a way to operate the X-PLOR package under CONDOR.

In order for the NMR-ZOO system to become fully operational as a vehicle for active collaborations, another shell needs to be added to fully support communications and collaborative data visualization. A plan for an Integrated Collaboratory Testbed for Biomolecular NMR has been developed that addresses needs in collaborative research for: (1) data visualization for collaborative analysis and development of the visual language of biomolecular NMR, (2) organization and capture of protocols and parameters involved in the acquisition, processing, and analysis of data associated with collaborative projects, and (3) tools for facilitating results checking in ongoing structural NMR investigations. The tools to be developed are designed to be flexible and extensible by others so that they can add their own software and databases to the network. The technology will be developed through design- testing-feedback-refinement cycles, with testing by NMRFAM staff members and evaluation in real-life scientific collaborations among NMRFAM staff members and external scientists. It was suggested that these tools be made available to the larger NMR Collaboratorium. Wider acceptance of tools of this kind, particularly if they are interoperable, should stimulate their further development and progressively enlarge the collaborative network into related scientific areas.

The advantages of the Wisconsin approach promise to be:

  • More efficient use of available NMR resources through organization and automation
  • Faster and more thorough analysis with better evaluation of errors
  • Documentation of all assumptions and procedures used
  • Procedures organized for training of novice spectroscopists
  • Remote access to relevant information decreases the necessity to travel to the data collection site
  • Less time required to plan and carry out experiments
  • Clear representation of the roles and contributions of each collaborator and the status of the project at any point in time.
  • Automated checking of calibration, pulse sequence, spectrometer settings used in data collection
  • Pipelined data analysis with jobs farmed out to most appropriate computers

Collaborative interactions are facilitated and recorded in the NMR-ZOO environment. DEVise complements these capabilities by allowing scientists to share visual presentations, and to collaboratively analyze data by interacting with such presentations. In fact, DEVise can also be used to examine the history of collaborative steps, as recorded by the NMR/ZOO component.

The Wisconsin approach to supporting collaborative visual analysis exploits the fact that a data-intensive visualization in DEVise is a "presentation template'' (containing data mappings and visualization details) applied to a collection of datasets. This careful separation has far-reaching consequences:

  • The size of a presentation template is small and independent of the size of the data being presented.
  • Templates can be applied to other similar datasets.
  • Templates can be incorporated into other templates easily.
  • Templates can be shared between users.

In NMR spectroscopy, as in many other scientific domains, a variety of analysis and visualization techniques have been developed for important kinds of datasets. Scientists often need to use a combination of tools based on different techniques. For example, a collaborator may view multidimensional NMR data with Felix (Molecular Simulations), protein sequences with a web browser, 3D structures with Midas+ or MolScript, and output from structure determinations by PPROCHECK-NMR and AQUA.

The goal being pursued at Wisconsin is to integrate the use of multiple visualization tools, developed by them and by others, through a single uniform interface based on DEVise. DEVise already complements domain-specific viewers by enabling specific data objects or data subsets to be viewed using appropriate external viewers, within the context of a larger picture provided by a DEVise visualization. The Wisconsin group plans to build upon this extensibility by making it easy to: register external viewing and analysis packages with DEVise, invoke external tools on subsets of the data, selected through operations on a visual presentation, and incorporate the results of the external tools into the presentation.