|
Appendix I:
Interfacing NMR Facilities, e.g. the NMR Facility at Madison
A promising approach to these problems is to harness recent developments
in information technology to develop specific tools designed to
facilitate communication and provide access to prior information
and distributed expertise. One such model is being pursued at
the University of Wisconsin in the NIH NCRR funded National Magnetic
Resonance Facility at Madison. This collaboration grew from contacts
between the group in Biochemistry, who had been setting up a data
bank for biomolecular NMR, and groups in the Computer Sciences
Department, who had independent research projects underway in
the areas of distributed computing, scientific experiment management,
and visual exploration of very large sets of data and images.
Over the past decade these research activities relevant to the
NMR collaboratorium have been pursued: (1) The ZOO, develops computer
science tools to enable scientists and engineers to manage experimental
data from the desktop. (2) The DEVise Project, develops tools
for the visual exploration of tabular and multimedia information.
(3) The CONDOR Project, develops high throughput computing on
large collections of distributively owned computing resources.
The basic approach being taken at Wisconsin is to develop a Customized
Desktop Experiment Management System (CDEMS), which utilizes a
DEVise viewer, to provide a flexible personal graphical view of
the laboratory environment (schema). In early discussions it became
apparent that the information technology tools were of immediate
and long term interest to the NMR facility and associated data
bank. On the other side, the complexity of NMR experiments, the
computational intensity of NMR data processing and analysis, and
the interrelationships and visual properties of NMR and structural
biological data presented challenges to the current state of computer
technology. The combined groups from Biochemistry and Computer
Sciences teamed up on two projects: (1) The NMR-ZOO Project is
equipping NMRFAM with a desktop experiment management system based
on the "ZOO" tools. (2) The BMRB Project operates BioMagResBank,
a publicly accessible repository for biomolecular NMR data.
The concept behind the NMR-ZOO project is that an NMR-specific
customized desktop experiment management system (CDEMS) can be
used to manage the full life cycle of an NMR experiment (experiment
design, data collection, data analysis, interpretation, and report
preparation). A laboratory schema represents useful objects in
the environment of each type of investigator: spectrometers, NMR
experiments, software, relevant databases, published reports,
etc. Specialized experiment schemas are used in launching software
applications, including those associated with NMR data acquisition.
Each NMR lab staff member and user will have their own personal
schema. New users will be provided with an initial view (schema)
with commonly used objects taken from the laboratory schema and
objects for managing the data they generate in the facility. For
novice users, this can be a simplified schema tailored to the
specific experiments to be performed. Users themselves delete
or create objects (of their own design or from the laboratory
schema) to accommodate the tasks to be performed. Users can customize
graphical displays to fit their own tastes and priorities. The
NMR- ZOO tools sit on top of a commercial relational database
management system (RDBMS). The RDBMS keeps a history of all operations
and catalogues the locations of primary data and intermediate
results. Particular operations, such as a chained series of calculations
with associated parameters, can be defined as objects so that
they can be launched with simple commands. In order to make this
effective, the metadatabase (the data underlying the database)
needs to be expanded to include data dictionaries that describe
objects of interest to different classes of participants. In addition,
it will be possible to implement validation tools that provide
running checks for inconsistencies or gaps in the data or can
carry out simulations that may suggest additional experiments
to be performed.
In implementing the NMR-ZOO, the starting point has been to begin
bringing key software packages used at NMRFAM under operation
of the CDEMS. The goal is to have a complete set of software packages
operating under the CDEMS that can be used to take NMR spectral
data from the spectrometers through the full process needed to
determine a three-dimensional structure. The Wisconsin team have
carried out successful pilot studies that indicate that it will
be possible for the CDEMS to control data acquisition. ZOO schemas
have been developed to model the input and output of NMRPipe,
a package for NMR data processing developed at the NIH, and the
package now is running under the CDEMS. Software has been developed
at NMRFAM to convert the output of NMRPipe to the submatrix format
used by Chifit and Felix (Molecular Simulations). The next package
to be incorporated into CDEMS will be Chifit, the software package
developed at NMRFAM that is used for automated extraction of chemical
shifts, line widths, peak intensities, and phases from 1D - 4D
NMR data sets. Automated NMR peak assignment tools (CONTRAST developed
at NMRFAM and/or other available packages) and the NOE peak filtering
and stereospecific assignment suite APIS developed at NMRFAM will
be incorporated into the CDEMS. A ZOO schema has been constructed
to model the use of the structure determination package X-PLOR
and will be used to bring this package (or more likely its successor,
CNS) into the CDEMS family. The Wisconsin team recently has devised
a way to operate the X-PLOR package under CONDOR.
In order for the NMR-ZOO system to become fully operational as
a vehicle for active collaborations, another shell needs to be
added to fully support communications and collaborative data visualization.
A plan for an Integrated Collaboratory Testbed for Biomolecular
NMR has been developed that addresses needs in collaborative research
for: (1) data visualization for collaborative analysis and development
of the visual language of biomolecular NMR, (2) organization and
capture of protocols and parameters involved in the acquisition,
processing, and analysis of data associated with collaborative
projects, and (3) tools for facilitating results checking in ongoing
structural NMR investigations. The tools to be developed are designed
to be flexible and extensible by others so that they can add their
own software and databases to the network. The technology will
be developed through design- testing-feedback-refinement cycles,
with testing by NMRFAM staff members and evaluation in real-life
scientific collaborations among NMRFAM staff members and external
scientists. It was suggested that these tools be made available
to the larger NMR Collaboratorium. Wider acceptance of tools of
this kind, particularly if they are interoperable, should stimulate
their further development and progressively enlarge the collaborative
network into related scientific areas.
The advantages of the Wisconsin approach promise to be:
- More efficient use of available NMR resources through organization
and automation
- Faster and more thorough analysis with better evaluation of errors
- Documentation of all assumptions and procedures used
- Procedures organized for training of novice spectroscopists
- Remote access to relevant information decreases the necessity
to travel to the data collection site
- Less time required to plan and carry out experiments
- Clear representation of the roles and contributions of each collaborator
and the status of the project at any point in time.
- Automated checking of calibration, pulse sequence, spectrometer
settings used in data collection
- Pipelined data analysis with jobs farmed out to most appropriate
computers
Collaborative interactions are facilitated and recorded in the
NMR-ZOO environment. DEVise complements these capabilities by
allowing scientists to share visual presentations, and to collaboratively
analyze data by interacting with such presentations. In fact,
DEVise can also be used to examine the history of collaborative
steps, as recorded by the NMR/ZOO component.
The Wisconsin approach to supporting collaborative visual analysis
exploits the fact that a data-intensive visualization in DEVise
is a "presentation template'' (containing data mappings and visualization
details) applied to a collection of datasets. This careful separation
has far-reaching consequences:
- The size of a presentation template is small and independent of
the size of the data being presented.
- Templates can be applied to other similar datasets.
- Templates can be incorporated into other templates easily.
- Templates can be shared between users.
In NMR spectroscopy, as in many other scientific domains, a variety
of analysis and visualization techniques have been developed for
important kinds of datasets. Scientists often need to use a combination
of tools based on different techniques. For example, a collaborator
may view multidimensional NMR data with Felix (Molecular Simulations),
protein sequences with a web browser, 3D structures with Midas+
or MolScript, and output from structure determinations by PPROCHECK-NMR
and AQUA.
The goal being pursued at Wisconsin is to integrate the use of
multiple visualization tools, developed by them and by others,
through a single uniform interface based on DEVise. DEVise already
complements domain-specific viewers by enabling specific data
objects or data subsets to be viewed using appropriate external
viewers, within the context of a larger picture provided by a
DEVise visualization. The Wisconsin group plans to build upon
this extensibility by making it easy to: register external viewing
and analysis packages with DEVise, invoke external tools on subsets
of the data, selected through operations on a visual presentation,
and incorporate the results of the external tools into the presentation. |