|
SCIENTIFIC FRONTIER I:
Beyond the Genome
The human genome will be sequenced by the year 2005 and with it
the sequence of all human proteins, the proteome, will become
available (on the order of 105sequences). Moreover, other genomes are being sequenced including
those of several plants and bacteria. Protein sequences, however,
provide little insight into protein function unless there is sequence
homology to a protein of known function. The structure of a protein
in its active, folded state results from a delicate balance on
non-covalent interactions within the protein itself and between
the protein and its environment. The energies involved are small
and difficult to measure or compute. Consequently, experimental
determinations of the exact geometry of an active site, the disposition
of solvent molecules, the amplitudes of molecular motions bringing
functional groups into close proximity, etc., are all critically
important to understanding protein function. Thus, the challenge
for structurally characterizing the proteome is compounded by
the need for many high resolution structures and for detailed
descriptions of dynamics. This information will have a broad impact
on fundamental biology, medicine, and biotechnology. Structures
will be used to understand the molecular basis for disease, to
develop diagnostics or therapies, to understand the molecular
basis for the action of toxins and combating them, to drug design
and pesticide development, for enzyme engineering and bioremediation,
for use as industrial catalysts, sensors, and so on.
Over the next decade the already large ratio of the number of
known protein sequences to known three dimensional structures
will increase dramatically. Homology modeling will reduce the
size of the problem even though such an approach will not yield
high resolution models of structure or dynamics. It is estimated
that the number of distinct protein domain folds is between 400
and 8,000 depending, in part, on the definition of a distinct
fold. To generate a representative set of structures comprising
all of these folds, so that homology modeling can approximate
the remaining structures, will require between 3,000 and 10,000
structures to be determined. Currently, about 400 three-dimensional
structures are being determined annually by the sum of all techniques,
with only a fraction giving new folds. The rate of generation
of new structures must accelerate to even begin to keep up with
the sequence information being generated now.
NMR is playing an increasingly important role in characterizing
the structure and dynamics of proteins, complementing results
obtained by other techniques, most notably X-ray crystallography.
In 1995 about one quarter of the new structures came from NMR
studies yielding time-averaged representations of the molecules
in aqueous solution, often at physiological temperatures, conditions
which are arguably closer to that of the native functional state
than exists in crystal form.
In addition, NMR can provide information on the hydration of biomolecules
that is critical for maintaining the functional state. It can
also provide unique insights into the stabilities of different
regions of the protein and information on the nature of unstructured
states and conformational ensembles of inherently flexible structures.
Thus, NMR is ideally suited to the study of protein folding-considered
by many to be the major unsolved problem in structural biology
and perhaps holding the keys to protein structure prediction.
NMR has tremendous potential for providing a structural and dynamic
basis for understanding the way a protein sequence translates
to a folded conformation, but there are truly some aspects that
are beyond what is directly coded by the genome. A great many
of the proteins are post-translationally modified in some way-for
the purpose of signaling, for the purpose of modifying stability
or activity, or for the purpose of marking proteins for disposal.
These modifications including phosphorylation and glycosylation
can be monitored by NMR to achieve a functional understanding
of their role.
Even though membrane proteins represent 30% of the proteome, relatively
little is known about the structure of these proteins, because
of their resistance to crystallization. Solid-state NMR can yield
time-averaged structures of proteins in the fluid membrane environment,
the milieu which is critically important for the function of membrane
proteins. The precise distance and orientational constraints provided
by solid-state techniques yield high resolution structures for
this important class of proteins. In addition, with the recent
development of transverse optimized relaxation spectroscopy (TROSY)
it may be possible to use solution methods and GHz NMR fields
to solve the structures of membrane proteins reconstituted into
small vesicles. While the limits of this approach are not yet
defined, success in this arena would constitute a major breakthrough.
NMR is also uniquely suited for the determination of both the
rates and types of molecular motion in solution (where isotropic
global motions occur) and in the solid state (characterized by
anisotropic global motions). In the search for correlations between
dynamics and function, not only is it important to know the time
scale of the motion but other details, such as, the axis about
which the motion occurs, its amplitude, whether the process is
a diffusional or discontinuous process, and whether motions from
adjacent sites are correlated. NMR can provide this information
in exquisite detail so that unifying functional correlations can
be elucidated.
The limitations of solution NMR spectroscopy as a method for macromolecular
structure determination are steadily receding; the molecular weight
limit for complete structures is approaching 50,000 Dalton; concentrations
as low as 200 micromolar are analyzable; and the quality of NMR
structures is continuously being improved. Great enhancements
over this performance are just on the horizon. Residual dipolar
interactions in weakly aligned systems have provided the first
absolute (i.e. relative to the laboratory reference frame) structural
constraints for solution NMR. The structural dependence of isotropic
chemical shifts and other NMR observables can be calculated through
quantum chemical methods better today than ever before, leading
to improved structural constraints. Such improvements will lead
to increased precision and accuracy in the structure determined
by solution NMR. In addition to improving the structural constraints,
higher magnetic fields, such as those proposed for the NMRC, will
be able to optimally take advantage of the TROSY experiments for
improved resolution leading NMR to backbone structures of proteins
having molecular weights perhaps as high as 100,000 Dalton. New
approaches for selective isotope labeling by taking advantage
of metabolic pathways in the cell cultures producing protein samples,
and by novel chemical procedures, for instance, stereospecific
deuteration will improve resonance selection. Moreover, new methods
in molecular biology for splicing protein fragments are becoming
available so that labeled domains in a natural abundance background
can be achieved, and hence opportunities are arising to focus
structural efforts on specific domains in larger structures.
Higher fields, high temperature super-conducting probes, low temperature
coils and preamplifiers are all leading to major improvements
in sensitivity. Presently, protein concentrations of hundreds
of micromolar are needed for NMR experiments. This solubility
requirement is a limiting factor for many solution NMR studies
and is analogous to the crystallographer's problem of obtaining
suitable crystals. The developments mentioned above in combination
with larger sample volumes will reduce the protein concentrations
needed, to tens of micromolar and the problems of aggregation
that have led to "poorly behaved" samples for NMR characterization
will become less frequent.
With the implementation of an NMR Collaboratorium operating over
the Next- Generation Internet, most of the work of spectroscopic
assignment and data analysis could be carried out by individual
research groups at remote locations. The development of automated
assignment software, and indeed the development of complete turnkey
packages for both assignment and three-dimensional structure analysis
would facilitate access to the most advanced NMR technology by
relatively non-expert personnel. As has been the experience of
the synchrotron community (Appendix II), this increased access
will increase the rate at which protein structures are solved
using NMR techniques. The simultaneous development of high-resolution
protein structures, together with improved computational techniques
for structure refinement, can be expected to result in a new generation
of structures with much higher information content. These will
then form the basis for better calculations aimed at a more fundamental
understanding of how proteins function.
Thus, we foresee that NMR spectroscopy will play an increasingly
important role in protein structure determination, both from the
perspective of being a major contributor to the development of
the protein structure database and from the standpoint of providing
unique information about protein structure and dynamics leading
to understanding of both mechanistic and kinetic functional attributes.
Membrane protein structure and dynamics will be determined in
both micellar and lipid bilayer environments. The next-generation
GHz (and higher) NMR technology, when available to the broad research
community, will significantly increase the impact of NMR in structural
biology research beyond the genome. |