SCIENTIFIC FRONTIER III:
Gene Regulation

The process of converting a gene in a chromosome to a functional protein is carried out by the tremendously sophisticated molecular machinery of the cell (see Figure 2). Since the levels of the protein product are critical to proper cell function, the overall throughput for each gene is regulated, and this regulation occurs at all levels from accessibility of the gene in the chromosome through release of the mature protein and its ultimate degradation.

The expression of specific genes at various times in the life cycle of the cell leads to differentiation, and also provides the possibility for a cell to respond to environmental stresses. However, loss of regulation for even a single gene product can also lead to a metabolic disorder. For example, most of the primary causes of cancer are believed to involve the breakdown of normal regulatory steps. For these reasons, the understanding of gene regulation has become in the past decade one of the focal points of molecular biology. Although tremendous progress has been made, much remains to be learned before the full potential can be achieved for developing methods to regain control of aberrant processes, to treat diseases, and to manipulate cells for new function for biotechnology.

As the molecular players in the complex and interconnected regulatory pathways become identified, it has been natural to ask how they act at the molecular level. Here too, despite the wonderful progress resulting from the synergistic efforts of molecular biology, X-ray crystallography and NMR spectroscopy, much more remains to be done. We now have learned how some of the first defined DNA binding motifs, such as the helix-turn-helix and zinc fingers, provide an interface to DNA for sequence specific recognition. At the same time, came the finding that this process is complex-the facts that some protein side chains in the interface are disordered and that water molecules act as bridges in recognition were not anticipated and are still not well understood. Elucidation of how proteins modulate transcriptional activity has led to additional surprises: it has been found that DNA bending and other distortions frequently occur in response to protein binding, and that large assemblies of proteins form and interact with the primary machine, the polymerase.

Some regulatory proteins enhance the rate of complex formation, while others cause the DNA at the promoter to shift from the closed state to open, whence transcription can actually begin. Many of the proteins involved have multiple domains, some to interact with DNA, others to regulate their own assembly, and yet others to contact the additional proteins involved. It has been found that these domains are often flexibly linked (a scientific contribution largely credited to NMR which is sensitive to local dynamics in flexible regions), which makes elucidation of their structures more difficult. Indeed some domains have been found to be completely unfolded random chains in the absence of their interacting partners (either DNA or protein) while upon assembly these domains become ordered. Only the first glimpses of the myriad protein-protein interactions required for regulation have been characterized.

Once the primary transcript has been made it must be processed to give the mature message-spliced by huge protein-RNA assemblies to remove noncoding regions with alternate splicing sometimes occurring to control protein function. Proteins recognize specific messages to act as genetic switches analogous to the repressors on genes, but little is yet known about how the correct messenger RNAs are recognized. Proteins add caps and poly-A sequences to messages preparing them for translation, and controlling the rate at which they are degraded. Other proteins change availability of key structures for translation of the message, yet another process subject to regulation by the cell. Though many proteins involved in these steps have been identified, very few have been structurally characterized.

Proteins once synthesized must fold, some needing the help of chaperonins to prevent aggregation. They are transported in the cell, and processed to add anchors, active site metals, or carbohydrates- all regulated processes. Some are carried through the cell and enter the nucleus, others taken to the surface are released to the surroundings.

All of these processes of regulation have in common the need for specific recognition of one molecule by another, and often the ability to then execute a chemical or mechanical function. The molecules involved have proven to be challenging as structural targets. A significant number of individual domain structures have been solved, but the real action comes with the interaction of these domains resulting in assemblies growing to a size at the upper end of what has been feasible to study by solution NMR. Higher fields and new probe technology have improved sensitivity, and widescale use of uniform stable isotope labeling with 15 N, 13 C and 2 H, and multidimensional spectroscopy have steadily expanded the molecular weight range that could be studied. A remarkable combination of developments occurring now stands to significantly extend the range so that many more of these complexes can be studied. The use of selective isotope labeling provides an approach to reduce spectral complexity in large complexes, while retaining key probes for intermolecular interactions. New methods for chemical synthesis, in vitro translation and metabolic labeling of proteins will provide many more options than previously available. Methods for enhancing and measuring dipolar couplings will provide new restraints to enhance calculation of accurate structures, and solve some of the problems that have plagued nucleic acid structure determinations. Higher magnetic fields will continue to improve spectral resolution and sensitivity which in spite of steady gains has always been a limiting factor. Higher fields will now play a further special role in transverse optimized relaxation spectroscopy (TROSY), which can yield sharp resonances even for very large complexes of proteins or nucleic acids. Now the process of solving structures of important domains and watching their assembly into relevant complexes can be realized to a far greater extent than ever before.

The processes which occur related to gene regulation should not be viewed as isolated or unique in the cell. Many other key biological functions, including recognition and repair of the genome (which in fact couples to gene regulation), replication, recombination, as well as many signaling processes rely on formation of multicomponent complexes with the same issues for NMR analysis. The ability to deal with larger complexes by NMR will facilitate work in all of these other areas.22