Consensus sequence for HMG1-like DNA binding domains
Author(s) -
David Kolodrubetz
Publication year - 1990
Publication title -
nucleic acids research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 9.008
H-Index - 537
eISSN - 1362-4954
pISSN - 0305-1048
DOI - 10.1093/nar/18.18.5565
Subject(s) - biology , dna , sequence (biology) , consensus sequence , computational biology , genetics , dna sequencing , base sequence , dna binding protein , dna binding site , evolutionary biology , gene , transcription factor , gene expression , promoter
Jantzen etal.(l) recently described the sequence of human hUBF, a sequence-specific DNA-binding protein required to activate ribosomal RNA gene transcription. The deduced protein sequence of hUBF contained three domains, called HMG Boxes 1, 2 and 3, each of which shared >35% identity with the DNA-binding domain in the middle of the chromatin-associated High Mobility Group protein 1 (HMG1) (2). Although HMG1 is not a sequencespecific DNA-binding protein, it does bind preferentially to single stranded DNA. Therefore, Jantzen et al. (1) hypothesized that the 'HMG box' represents a new class of DNA-binding domains. We would like to draw attention to three other proteins whose sequences consist entirely of HMG Boxes; this has allowed derivation of a consensus 'HMG Box' sequence. Two of these HMG Box proteins, NHP6A and NHP6B, are moderately abundant nuclear proteins in Saccharomyces cerevisiae (3). They are both —11-kDa and have >40% sequence identity with the middle segment of the 27-kDa HMG1 protein. Like HMG1, NHP6A and NHP6b bind preferentially to single-stranded DNA with no apparent sequence specificity (D.Kolodrubetz, unpublished result). The third protein, LG1, has been isolated and cloned from the unicellular organism Tetrahymena thermophila (4). LG1 is a small protein, -11.6 kDa, which shares 40% identity with the middle region of HMG1. LG1 is interesting because it is a chromatin-associated protein found only in the transcriptionally active macronucleus. The sequence analysis program ALIGN (3) was used to determine the optimal alignments between each pair of eight proteins or protein segments: NHP6A, NHP6B, LG1, the middle segment of HMG1, the amino terminus of HMG1 (which is homologous to its middle segment), and the three HMG boxes from hUBF (Fig. 1). A consensus sequence was derived. Despite the fact that a number of the pair-wise alignments between the eight polypeptides showed only 26% identity, the consensus sequence shared at least 37 % identity with each of them (an x does not count as an identity). When the consensus sequence was used to search the protein databases of the Protein Identification Resource, the proteins discussed here all had alignment scores at least twice as large as any other proteins; this demonstrates the validity of the consensus sequence. The requirement for any of the conserved amino acids in DNAbinding is unproven, but it is noteworthy that positions 11 and 34, which are 100% conserved, are both proline. HMG1 contains a high percentage of alpha-helix (2) and prolines are strong helix breakers most often found at turns of the polypeptide backbone. Thus it is interesting to speculate that these two prolines are important for the correct positioning of adjacent alpha-helices. The potential importance of a helix-turn-helix is reminiscent of motifs found in other DNA-binding proteins (5, 6), but the sequence of the HMG-Box consensus is not significantly related to the other motifs. Mutational analyses of the positions with conserved amino acids are necessary to prove their importance in DNA binding.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom