Basic Helix-Loop-Helix (bHLH) transcription factor proteins are a transcription factor protein found in almost all eukaryotes from plants to bacteria and humans, and play important roles in the regulation of conditions such as neurogenesis, myogenesis and cardiac development in animals [1].
It contains two alpha helical structures separated by a loop and a DNA binding site of 60 amino acids (basic) [1]. The word "foundation" refers to the chemistry of the motif, not the complexity of the structure. Usually, one helix is smaller and, due to the flexibility of the loop, it is folded over and packaged onto another helix, allowing dimerization. The larger helix contains DNA binding sites. bHLH transcription factor proteins E-box bind to a consensus sequence called CANNTG. The canonical E-box is CACGTG (palindromic), but some bHLH transcription factors, particularly those in the bHLH-PAS family, bind to related non-palindromic sequences similar to the E-box. bHLH TFs can homodimerize or heterodimerize with other bHLH TFs and form a wide variety of dimers, each with specific functions [2]. The helical structures allow interaction with other bHLH proteins. Nineteen amino acids in the domain are highly conserved in almost all living things, from yeast to humans. It has important roles in the development, cell proliferation and differentiation of vertebrates. However, not much information is known about their evolution and differentiation [1], [3].
"These clades are classified into five major groups based on their basic DNA-binding patterns (Atchley and Fitch 1997; Ledent and Vervoort 2001). Group A proteins bind to the hexanucleotide CAGCTG E-box and include proteins such as Lyl, Twist, dHand, Achaete-Scute, Atonal, MyoD, and E12. Group B proteins bind to the CACGTG E-box, also known as G-box in plants, and include Srebp, Tfe, Myc, Mad, Mxil, Cbf1, ESC, R, and G-box. Group C proteins, including Sim, Trh, and Ahr, have an uncharacteristic basic region and contain a pair of PAS repeats, which facilitates dimerization with other PAScontaining proteins. Group D proteins lack the basic DNA binding region and act as dominant negative regulators of other bHLH proteins and include Id and Emc. Recently an additional group E has been described that includes Gridlock, E(spl), Hey, and Hairy (Ledent and Vervoort 2001). This latter group contains proline or glycine residues within the basic region and shows a preference to bind the sequence CACGNG (Steidl et al. 2000; reviewed by Fisher and Caudy 1998)." [4].
Table 1. Helix-loop-helix transcription factors: Protein families, functions, and motifs [5]
Protein families |
Included proteins |
Groupings |
Function |
||
E- box |
Murre et al. |
LZ |
|||
AC-S |
ac, sc, ase, l’sc, mash, ash |
A |
II |
|
Neurogenesis; determination of neuronal precursors |
ATONAL |
atonal, lin-32, math1, neuroD |
A |
|
|
Neurogenesis |
DELILAH |
delilah |
A |
|
|
Differentiation of epidermal cells into muscle |
dHAND |
dhand, ehand, hxt, hed |
A |
|
|
Rardiac morphogenesis; trophoblast cell development |
E12/Da |
e12, e47, itf, pan, G12, me2, da |
A |
I |
|
Neurogenesis, sex determination; regulation of myogenesis |
HEN |
hen, helhel |
A |
|
|
Neurogenesis |
LYL |
lyl, scl, nscl, tal |
A |
|
|
Hematopoietic proliferation and differentiation |
MYOD |
myod1, myogenin, myf5, myf6 |
A |
II |
|
Myogenesis |
NEX |
nex-1, rat4 |
A |
|
|
Neurogenesis |
TWIST |
twist, ec2, paraxis, scleraxis, dermo |
A |
|
|
Specification of mesoderm lineages |
ARNT |
arnt |
B |
|
|
Regulation of aryl hydrocarbon receptor activity |
CBF |
cbf-1 |
B |
|
|
Rentromeric binding and chromosomal segregation |
ESC |
esc1 |
B |
|
|
Sexual differentiation in yeast |
G-Box |
G-box |
B |
|
|
|
HAIRY |
hlhm, hairy, hes, deadpan, e(spl) |
B |
VI |
|
Neurogenesis; segmentation |
MAD |
mad, mxi1 |
B |
IV |
Yes |
Regulation of cell proliferation |
MYC |
c-myc, n-myc, l-myc, max |
B |
III |
Yes |
Cell proliferation, differentiation; oncogenesis |
NO |
ino2, ino4 |
B |
|
|
Phospholipid synthesis |
PHO4 |
pho4, nuc1 |
B |
|
|
Phosphate regulation in yeast |
R |
r, delila |
B |
|
|
Regulation of anthcynanin pigmentation |
SREBP |
srebp, add1, hlh106 |
B |
|
Yes |
Sterol synthesis; adipocyte determination |
TFE |
tfe3, tfeb, mi |
B |
III |
Yes |
Activates transcription in immunoglobulin heavy chain enhancer |
USF |
usf, spf1, namalwa |
B |
|
Yes |
Upstream stimulation factor; insulin enhancer |
SIM |
sim, trh, ah |
C |
|
|
Central nervous system midline lineage regulation; tracheal cell induction |
ID |
id, heira, emc, hlh462 |
D |
V |
|
Negative inhibition of DNA binding; myogenesis, neurogenesis |
CENPBR |
cenpbr |
? |
|
|
Centromeric binding protein |
AP-4 |
ap-4 |
? |
|
Yes (2) |
Enhance viral and cellular gene activation |
References
[1]: S. Jones, "An overview of the basic helix-loop-helix proteins," Genome Biology, vol. 5, no. 6, p. 226-226, 2004.
[2]: https://pfam.xfam.org/family/PF00010
[3]: V. Ledent and M. Vervoort, "The basic helix-loop-helix protein family: comparative genomics and phylogenetic analysis," Genome Res., vol. 11, no. 5, p. 754-770, 2001.
[4]: M. J. Buck and W. R. Atchley, "Phylogenetic Analysis of Plant Basic Helix-Loop-Helix Proteins," Journal of Molecular Evolution, vol. 56, no. 6, p. 742-750, 2003.
[5]: W. R. Atchley and W. M. Fitch, "A natural classification of the basic helix-loop-helix class of transcription factors," Proceedings of the National Academy of Sciences of the United States of America, vol. 94, no. 10, p. 5172-5176, 1997.