Ancient Airs: A Musical Reading of Protein
Structure
We're going to do something like a close reading on a small group of proteins that bind calcium. Proteins are linear texts composed of an alphabet of 20 different amino acids. They are also like self-folding origami, and assume a specific configuration in three-dimensional space. This form expresses the biological meaning of a protein's text, as the juxtaposition of words and phrases produces meaning in a poem. Proteins work by touch -- their forms invite interaction with other molecules, perhaps to catalyze a chemical reaction or to recognize a hormone. The calcium binding proteins may be activated or inactivated when a calcium ion attaches to them.
Listening to proteins is a very direct way to explore their patterns. Like music, proteins are composed of strings of phrases and themes that may repeat themselves both within a protein and across time. New protein texts are written over old ones. Proteins change over time and an ancestral protein can generate a family of descendants in which changes in the original text can result in a subtle refolding into a form that serves different functions than the ancestral form. We'll look at four proteins that serve two different functions in three species: humans, European eels, and sea anemones. The two proteins we'll listen to are Calmodulin and Troponin C, and to compare them we'll use a common musical algorithm.
Why use music to analyze protein structure? There are two aspects of protein structure that translate into music very well.
- Individual musical tones can identify specific amino acids. An important characteristic of amino acids is their water solubility. When proteins fold into their functional shape, the less soluble amino acids move to the interior while the more soluble ones are exposed on the surface. We can convert "solubility scales" to musical scales to hear how the different types of amino acids are distributed within the protein.
- Click here to see the amino acid sequences of Calmodulin and Troponin C.
- Music can represent folding patterns. When proteins fold, they do so in combinations of several basic folding patterns. One of these is the alpha helix -- a structure that looks like a slinky, and which is the dominant folding pattern seen in Calmodulin and Troponin. We can identify the parts of the protein that form helices by setting them in a specific instrumental voice.
- Click here to see the helices in Calmodulin and Troponin C.
Calmodulin is truly an "ancient air." It is found in all eukaryotes, i.e. organisms that are not bacteria. This makes it at least a billion years old. It mediates the activation of many cellular processes. As old as it is, it is a remarkably conserved protein, that is, it is very similar in different species, and by inference, not much different from its ancestral form. Click here to read more about human calmodulin and troponin genes.
Troponin is a muscle protein. We will be listening to Troponin CS, the form found in fast-contracting skeletal muscle fibers. Like calmodulin, this form has four active calcium binding sites.
In both proteins, each of the four calcium binding sites is a sequence of 12 amino acids, flanked by two helical sections of about a dozen amino acids on each side. In the diagram to the left, the green ball represents calcium in the loop formed by a calcium binding site.
Well begin our musical exploration with the calcium binding sites of Calmodulin. The scale for most of the examples in this presentation is a whole-tone scale, with low-solubility amino acids represented by the lower pitches. Using the whole tone scale means that there is no privileged key center.
Three regions of the protein will be represented by different instruments. The calcium binding sites are played by harp, the flanking alpha helical regions are played by a string ensemble and the introductory and linking regions are played by flute. The second helix of each calcium binding domain begins within calcium binding site, so youll hear a brief overlap of harp and strings. You will hear each of the four 12-note calcium binding sequences separately, and then the four domains that include the calcium binding sites together with their left and right flanking helical regions.
The Calcium-Binding Motifs of Calmodulin [2:30]
You can hear the similarities among the four calcium binding domains. Now listen to the four calcium sites played together, first one site, then the first and second sites together. The third and fourth sites are then added in sequence. Snce the pitches are assigned according to solubility, amino acids with similar properties will have pitches close together, and a substitution of one amino acid for a similar one will sound as a close interval. Amino acids with greatly different solubilities will sound further apart.
Calmodulin's Four Calcium Binding Sites [1:00]
At about midpoint of the sequence, the protein breaks into two similar halves, indicating that the sequence may be a product of gene duplication, or possibly even quadruplication from an original calcium binding unit of 35-40 amino acids. To compare the similarities between the two halves of the protein, well hear them together as a duet. The two halves have been aligned so that the two sets of calcium sites play simultaneously.
Calmodulin Half Molecule Comparison: [1:00]
For an ancient protein, calmodulin has changed little from species to species. To demonstrate the amount of conservation in the protein, you will hear sequences of two widely separated species: human and sea anemone these are the most distantly related animals I could find in the protein data bases. These two species come from lineages that had separated from each other by the time of the Cambrian explosion 550 million years ago, and perhaps even earlier. The sequence includes the four calcium sites and the three sequences that link them, and will play in unison except at points where there are different amino acids in the sequence of the two species. How many differences do you hear?
Human/Sea Anemone Calmodulin [1:35]
The calmodulins have been held tightly to their original sequence with little variance between even distantly related species. However, other members of the Calmodulin gene family are less conserved. Troponin C is thought to be a molecular descendant of the same ancestral protein that gave rise to calmodulin, but its function is different from calmodulins, and one protein will not substitute for another. Look at the structures and sequences of Calmodulin and Troponin C again to compare the two.
The overall structure of Troponin C is exactly like that of Calmodulin: 4 Ca++ binding sites, each flanked by two regions of alpha helix. Troponin C is slightly larger, with 159 amino acids, as compared with 148 for Calmodulin.
Troponin CS is much less conservative than Calmodulin is. Although the Troponin CS of humans is similar to that of other mammals, there is more divergence between the human protein and that of other vertebrates. Since the basic structural elements of Troponin and Calmodulin are similar well go straight to a species comparison of two Troponin Cs to demonstrate the extent of conservation. The two species well hear are human and european eel, members of lineages separated for at least 450 million years. Again you will hear the four calcium sites in the harp and the three helical segments in strings. How many differences do you hear and how are they distributed?
Human/European Eel Troponins [1:32]
We've heard two Calmodulins, which were very similar, and two Troponin Cs, which were much more different. Now we want to ask a different evolutionary question: how different is troponin from calmodulin. The gene structure of the two is virtually identical, so the two molecules appear to have had a common ancestor. Yet they have different functions. Next we'll compare Calmodulin and Troponin C, both from humans.
Well start with a comparison of just the four calcium binding sites.
Comparison of Troponin and Calmodulin Calcium Sites [1:00]
Now well hear the complete proteins. Troponin is a slightly bigger protein than calmodulin. To correct for the length difference, the two sequences have been aligned relative to the 4 calcium sites. Troponin has 7 additional amino acids at the beginning of the chain that are not represented in the music, 3 additional amino acids at the end of the middle linking sequence, which you will hear, and 1 additional amino acid at the end, which you will also hear. Again listen to the distribution of the differences between the two sequences.
Human Troponin/Calmodulin [1:37]
Now, since troponin is more variable in general than calmodulin is, how can we sort out differences due to species variation from differences between the proteins? These proteins would have diverged from each other before the evolution of the vertebrates about 500 MYA, so differences between vertebrate species would be more recently acquired than differences between the proteins.
One thing we can do is to reset the musical scale to accommodate substitutions between amino acids with similar solubilities amino acids with similar properties now get the same pitch. So the next thing well hear will be the two troponins again, but this time using a reduced scale of 9 tones (instead of 20), and set in a pentatonic scale. Note the differences and their distribution.
Human/Eel Troponins: reduced scale [1:45]
Differences between the same protein in two different species are often substitutions using amino acids whose properties are similar. However, we would expect to hear more divergent substitutions between proteins whose function is different, since a difference in function would require at least some refolding in the protein. Now well listen again to the Troponin/Calmodulin differences, this time set in the reduced scale. Since were listening to an evolving protein, Ive "decorated" this last example by the addition of a high repeating tone as a sort of clock tick.
Human Troponin/Calmodulin: reduced scale [1:49]
You probably noticed more amino acid differences in the second half of the protein than in the first half. However, the human/eel troponin differences are also concentrated in the second half of the protein by a factor of nearly 4:1. Whereas the two troponins differ predominantly in the second half of the sequence, troponin and calmodulin differ from each other in both halves. Another thing we can hear in this combination is a ghost of the common ancestor: patterns that we might not hear in the two individual molecules, but which emerge when both are played together. Listen again for this "ghost" sequence.
Repeat: Human Troponin/Calmodulin: reduced scale [1:49]
Much of the second half of these proteins the more variable half -- is encoded in the penultimate exon of their respective genes. If the four-unit Ca++ binding protein is descended from an early two-unit package, then one of the copies seems to be under less constraint than the other.
A common mechanism of evolutionary change is to fiddle with the extra copies of multiple-copy genes. In the human genome, the Calmodulin/TroponinC combination is represented by five genes. Three of these encode identical calmodulin sequences (Chromosomal locations: 14q, 19q, 2p). The other two encode the two different Troponins: Troponin CS and CC(Chromosomal locations: CS-20q, CC-3p). The five genes are all on different chromosomes. One of the two proteins is also under less evolutionary constraint than the other. There seem to be lots of ways you can be Troponin, but basically one way to be Calmodulin.
Finally, just to celebrate the proteins of the Calmodulin family, here is a piece based on another member of the family: the green fluorescent jellyfish protein Aequorin. This protein became famous recently when it was transferred to a monkey. The piece begins and ends with renditions in several voices of one of its Ca++ binding sites. The full sequence begins with the key change and is repeated twice. Calmodulin makes a brief appearance during the second reiteration of the Aequorin sequence.
All musical sequences created using SoftStep and BioSon software from Algorithmic Arts.