Furthermore, MLH might begin to address the mystery raised by Charles Darwin (1871, 878): ""As neither the enjoyment nor the capacity of producing musical notes are faculties of the least direct use to man in reference to his ordinary habits of life, they must be ranked amongst the most mysterious with which he is endowed."" MLH suggests that language and music ""could have arisen due to the occurrence of an ancestral stage that was neither linguistic nor musical but that embodied the shared features of modern-day music and language"" (Brown 2000, 277).
I am not proposing any worked-out theory of music cognition in what follows. The stage has not arrived for developing MLH to anything like an empirically significant theory. My hope is that, once the proposal is seen to be empirically plausible and relatively immune from conceptual objections, theory building-by people more directly engaged in linguistics and music theory-in this obscure but intrinsically interesting area might take a definite course, in contrast to the rather partial and disjoint eorts currently in view, as we will presently see.
In fact, to antic.i.p.ate, it might be possible to entertain an even stronger hypothesis in which the computational principles already discovered in the language case may be used to explain aspects of musical competence.
If that makes preliminary sense, then we may be spared the trouble of discovering at least the basic form of some of the fundamental principles of musical computation independently and (entirely) afresh.
6.1.1.
Evidence There is no conclusive evidence that supports some version of MLH.
Nonetheless, over the years, some evidence has acc.u.mulated from a variety of largely independent directions. Even if individual pieces of evidence do not directly support MLH, the c.u.mulative body of evidence does seem to encourage further theoretical inquiry on MLH. The most interesting pieces of evidence may be listed as follows.
Language and Music 191.
Prima facie, it seems that music, like human language, is a completely universal, species-specific capacity. Infants a few months old can distinguish between clause boundaries of languages (Karmilo-Smith 1992, 37) as well as between consonant and dissonant notes in a melody (Sch.e.l.lenberg and Trehub 1996; Trehub 2003; Hauser and McDormott 2003).
Furthermore, there is striking convergence between language and music in stages of maturation, including critical periods (Fitch, Hauser, and Chomsky 2005 and references). Like language, every culture develops at least vocal music even in adverse environmental conditions independently of race, gender, cultural achievement, and the like. Sami people of Lap-land developed extremely complex vocal music although they failed to develop instrumental music because of scarcity of material in such lat.i.tudes (Krumhansl et al. 2000).
Archaeologists have discovered flutes made from animal bones by Neanderthals living in Eastern Europe (Kunej and Turk 2000). Although the interpretation of the discovery is controversial (Fitch 2006), it is instructive to look at some of its implications for the issue in hand (see also Mithen 2005). The discovery consists of a perforated thigh bone of a young cave bear; the bone was buried at layer 8 at Divje babe I cave site in Slovenia. The authors report that, previous to this find, the earliest known bone flute was dated by radiocarbon method at 36,000 years, and was a.s.signed to the Upper Paleolithic. Layer 8 has been radiocarbon dated to an interval from 43,000 to 45,000 years, and is a.s.signed to the Middle Paleolithic. The authors claim that the holes in the bone flute could not have been made by an animal: strong evidence is given that the holes were made by humans with stone tools designed for that purpose. The authors note that the spread of ""technology"" in those days took a long time, perhaps tens of thousands of years, not surprisingly.
They also note that a more common method of making flutes could have been the use of hollowed bark, which is much easier to handle. Being less durable, all wood flutes, if any, would have been lost by now. Use of bones with special tools could have been a later method (Fitch 2006), perhaps to obtain a sharper, more durable tone; we can only guess. Further, it is a plausible conjecture that, given the trouble needed to fashion appropriate instruments, instrumental music could have emerged much later than vocal music. This would place emergence of music even further back in time.
Estimates of the emergence of language vary widely, some tracing it to as far back as 100,000 years. However, according to one respectable estimate (Holden 1998), the faculty of language emerged about 40,000 years 192
Chapter 6.
ago. It is most interesting that recent estimates of the sudden increase in human brain size is also traced to about 100,000 years (Striedter 2006, cited in Chomsky 2007, 19).
Next, outside of language, music is the only (other) cognitive system for which a fairly detailed generative theory has been proposed on the model of research in generative grammar: generative theory of tonal music, GTTM (Lerdahl and Jackendo 1983). Although the theory is focused on Western tonal music, the structures so isolated suggest an underlying basis to musical experience that looks invariant across a vast range of music in that genre. Some parts of the theory were subsequently verified in terms of actual audience response (Jackendo 1992; Krumhansl 1995). Subsequent theoretical and experimental work has now yielded a theory some of whose predictions can be quantified (Lerdahl 1996, 2001).
Notice that I am using the emergence of GTTM itself as a possible evidence for MLH, without supposing that, as a theory, GTTM ill.u.s.trates MLH. The idea is that we get some glimpse of the general properties of an object from the sort of theoretical moves it responds to (see section 1.1). In this case, the object appears to be formal in character.
Further, experimental research on various aspects of music cognition, across a wide spectrum of musical traditions and cultures, are beginning to provide evidence for the ""parametric"" nature of musical organization within a narrow range. Researchers have identified ""a core set of psychological principles underlying melody formation whose relative weights appear to dier across musical styles"" (Krumhansl et al. 2000, 1314). For example, in the paper just cited, it was reported that in studies on melodic expectancy and tonal hierarchies, considerable agreement was found between listeners from the music"s cultural context or from outside it. Thus, ""the inexperienced listeners were able to adapt quite rapidly to dierent musical systems"" (p. 14).
Finally, there is some evidence that ""tonal syntax is closely a.n.a.logous to the part of language we call grammar,"" as Carol Krumhansl interprets a recent study on Broca"s area of the brain (Maess et al. 2001; also Patel 2003). In an attempt ""to localize the neural substrates that process music-syntactic incongruities,"" Maess and his colleagues studied brain processes ""elicited by harmonically inappropriate chords occurring within a major-minor tonal context."" They found that such chords elicited an early eect ""localized in Broca"s area and its right-hemisphere h.o.m.ologue, areas involved in syntactic a.n.a.lysis during auditory language comprehension.""
This suggests ""that these areas are also responsible for an a.n.a.lysis of in- Language and Music 193.
coming harmonic sequences, indicating that these regions process syntactic information that is less language-specific than previously believed.""
Turning to some other properties of the language system, it is plausible to a.s.sume that musical systems satisfy some of the well-known general properties of language such as unboundedness and weak external control (Chomsky 2002). Apparently, every musical system consists of a small set of notes with a universal core. Informally speaking, these notes are com-piled over and over again to generate progressively complex objects such as chords, phrases, pa.s.sages, movements, and so on. The generation of complex objects is unbounded and countable (Brown 2000, 273; Fitch 2006). I return to the topic for extensive discussion.
The entire system is totally ""internal"" in the sense that there seems to be little external control on the form and development of the relevant cognitive structures. As with language, music leads to music, that is, the primary tonal data that triggers o the musical system is a product of the musical system itself. As far as I can tell, children do not develop musical competence by listening to birds even if some birds are viewed as musical.
As noted, the system is at once universal and ""parametric"" in a general way. As Chomsky notes, these are pretty rare and surprising properties of cognitive systems, including most human systems. So, if they are available (only) for language, music and a small cla.s.s of other systems, some unifying explanation is called for.
Additionally, introspective evidence, for what it is worth, suggests that, unlike vision (I return), simultaneous access to musical and linguistic systems is at least dicult.3 At one extreme, it is nearly impossible to both sing and listen to someone at the same time. However, just this piece of evidence is not persuasive since the diculty could be arising not due to conflict in the computational system, but because of conflict in the same (auditory) channel of information. Turning thus to two dierent channels, it is also extremely dicult to sing and read something with equal eciency, unless it is the score we are singing from. It continues to be dicult to listen to music while reading something; we are unlikely to bring a book to a concert. It is also quite dicult to listen to music while thinking (hard) about something else. Introspective evidence also suggests that listening to music does not aect purely ""visual reading"" such as attending to shapes of letters, s.p.a.cing, and so on, as in proofreading.
Problems begin when we attend to syntactic properties such as agreement and clause boundary; the music is simply ""switched o "" at that point.
Needless to say, the diculties compound with increase in the complexity 194
Chapter 6.
of the piece of music and the object of reading/thinking at issue. This last point suggests that, other things being equal, the systems compete for the same computational resources.4 6.1.2.
What the Evidence Means None of the pieces of evidence just cited directly support MLH, although their salience with respect to MLH varies. The fact that both linguistic and musical syntactic abilities begin to show up in early infancy is not decisive since many other abilities, such as the ability to recognize faces, to detect and express emotions, and so on, may also be showing up at about the same stage. It will be dicult to hold that the ability to recognize faces influenced the emergence of language. Similarly, the sudden increase in brain size might have led to the emergence of a variety of other abilities-for example, advanced use of digits for tool-making. On the basis of current evidence, there seems to be little connection with tool-making and the ability to hum.
Although the emergence of a generative theory of music (GTTM) soon after that of language is interesting, it proves little by itself. As Lerdahl and Jackendo (1983) explicitly observe, their work shows little similarity between language and music with respect to core linguistic systems such as phonology, syntax, and semantics; parallels, if any, are to be found in areas such as rhythmic and prosodic structures, which are generally viewed as not restricted to language, or even to humans in any case (Ramus et al. 2000; Fitch 2006). Lerdahl and Jackendo suggest that the parallels with linguistic theory are to be found more in the methodology and style of inquiry, and, according to Jackendo (1992), the ""style""
extends to domains such as vision and social cognition, which fall beyond the scope of MLH. It follows that, if MLH is to hold, GTTM is not likely to satisfy it.
In my opinion, the most compelling evidence is the introspective one that seems to filter out other cognitive systems except the hominid set under consideration. But then, as emphasized, the evidence is just introspective, and people"s judgments are likely to vary. We must learn more from controlled experimentation on musical ability on various grades of autism, children with specific language impairment, and so on, to find out exactly which musical capacity, if any, remains unimpaired alongwith varieties of linguistic impairment, and vice versa. In any case, it is hard to see what exactly to look for in the suggested cases unless we already have something like a theoretical framework in hand.
Language and Music 195.
The point applies, perhaps more clearly, to neural evidence (Mukherji 1990). For example, the neural evidence regarding Broca"s area cited above could just mean that we have been wrong in identifying the resources of this area too narrowly. Also, following Patel 2003, it is unclear to me whether the neural evidence regarding Broca"s area directly explains musical competence or whether it brings out certain general patterns of acoustic processing shared by language and music. Again, it is hard to see how to distinguish between these alternative interpretations of neural data definitively without the guidance of a theory.
Finally, the suggestion that music, like language, is unbounded and is weakly subjected to external control is more like a proposal than a statement of fact; it depends crucially on what we mean by ""unboundedness""
and ""weak control"" and which properties are in fact jointly satisfied. I will thus spend considerable time in getting clear about the properties of unboundedness and weak control.
MLH does look more promising with respect to the c.u.mulative body of evidence cited above. When we consider the full basket of evidence, individual pieces might be viewed as reinforcing each other. Thus, neural evidence that the Broca"s area could be processing both musical and linguistic syntactic information supports the introspective evidence concerning diculties of simultaneous access. These two evidences together support the evidence of surprising matching of critical periods in early infancy within the narrow domain of phrase boundaries for both music and language. The growing body of evidence then aligns with the evolutionary evidence regarding almost simultaneous emergence of the two systems, perhaps due to the increase in brain size. Following this direction, to me the most promising aspect of the noted evidence for music is that, except for arithmetic, logic, and the like, I do not know of any other human nonlinguistic cognitive system, not to speak of nonhuman systems, where so many languagelike properties cl.u.s.ter.
Consider the visual system. Recall that an obvious general property of language is that it is a formal, articulated system. That is, it is a system of perceptually distinguishable signs that individually and collectively express the information encoded in the representations a.s.sociated with the signs. This contrasts sharply with the visual system, which is a ""pa.s.sive""
system; it is not a system of signs at all.5 No doubt, visual representations can be described in combinatorial terms (Marr 1982; Homan 1998). But the system itself is not a ""language""; we use signs to describe its structure (see section 7.3). Another general property of language is that it is a 196
Chapter 6.
system of discrete infinity, as noted. In the absence of a system of ""expressions"" it is unclear how to determine the magnitude of what the visual system ""generates.""
Most importantly, environmental conditions strongly influence the properties of perceptual systems; they only weakly influence, if at all, the properties of the language and the musical systems. It is well known that the sensory systems of organisms, including the visual system, degenerate when environmental conditions that enforced those systems are no longer present. In the other direction, these systems adjust to changed environmental conditions to develop or amplify alternative sensory properties.
The blind mole rat (Spalax ehrenbergi) ill.u.s.trates the point. As the species moved underground millions of years ago, their eyes atrophied and became subcutaneous (David-Gray et al. 1998). It was naturally a.s.sumed that their visual system had become completely dysfunctional, but recent studies have shown that only those parts of the brain that support image formation have atrophied. The eye and the other parts of the brain have continued to develop an auditory system suited for the perception of vibratory stimuli (Bronchti et al. 2002).6 The phenomenon just does not apply to the language system since the only ""external"" condition it has to meet is the linguistic ""environment,"" not the physical properties of the world: the linguistic ""environment,"" the source of primary linguistic data, is a product of the language system itself.
Furthermore, the familiar argument from the poverty of stimulus suggests that the initial human language system, the faculty of language, ought to be simple and uniform across the species; this must also be the case with the visual system. But the visual system is not only uniform across the species like the language system, the states that it can attain, unlike the language system, are largely uniform as well, pathology aside.
The states that the language system can attain, however, vary wildly as thousands of human languages and dialects testify. Within the species, the language system is thus parametric; the visual system is not. Moreover, given the ""master-eye"" hypothesis, the human visual system may not be restricted to the species at all; the language system, in contrast, is largely unique to the species in major respects.
It is not surprising, therefore, that there is no common core in the specific operating principles of the two systems; the language system does not have anything like the rigidity principle of the visual system, the visual system does not seem to require anything like internal Merge. Possibly, the remark extends to principles of acquisition of these systems. Jenny Saran (2002) makes the interesting observation that strategies of statisti- Language and Music 197.
cal learning of predictive dependencies-for example, the ability to predict phrase boundaries from incoming streams-may extend beyond natural languages to include ""artificial languages"" and music. These strategies do not seem to apply to the visual modality when the stream of input is presented simultaneously-not serially-which is typically the case with vision.
Finally, when we are discussing whether two cognitive systems dier in their computational principles, it is natural to ask if there is any conflict in their operations. Introspective evidence seems to suggest that, unlike the music case, there is no obvious conflict in the simultaneous operation of the linguistic and the visual systems, enabling us to report on what we see. We can change visual fields, and zoom in and out of them while continuing to report on all these changes at near-perfect eciency. Heart sur-geons and sports commentators are able to give running commentaries on the intricate, rapidly changing scenarios in front of them, including their own actions. In fact, in this line of reasoning, our ability to talk about what we see would seem to require that the systems of visual and linguistic computation are separate.7 Other things being equal and pending more controlled experimentation, it is most likely that the linguistic and the visual systems access dif-ferent computational systems; it is hard to find a dimension in which to place the visual system on a par with the linguistic system. The music system, in contrast, seems to satisfy all those ""languagelike"" conditions that the visual system fails to satisfy.
6.2.
Strong Musilanguage Hypothesis The net result seems to be that the c.u.mulative body of evidence demands a theory in MLH lines so that we are able to furnish a unified account of otherwise disjoint and inadequate individual pieces of evidence. What are the prospects of giving a theoretical shape to MLH? The simplest answer will show that the same syntactic system underlies the capacity to generate unbounded sequences in both music and language. Ideally, the structuring principles already discovered in the language case will const.i.tute such a system.
In what follows, I will concentrate on the syntactic framework proposed by Noam Chomsky in the Minimalist Program without ruling out that other frameworks-for example, Richard Kayne 1994-may be relevant. For now, it seems to me that Chomsky"s framework has direct implications for music. In that sense, the ability to cover cognitive 198
Chapter 6.
systems other than language may well be a criterion for choosing among various syntactic frameworks.
As a first step in that direction, we will require that the principles of linguistic organization are not linguistically specific. Under the current minimalist conception (Chomsky 1995b), the core linguistic system consists of two things: a recursive operation Merge and some principles of computational eciency (PCE). We saw in some detail that these abstract principles and the operation that const.i.tute the computational system of human language, CHL, are not linguistically specific.
Could these principles be involved in musical organization as well? If the answer is in the positive, then CHL is the sole computational system of music and language. This will count as the strongest version of the musilanguage hypothesis (SMH). SMH continues to be ( just) a hypothesis; an empirically significant theory ensuing from SMH is nowhere in sight. Detailed empirical and theoretical research is needed to show that the principles of CHL, or some (abstract) version of them, in fact explain properties of musical organization.8 I have been trying to promote MLH, and now SMH, as an attempt to make a variety of insights, mysteries and individual pieces of evidence converge. Nevertheless, a number of conceptual or foundational issues need to be addressed before SMH is allowed to get o the ground. For the rest of this chapter I will be concerned with what I consider to be conceptual issues of immediate interest. For example, I am aware that broad philosophical or musicological objections may be raised against the very idea of computational theory of music (Scruton 1997) just as they were raised against linguistic theory in the past. As with linguistic theory, such objections can only be addressed by simply pursuing an otherwise plausible theoretical framework. So these cannot be of immediate interest.9 Two general issues seem to require pressing attention before SMH is seriously entertained: (i) Is music a system of sound-meaning correlations at all? (ii) Is music recursive in the sense in which language is?
6.2.1.
Music and Meaning Not everyone is convinced that music is a symbol system in the right sense. One may even doubt if music is a system of symbols at all-that is, music may be viewed as nothing but a system of sounds. The underlying idea is that a sound is a symbol if it has a meaning, and the relation between sound and meaning is largely arbitrary. For example, Bertrand Russell (1919) held that a symbol ""stands for"" something else. The traditional way in which a linguistic symbol stands for something is for the Language and Music 199.
sound of the word to be a.s.sociated with a concept or an object-typically, both. Since we cannot a.s.sociate a musical sound with either a concept or something in the world (Fitch 2006; Boghossian 2007), musical sounds are not symbols.
Notice that the objection makes it virtually impossible to inquire if a certain noise or mark on paper is a symbol unless it is very much like a linguistic mark; that is, we cannot meaningfully ask if there are symbol systems other than language (Raman 1993, 4041). More significantly, on the Russellian count, it is dicult to a.s.sign any theoretical salience to the notion of a linguistic symbol. From what we can judge now, word-concept and word-object relations may fail to be theoretically salient; as we saw at length in chapters 3 and 4, we can even doubt whether these relations obtain for natural languages.
Suppose we do so, but that does not prevent us from entertaining some concept of meaning within grammatical theory itself. To recall, when a set of lexical items enters the grammatical system, computation begins.
If the computation does not crash, a relationship is established between the phonetic (PF) and the logical form (LF) such that the pair hPF, LFi captures the traditional idea of language as a system of sound-meaning correlations (Chomsky 1995b). In that sense, an LF is an organization of symbols, but the semantic content of LF does not include either denotational or conceptual information. At LF, all phonetic features have been stripped away and only semantic and formal features remain, but the semantic features are not interpreted at LF since LF is the output of grammar;10 only formal features, including features such as person, number, and gender play computational role. So the LF-structure should be viewed as consisting of only these purely structural items. This must be the case whatever be the output of narrow syntax-LF, SEM, or just interpretable phase. As proposed above, LF-information is best viewed as captured as the output of FLI systems, rather than as the output of the computational system. Even there, we do not know, say, what the gender feature ""female"" means; we just know that the feature has to agree.
In eect, grammatical theory provides compelling evidence for postulating some (as yet unclear) notion of internal significance of a sequence of symbols (McGilvray 2005). I do not see why this restricted notion of meaning/significance cannot apply to music. From this perspective, consider again the LF-representation of the sentence every boy danced with a girl. As we saw, the sentence is two-ways ambiguous and the ambiguity can be represented at LF as follows: 200.
Chapter 6.
(121) (a) Representation: [IP [every boy]i [ei danced with a girl]]
Interpretation: For every boy x, x danced with a girl (b) Representation: [IP [a girl]j [IP [every boy]i [ei danced with ej]]]
Interpretation: A girl y is such that for every boy x, x danced with y Since representations (121a) and (121b) carry lexical information of quantifiers, nouns, verbs, and so forth, they are linguistic expressions par excellence; in particular, scope distinctions are forced by linguistically specific properties of the expression as we saw. I am not suggesting that musical representations look like these.11 Nevertheless, I wish to draw attention to some general features of this example which, in my opinion, are available beyond language. First, (121a) and (121b) are structurally distinct in that the relative positions of the symbolic objects in them dif-fer. Second, these structural dierences are directly related to how a representation is to be interpreted. Third, the interpretations do not make any reference to how the world is like, the beliefs of people interpreting them, the vagaries of the a.s.sociated culture, and the like. In fact, in order to dier, the interpretations do not require that there be an ""external""
world at all.
Turning to music, Diana Raman observes that musicians are typically concerned with structural issues such as a given phrase ends at a certain E-natural because the note prepares a modulation to the dominant (Raman 1993, 59). Three possibilities arise: the phrase ends before the E-natural, the phrase ends at the E-natural, and the phrase extends beyond the E-natural. As anyone familiar with music knows, these structural variations make substantial dierences in the interpretation of music.
Depending on the group of notes at issue, and the location of the group in a pa.s.sage, some of the structural decisions may even lead to bad music.
This is because these decisions often make a dierence as to how a given sequence of notes is to be resolved. Any moderately experienced listener of music can tell the dierences phenomenologically, though its explicit explanation requires technical knowledge of music (such as modulation to the dominant).
Lerdahl and Jackendo "s work (1983), especially Jackendo and Lerdahl 2006, shows how dierent groupings impose dierent hierarchies on musical surfaces such that each hierarchical organization gets linked to a specific interpretation of the surface. For example, if phrase boundary is marked with pauses or interludes, grouping of the same sequence of notes-with pitch and meter fixed but the location of the pauses Language and Music 201.
varying-creates very dierent musical surfaces (Jackendo and Lerdahl 2006, 2.1). The existence of delineable grouping structures explains why composers and performers spend much time on ""marking"" a score to show how exactly they wish a sequence of notes to be grouped. The practice is explicit in musical traditions which use scores. But it can be observed in any tradition by attending its training sessions, for example.
Training means attention to the pitch of individual notes and how notes are to be organized.
When the music becomes complex, and it begins to tax memory and attention, various devices are used to highlight the salient properties of tonal organization. These include emphasis typically by suitable ornamentation, organization of music in delineable cycles such as rondo, display of unity of larger sections by cadences, exploiting the cyclic features of the accompanying beat, and so on. The description is obviously very incomplete, but it is pretty clear that, in some sense, there is nothing else to music. Interpretations in music are sensitive solely to the formal properties of representations. That does not make musical sounds any less symbolic.
6.2.2.
Themes from Wittgenstein We just saw that grammatical theory postulates the notion of ""internal significance"" of a sequence of symbols; the postulation perhaps extends to musical symbolism. What is internal significance? The topic is large and currently pretty obscure though, as noted, it has to be faced eventually. For now, I will only make some brief remarks to indicate the issues involved. In my opinion, a fruitful way of doing so is to examine some puzzling remarks of Ludwig Wittgenstein, since Wittgenstein"s observations on language led him directly into drawing parallels with significance of musical expressions. Even there, I will avoid (the dangerous ground of ) exegesis of Wittgenstein, and attend only to some of his a.s.sorted remarks.
It is interesting that Wittgenstein"s interest in a joint study of language and music spanned his entire philosophical career, despite radical changes in his philosophical position. In his earliest work, Tractatus Logico-Philosophicus (1922), for instance, he held a ""picture theory"" of meaning in which the internal organization of linguistic symbols was viewed as establishing relations between language and the world: ""A proposition is a picture of reality: for if I understand a proposition, I know the situation that it represents"" (1922, 4.021). Strangely, he extended the idea to music: ""A gramaphone record, the musical idea, the written notes, and the 202