178.
Chapter 5.
Shortest Derivation Condition says that between two converging derivations, the one with less number of steps is preferred. It is then clearly a least-eort principle. However, Kitahara (1997) suggested that Procrastinate can be derived from the least eort SDC; it follows, in any case, that Procrastinate is not independently needed. Although the least eort spirit of SDC is well taken, it looks problematic as formulated since it requires that two or more converging derivations be compared for their length.
Comparing derivations ""globally"" after they are over is a hugely costly aair not becoming of a least-eort principle. The natural solution is to basically block multiple derivations by keeping derivations short so that alternative derivations do not get the chance to branch out, as it were (Boeckx 2006). This is achieved by reducing syntactic domains and operations on them: cyclicity. In that sense, Merge-driven single-cyclic derivations by phase implements the spirit of SDC without invoking it; as a principle, SDC is not required.
This leaves the principles FI and MLC. As noted, Full Interpretation requires that no illegible objects appear at the interfaces (for more, see Lasnik and Uriagereka 2005, 105106); Minimal Link Condition requires ""shortest move"" in that an element cannot move to a target if another element of the same category occurs between the first element and the target. MLC thus imposes a condition of relativized minimality- ""relativized"" because minimal links are defined with respect to specific categories such as WP. Both FI and MLC are clearly least-eort conditions. As with last resort and SDC, the question is if they are to be stated as specific principles. As Chomsky (1995b, 267268) observes, if we state MLC as a principle, then it can only be implemented by inspecting whether another derivation has shorter links or not. Similar observations apply to FI if it is to be formulated as an output condition for choosing the most economical representation. In each case, the problem of globality arises.
As with last resort, the natural solution is to think of these conditions as enforced directly in the system. Thus, the reflex of MLC obtains by simply barring anything but shortest move (Reinhart 2006, 22); the reflex of FI obtains by rejecting structures with uninterpretable elements within CHL, as noted. Empirically, we know that these conditions on movement and representations are obeyed in a vast range of cases. To that extent, these conditions are empirical generalizations. If, however, an evidence is located in which, say, the minimal link condition on movement is apparently violated, we try not to withdraw the least eort condition, but explain the anomaly by drawing on other factors (Boeckx 2006, 104). Be- Linguistic Theory II 179.
yond empirical generalization, then, the least eort conditions act as general constraints on the system.
To sum up, it is reasonable to say that the last resort condition obtains in CHL in a general way. The least eort condition seems to obtain more specifically in three subconditions implementing the eects of SDC, MLC, and FI: these subconditions are restricted cyclicity, condition on movement, and condition on representations, respectively. To emphasize, neither of the last resort and least eort conditions are stated as specific economy principles. During computation, we will expect the last resort and least eort conditions to obtain throughout to ensure that the computational system, CHL, meets the conditions of optimal design in its operations.
5.2.
CHL and Linguistic Specificity Although intricate and, sometimes, elaborate computations take place in CHL as information from the extremely complex lexicon of human languages enters the system, CHL itself const.i.tutes of just Merge that operates under last-resort and least-eort conditions-apparently, nothing else.15 To put it dierently, linguistic information is essentially stored in the lexicon and, working on it, CHL generates symbolic objects at the interfaces which are interpreted by the relevant external systems. As we saw, some of these systems are likely to be linguistically specific. Is the CHL itself-or, better, its components-linguistically specific?
I am raising this question because, once we reach the austere design of CHL under the Minimalist Program, it is dicult to dispel the intuition that the system seems to be functioning ""blindly"" just to sustain ecient productivity. There is a growing sense that, as the description of the human grammatical system gets progressively simple, the terms of description get progressively linguistically non-specific as well.
Let us say that a principle/operation P of a system Si is nonspecific if P makes no reference to Si-specific categories. Suppose that the collection of Ps is sucient for describing a major component of Si for us to reach some nontrivial view of the entire system. Recall that, with respect to the language system, we have called such a principle a ""purely computational principle"" (PCP) earlier (section 2.2). It is the ""purely computational""
nature of the functioning of CHL that gives rise to the intuition of (the relevant notion of ) nonspecificity.
Intuitively, to say that P is purely computational is to say that the character-and hence the formulation-of P is such that its application 180
Chapter 5.
need not be tied to any specific system Si. In that sense, P could be involved in a system Sj which is (interestingly) dierent from Si in which P was originally found. It could turn out of course that only Si has P since only it requires P even if its formulation is nonspecific-that is, it could be that there is no need for P anywhere else (but, in that case, the nonspecific character of P remains unexplained). So the idea really is that, if a computational system other than the language system required P, then P must be nonspecific; it does not follow from this statement alone that there are other computational systems requiring P. That is an empirical issue, but it interestingly opens up only when the collection of Ps in Si begin to look as if they are non-Si specific.
Until very recently, linguists, including Chomsky, held a very dierent view of the language system. The GLOW manifesto, which represents the guiding spirit and motivation of current linguistic work, states explicitly that ""it appears quite likely that the system of mechanisms and principles put to work in the acquisition of the knowledge of language will turn out to be a highly specific "language faculty" "" (Koster, Riemsdijk, and Vergnaud 1978, 342). In general, Chomsky had consistently held that, even if the ""approaches"" pursued in linguistic theory may be extended to study other cognitive systems, the principles postulated by the theory are likely to be specific to language: ""There is good reason to suppose that the functioning of the language faculty is guided by special principles specific to this domain"" (Chomsky 1980, 44; also Chomsky 1986, xxvi). Notice that this view was expressed during the G-B period that promoted a strongly modular view of the organization of language (Boeckx 2006, 6266), as we saw. The point of interest here is that the idea of linguistic specificity was advanced for the principles and operations that const.i.tute the computational system of human languages.
Nonetheless, I am asking whether the elements of FL are dedicated to language alone, or whether there is some motivation for thinking that significant parts of FL might apply beyond language. I am suggesting that the most reasonable way to pursue this motivation, if at all, is to focus on the combinatorial part of the system to ask whether some of the central principles and operations of this part could be used for other cognitive functions. Therefore, the term ""CHL"" is to be understood as a rigid designator that picks out a certain cla.s.s of computational principles and operations, notwithstanding the built-in qualification regarding human language. However, so far, I am thinking of CHL as restricted to language and some other human cognitive systems, especially those that may be viewed as ""language-like,"" ones that are likely to require P under a first Linguistic Theory II 181.
approximation. In this formulation of the issue, the human specificity of these systems is not denied although the domain specificity of some of the central organizing principles of these systems is questioned.
The formulation arises out of the intuition that, besides language, there are many other cognitive domains where combinatorial principles seem to play a central role: arithmetic, geometry, music, logical thinking, interpretation of syntactic trees, maps, and other graphic representations (Casati and Varzi 1999; Roy 2007), to name a few. If the elements of FL are to be used elsewhere at all, it is quite likely that they reappear in some of these domains; that is the step of generalization I have in mind. In a related way, the proposal might enable us to make better sense of the architecture of the language faculty, sketched earlier, in which domain-specific FLI systems are viewed as separate from the core computational system itself. If language is a distinct cognitive domain we will expect some linguistically specific eects to cl.u.s.ter somewhere while the computational system itself eects bare productivity in other domains as well.
For that to happen, the computational system itself needs to be linguistically nonspecific.
To my knowledge, there has been little discussion on whether and to what extent the principles actually discovered in the study of language can be extended to other cognitive domains. Clearly, the issue under discussion here arises only for those cognitive domains for which a fairly abstract body of principles is already in hand. In other words, if the principles postulated for a cognitive domain are too directly tied to the phenomena they cover, then their very form will resist generalization across phenomenal domains. For example, questions of generalization could not have been interestingly asked for the system of rules discussed in the Aspects model of language (Chomsky 1965). For the cognitive domains under consideration here, it is generally acknowledged that a suciently abstract formulation has been reached, if at all, only for a few cognitive domains including language, and that too very recently. Thus, given the lack of sucient advance in studies on other ""languagelike"" cognitive domains, the question that concerns us here has not been routinely asked.
Postponing Chomsky"s current and very dierent views on this issue to chapter 7, I will propose that a significant component of the language system, under suitable abstractions, consists wholly of purely computational principles. The proposal requires a study of the organization of grammar, principle by principle, to see if it is valid. Since we have just traced the development of grammatical theory, it seems to me that this is the right place (while grammatical theory is still fresh in our minds) to pursue the 182
Chapter 5.
proposal in terms of a quick review of what we have seen so far. I will discuss the significance, including empirical motivation, of the issue in the chapters that follow.
To recapitulate (section 2.2), we may think of four kinds of rules and principles that a linguistic theory may postulate: language-specific rules (LSR), construction-specific rules (CSR), general linguistic principles (GLP), and purely computational principles (PCP). It is obvious that, for the issue of nonspecificity, only PCPs count. From that point of view, the four kinds of rules basically form two groups: linguistically specific (LSR, CSR, GLP), and linguistically nonspecific (PCP). If PCP is empty, then the language system is entirely specific. If PCP is nonempty but ""poor,"" then the issue of nonspecificity is uninteresting beyond language. Thus, the real question is: is PCP rich? In other words, how much of the working of CHL can be explained with PCPs alone?
As we saw, the principles-and-parameters framework (P&P) postulates that rules of the first two kinds, that is, LSR and CSR, may be totally absent from linguistic theory. In these terms, a linguistic theory under the P&P framework postulates just two kinds of principles, GLP and PCP.
However, it is clear that just the framework is not enough for our purposes, since the framework allows both GLP and PCP. Therefore, unless a more abstract scheme is found within the P&P framework in which PCPs at least predominate, no interesting notion of nonspecificity can emerge. The issue obviously is one of grades: the more PCPs there are (and less GLPs) in CHL, the more nonspecific it is. The task then is to examine the short internal history of the P&P framework itself to see if a move towards progressively PCP-dominated conceptions of CHL can be discerned. As noted, CHL has two components: some principles and one operation. I discuss these in turn.
5.2.1.
Principles Recall the organization of grammar in G-B theory schematically represented in figure 2.1. The following principles are postulated in that grammar: Projection principle, X-bar, y-criterion, Case filter, principles of Binding, empty category principle, subjacency, chain condition, and Full Interpretation, among others. Let us now see how the principles postulated by G-B theory fall under the suggested categories of GLP and PCP. The cla.s.sification is going to be slightly arbitrary; we will see that it will not aect the general argument.
The projection principle stipulates that lexical information is represented at all syntactic levels to guarantee that input information may not Linguistic Theory II 183.
be lost to the system. Any computational system requires that none of the representations that encode information are lost to the system until a complete interpretation is reached. However, the formulation of the projection principle mentions the linguistic notion of lexical information.
This suggests an intermediate category of principles; call it ""quasi-PCPs""
(Q-PCP): linguistically specific in formulation, but PCP in intent.
X-bar is a universal template, with parametric options, that imposes a certain hierarchy among syntactic categories. Again, it stands to reason that any computational system will require some notion of hierarchy if a sequence of its elements is to meet conditions of interpretation. Still, it is not obvious that every symbol system must have the rather specific hierarchy of specifiers, heads and complements captured in X-bar theory. In that sense, the principle falls somewhere between GLP and Q-PCP. Given the uncertainty, let us a.s.sume the worst case that X-bar theory is GLP.
y-theory seems linguistically specific in that it is exclusively designed to work on S-selectional properties of predicates. The y-criterion (""each argument must have a y-role""), the main burden of this theory, is phrased in terms of these properties. But what does the criterion really do, computationally speaking? As we saw (section 2.3), two kinds of information are needed to precisely determine the relations between the arguments projected at d-structure: an enumeration of arguments, and the order of arguments (Chomsky, Huybregts, and Riemsdijk 1982, 8586). Thinking of thematic roles as lexical properties of predicates, the y-criterion checks to see if elements in argument position do have this lexical property. To the extent that the y-criterion accomplishes this task, it is a PCP. Yet, as noted, it is phrased in GLP-terms. In my opinion, it ought to be viewed as Q-PCP.
The Case filter (""each lexical NP must have Case""), the main burden of Case theory, is also linguistically specific in exactly the same way: it cannot be phrased independently of linguistically specific properties. Yet, as for the y-criterion, the Case filter serves a purely computational purpose to check for the ordering part of the set of arguments; as we saw, the system does not care which lexical NP has which Case as long as it has a Case. In that sense, it is a Q-PCP as well.
Binding theory explicitly invokes such linguistically specific categories as anaphors, p.r.o.nominals, and r-expressions to encode a variety of dependency relations between NPs. It is implausible to think of, say, musical quantifiers, anaphors and p.r.o.nominals, just as it makes no sense to look for Subject-Object asymmetries in music. Notice the problem is not that other symbol systems may lack dependency relations in general; they 184
Chapter 5.
cannot. The issue is whether they have relations of this sort. Similar remarks apply to the Empty Category Principle (ECP). These are then GLPs.
This brings us to the principle of Full Interpretation (FI) and Bounding theory. Bounding theory contains the Subjacency principle that stipulates the legitimate ""distance"" for each application of Move-a. These distances are defined in terms of bounding nodes, which in turn are labeled with names of syntactic categories such as NP or S. Abstracting over the particular notion of bounding nodes, it is an economy principle that disallows anything but the ""shortest move"" and, as such, it is not linguistically specific; it is Q-PCP. Finally, the principle of Full Interpretation does not mention linguistic categories at all in stipulating that every element occurring at the levels of interpretation must be interpretable at that level; in other words, all uninterpretable items must be deleted. FI then is PCP.
Notice that most of the principles cl.u.s.ter at the inner levels of representation: d-structure and s-structure. Moreover, the principles discussed are a mixed bag of GLPs, Q-PCPs, and PCP; predominantly Q-PCPs, in my opinion. In this scenario, although PCP is nonempty, it is poor; hence, the system is not really nonspecific. But the predominance of Q-PCPs, and the relatively meager set of GLPs, suggests that there are large PCP-factors in the system which are concealed under their linguistic guise. If these factors are extracted and explicitly represented in the scheme, G-B theory can turn into one that is more suitable for nonspecificity. I will argue that the scheme currently under investigation in the Minimalist Program may be profitably viewed in that light.
As noted, the Minimalist Program is more directly motivated by the a.s.sumption that FL has optimal design. Two basic concepts, legibility conditions and conceptual necessity, are introduced to capture this a.s.sumption. On the one hand, we saw that the intermediate levels of d-and s-structures are eliminable from the system on grounds of conceptual necessity. On the other, we saw that most of the complex array of principles of G-B theory was cl.u.s.tered on these inner levels. With the elimination of these levels, MP enforced drastic reordering of principles.
Recall that we viewed X-bar theory, Binding theory, and ECP as GLPs. CHL, the computational system in MP, (arguably) does not contain any of them. Further, the projection principle, Subjacency, y-theory, and Case theory were viewed as Q-PCPs. While the projection principle as such is no longer required in the system, Case theory is basically Linguistic Theory II 185.
shifted to the lexicon. y-theory, Binding theory, and ECP are shifted to an external cl.u.s.ter of FLI systems, as we saw.
This leaves Subjacency, a Q-PCP. Q-PCPs are essentially PCPs under linguistic formulation. This raises the possibility, as noted, that just the PCP-factor may be extracted out of them, and explicitly represented in the system. The MP principle Minimal Link Condition (MLC) serves that purpose with respect to Subjacency. Similar remarks apply to the G-B condition on chains. This condition is first replaced by an economy condition called the Shortest Derivation Condition (SDC), which requires that, in case there are competing derivations, the derivation with the least number of steps is chosen (Chomsky 1995b, 130, 182); as we saw the condition was then implemented directly by restricting the domain of operation. Thus, the only G-B principle which is fully retained for the CHL in MP is Full Interpretation (FI), a PCP.
In sum, insofar as the G-B principles are concerned, all linguistically specific factors have been either removed from the system in MP, or they have been replaced by economy conditions. As I attempted to show, all the principles of MP have been factored out of those of G-B-that is, no fundamental principle has been added to the system in MP. The general picture, therefore, is that the CHL in MP is predominantly const.i.tuted of PCPs.
The preceding discussion of MP is not exhaustive. Let us also grant that the rendition of some of the individual principles and operations, regarding the presence or absence of linguistically specific factors in them, could be contentious. Yet, plainly, when compared to the G-B framework, the overall picture is one of greater generality and abstraction away from linguistic specificity. Recall that the only issue currently under discussion is whether we can discern a progressively PCP-dominated conception of CHL.
5.2.2.
Displacement A variety of objections may be raised against the picture. A general objection is that, granting that successive phases of linguistic theory do show a movement from GLPs to PCPs, PCPs are to be understood in the context of linguistic explanation (only).
The objection is trivially true if its aim is to draw attention to a certain practice. There is no doubt that these PCPs were discovered while linguists were looking only at human languages. We need not have entered the current exercise if someone also discovered them in the course of 186
Chapter 5.
investigating music or arithmetic. But the future of a theoretical framework need not be permanently tied down to the initial object of investigation. As Chomsky observed in the past, a suciently abstract study of a single language, say, Hidatsa, can throw light on the entire cla.s.s of human languages; hence, on FL. This observation cannot be made if it is held that the non-Hidatsa-specific principles that enter into an explanation of Hidatsa cannot be extended to Hindi because Hindi was not in the original agenda.
No doubt, the laws and principles postulated by a theory need to be understood in their theoretical context. For example, the notions of action and reaction as they occur in Newton"s force-pair law (""every action has an equal and opposite reaction"") have application only in the context of physical forces even if the law does not mention any specific system. We cannot extend its application to, say, psychological or social settings such as two persons shaking hands. Global limits on theoretical contexts, however, do not prevent theoretical frameworks to evolve and enlarge within those limits. The force-pair law does not apply to social situations, but it does apply to a very large range of phenomena, perhaps beyond Newton"s original concerns in some cases. For instance, the law has immediate application in static phenomena like friction, but it also applies to dynamical phenomena such as jet propulsion. So the question whether principles of CHL apply to other cognitive systems is more like asking whether the force-pair law applies to jet propulsion, rather than to people shaking hands. The burden is surely on the linguist now to tell us what exactly the boundaries of the linguistic enterprise are.
A specific objection to the picture arises as follows. As we saw in some detail earlier, human languages require that sometimes an element is interpreted in a position dierent from where it is sounded. John and the book receive identical interpretations in markedly dierent structures such as John read the book and the book was read by John. It is the task of a transformational generative grammar to show the exact mechanism by which the element the book moves from its original semantic position to the front of another structure without altering semantic interpretation. A basic operation, variously called Move-a or Aect-a in G-B, and Move, Attract, or Internal Merge in MP, implements displacement. We saw all this.
Now the objection is that nothing is more linguistically specific than the phenomenon just described. A major part of CHL is geared to facilitate instances of movement in an optimal fashion. Thus, even if the requirement of optimality leads to PCPs, the reason why they are there Linguistic Theory II 187.
is essentially linguistic. In that sense, the phenomenon of displacement could be viewed as blocking any clear conception of nonspecificity of the computational system. To contest, I will outline a number of directions to suggest that the issue of displacement (hopefully) breaks down into intelligible options that are compatible with the general picture of nonspecificity.
First, suppose displacement is specific to human languages. In that case, the general picture will not be disturbed if the phenomenon is linked to other linguistically specific aspects of the system. From one direction, that seems to be the case. We saw that the lexicon, which is a collection of features, is certainly linguistically specific in the sense under discussion here. One of the central ideas in MP is that the lexicon contains uninterpretable features such as Case. Since the presence of these features at the interfaces violates FI, CHL wipes them out during computation. The operation that executes this complex function is Move. Move is activated once uninterpretable features enter CHL; displacement is entirely forced by elements that are linguistically specific.16 There are several ways of conceptualizing this point within the general picture.
If Move is an elementary operation in the system, then we may think of this part of the system as remaining inert until linguistically specific information enters the system. The rest of CHL will still be needed for computing non-linguistic information as Merge is activated to form complex syntactic objects. In eect, only a part of the system will be put to general use and the rest will be reserved for language. Chomsky (1988, 169) says exactly the same thing for arithmetic: ""We might think of the human number faculty as essentially an "abstraction" from human language, preserving the mechanism of discrete infinity and eliminating the other special features of language."" Alternatively, Move may not be viewed as an elementary operation but a special case of existing operations. There are suggestions in which Move is viewed as specialized Merge (Kitahara 1997; Chomsky 2001a). As we saw in some detail, an even simpler view is that Move is simply internal Merge. So if you have (external) Merge, you have (internal) Merge for free.
Second, we may ask whether displacement in fact is linguistically specific. We saw a CHL-internal reason for displacement triggered o by uninterpretable features. However, there is another reason for displacement.
As noted, external systems impose certain conditions on the form of expressions at the interfaces. For example, (ecient) semantic interpretation often requires that items be placed at the edge of a clause to ef-fect a variety of phenomena such as topicalization, definiteness, and the 188
Chapter 5.
like. The elimination of uninterpretable features takes an element exactly where it receives, say, a definiteness or quantifier interpretation.
Given that linguistic notions such as topicalization, definiteness, and so on-""edge"" phenomena-are viewed as special cases of more general notions such as focus, highlight, continuity, and the like, could it be that the external systems that enforce these conditions are not themselves linguistically specific, at least in part? If yes, then these parts could be viewed as enforcing conditions on structures which are met in dierent ways by dierent cognitive systems in terms of the internal resources available there. For example, language achieves these conditions by drawing on uninterpretable features specifically available in the human lexicon; as we will see, music could be enforcing similar deleting operations with the ""unstable"" feature of notes that occur in musical progression. This will make the implementation of displacement specific to the cognitive system in action; but the phenomenon of displacement need not be viewed as specific to any of them. In any case, the issue seems to be essentially empirical in character; we just need to know more.
6 Language and Music It seems that whenever there is an urge to talk about languagelike systems, people typically mention music, arithmetic, and logic, among others. After characterizing ""hominization"" as a process that includes acquisition of language, music, mathematics, and logic, Derek Bickerton (2000, 161162) thinks that it would be bizarre to suppose that ""each of these capacities had a separate and independent birth.""1 According to Bickerton, the supposition would be bizarre because these ""traits"" are essentially unique to humans, yet it is ""entirely beyond belief "" that the variety of ""unconnected capacities of this magnitude"" could have emerged in the short period of time that has elapsed since the hominid line split from the rest of the primates. It is more likely then that these capacities have a common origin. Call the suggested list of capacities the ""hominid set.""
Since much of Bickerton"s paper is concerned with computational or syntactic aspects of language, I will a.s.sume that the preceding concerns about common origin also have the same thrust: the computational principles, or some (abstract) version of them, that underlie human linguistic competence could be implicated in the domains of music, mathematics and logic as well.
The list of capacities just mentioned is intuitive and obviously incomplete; if the hominid set is to denote a natural category, what falls under it cannot be stipulated. Extensive theoretical and empirical inquiry is needed to determine which cognitive systems in fact satisfy the early a prioristic formulation.
6.1.
Musilanguage Hypothesis I will focus on the cognitive system of music to see if something like a ""musilanguage hypothesis"" (MLH)-music and language share underlying syntactic properties-makes sense. The choice of music, as against, 190
Chapter 6.
say, arithmetic, is interesting since arithmetic is generally taken to be ""derived"" from language anyway. Informally, people often speak of music also as a language, sometimes even as a universal language of man-kind.2 Such superficial impressions aside, language and music seem to be very dierent cognitive systems marked by domain-specific properties-for example, language stores lexical information, music stores tonal information. We will note many other dierences as we proceed (section 7.2).
In that sense, the proposed generalization that cognitive systems other than language might share syntactic properties with language will be significant if it extends to the music case.