Scuola Superiore di Lingue Moderne per Interpreti e Traduttori
University of Bologna
Paper presented at 6th Jornada de Corpus, UPF, Barcelona, May 1998
As Widdowson (1989) has pointed out, the relationship between linguistics and applied linguistics is not simply a matter of applying linguistic theory to language teaching. Applied linguistics has its own concerns and its own criteria of relevance, in the light of which the methods and findings of linguistics need to be interpreted. Corpus linguistics, which over the last thirty years has come to have a significant impact on linguistic thinking, poses precisely these problems of interpretation. Leech (1992: 106) defines a corpus as "a helluva lot of text, stored on a computer". From an applied linguistic perspective, the central question is whether there is a hell of a lot the teacher and learner can do with one.
If we look back at the uses which have so far been made of computer corpora in language teaching, we can distinguish two main lines of approach. The first, which we might term a behind-the-scenes approach, has seen corpora used by publishers and researchers in developing syllabuses, materials and reference works for language learning - typically by focussing on the most frequent items and uses of those items to be found in corpora. This approach has been particularly influential in the production of reference works: in the wake of the pioneering COBUILD project, all the principal learner dictionaries of English now proclaim themselves to be `corpus-based'. There have also been various initiatives in the design of syllabuses and of classroom materials which have drawn on corpus data as a means of selecting and grading their linguistic content (e.g. Willis 1990, Willis and Willis 1987, Mindt 1997).
The behind-the-scenes approach has generally been characterised by the use of very large corpora and of sophisticated software, whose development has required massive financial investment and considerable linguistic and computational expertise. What is now the Bank of English, developed under the COBUILD project, and the largest corpus of contemporary English, already ran to some 20 million words in the mid-eighties and now exceeds 300 million - a quantity which is probably approaching the lifetime linguistic experience of the average person. The size and complexity of such resources, along with the need to protect commercial investment, have meant that large corpora have only been accessible to a limited group of researchers, with the relationship between the corpus and end-users - classroom teachers and learners - being mediated and controlled by experts. The end-user has only had access to the products of corpus analysis, and not to the processes which give rise to them. Thus publishers and researchers have stated that products based on corpus analyses are descriptively superior, but the end-user has had no possibility of performing these analyses and verifying their superiority directly. Only in certain ESP applications have smaller corpora been used to draft syllabuses for particular domains (e.g. Flowerdew 1993), potentially allowing for replication.
The second approach, which we may term the on stage approach, has instead attempted to bring corpora and corpus analysis directly into the teaching and learning environment. Its principal exponent, Tim Johns (based, like the COBUILD project, at the University of Birmingham), has coined the term "data-driven learning" to describe a discovery procedure where learners inductively derive and deductively apply generalisations by categorising data from corpora (Johns 1991). This procedure finds a justification in recent work in second language acquisition theory, which highlights the effectiveness of inductive learning from multiple examples (Ellis 1996, Skehan 1998), and it also fits with many of the premises of communicative language teaching, since it promotes a schematic view of linguistic knowledge and of language use (Aston 1995). Data- driven learning lends itself both to work where the teacher provides concordance data for learners to analyse, as in Johns' "classroom concordancing" model (1991, 1994), and to work where learners extract data from the corpus for themselves, be this in the classroom or in self-access contexts (Jordan 1992). It can also give rise to a range of communicative activities by providing "reasoning gaps" (Prabhu 1987) which learners must bridge, as they agree on how to interrogate the corpus, how to identify regularities, and how to interpret findings (Bernardini 1997; forthcoming).
While providing learning opportunities of a theoretically valid nature, on-stage corpus use has tended, given the limited financial and technical resources of the average educational institution, to be based on relatively small corpora (of a few hundred thousand words at most: Flowerdew 1996) which have lacked the careful design of the large research corpora which dominate behind-the-scenes uses. This means that generalisations made from them are likely to be of limited value. For instance, one of the few published small corpora, MicroConcord Corpus A (Murison-Bowie 1993), consists of newspaper articles drawn from one year's issues of The Independent. While it may tell us something about that year of that newspaper, it will not allow reliable conclusions to be drawn about newspaper language in general, and obviously not about uses in other registers. The limited size and opportunistic construction of such corpora makes them inherently less generalisable from than large research ones, particularly as far as features which are relatively uncommon and/or unevenly dispersed are concerned.
The divide between the behind-the-scenes and on-stage approaches is currently diminishing, however. Thanks to changes in policy, and the growth of computer networking, large research corpora are now becoming more generally accessible, and consequently available for on-stage use. It is now possible to consult two large corpora of English over the Internet, at relatively low cost and using relatively straightforward software, offering greater reliability for on-stage work in the classroom or in self-access, with better documentation of less common features across a wider range of texts and text- types. On the one hand, this allows generalisations derived from small corpora to be tested and broadened, and on the other, as I hope to demonstrate, it allows for a greater variety of learning activities. In this paper I illustrate some on-stage uses of the British National Corpus (BNC), which is now freely available in Europe for non-commercial research purposes - including research by teachers and language learners.
The BNC consists of approximately 100 million words of contemporary British English, taken from over 4100 texts of different types, spoken and written: the spoken component, in the form of transcriptions, runs to 10% of the total (for more details on the composition of the corpus, see Aston and Burnard 1998). The corpus is marked up with information as to the nature, source and structure of each text, and each word is annotated to show its part-of-speech. All this additional information is given in SGML (Standard Generalised Markup Language) tags between angle brackets, thereby distinguishing the markup from the words of the text itself. The complexity of the markup underlies much of the BNC's potential for on-stage use in language pedagogy, and I shall therefore begin by briefly describing it. (NOTE: For formatting reasons, SGML tags in the HTML version of this paper are shown between square rather than between angle brackets, except in the .gif images in figures 1 and 4 below.)
Figure 1 shows some of the principal features of written text documents in the BNC,
each of which corresponds to one written text.
Figure
1
Each
document is marked up as a single [bncDoc] element, which contains a [header] element
and a [text] element. The [header] contains information about the text - bibliographic details
concerning its source, and its categorisation along such parameters as domain (topic),
medium (published or unpublished, book or periodical, etc.), type of author, etc. - while
the [text] element contains the text itself, which is divided into [div0] elements representing
major sections, such as the chapters of a book or the articles in a newspaper, in turn
divided into [div1] elements representing sub-sections, in turn divided into [div2] elements,
and so on. Each of these divisions may contain a [head] element (the section heading), and
a series of [p] elements representing paragraphs. These [head] and [p] elements must
contain a series of [s] (sentence) elements, which are in turn made up of words ([w]
elements) and punctuation ([c] elements).
Figure 2 illustrates the low-level structure of part of a written text.
Figure 2
[div2 complete=Y org=SEQ r=bx] [head type=MAIN] [s n=1202]
[w PRP]IN [w AT0]THE [w NN1]BEGINNING[c PUN]&hellip [/head]
[p] [s n=1203] [w AT0]The [w NN1]word [w NN2]jeans
[w VVZ]originates [w PRP]from [w AT0]the [w NN1]place
[w NN1]name [w NN1-NP0]Genoa [w AVQ-CJS]where
[w NN2]sailors [w PRP]from [w AT0]the [w NN1]port
[w VVD-VVN]hit [w PRP]on [w AT0]the [w NN1]idea [w PRF]of
[w VVG]making [w NN2]trousers [w PRP]from [w AT0]the
[w AJ0]sturdy [w NN1]sailcloth[c PUN].
It shows the beginning of a [div2]
element (the values of whose attributes show that it is
complete and sequentially organised). This section starts with
a main heading that contains the 1202nd sentence in this text.
This sentence contains a preposition (IN), a definite
article (THE), a singular common noun
(BEGINNING) and an ellipsis (three dots). The heading
ends at this point (a slash following the opening bracket
marks the end of the element in question). It is followed by a
new paragraph, which begins with the 1203rd sentence, which
begins with the definite article, and so on. All this
information need not of course be displayed, and for many
purposes it will be more convenient to view it on the screen
as in Figure 3:
IN THE BEGINNING... The word jeans originates from the place name Genoa where sailors from the port hit on the idea of making trousers from the sturdy sailcloth. Denim is the name of the blue woven cloth first made in the French town of Nimes. So now you know!
The typical structure of spoken text documents is similar,
and is shown in Figure 4.
Figure 4
The [header] here
also contains information concerning the various participants
in the interaction, and it is followed by an [stext] (spoken
text) element. The latter consists of [div] elements
representing different events or conversations, which in turn
consist of [u] (utterance) elements representing turns at
talk. Spoken texts may also contain non-verbal features such
as laughter and coughing, and paralinguistic ones, such as
shifts in voice quality, pauses, cut-offs and overlaps. They
may also contain indications of unclear segments and editorial
omissions in the transcript. Figure 5 shows an extract from a
spoken text: for each utterance, the speaker is identified by
the value of the who attribute on the [u] element, and
the beginnings (and endings) of mutually overlapping sections
are marked by [ptr] elements whose t attributes share
the same value.
Figure 5
[u who=PS6M6] [s n=079] [w CJC]And [w PNP]I [w PNP]I
[w VVB]mean [w PNP]it [w VBD]was [w AV0]absolutely
[w AJ0]gorgeous[c PUN]. [s n=080] [w PNI]Everything[c PUN],
[w AT0]the [w NN2]railings [w PNP]you [w VVB]know[c PUN],
[w AVQ-CJS]when [w PNP]they [w VVB]put [w AVQ-CJS]when
[w PNP]they [w VVD]painted [w AT0]the [w NN2]railings[c PUN],
[w AT0]the [w AJ0]burned [w AT0]the [w AJ0]old [w NN1]paint
[w AVP]off[c PUN], [ptr t=KNHLC00U] [w AT0]the [w AJ0]new
[w NN1]paint [w AVP-PRP]on [ptr t=KNHLC00V][c PUN]. [/u]
[u who=PS6M7] [s n=081] [ptr t=KNHLC00U] [w ITJ]Ah [w ITJ]yes
[ptr t=KNHLC00V] [w ITJ]yes [w ITJ]yes [w ITJ]yes
[w ITJ]yes[c PUN]. [/u] [u who=PS6M6] [s n=082]
[w AV0]Now[c PUN], [w AJ0]old [w NN1]paint [ptr t=KNHLC00W]
[w AV0]just [w AV0]straight [w AVP]on [w AJ0-NN1]top
[ptr t=KNHLC00X] [/u] [u who=PS6M7] [ptr t=KNHLC00W] [unclear]
[ptr t=KNHLC00X] [/u] [u who=PS6M6] [s n=083]
[w ITJ]Aye[c PUN]. [s n=084] [w PNP]It [w AV0]just [w PNP]it
[w VVZ]looks [w AJ0]terrible[c PUN]. [/u]
Figure 6 shows a simplified display of
this extract: the carets correspond to
{PS27J}: And I I mean it was absolutely gorgeous. Everything, the railings you know, when they put when they painted the railings, the burned the old paint off, ^ the new paint on ^. {PS27K}: ^ Ah yes ^ yes yes yes yes. {PS27J}: Now, old paint ^ just straight on top ^ {PS27K}: ^ (...) ^ {PS27J}: Aye. It just it looks terrible.
This detailed markup makes the BNC a very flexible instrument. The encoding of text structure means that it is possible to search not only for all the occurrences and co- occurrences of words or phrases in the corpus as a whole, but also for ones in certain structural positions (for instance co-occurrences within the same sentence, occurrences at the beginning/end of paragraphs/utterances, or following a pause), as well as ones with particular part-of-speech values. Similarly, the information in the header allows the user to restrict a search to certain texts or types of text, or to the speech or writing of certain participants or categories of participants. As we shall see, this can provide material for a wide variety of activities.
Perhaps the most obvious way in which the BNC can be used by teachers and learners is as a reference tool in text production or reception, specifically during activities of writing, reading, and translation. Given its size and variety, the corpus can frequently provide solutions to specific problems which may emerge, as an alternative and/or complement to conventional reference tools such as dictionaries, grammars, and encyclopaedias. However, the user needs to think carefully about how to formulate the necessary queries and how to interpret the data provided, as the examples below will illustrate.
As well as providing information about the frequency of particular forms, the corpus can provide information about the frequency of particular collocates. This can cast light on synonym use. Suppose a learner is uncertain whether s/he should talk of pursuing or chasing an objective. A search for forms of pursue (pursue, pursued, pursues, pursuing) occurring within a span of nine words on either side of objective or objectives, finds 100 solutions, whereas an equivalent query for forms of chase (chase, chased, chases, chasing) finds only one. The difference here seems large enough to dispel all doubt as to the more appropriate choice. However this inference still depends on the appropriacy of the query for the purpose at hand - for example, whether it is appropriate to include the alternative forms of the lemma (chase can be a noun as well as a verb), and to use a span of nine words. To take another example, if we are concerned to discover whether we might better describe a man as beautiful or handsome, the latter is much more common as a collocate of man within a span of two words, but much less so within a span of nine words. The reason is that beautiful is a much more frequent word than handsome in the corpus as a whole, and therefore more likely to appear in the non-adjacent context (Aston and Burnard 1998: 82-84). In the case of pursue and chase, the marked difference in frequency is still present with smaller spans.
As well as specific collocates, the corpus can highlight
syntactic and semantic patternings. Discussing the respective
economic prospects of teachers, interpreters, and translators,
one student came up with the sentence Translators earn far
and away the least. Recourse to the BNC showed that there
were 73 occurrences of far and away, and that the
majority preceded superlatives, confirming this colligational
pattern. Looking more closely at the behaviour of the
expression in a random 30 concordance lines, however, revealed
that patterning was semantic as well as syntactic. Far and
away was almost always used to intensify adjectives and
adverbs with positive connotations - having a positive
semantic prosody, in Sinclair's (1991) terms (Figure 7).
Figure 7
in a full League season was to remain far and away the best by any Palace goalkeeper for over ha
a proudly acknowledged agency, and is far and away the most successful PR exercise (perhaps the
free and unfree peasants.These formed far and away the largest group in the population of Europ
enry I's time, that for 1129 - 30, is far and away the earliest royal account to survive in any
tgun. First point: netted rabbits are far and away more saleable. There is no shot in them, the
re twelve different types although by far and away the most common are called `liberty caps", s
h domestic market remains to this day far and away the largest consumer of Champagne.|
es, their vice-chancellors and deans. Far and away the most important powers, however, are thos
unfortunately it looks a bit messy.| Far and away the most interesting aspect of this guitar i
uple, who worked together in the film Far and Away, are billed as Tinseltown's most romantic co
nd which in the long run had made her far and away the most loved of all the members of the Roy
possibly just down the road! It is by far and away the best single-source reference on this eve
the West End premiere of their film, Far And Away.| The following day they trotted off to Laur
nearest rival, Tesco, they've become far and away the most popular places to do the weekly sho
ty at the top level then Wright is by far and away ahead.|`Scoring at this level is not a one-se
hich Britain underwent in the 1980s?| Far and away the most important point is that the museums
aspire to the red jersey of Wales was far and away the most dashing thing you could do. Burton
onships, but because it would have by far and away the largest European economy outside the EEC.
ondon market - in 1972 - but has been far and away the most consistently successful. It was hel
enjoyed a lucrative tourist trade as far and away the most popular resort of pilgrimage, the s
erty, virtually accounts for what was far and away the greatest personal estate owned by any com
If there is a local one, then that is far and away the best place to go, otherwise there is no
next bend and a Fly-Drive package is far and away the most convenient and comfortable way to s
matter. The Chancellor of Germany, by far and away, in economic terms, the most powerful countr
ly, shows that the United Kingdom has far and away more undertakings with more than 1,000 emplo
. Biffen), whom I certainly regard as far and away the most successful Leader of the House in a
n. Member for Chesterfield, which was far and away the most interesting part of the debate -^ A
into the fort with all its comforts. Far and away superior to those we had at our base RAF Hin
ular intervals. It was, of course, by far and away a situation too good to last and in time, gaz
to add to his laurels.| He has had by far and away his best season since moving to Newmarket fr
Furthermore, in these citations far and away occurred with verbs with
stative meanings - be, have, remain, become, form, etc. - unlike
the more process-oriented earn of the student's proposal. Overall, in this case the corpus
data turned out not to support the option proposed by the learner. The analysis did, however, suggest
some possible alternatives. This student managed to reformulate her sentence as Interpreters are far
and away the most highly paid. This was a result of her recognising quite complex and abstract
syntactic and semantic patterns in the data - as well as of discounting irrelevant instances, such as
those where Far and away is the title of a film.
Another area where the corpus can provide evidence of appropriacy is with respect to register -
though again careful thought may be necessary in designing queries and interpreting results. Should a
learner writing an essay describe the probability of a plane crash as pretty unlikely? Or is such
an expression too informal? Figure 8 shows the respective frequencies of pretty as an adverb in
the whole corpus, in spoken texts, and in written texts from two different groupings of subject
domains - on the one hand Imaginative and Leisure, on the other the remaining BNC domain categories
(Arts, Belief and thought, Commerce and finance, Natural science, Applied science, Social science, World
affairs) - a grouping we would expect to be generally more formal.
Figure 8
occurrences million words occurrences/
million words
whole corpus 4322 100 43
spoken 1110 10 111
written (imaginative 2249 30 75
and leisure domains)
written (other domains) 963 60 16
Comparing the numbers of occurrences of the adverb
pretty with the total numbers of words for each category, we find less use
in writing than in speech, and the least in the less formal written domains,
suggesting that the learner might be advised to avoid it in formal writing. Examining
a random concordance of pretty as an adverb in the formal domain group enables
this generalisation to be refined somewhat (Figure 9); pretty seems often used where,
for some reason, the discourse shifts to a less formal, more conversational style - in direct
speech and authorial asides, for example - and along with other markers of informality, such as
contracted forms, first person singular pronouns, etc.
Figure
9
be replaced. Mr Morton, said he was `pretty confident" that would not happen. | If a d
until eleven o'clock everything went pretty well, When just as you start thinking to you
n Tillage or pasture, and the Country pretty fully inhabited, it cannot be desirable that
. However, Mr. Danse, the Vicar, was pretty shrewd and was able to strike a deal with Si
he evidence, as audience studies have pretty conclusively shown. Indeed it is now uncont
ites. All the internal organs looked pretty normal to the naked eye. There were some gr
e sounds but leaving the overall feel pretty much unchanged. | In RMS/Soft Knee mode th
47 Backchat Mat Coward on PR - pretty ridiculous 47 Forteana Paul Sieveking
he intention is to deceive and we are pretty hard on that." | Caterham is still technic
cks, holographic jewellery - it looks pretty much like a theme park. For the shoot, Coli
s. Other people have to do something pretty dramatic for us to notice. Putting a case m
aining and I rated the whole thing as pretty good." | An Ideal Husband | | Ivan Wate
. Only property above a minimum (and pretty exorbitant) price may be purchased. You mig
Ross is keeping price and performance pretty much under cover, though there is talk of th
of johs I ought to do - I used to be pretty thorough - and there are things I haven't go
and cross country captain. I got on pretty well with Reg Witter, the games and PE maste
achievement to say that his theory is pretty weird all the same. It has to be to get the
who refused to use soap, and that was pretty horrible. (Both used to lie in baths hoping
pt that in all probability it will be pretty well apparent to the reader quite soon who i
given a sentence. Otherwise I got on pretty well, had a laugh: you had to. I don't thin
of writing, sent one on later so it's pretty safe to assume that the trial was free. Don
ck but, given a steady hand, it works pretty well with a mouse. Once drawn, of course, t
ur advertisers to put on record that, pretty well without exception, they have a lively a
used in the whole of world war two. Pretty well the entire post-biblical civilian infra
greater than their share value shows pretty clearly how much value corporate managers ca
time when the advertising cupboard is pretty bare. | Banbridge 21 CIYMS 8 | | On a w
binding declaration was dismissed as "pretty irrelevant" by a UK government official.) ^
We can also, en passant, notice the recurrence of pretty well as a
collocation - almost 20% of the occurrences of pretty as an adverb in these domains, as may be
discovered using the SARA Collocation option. In comparison to a dictionary, here the corpus offers far
more subtle information, potentially proposing variables which may have been missed in the learner's
original formulation of the problem. In order for these to emerge, however, the learner must not expect
the answer to simply leap out of the corpus: s/he needs to reflect on appropriate criteria to
distinguish particular text-types, and to browse wider contexts than the single concordance line to
distinguish particular discursive styles with confidence. Pretty may hold other surprises for the
learner, as we shall see below.
The example in Figure 10 comes from a headline in the Financial Times:
Figure
10
Profit warnings hit Tokyo markets
Collapse of Falichi Corp rekindles fears in banking sector
One problem here for the learner may be understanding the meaning of rekindles in this context. A
randomly selected concordance of forms of the verb rekindle (Figure 11), of which there are a
total of 147 occurrences in the corpus, shows that it is typically used metaphorically, as it is in this
example.
Figure 11
gerac on Saturday.|Rugby Union: Young rekindles Waspish spirit||By BARRIE FAIRALL||Wasps....
or, said: `We have the opportunity to rekindle Liverpool's spirited sea-faring tradition an
dy Derby on June 3.| Just as the race rekindled Classic hopes for Stoute, the flame was snuf
ss close to Explorers, which hoped to rekindle pride in the old customs, language and tradi
begin again.How is extinguished fire rekindled?It evaporates in a gaseous form from the Ear
you..." Her voice faded as her words rekindled memories.| The trimphone extension warbled u
ack on track again."|`We hope it will rekindle the atmosphere of old, not just on the field
treet star Chris Quinten is trying to rekindle his career- by appearing in panto.| He will
-riding Norwich. Kendall said: `We've rekindled the fans" hope and belief and eased their ap
o be restated. There is now a need to rekindle the idea that teaching is a vocation which m
60 Kennedy-Nixon debates is enough to rekindle the exaggerated sense of urgency then felt t
was born in post-war Europe could be rekindled, larger and brighter in a post-cold-war worl
d memories, some of which he hopes to rekindle if his plans for a visit next year come to f
picture-postcard thatched cottage is rekindling some very happy memories|| Home for novelist
restaurant. She hoped Angus wanted to rekindle their love affair, as she did.| Rules was de
mpt to shed his diplomatic veneer and rekindle memories of his early rough and tumble North
as good as died for him. That thought rekindled his fury, briefly. However dreadful his task
t long enough to sate his desires and rekindle her expectations. And so this grumbling thre
he one hand, such action would simply rekindle the international outcry that resulted in th
e it was extinguished. Our task is to rekindle it. Will you not help me?"| He paused again,
progressed many old friendships were rekindled and new ones formed with `cross fertilisatio
his intention to evict her, that had rekindled the dream in the first place.|`Then I'll say
iser Brendan Foster tipped his pal to rekindle memories of his glory days in his new event.
es from the Gulf, and thereby avoided rekindling the debate about the constitutionality of de
REKINDLE AN AGE OF ELEGANCE| Here they are! The fines
reaks suggest that some may have been rekindled from underground smoulderings dating from at
irty years on, a book on Joe Meek has rekindled interest in Britain's first independent pop
o early in the campaign.| In order to rekindle the title dream, the restoration of confiden
iverpool's players and supporters can rekindle the Auxerre spirit in front of a sell-out 38
situation that could cause stress and rekindle bitter feelings."| Single parents will be ob
The kinds of things that are rekindled are emotional states -
hope(s), interest, memories and the like. The connotations of rekindle seem
generally positive, but there are enough negative examples to suggest that this prosody is not constant
- we also find bitter feelings, fury and international outcry, for example.
The positive semantic prosody for rekindle emerges strongly if we examine the occurrences
with its most frequent collocate, memories (Figure 12).
Figure 12
oleaxed."Paul added that it rekindled memories of a Borussia Moenchengladbach v Inter Milan
nversations, many happy and formative memories can be rekindled.| Or they may wish to discu
e dead. The sudden rekindling of past memories and passion for the man she had been about t
er voice faded as her words rekindled memories.| The trimphone extension warbled urgently f
acticality, interlaced with many fond memories, some of which he hopes to rekindle if his p
y loaned by Mr. E. Roberts) rekindled memories of the last down `Cornishman" which ran on S
ed his diplomatic veneer and rekindle memories of his early rough and tumble North Country
. His pizza slices certainly rekindle memories of the good old days in football... they tas
he metropolis and beyond, to rekindle memories of times past.| Early arrivals heard one of
htness about it as well. It rekindles memories of those old-fashioned Hollywood romances of
dan Foster tipped his pal to rekindle memories of his glory days in his new event.| Eight y
o 302 all out in 47 balls to rekindle memories of their Cup disaster last month when they l
ar.| The 12-strong cast will rekindle memories of the Andrews Sisters, Tommy Handley, Rita
ly, that the programme would rekindle memories of the singles holiday in Torremolinos or th
Nearly all the 15 citations suggest nostalgia, with a revival of happy/fond
memories of the good old days/times past/glory days. The same nostalgic sense seems present in a
number of the other citations in Figure 11 - the atmosphere of old, for instance.
On the other hand, nostalgia would hardly seem to be at issue where negative emotions are involved.
Looking at these instances (Figure 13),
Figure 13
al" about the BMA's backdown to avoid rekindling the controversy.They are keenly aware the BM
eek after Aldershot were wound up and rekindle fears for several Fourth Division clubs faci
at the heart of Europe".|It will also rekindle suspicions among the Euro-sceptical wing of
vements of people, exacerbated by the rekindling of the civil war between the north and the s
ted, and the media hype threatened to rekindle itself. As if frightened by more unwanted ex
ease with which the nationalists have rekindled historical resentment and traditional chauvi
peacekeeping operation in Croatia and rekindle the flames there. The position of all minori
60 Kennedy-Nixon debates is enough to rekindle the exaggerated sense of urgency then felt t
gnalled its intention to press ahead, rekindling the fury of the country's 4,300 mostly white
guise of financial conglomerates has rekindled this debate. Nine types of conflict of inter
tion that texts be used in such a way rekindled related anxieties. But the issue was now rai
as good as died for him. That thought rekindled his fury, briefly. However dreadful his task
nce that German nationalism should be rekindled at the very time we're about to reduce our t
he one hand, such action would simply rekindle the international outcry that resulted in th
th every opportunity in the world for rekindling those ugly sparks of revolution.| Thank God
smissal of Elise as a mere client had rekindled all her misgivings. And yet Luke's presence,
into gear, and the glint in his eyes rekindled the unwelcome wildfire in her veins.| She no
tality - afraid that the memory would rekindle some private pain. He spared us both by refe
es from the Gulf, and thereby avoided rekindling the debate about the constitutionality of de
gh-Pemberton issued his warning about rekindling inflation, Downing Street abruptly changed i
situation that could cause stress and rekindle bitter feelings."| Single parents will be ob
what they seem to have in common is the
position of the speaker, who takes a detached or even ironic stance with
respect to the feelings described. From this perspective, the
Financial Times headline can perhaps be seen as taking a
certain distance from the emotions of the Tokyo stock market, and one
wonders whether the same sub-editor would have used the expression
rekindles fears to describe events in the City of London.
Interestingly, a similar distancing appears present in some of the
apparently positive examples: returning to the examples of rekindle
memories, and looking at a rather larger context, we can see that
some of these too appear to be ironic (Figure
14).
Figure 14
Mig Romerez did not even recognise a football when I showed
him one but his exotic appearance should be enough to impress
those bumpkins in `The Tip" crowd. His pizza slices certainly
rekindle memories of the good old days in football... they
taste like Dubbin.
Eldorado pitched itself to the tabloids as a `sun, sea, sex
and sangria" story. Hungry hacks were flown out to the set to
experience the four S's for themselves. It was hoped that the
Bonkidorm and Costa del Bonk set would tune in avidly, that
the programme would rekindle memories of the singles holiday
in Torremolinos or the villa trip to El Capistrano.
Overall, rekindle seems to be
used in two contrasting ways: either the speaker/writer can
identify with the (positive) emotional states described, or
they can distance themselves from them. It is this second use
which would explain its occurrence with negatively as well as
positively connotated events. Such contrasting uses appear to
be found with many clich‚d expressions, and corpus examples
can provide a useful way of helping learners appreciate them,
and hence to decide the connotations of particular cases.
Like that of pretty, such a study of rekindle goes in many ways beyond solving a problem in interpreting or producing a specific text. It comes closer to a second type of corpus use, in which a particular linguistic feature or group of features is studied for its own sake, in order to learn how it works in the language. The aim of such study is not to rival the work of the professional lexicographer or grammarian, but to deepen understanding of the feature or features in question through personal discovery. For further examples, we may return to the concordance of rekindle memories (Figure 12 above), where we find several features which could be of interest. For instance, most learners will be familiar with the expression be fond of, but how many will feel comfortable with the attributive use of the adjective, as in fond memories (line 5)? By selecting a random sample of occurrences of fond, and then sorting them by the part-of-speech of the word which follows, we can group those where fond is followed by a noun, and then investigate the frequencies of particular nouns as collocates in this position (Figure 15).
It emerges that memories is quite the most frequent noun to follow fond, some way before farewell, farewells, memory and parents (this order is maintained even if we group the collocates semantically, including fathers, mothers, and other relatives with parents). Numerically, the 70 occurrences of fond memory/ies and 42 of fond farewell/s suggest that these forms may be worth memorising by the learner as fixed expressions. The corpus not only helps the learner identify the most common uses of the feature being studied, but also to decide which may be worth learning and which not.
Another expression in the concordance of rekindle
memories which learners may not know is glory days.
While its denotation is easy enough to understand, the 42
occurrences in the corpus provide information as to its
contexts of use of a less predictable nature (Figure 16).
Figure 16
| PEOPLE AND PLACES | | John's glory days | | SOCCER player John Groves fears a broke
the club has had a brief taste of the glory days but now is immersed in the worst crisis in i
it Brooklands Today The circuit's glory days live on A Week in a Bentley Brooklands To
since they won the FA Cup back in the glory days of 1947. | Certainly not the army of suppo
din. He should be fit. | Swindon's glory days in the FA Cup were a long time ago while Cam
tico Alberto" banners stored from the glory days of three years ago, when he won ten times in
its he cries when he sees film of the glory days in Italy when Gazza was ready to become the
man to Bobby Robson in Ipswich Town's glory days in the early 80s, is wanted by Sunderland.
Elland Road - just like it was in the glory days of super manager Don Revie and hard-man skip
itor of Sounds during its late '70s glory days. More importantly, he had also run a pub, w
ources and the backing of Fiat, their glory days are in the past. Last won the Constructors'
would have missed out on all Rovers' glory days of promotion, Wembley and Europe. | The wa
king! | | All-original hits from the glory days of pop, plus a FREE ALBUM of classical Elvis
John Dawes Room, with pictures of the glory days, the great man is optimistic. `I shall be v
ouness has struggled to recapture the glory days at Anfield. | He was suspended for five ma
d his pal to rekindle memories of his glory days in his new event. | Eight years after sett
arly twenties, and well remember the `glory days" of Newcastle United with their world class
s deserve my loyalty, says Bassett GLORY DAYS... Dave with the FA Cup won at Wimbledon
have to be Bruce Springsteen singing `Glory days - well, they pass you by..." | Steve. |
is own name for Stewart. In 1971 the glory days returned as Stewart won six of the eleven ro
et its no coincidence that during our Glory Days we had the same players year in year out. ^
eegan's return. | And the man whose glory days at Goodison included League championship, FA
supporters brought up in the pre-1968 glory days are mostly content to support the White Rose
allowing in memories of the long-gone glory days. (OK, so I'm guilty of psychic breaking and
ults cannot compare with those of the glory days of 1989, but nobody was complaining. `It's
y side desperately keen to revive the glory days of the late 80s. | The influence of Kilken
support of the whole village." ^ GLORY DAYS: Ice star John Curry in 1976 | LAST OF T
Apart from being the title of a song by Bruce Springsteen (one of the innumerable
snippets of encyclopaedic knowledge that may be picked up from the BNC), glory days seems
principally to refer to the past successes of sportsmen or sports teams, being primarily associated with
sports journalism as a genre (bar the odd case from music journalism, where we have a similar meaning of
group triumph). There is no occurrence in speech. Further queries indicate that its form is as
fixed as its context, there being no cases of intervening modifiers between the two words, no
instances of glory day, and only the occasional glory nights and glory years.
The advantage of using the corpus in this manner is that the learner is encouraged to investigate variation of form and function in relation to the dispersion of a feature in the language as a whole, rather than simply in relation to a specific context, as in contingent reference use. The learner who investigates the relationship between the use of an item and sociolinguistic factors may come to appreciate the importance of a range of situational variables. For instance, as well as indicating what kinds of texts an expression is used in, the corpus can also reveal what kinds of users employ it. In the spoken component of the BNC, there are 109 occurrences of the word navy. 51 are produced by male speakers, and 44 by female speakers (the remainder being by speakers whose identity is uncertain). As the total amounts of speech by male and female speakers in the BNC are very similar,(note 3) these figures might suggest that the word is fairly equally used by both sexes. However, when we distinguish between the nautical and colour senses of the word, we find a clear distinction, with the colour sense far more common in utterances by women. Or, to return to the example of pretty, we find that the word pretty is rather more frequent in men's speech (730 occurrences) than in women's (514).(note 4) However when we compare its use as an adjective and as an adverb, we find that 40% of female use is adjectival, whereas only 7% of male use is. This means that overall, pretty as an adverb occurs over twice as often in men's speech as in women's (681 vs 306 occurrences), while pretty as an adjective is over four times as common in women's speech as in men's (208 vs 49 occurrences). The learner who wishes (not) to conform to gender stereotypes might draw her/his own conclusions as to whether and how to use pretty in speech.
Similar comparisons can be made for different age groups: we find, for instance, that wicked has negative connotations for older speakers, but positive connotations for younger ones. And while such comparisons will of course not always provide relevant distinctions, discovering that they sometimes do may encourage learners to think about when they might be relevant, and to refine their use of the corpus accordingly.
A corpus like the BNC lends itself to browsing, rather as one might browse in a bookshop or library. Rather than just focussing on a single problem or feature, the user can explore serendipitously, passing freely from one curiosity to the next. In the last section we used the concordance of rekindle as a starting point for studies of other features, such as fond and glory days, which that concordance happened to contain. In their turn, these investigations might have led on to studies of further features - a brief taste and recapture, for instance, which are collocates of glory days (Figure 16). Virtually any concordance will throw up potential curiosities of this kind, and the SARA Browser option, which allows the user to scan the entire source text from which a citation is taken, further increases the opportunities for discovering them.
In these examples, exploration of the corpus is syntagmatic, in the sense that attention shifts from a previously searched-for feature to one in its context. However it is also possible to explore paradigmatically, shifting attention from a previously searched-for feature to a feature or features which are formally or semantically related to it. An investigation of pretty as an adverb invites comparison with pretty as an adjective, which, we saw in the last section, turn out to have very different distributions across male and female speakers. At the single word level, comparisons of this kind may be prompted by the listings provided from the corpus index of forms and of the parts of speech associated with them. For instance, searching for forms of the verb budge using the SARA Word Query option will display a list of all the word-forms in the corpus beginning with the letters budg - including not only budge, budged, budges, and budging but also budgerigar and budgerigars. A curious learner might be inclined to investigate these words, just as s/he might be inclined to investigate near synonyms of budge which come to mind, such as shift, or antonyms, such as stand. Formal and semantic association may prompt investigation of similar phrases as well as words: the concordance of rekindle memories (Figure 12 above) might stimulate not only an investigation into the collocate times past, but also a comparison with past times - which turns out to lack the former's nostalgic connotations. Other strands of serendipitous investigation include possible variants of a phrase (are there instances of glory weeks?), as well as varying positions in text structure (in headings, at the beginning/end of paragraphs/utterances), in different text-types or speaker-types, and even in specific texts or speakers. Once the learner has mastered the software, and realised the different kinds of information the corpus can provide, a combination of paradigmatic and syntagmatic exploration can become a routine leisure activity which proceeds happily and profitably for hours at a time.
The examples so far discussed have all supposed that the focus of interrogation will involve linguistic features of some kind. However other, non-linguistic approaches are also possible: as we have seen, the BNC can provide large quantities of encyclopaedic information. A search for the name of a person or place will usually throw up a range of interesting facts, from Manchester to Masoch. Corpora can also illustrate cultural stereotypes and prejudices, as Stubbs (1996) has pointed out. A learner might try searching for Irish or Kraut, or comparing the collocates of man and woman, or the use of racial, tribal and ethnic (Krishnamurthy 1996).
It is equally possible to use the corpus for less serious purposes, looking, for instance, for
occurrences of one's own name, or of expressions related to topics of particular personal interest
(beer, sex, linguistics, etc.). It can even act as a kind of oracle. By searching for occurrences of
sentence-initial phrases (My problem is ..., What I want is ..., Why don't you
...), for instance, one can then examine their continuations for suitable advice. Occurrences of
questions (How are you?, What's for dinner?), can similarly be examined for the responses
which follow. Figure 17 shows a random selection of instances of one such oracular question.
Figure 17
ble, `Do you love me?" With his wife he had known precisely where he was. No marriage had begu
ugars do you love me?" | `A million pounds." And he'd bounce me on my bed and make a `little
Do... do you love me?" | Her head was bent and her words were hardly audible above the noise o
| `Do you love me?" | `Yes, I do, John, with all my heart." | `That settles it." | The
`And do you love me?" | He did not answer this question. | `Oh Angel - my mother says she k
| `Do you love me?" he murmured, his mouth exploring her ear. | She nodded dumbly. He held
| `Do you love me?" Andy asks, looking up at him. | `Of course I love you," John says. |
er. `Do you love me?" | `No." | `That's right. Killed anyone lately?" | `Three last nigh
s. ` Do you love me?" | He stood very still for a few seconds, a faint frown lining his foreh
ly. `Do you love me?" | Caroline jerked her hand back, and Nicolo caught it and held it in hi
ly. `Do you love me?" | `Yes," she said, `of course I do. I love you with all my heart." |
s? " Do you love me? There was a loud scream of affirmation, and it was only then, as the audi
s? " Do you love me? Surely he didn't need to ask. The audience of willing females had shoute
s? " Do you love me? Shelley held on tight to the seatbelt, and looked sideways at Miguel. Th
s? " Do you love me? And Shelley shook her head to clear it. Was she so very tired that she c
, but do you love me?" | The words were what she had longed to hear, and she stayed silent, sa
me. Do you love me?" | She came to life, put her arms round his neck, and stroked his hair.
much do you love me? (.) That much? Okay. (.) You're only having little bits. (.) You're no
Rach do you love me? (.) Do you love mummy? (.) Do you love nanny? (.) No! [laugh]
The responses to it provide many opportunities for discussion - learners could at the very least
debate which response they would (not) prefer, engaging in significant amounts of communicative
interaction in the process. The concordance also displays a number of features which might warrant
further serendipitous exploration (for example, the phrase with all my heart), as well as
potentially stimulating the user to find out more about certain situations by browsing the source texts.
The intriguing nature of this last concordance finds little echo in the literature as a whole, which reveals relatively little enthusiasm for the idea of giving teachers and learners direct access to large corpora. One particularly sceptical observation is the following: [...] simply dumping 200 million words of corpus data in front of people isn't going to be much help for most teachers and students. It takes time, commitment and some good software tools to become really expert in the analysis of this type of material. (Clear 1996: 27) In this paper I have illustrated four ways in which I believe learners can, with practice, use large corpora productively. This is not of course to say that large corpora constitute a panacea for all ills or for all learners, and Clear's comment raises a series of questions which merit serious reflection.
There is, I think, little doubt that a corpus like the BNC can only be used profitably by fairly advanced adult learners - as well as, of course, by teachers. The linguistic complexity of many citations and their relative unpredictability, given the limited context available in a one-line concordance display and the variety of texts contained in the corpus, mean that it is much more difficult to make sense of concordance lines than to consult a learner dictionary, grammar, or textbook. However, understanding does not necessarily have to be complete in order to be value, and learners can in most cases be left free to select those citations they are best able to make sense of. Unlike professional linguists, they are under no obligation to account for all of the data, or to do so in a manner which meets linguistic criteria of descriptive adequacy. Learning a language proceeds by progressive approximations, and partial generalisations derived from limited data are essential to that process, always provided that their partial nature is recognised (Aston 1995). What seems important is that the learner can make enough sense of the data, and draw conclusions of sufficient relevance, to maintain interest and motivation, so that interpretative skills have the opportunity to improve with practice. Working in pairs or small groups may help in these respects.
While large corpora are rich resources, there are nonetheless limits to the kinds of information which can be obtained from them. These depend largely on the design and encoding of the corpus, and the software used to interrogate it. While an excellent source of lexical information, the BNC, for example, can only really be used to study a limited set of grammatical patterns, namely those which have distinctive lexical correlates. While it is easy enough to find all the occurrences of enjoy, and to sort them according to the part-of-speech category of the following word, it is impossible to find all cases of verbs followed by a gerund, since the SARA index does not include part-of-speech categories such as "all verbs" or "all V-ing forms". And not all lexical correlates are sufficiently unambiguous to allow them to be used in queries: any search for restrictive relative clauses would drown the user in irrelevant data, given the number of other uses of wh- pronouns and of that in the language (not to mention the impossibility of identifying relative clauses with pronoun deletion, as in the man I saw). Particular semantic and pragmatic categories (doubt, cognisance, disagreements, summaries, etc.) are difficult to locate for the same reason. Nor is the BNC the place to study many features of spoken discourse: transcripts are orthographic, paralinguistic features are only roughly indicated, and situational description is limited. This means, for example, that while one can compare speech by men and by women, one cannot compare speech to women and to men.
A large mixed corpus is also inappropriate for the study of highly specific text-types or genres, any one of which is unlikely to be adequately represented, and may not be recognisable from the encoding. There are very few business letters in the BNC, just as there are very few service encounters, and those wishing to explore their specific conventions would do better to compile a small corpus including only texts of those types. It should also be borne in mind that the BNC contains contemporary British English: those interested in other geographical or historical varieties should look elsewhere - though they might still want to use the BNC to carry out contrastive analyses.
Large corpora are complex, as are the software programmes to interrogate them. It takes time and practice to learn how to formulate queries which will effectively find what one is looking for, without omitting too many relevant instances or including too many irrelevant ones. Our experience with undergraduate learners of English at Bologna University is that they need a minimum of eight hours hands-on instruction and a similar amount of individual practice in order to feel reasonably at ease with SARA and avoid the more obvious pitfalls in its use. The required training is not simply a matter of learning about the corpus and the software, but also one of learning how to learn from them. They will need practice in recognising patterns of collocation, colligation, semantic preference and semantic prosody, in hypothesising possible formal variants, and in watching out for associations with particular positions, texts or text-types, of users and user-types. They may also need to learn to make and to value partial generalisations of a relatively low-level nature. They may, for instance, need to learn to notice that the most typical thing to be rekindled is memories, rather than attempting a blanket generalisation to "past feelings" which hides this specific fact (Aston 1997). And they must learn to handle numbers, understanding what frequencies and differences may be significant, not so much in a statistical sense as in the more general one of always asking whether the numbers are large enough to warrant inferences.
Training takes time and energy, as does corpus use. There have as yet been no empirical studies to show whether corpus-aided activities of the kinds I have outlined here are worth the investment in terms of results. From a theoretical perspective, however, it can be hypothesised that the on-stage use of a large corpus might have the benefits listed below, whose extent would seem to make further research and experimentation desirable.
3. 307539 utterances by female speakers, 304278 utterances by male speakers.