WHAT YOU HEAR ISN’T WHAT YOU SEE: THE REPRESENTATION AND COGNITION OF FAST MOVEMENTS IN HINDUSTANI MUSIC
Wim van der Meer, Dept. of Musicology, University of Amsterdam, The Netherlands, wvdm[at]me.com
Suvarnalata Rao, National Centre for the Performing Arts, Mumbai 400021, suvarnarao[at]hotmail.com
Keywords: visual representation, melography, ornamentation, pitch perception.
In Hindustani music the space ‘between the notes’ is often more important than the discrete notes themselves. With the help of melography, and more in particular the use of advanced models of pitch perception in computer software, we can actually ‘see’ the precise forms of meend and other aspects of pitch bending. Generally, both musicians and musicologists agree that this graphic representation does much better justice to the music than staff notation or sargam notation. However, there are also serious limitations, especially when we look closely at rapid ornamentations or tans. From perception studies of the seventies it is known that the ear starts averaging out rapidly oscillating pitches at speeds of 6 movements per second. In this paper, we demonstrate the limitation of this hypothesis and attempt to formulate a model of how the visual representation corresponds to the perceived note patterns in case of fast movements involving murki and gamak.
When melography started in 1949 it could barely represent the simplest monophonic melodies (Seeger 1951, 1957, 1962, Cohen 2005, Moore 1974). Moreover, the melograph (fig. 1) was a cumbersome instrument and since the demand for it was limited, it remained a rather expensive tool for musicologists – a brand of academics not particularly known for their spending powers anyway. It was only with the rise of computers, that melography received a new impetus. Computers, especially personal computers, had a wide range of applicability and therefore a dedicated machine was no longer necessary for melography.
The first non-dedicated pitch-trackers were probably made in early 1980-s, mostly by music engineers in sonology departments. These were people using Commodore, Atari and Apple computers for the construction of sounds, composing, creating scores, generating music automatically and so on. Pitch tracking programs were made in the small hours and rarely used beyond a single computer. One such set-up was constructed at the NCPA laboratory in Mumbai by the French engineer, Bernard Bel (1983). It consisted of a combination of dedicated filtering hardware attached to an Apple II computer note that even these very simple personal computers cost thousands of dollars, while a mini-computer had a price-tag of tens of thousands of dollars). The computer did no more than register the number of zero-passes per second. This method would now be considered obsolete but for vocal music it worked well.
Fig. 1: The Seeger Melograph Model C
The really interesting thing was however that once the series of measurements were caught by the computer all sorts of calculations were possible, such as the creation of ‘tonagrams’ that showed the tonal backbone of a piece of music (fig. 2). At the NCPA laboratory hundreds of recordings were processed in this manner (Meer 2000, Rao 2000)
Fig. 2: Tonagram of the raga Ahir Bhairav
At the same time phoneticians were tackling the problem from their side, but since they were using costly mainframe and minicomputers (e.g. the MicroVAX) the programs were not used on a wide scale. It was however in these programs (e.g. LVS), that recent ideas on pitch perception were implemented. Soon, the new models of pitch perception were implemented in computer programs that ran on cheap computers, one of them written by Meer (PitchXtractor) running on early Macintoshes. Since these computers were running at 4 MHz (as compared to a couple of GHz nowadays) extracting the pitch-line of a minute of music took about one hour of processing. In getting the best pitch-line it was paramount to fine-tune a number of parameters (e.g. range and type of voice), again extending the processing time enormously. And yet, the sound-files were mainly monophonic aif files of 11 or 22 kHz with a 8-bit resolution.
Taking advantage of the rapid increase of processing power of personal computers and building on the experience of other programmers in the Netherlands, Paul Boersma and David Weenink started developing PRAAT in 1992. In the course of the years this has become the standard tool for many phoneticians and musicologists.
One of the major problems of all pitch extraction programs has been the messiness of pitch lines and the glitches (fig. 3). Following are some reasons why pitch lines can look messy:
- the recording has more than a single voice (accompaniment)
- the recording settings are bad (clipping, noise, very low level)
- there is echo (or sympathetic strings)
- there is wow and flutter
- the voice itself flutters a lot
- the program can’t handle the sound events
- The fifth point of fluttering of voice is something of a disappointment to many singers, when they realize how irregular their pitch lines look! Possibly it is one of the reasons why Western singers use so much vibrato, it masks the roughness of the voice.
Fig. 3: Glitches in the middle portion of the audio signal
It is however the last point concerning inability of the program to handle the particular sound event is what we want to draw special attention to. Especially in fast moments with rapid pitch change the algorithm for calculating the fundamental pitch often makes mistakes, mostly by an octave or a fifth. In the course of the years three factors have contributed to improving the pitch lines: better recordings (multi-track recordings with strict separation of instruments), higher resolution of the recording (e.g. 44 KHz, 16 bit instead of 22 KHz, 8 bit) and better algorithms. A subsidiary to the improved algorithm is the fine-tuning of the parameters of the analysis (see http://www.fon.hum.uva.nl/praat/manual/Sound__To_Pitch__ac____.html).
As a result, we can now have very clear and detailed pitch lines that are a far cry from the vague images of the past (compare fig. 4 below to fig. 6 of the same fragment).
Fig. 4: Old-fashioned ‘sonogram’ (same sound file as in fig. 6)
Pitch perception and movement
The improved imagery of melodic movement however opens up a whole new can of worms. Take a look at the graph below (fig 5):
Fig. 5: Heavy gamakas, Parveen Sultana, raga Ahir Bhairav
The sargam would read SrmSrGmPSrSSSGmPDDP (with a lot of gamak). Evidently, as long as we were studying steady, long drawn notes of a slow alap the correspondence between the auditory and the visual experience seemed to match, but in the study of faster movements the connection becomes enigmatic.
Oscillation and vibrato
One of the leads in trying to solve this puzzle is to be found in the perception of vibrato. It had been established in 1978 (Sundberg 1999: 201) that the perception of vibrato is a steady pitch at the average of the upward and downward movement. In other words, although the pitch actually goes up and down (usually by a semitone to a whole tone on either side of the average) we perceive a steady pitch. For that, the vibrato has to move up and down within a narrow range of time, usually 6-10 movements per second. Obviously, if the vibrato would be very fast, let’s say 30 times per second, we could start hearing a low tone. On the other hand, if the vibrato is slowed down to about one or two movements per second we clearly start distinguishing a rising and falling pitch, as in andol (Some authors have erroneously thought that the perceived pitch of andol would be the average of pitches of that andol or the average between the highest and the lowest point (e.g. Levy 1982: 89, 107-9, 163-4). This can easily be demonstrated by slowing down a vibrato. See the following graph (fig. 6) for a typical Indian vibrato from second 4 to 5 (Western vibratos tend to have a larger amplitude).
Fig. 6A: Parveen Sultana: vibrato from sec 4 – 5.
Fig, 6B: The vibrato at the end in slow motion
If we compare this to a typical andol (fig. 7) we can notice that the oscillation takes place much slower and as a result we actually hear the creeping up and down of the pitch, including the in between movements.
Fig. 7: Uday Bhawalkar, Darbari kanada.
Being able to slow down a sound is a true marvel of modern computer programs. In the past we would slow down a piece of music by playing a tape at a lower speed, but that of course brought down the pitch proportionally, turning it into a very unnatural sound. There were some cassette recorders that had a system of slowing down whilst maintaining the same pitch, but they produced extremely choppy, stuttering sound. Even today, few computer programs really produce an acceptable sound quality (we have been using Amazing Slowdowner).
The observation that when a vibrato is slowed down we start hearing a fluctuating pitch opens up yet another can of worms. For this proves that when we slow down the sound we start hearing things differently – we start hearing things that we didn’t hear at normal speed. Of course that was to be expected – although many people who transcribe music with the help of a slow-downing system either seem not to know this or prefer to do as if.
When we listen to the fragment shown in figure 8 at one-fourth the normal speed we clearly notice how the gamakas actually lose all of their clarity. What is most intriguing however is to see if there is any way in which we can correlate the image with the perceived sound. We would expect that the perceived pitches would be the average of a full sine-like movement, much in the same way we perceive vibrato. A close look at the gamakas shown in figure 8 indicates however that the matter isn’t quite that simple:
Fig. 8A: The gamakas of fig. 5 analysed in relation to the perceived pitches (slowed down by 1,5 x).
Fig. 8B: Fig 8A slowed down by a factor 2 (=3 x slower than the original)
It is immediately obvious here that the upward peaks (indicated by thick arrows) of m, P and D (twice) are NOT the average of a full sine-like movement but instead simply touch on the intended not at their highest point. On the contrary, apparently some of the notes that are not at the extreme of an upward movement do seem to be the resultant of an averaging of the sine curve, see for instance the first S, the second r, the first G etc. And what is most surprising is that the note that comes before the extreme position seems neither to conform to the peak-level nor to the average-level. Rather, it takes on an intermediary position.
Evidently, the direction in which the movement goes and the ‘thrust’ of the gamakas play a decisive role in the way we perceive the sequence of pitches. In the fragment of figure 9 this becomes very evident:
Fig. 9A: Kishori Amonkar, Miyan ki Malhar, m- mRSNSR-
Fig. 9B: Fig 9A slowed down by a factor 2
The rapid short tan has all the perceived notes in the upper section of the curves, and the structure of the movement could best be described as /m/R/S/N/S/R.
Various types of attacks on notes, such as kan and murki also show how there is no steadfast rule of the perceived pitch in fast movements. A small fragment from a recording made at the NCPA by the talented young singer Yashaswi Sathe in the raga Kedar may serve as an example (fig. 10):
Fig. 10: Yashaswi Sathe: raga Kedar, kan and murki
The first point to be noted is that the large swing before PP is really perceived as an approach from below (written as “/”). Both the N and the D just simply stand on the edges of the sinewave – there is no overshooting. In the following NSR there is only a very slight rising of the sines. After the extended S it gets very tricky. How the double murki is really perceived is a matter of opinion, but obviously the perceived D’s (note position 9) are not really touched by the dipping curve. Much more salient is the transition from N to P at the end, in which we can very clearly hear D, whilst it is hardly visible!
Interpretation of computer graphs representing fast movements is certainly not self-evident. Perception studies dating back to the seventies suggest that the ear starts averaging rapidly oscillating pitches at speeds of about 6 movements per second. However, the examples cited in this paper clearly demonstrate that this model cannot be translated to fast movements like tan or murki. Here, on the one hand we clearly distinguish a series of discrete pitches, but on the other hand we can notice a considerable degree of ‘over- (or under-)shooting’ of pitches, especially when there is gamak.
Further, the pitch perception model that could explain vibrato is not applicable to fast ascending and descending movements. Sometimes the average of the oscillating pitch is perceived, but in many other situations the extremes of the oscillation gives the perceived pitch. Sometimes even the extreme doesn’t really come close to the perceived note; instead it often seems to ‘hint’ at the note.
It is tempting to think of making a typology of ‘tonemes’ or gamakas, a classification of the smallest meaningful units of melodic movement with the oscillating effect of gamak. In the past (e.g. Sangitaratnakara) and in recent times (e.g. the study of Gopalam 1991 on South Indian gamakas) such typologies have been made, evidently both have been based on instrumental techniques (where one can “see” the actual technique in terms of fingering). It would be of tremendous interest to extend this to the vocal techniques and the corresponding graphic representations.
The present study is no more than a preliminary incursion into the problems fast movements pose when we try to match visual representation and aural perception. Hopefully, in the near future, the compilation of a much greater body of data will help us to arrive at more meaningful formulations regarding such movements in the contemporary Hindustani music.
Bel, Bernard, ‘Musical Acoustics: Beyond Levy’s “Intonation of Indian Music”’, ISTAR Newsletter no. 2, 1984, p. 7-12.
Bel, Bernard, ‘Pitch Perception and Pitch Extraction in Melodic Music’, ISTAR Newsletter no. 3-4, 1985, p. 54-59.
Boersma, Paul & Weenink, David (2005). Praat: doing phonetics by computer (Version 4.3.29) [Computer program 1992-2005]. Retrieved November 11, 2005, from http://www.praat.org/
Cohen, Dalia and Ruth Katz: ‘Melograph’, Grove Music Online ed. L. Macy (Accessed 15 November 2005), http://www.grovemusic.com
Deutsch, Diana (ed.), The Psychology of Music, 2d. ed., Academic Press, San Diego 1999.
Gopalam, Sharada, Facets of Notation in South Indian Music, Sundeep Prakashan, Delhi 1991. Levy, Mark, Intonation in North Indian Music, New Delhi: Biblia Impex, 1982.
Moore, Michael, ‘The Seeger Melograph Model C, in: Selected Reports in Ethnomusicology, Vol. II, No.1, University of California, Los Angeles 1974, p. 2-12.
Meer Wim van der, PitchXtractor, [Computer program 1988-1991].
Meer, Wim van der, ‘Theory and Practice of Intonation in Hindusthani Music’, in: The Ratio Book, ed. C. Barlow, Köln:Feedback Papers, 2000, pp. 50-71.
Rao, Suvarnalata, Acoustical Perspective on Raga-Rasa Theory, New Delhi: Munshiram Manoharlal Pub. Pvt. Ltd., 2000.
Seeger, Charles, ‘An Instantaneous Music Notator’, Journal of the International Folk Music Council, Vol. 3 (1951), pp. 103-106.
Seeger, Charles, ‘Toward a Universal Music Sound-Writing for Musicology’, Journal of the International Folk Music Council, Vol. 9 (1957), pp. 63-66
Seeger, Charles, ‘The Model “Melograph”. A Progress Report’, Journal of the International Folk Music Council, Vol. 14 (1962), p. 168
Sundberg, Johan, ‘The Perception of Singing, in: Deutsch, Diana (ed.), The Psychology of Music, 2d. ed., Academic Press, San Diego 1999, pp. 171-214.