Interactive use of audio programes

From WikiEducator
Jump to: navigation, search


Interactive audio in education and training: Applications of CD-ROM

Interactive Audio refers to a technology involving the combined use of audio with the simultaneous (and synchronous) display of computer generated visuals, such as text, graphics and photographs. Being just one of many alternative approaches to Computer Aided Learning (CAI), interactive audio occupies a position on the wide spectrum of educational technologies, somewhere between conventional CAI and interactive video (using videodisc). Unlike conventional CAI, which is silent, and videodiscs which rely to a large extent upon the impact of high quality video, interactive audio depends mainly upon audio for conveying information. Audio can also be used to convey additional meaning, moods, and emotions through the intonations of voice and background sound effects. These three technologies (namely conventional CAI, interactive audio, and interactive video) each have a specific role to play in the tasks of teaching and learning. Knowing the attributes of all these approaches, and selecting the one which is most appropriate for a given situation, is a challenge for all producers of computer-controlled educational programs. This paper describes the development of interactive audio, the types of projects which are well suited to this technology, and some delivery systems suitable for interactive audio. Special reference is made to optical disc technologies, involving Compact Disc Read Only Memory (CD-ROM), and some recent projects involving the use of this technology are described. Development of interactive audio

(a)Phase One

The first interactive audio projects described by Shaw (1987-A, 1987-B) used a computer-controlled tape player as the source of audio. Successful programs were developed with this system, but the limitations were: a. slow access time to specified points on the tape, and b. difficulties in reproducing the audio tapes. Although the content of the tapes could be copied accurately, the relative position of material on the copied tapes was never the same as the master, and the tacho-counter of the tape player would consistently misinterpret the search or location commands from the computer. c.

(b)Phase Two

A commercially available voice card was then used to enable natural voice and other sounds to be digitised and recorded onto the hard disc of the computer. A variety of successful projects using this technology were developed for both educational and industrial applications (for example Shaw, 1988- E). These digitally recorded sounds were reproduced under program control, together with the simultaneous display of text and pictures on the computer monitor. Digital audio stored on magnetic disc enabled fast, reliable and "random" access to speech files and other audio files stored on the disc. Although the hardware component was inexpensive, the voice card being approximately $200, the audio consumed memory at the rate of 4 kbytes per second. Typical programs containing say 30 minutes of audio therefore consumed at least 7 megabytes of the hard disc. This space consideration, the awkwardness of transferring large programs, and the relatively poor quality of audio reproduced through the voice card, were all factors which led our development team to adopt an alternative approach, namely the use of optical disc technology.

(c) Phase Three

The first CD-ROM pressed in Australia (January, 1988) featured an interactive audio program called "The Foetal Heart". This program was produced by the Centre for Research and Development at FIT, and the same group has since produced three more CD-ROM products featuring interactive audio. These programs, released at recent conferences (Shaw 1988-F, 1988-G), feature the following:

a.A Talking Dictionary of Medical Terminology

Unit One, Anatomy and Physiology

b.Spelling for Technologists Unit One, General Vocabulary

c.Learning Japanese

Unit One, Basic Sounds (Romaji and Hiragana). These programs also access a variety of photographs taken with the aid of a video camera, and interfaced with the computer through "frame-grabbing" facilities. The resulting images can be stored on hard disc or CD-ROM.

(d)Phase Four

Programs have also been developed in which sounds have been accessed through a combination of CD- ROM and the digital voice card. One important application of this dual technique involves teaching students how to pronounce basic sounds of the Japanese language. Authentic pronunciations of these sounds, by a native speaker of the language, are delivered to the student from a recently produced CD-ROM, while photographs of the corresponding symbols are also displayed on the monitor. The student is then asked to read the row of symbols currently displayed; these attempts are recorded through the voice card and temporarily stored in the computer's RAM Next the "correct" basic sounds are replayed from the CD-ROM, followed immediately by a replay of the student's attempt, which is recalled from RAM and played back through the voice card. This process can be repeated by the student until a self judged level of competency is achieved. The final student attempt is also stored uniquely on hard disc, so that a tutor can check on the student's progress at a later date. Combining the use of sounds and pictures from a CD-ROM, with sounds recorded through a voice card, offers a new and potentially effective way of learning foreign languages.

Attributes of CD-ROM

As a delivery system for interactive audio, CD-ROM technology offers a variety of significant advantages, some of which are outlined below:

(a)Capacity and audio quality

A compact disc can store 72 minutes of high fidelity audio, 144 minutes of FM quality audio, or 4 hours of AM quality audio. Sampling rates in the analogue to digital conversion process specify the bandwidth of the recording, the density of data, and ultimately the quality of audio at the time of its reproduction. If even lower sampling rates are used, such as the 4 kbyte/sec used by the voice card described earlier, then at least 38 hours of audio could be stored on one disc.

(b)Sharing space - audio, photographs and text on CD-ROM

Rather than completely filling a CD-ROM disc with digital audio, as indicated in the previous paragraph, the disc can be partitioned so that the total available memory of 550 megabytes is shared between audio, text files, numerical databases, and digitised photographs (colour or black and white). Text and numerical databases are the typical contents of most CD-ROMs, and these data types account for the great majority of discs commercially available. However the latest programs produced by FIT contain a mixture of both audio and a database of photographs. The proportion of disc space allocated to each medium is simply a function of individual project requirements. For example, the CD-ROM called "A Talking Dictionary of Medical Terminology" contains mostly audio, with a companion set of 80 photographs used to enhance certain explanations within the lesson. On the other hand, a project under development for the Museum requires 8,000 photographs to be archived onto one disc, with a short audio description associated with each picture. As in the case of digitised audio, photographs can be digitised with varying degrees of resolution, resulting in picture files which have a considerable range in memory storage requirements. For example, a photograph which has a resolution of 512 x 512 pixels on a computer monitor and uses a maximum of 24 bits for colour, and a further 8 bits for graphics overlays would require at least one megabyte of memory. On the other hand, a photograph which has a resolution of only 256 x 200 pixels and uses 256 colours or shades of grey would produce a file size of only 51 kbyte. Compression algorithms can further reduce the size of these picture files. Compression factors varying from 5 to 100 or more are possible, depending upon the complexity of the original picture and the particular algorithm used to reduce the file size. Producers should, however, be aware that accessing a picture file from CD-ROM or from hard disc requires a finite time, which is further increased if a compressed file needs to be unravelled prior to display. During the presentation of this paper, the display of colour and black and white photographs, which use different amounts of memory, will be demonstrated. These uncompressed pictures are called from disc and displayed within one or two seconds.

(c) Reliability, cost and production considerations

Laser read optical devices, such as CD-ROM and videodiscs, are remarkably reliable as delivery systems. The accuracy and speed of searching for particular data on these discs is a major attribute worth re- emphasising Other factors, such as cost of production and delivery hardware are also relevant when evaluating the available technologies. A comparison between interactive audio and interactive video was presented by Shaw (1987-D) using a set of criteria involving among other things, cost, ease of production and overall production time. Our three most recent CD-ROM projects have taken an average of approximately four weeks each to produce. Rather than debating the merits of one technique over another, it is more appropriate to emphasise an earlier comment, namely that producers and educational technologists must decide which technique is appropriate for the particular task in hand. Matching the attributes of a technology with the project objectives must always remain a prime consideration.

(d)Pseudo video using CD-ROM

Distinguishing between interactive audio and interactive video simply on the basis of video capability is no longer an easy process. It is now possible to incorporate simple video-style motion using still- frame photographs stored on hard disc or small format optical discs like CD-ROM. The announcement of recent display systems referred to as CD-I (Compact Disc Interactive) and DVI (Digital Video Interactive) means that photographic images can now be stored and recalled from discs, other than videodiscs, with sufficient speed to satisfy the requirements of "video". Sophisticated algorithms which compress, and later reconstruct the images have been developed elsewhere, and educational technologists await the introduction of these techniques. In the meantime, the Centre for Research and Development at FIT has created pseudo video from a series of photographic images in some experimental projects. Earlier it was stated that digitised photographs, which fill most of the available space on screen, can be displayed in about one second. It follows that a smaller section of that image (say 50 x 40 pixels) can be "refreshed" at a much faster rate, thereby raising the possibility of pseudo video or "pixilation" within part of the existing image. At least two possible methods of generating this video effect can be described as follows:

a.Two sequential images are compared, and only the differences are used to quickly update the currently displayed image.

b.A known or defined area of the photograph is specified, and sequential pictures refresh only this particular section of the image (without checking for changes which could occur elsewhere in the image).

So far most of our experimental projects have used the latter technique, and some useful simulations of video have been produced. For example, a micrometer was photographed numerous limes, with slight alterations of the position of the barrel and jaws occurring between each photograph. When operating in the replay mode, the program displays the first photograph completely, but then subsequent images simply refresh only the barrel and jaws of the micrometer. The resulting effect is the appearance of a micrometer being operated as if photographed under normal video conditions. In another application, photographs of the Japanese characters (Hiragana) are displayed on the computer monitor. Sequential photographs allow each character to be written to screen step by-step, thereby showing the stroke order used when writing these complicated characters. In this example different (but predetermined) areas of the photographs are refreshed to create the illusion of video.

Relevant project types

Consistent with the view that one particular technology should not be forced upon any or every project, this concluding section nominates some types of projects which are particularly well suited to Interactive Audio (as well as to the CD-ROM delivery system). Projects in which audio is critical to learning a particular skill have been identified in medical education (especially Nurse Education) where students endeavour to learn sounds associated with particular body functions. For example, listening to and counting the foetal heart beat, and determining if the foetus is healthy, was the focus of an early program Similarly the measurement of blood pressure has been successfully taught with the aid of an interactive audio program (Shaw and Spratling, 1987-E). Learning any foreign language (or learning English as a second language) is a field particularly well suited to interactive audio Authentic sounds and the corresponding symbols can be delivered to students (from the CD-ROM), and student responses can be recorded through a digital voice card. Fast response times, and the option to hear repetitions of a word or sentence, can encourage the student to practice the pronunciations of new or difficult words. Learning to spell difficult English words, or simply words which appear phonetically similar ( such as affect and effect) is another task well suited to interactive audio. Even a talking dictionary, which explains and pronounces difficult jargon, is an appropriate task for interactive audio.

(2)Types of computer programs for self–study; their use and limitations:

Type 1. Vocabulary with text, sound and illustrations etc.-knowledge of script required, sound may be difficult to imitate, illustrations may not convey correct sense. Type 2. Interactive programs for grammar- can be only of yes/no selection types in different formats, difficult to deviate from standard answer in social context. Type 3. Speech recognition with in-built pronunciation of some words and a facility to compare user’s voice with model.- accuracy questionable, built-in voice may be difficult to imitate, wrong clues can misguide learner. Type 4. Listening and viewing material- useful to enrich soft skills. Limitation of self-study: Such material does not give opportunity to real life interaction with other individuals. With the result, the diffidence for new language does not go, fluency cannot be achieved easily. All such self-learning programs are suitable for beginners to expose them to a new language and for business executives who do not have time for detailed study.

Classical method:

In the classical method of learning, vocabulary is acquired step by step along with syntax and grammar under a teacher’s guidance. This takes time but the quality of learning is superior.


This paper has outlined some of the recent developments in interactive audio, and has described how CD- ROM can be used as an effective and efficient delivery system for the process. Enhancement of interactive audio programs, through the display of photographs was described, and some consideration was given to methods of using such photographs to generate a effect similar to video.