Design and Construction of a Virtual Environment
for Japanese Language Instruction

by Howard Rose

[Previous Chapter][Table of Contents][Next Chapter]


Chapter II
Review of the Literature on Language Learning

Alternative Methods of Language Instruction

Over the last decade there has been an explosion of interest in learning Japanese in the United States. Japanese is now being offered not only at the post-secondary level, but also in many high schools, junior high schools and even in elementary schools across the country. The Modern Language Association reports a 94.9% increase (from 23,454 to 45,717) in college students studying Japanese between 1986 and 1990 (cited in Aida, 1994, p.155). Current programs range from a progressive, immersion style of instruction in a Eugene, Oregon elementary school, to the nationwide STAR satellite system which brings Japanese instruction into over 10,000 public school students.

The United States government established the STAR program in 1986 to send satellite Japanese courses into public schools unable to afford to hire their own instructors. The program has grown to reach students nationwide and now emanates from two sites, Spokane, Washington and Lincoln, Nebraska. Students in the STAR program have access to instructors and tutors via a WATS telephone line, and once or twice a semester they are linked directly to the instructor on the live broadcast via a two way audio connection. These programs have greatly increased public access to Japanese language instruction. But access to instruction does not solve all the would-be Japanese speaker's woes.

Unfortunately, learning Japanese and other non-Indo-European languages can be very difficult for Americans. One research study cited attrition rates as high as 80% occurring in Japanese language programs on a regular basis (Mills, Samuels & Sherwood, 1987, cited in Samimy & Tabuse, 1992, p.390). For comparison, while American students require an average of 720 hours of instruction to reach oral skills proficiency level 3 in French or Spanish, from 2,400 to 2,760 is required to achieve the same fluency in Japanese or Chinese (Liskin-Gasparro, 1982, cited in Samimy & Tabuse, 1992, p. 390). Because the general instructional approaches for teaching Japanese are not significantly different from those used in teaching Romance languages, it would be difficult to attribute the gap between these learning times wholly to pedagogy, We must remember that Japanese and English do not share common linguistic roots, have dramatically different writing systems, and the average American has only limited exposure to the Japanese language. However, there may be pedagogical solutions which could at least mitigate some of the significant impediments preventing many American students from succeeding at Japanese.

A number of studies (Samimy & Tabuse, 1992; Lai, 1994; Aida, 1994; Ganschow et al., 1994; Horwitz et al., 1986) have indicated a strong relationship between student attitudes toward learning a foreign language and student performance. Samimy and Tabuse (1992) studied the affect of attitudinal and learner characteristic variables on students of Japanese. Their study surveyed 70 beginning Japanese students in a midwestern university: 58 undergraduates, 8 graduates and 4 others. The experimental group included 36 females and 34 males who took 50 minute classes 5 times each week over an entire year. Instruction was carried out with the audio-lingual approach, stressing teacher-centered oral drill and practice and a strong emphasis on explicit grammatical instruction. An attitude questionnaire was designed to measure (a) three situation affective variables (risk taking, sociability, discomfort), (b) motivational types and strength of motivation, (c) attitude, (d) concern for grades, (e) students' personal background with Japanese. The study concluded that motivation and attitudinal factors are critical in predicting success. Classroom behavior factors, risk taking, and discomfort were also found to be determinants of students' final grades. The study also noted a significant decrease in student motivation from fall to spring.

A similar correlation between attitude and success was found in a related study of students of English in Hong Kong. Lai (1994) studied learner confidence levels in using English and the factors which accounted for variances. Lai gave a questionnaire to a sample of 487 Form Four students from eleven secondary schools in Hong Kong. Three were government schools, six government subsidized schools and two private schools. The overall population of students in the study ranged in academic achievement from the 99th to 46th percentile. An English teacher in each school selected one "average" class in Form Four to complete the questionnaire. The questionnaire focused on five aspects of communication including student perception of patterns, opportunities for communication in and out of class, and student perception of self-confidence in expression.

The results of the survey paint a rather negative picture of language classroom environments. Students spent the majority of their time passively listening to their teacher, and rarely had the opportunity to speak. Nearly half (46.4%) said they had never or seldom given long answers to teachers' questions. 73% of students also reported they communicated infrequently with their peers in English. Lai found the majority of learners lacked confidence in communicating in English in the classroom which she attributes to poor self-esteem, anxiety, and unfavorable patterns and opportunities for classroom communication. These findings may be attributed at least partially to artifacts of cultural norms and student/teacher relationships within Chinese society, rather than the difficulties of learning English, a language very different from Chinese. Even so, the teacher-centered audio-lingual method is still in wide use in the United States, and does not vary significantly from the instructional methods described in Lai's study.

Horwitz et al (1986) have also researched the general effect of attitude on performance among foreign language learners. They hypothesize that anxiety may be seen as a combination of three components: (a) apprehension or shyness about communicating with others, (b) fear of negative social evaluation in various situations speaking a second language, and (c) fear of test failure. Horwitz et al (1986) developed the Foreign Language Classroom Anxiety Scale (FLCAS) to measure student self-esteem and attitudes towards learning a second language. The FLCAS has been demonstrated to have a significant correlation with individuals' language success under a number of conditions (Williams, 1991; Ganschow et al., 1994; Aida, 1994). For example, Aida (1994) reports test-retest reliability levels for the FLCAS used with Japanese language students (r = .80, p < .01) which very closely match those reported by Horwitz et al (1986) using the FLCAS with Spanish language students (r = .83, p < .01).

The effectiveness of grammar-translation methods of language pedagogy have come increasingly into question since the 1960's. The early 1950's brought the rise of the audio-lingual method, characterized by listen and repeat drills and the familiar "language lab" approach of programmed instruction. When put into practice, audio-lingual teaching methods often digressed into mere grammar and translation exercises (Hammond, 1988). As a rejection of both grammar-translation and the audio-lingual method, a number of new approaches to language instruction have been pioneered with an emphasis on communicative competency. One well known example is Asher's Total Physical Response (TPR) strategy (Asher et al., 1974; Asher, 1966, Kunihira and Asher 1965). TPR is a direct assimilation method where meaning of the target language is conveyed through physical demonstration. TPR does not use any form of translation into the first language. TPR studies report improved understanding, attitude and retention when students take a physically active role in learning language.

Kunihira and Asher (1965) tested total physical response strategy affect on retention with 88 volunteer college students. Students were chosen on the basis of having no prior Japanese training, no foreign language fluency and not being language majors. All volunteers took the Modern Language Aptitude Test (MLAT), and a mental ability test selected from the American College Testing program. The tests were administered before the study to insure homogeneous groups. The selected students were randomly assigned to 4 groups of 10 males/12 females each. Only 67 of the 88 students completed the training.

One of the four groups received the TPR treatment; the other three received three different control treatments. The TPR group listened to a tape and mimicked the actions of the instructor. There was no translation into English. In the first, eight minute session, students heard single words such as "Sit. Run. Walk." and responded by acting out those commands. After the eight minutes of training, subjects were individually given a retention test. Twenty four hours later they were retested and then received ten-and-a-half minutes more training. The second session used more complex language at this level: "Walk to the door and then run to the chair." During the ten-and-a-half minute training the students responded to about 40 different utterances and retention was tested at the end of the session. During the third session of seven-and-a-half minutes complexity was expanded. "Walk to the desk and put down the pencil and book. Pick up the paper, book and pencil and sit on the chair." Students' retention was measured with a comprehension test where they physically responded to 16 different utterances. Two weeks later, retention was tested again. Only 16 of the initial 22 students completed TPR training (final N=16).

Each of the control groups experienced similar training with the following exceptions: C1 (final N=15) listened to the same tape as the experimental group but did not execute a physical response; they merely watched the instructor. C2 (final N=18) sat and heard an English translation after each Japanese phrase. C3 (final N=18) sat and heard the Japanese but read an English translation after each Japanese utterance.

All testing of the experimental group required students to respond to Japanese commands by acting them out physically. The only exception was on the final test, after two weeks with no training, where experimental group subjects were required to write their responses in English rather than respond physically. All control group tests required students to write their answers in English.

Mean performance scores were compared among treatment groups. A one-tailed t-test was performed for each classification of utterance: single-word, short, long , and novel. The results showed the experimental group to be significantly higher than all the control groups on nearly all retention tests of short, long and novel utterances. This was true for all four levels of language complexity. Interestingly, control groups did not show significant differences in retention amongst themselves.

As a secondary observation, Kunihara and Asher noted that the MLAT and ACT tests were poor predictors of performance for the experimental group, but correlated closely with control group performance. Also, the experimental group members tended to cluster compactly near the maximum scores, rather than the wide degree of performance variance found among the control groups. Retention in the experimental group was significantly higher than all three other groups. Kunihira and Asher noted that TPR seemed most effective in more difficult areas of speech: long and novel utterances. The authors imply that TPR instructions results in more flexible links between the input they receive and their own output. These claims are not empirically substantiated.

In summary, the total physical response strategy is based on the following premises: 1) coupling physical activity with commands facilitates in direct assimilation of language (Asher et al., 1974); 2) this direct approach to language assimilation seems to facilitate long term retention (Kunihira and Asher, 1965). Secondary benefits include higher-level performance by more students and reports of improved confidence and attitude toward foreign language learning (Kunihira & Asher, 1965). More research is necessary, however, to substantiate these claims.

Asher's initial findings have been supported by a small group of studies (Wolge & Jones, 1982; Asher et al., 1974; Asher, 1976; Hammond, 1988; Gary, 1975) using subjects of different age ranges and modifying various testing parameters.

Omaggio (1986) criticizes TPR's extensive use of imperatives as being limited to what is easily reproduced and taking too much advantage of the bizarre, and fun nature of the activity. Ommagio warns that unless TPR is supplemented with other types of practice, students will have little opportunity to internalize natural language used for authentic purposes. Baltra (1992) points out that many adults may find TPR activities too degrading or improper to get them to participate. Subsequent theorists have modified Asher's methods to try to capitalize on the obvious strengths of TPR. I also note that I have yet to see a complete description applying TPR to more advanced language learning. It would seem that while a TPR approach can have benefits for beginning students, it is probable that students will eventually outgrow this strategy.

Terrell's Natural Approach (NA) is an attempt to build a more generalizable teaching method on the foundation of TPR and communicative competency (Terrell, 1986). Terrell adopted Asher's TPR techniques because he found them effective, particularly in the early stages of language learning . NA describes three stages of language acquisition: comprehension (preproduction), early speech (one-word responses) and speech emergence (sentence production). Thus, NA, like TPR recognizes the need for a "silent period" of delayed oral practice, where students absorb language without the stress of audio-lingual-type listen and repeat drills. The proposed merit of a silent period is supported by other researchers (Mangubhai, 1991; Atherton, 1993; Gary, 1975; Winitz & Reeds, 1973).

Terrell's approach is a comprehensive curriculum of communicative games, such as role plays or solving puzzles, which inspire students to communicate in the new language. Speech is motivated by the task and the environment, as opposed to the listen-repeat drills of audio-lingual teaching. Communicative activities in NA are designed to helps students develop concrete associations between experience-based meaning and linguistic forms (Terrell, 1986). Terrell explicitly intended that NA should reduce the psychological tension and anxiety experienced by beginning language learners. He stressed the need to make language learning enjoyable in order to diminish the stress.

Gary (1975) conducted a study which supports the instructional effectiveness of delayed oral practice in initial stages of second language learning. Gary taught 85 25-minute Spanish lessons to 50 lower elementary school English speakers over a five month period. While both groups experienced the same amount of listening practice, oral practice was present for the control group from the first day of instruction throughout the experiment. Oral practice was totally absent in the experimental group during the first 14 weeks (Phase 1) and in the first half of the daily lessons for the final seven weeks (Phase 2). Student performance was measured by tests of comprehension, oral production and attitude given daily to each group, and administered to each student at the end of the 14th and 22nd week. Analysis of test scores using a one-tailed sign test showed the experimental group excelled in comprehension of both commands and questions. The experimental group also scored slightly higher on oral production tests, though the differences were not statistically significant. The results imply that delayed oral practice benefits listening comprehension, and is at least as effective in developing oral production as methods which emphasize speaking from the outset. Gary also found that students in the experimental group reported less anxiety on attitude surveys.

Unfortunately, Gary's study lacks significant academic rigor to generalize these results too widely. First, there was no attempt to assess the base knowledge or characteristics of the learners at the outset. Nor is there enough information about the instructional or evaluation methods to draw any firm conclusions. It is regrettable that a fundamental question such as this has received such sparse attention from language researchers.

Approaches which emphasize communicative competency such as TPR and the NA have attracted significant criticism for their dismissal of explicit grammar instruction. Higgs and Clifford's (1982) article warning about the danger of "fossilized language" among students in competency programs is still widely cited. Higgs and Clifford claim that programs which overly stress communication do so at the cost of linguistic proficiency. The result is that many students emerge with large vocabularies but poor grammar execution. They assert that after about four semesters of instruction these grammatical errors fossilize and actually become incurable, no matter how much subsequent instruction the student receives. Higgs and Clifford refer to this as the "2/2+ syndrome", because the students will never surpass level 2 competency. It is important to remember, however, that Higgs and Clifford present absolutely no data to substantiate their claims.

Hammond (1988) attempted to clarify the controversy over linguistic and communicative instructional methods by conducting a study at two universities over four semesters with 60 sections of first-semester Spanish students. Eight sections of an experimental group received instruction according to the NA; 52 control group sections were taught according to a modified grammar-translation methodology which included deductive grammar instruction. No attempts were made to control socio- or psycho-linguistic variables of the student groups. All students took the same standard mid-term and final exams. Comparison of both mid-term and final test scores showed that the experimental group had higher mean scores than the control group for both tests over all four semesters. Hammond claims his results indicate a consistent, statistically significant advantage for the communicative approach that is conclusive enough to refute Higgs and Clifford's argument. However, students in the experimental groups were required to complete the same written translation homework assignments as the control group. Thus Hammond's results are confounded by students receiving a somewhat arbitrary amount of traditional instruction. These exercises were apparently not modified to fit the communicative approach. Unfortunately, Hammond's results are not as conclusive as one might hope and offer insufficient evidence to prove or disprove Higgs and Clifford at this time (Celce-Murcia & Hiles, 1988). The controversy between grammatical and communicative approaches continues today.

In summary, communicative competency approaches, such as TPR and the NA, seem to be effective alternatives to the grammar-translation method. TPR and NA derive both their strengths and limitations from a common reliance on physical activity and demonstration. While the physical rehearsal does seem to help students both understand and recall language, the severe restrictions on what can be demonstrated and experienced within a typical language classroom have greatly limited the proliferation of these methods, particularly TPR. The restrictions of what can be demonstrated, experienced and responded to in the confines of the typical language classroom seem to limit the applicability of communicative methods such as TPR and NA. Unfortunately, the debate over the value of giving explicit grammar instruction has yet to produce a conclusive agreement. There is a great need for more, solid research in this area to settle such controversies.

The dynamic, kinesthetic nature of virtual reality offers the opportunity to build on the successes of TPR and the NA, without the limitations of the physical classroom. A virtual language learning environment would make a TPR classroom virtually limitless, and far more complex and intellectually appealing than being repeatedly told to "Sit down" and "Stand up." Virtual reality could also become the testbed for settling long-standing controversies around communicative competency methods enabling a highly controllable and consistent experiential learning environment.

Virtual Environments in Education

Virtual environments, or virtual reality (VR), can be understood as a logical evolution from a long line of computer technologies. Although there is a significant history of computer applications for the purpose of language instruction, no record is available of an immersive virtual reality system in current use or even under development. Taylor (1992), however, gives a forward looking account of the potential uses of VR in language learning. Taylor suggests that virtual reality will be a powerful opportunity for students to create their own educational virtual environments, and specifically mentions the potential for the technology to extend the benefits of TPR (Taylor, 1992, p.72).

Winograd's SHRDLU (1972) was an early attempt at creating an interactive computer environment before the coming of virtual reality technology. SHRDLU is an artificial world where geometric objects are displayed on a computer monitor, and can be manipulated by a user typing in textual commands. For example, the world was programmed to respond to commands such as, "Put the red cube on top of the blue cube." While SHRDLU was an important example of early artificial intelligence programming, the system lacked a method for manipulating objects in a direct and natural fashion. Textual command input is far too limited and presents too many linguistic barriers to language learners, particularly beginners.

Building on Winograd's beginnings, John Higgins (1985) developed the John and Mary/Grammarland programs. Higgins created structured environments in which the user can manipulate and interact with graphic characters, John and Mary, on screen. Higgins' goal was to create a system capable of exchanging meaningful messages in natural English with a student, to see whether this could aid language acquisition. His dialog-stimulator programs generate questions, find answers to questions, obey commands, and can assimilate new knowledge. Unfortunately, Higgins appears not to have conducted extensive follow-up research based on Grammarland which would indicate whether this approach is instructionally beneficial.

This past decade has yielded great attention and activity in developing instructional mutlimedia software for personal computers. This development reflects the current capabilities of the technology to deliver audio, video and textual information on a single machine. A range of multimedia programs for teaching foreign languages, including Japanese, are commercially available. Current programs cover a range of areas including conversation, reading Japanese characters and cultural awareness. There are no commercially software programs currently available which incorporate immersive technologies such as virtual reality.

One group of programs exploit sound digitizers and voice signal displays. These systems display visual representations the student's speech patterns and allows the student to compare himself to model native speakers. Such displays have long been applied in the field of speech pathology and phonetics and are becoming more widely applied in foreign language teaching. Molholt (Molholt et al., 1988; Molholt, 1988), has written extensive articles documenting his application of digitized sound and computer displays to teach pronunciation of individual sounds (segmental features) as well as sentence patterns (suprasegmental features). Molholt concludes that real-time, spectrographic displays of native speakers are less frustrating and more productive than conventional methods of correcting student (Molholt, 1990). Proponents of the visualization technology cite research indicating visualization of intonation patterns significantly enhances judgment and pattern recognition (Leon & Martin, 1972). Other studies indicate how visualization helps students recreate proper intonation (James, 1976), and the superiority of audio/visual feedback over only audio presentation (de Bot, 1980). Thus there is evidence which indicates that voice recognition systems are helpful when they provide direct feedback and support for the learner.

Such findings suggest that the voice training process used in Zengo Sayu potentially provides valuable directed feedback and support for learners. For example, as students train the computer to understand their voices, they are presented with a visual display of the sound wave they produce. Unfortunately, limited resources and access to software prevented development of a fully functional, robust voice recognition system in the current version of Zengo Sayu. Therefore, much of the work which could feasibly be incorporated into the computer system is still carried out by a teacher monitoring the voice training process. In the future, audio support systems could be built into the computer system to enable students to identify and correct their mistakes in real time from within the virtual environment.

Another genre of multimedia application for language learning is the interactive narrative. A la Recontre de Philippe, developed at the Massachusetts Institute of Technology, is one of the better examples of this genre, using a laser disk to deliver high quality audio and moving video. The story line of the program places the student in Paris with the task of helping Philippe find a new apartment, and focuses the learner on completing specific tasks along the route. The five possible endings allow students to plot their own way through the story line and the linguistic content. The task-based nature of the plot requires students to interact with and process the information they are presented.

Murray, (1990) mentions A la Recontre de Philippe as one of the few videodiscs which has been evaluated in classroom use. She notes that different students use the learning materials in significantly different ways according to their own learning styles. Murray divides student learning styles into two groups: 1) Look-before-leap; intense painstaking use of online helps before making any moves, or 2) Leap-before-look, who use online help only minimally and prefer to work by trial and error and guessing. Murray raises the important point that "future videodisc development should encourage more of the student to follow the leap then look pattern -- to try to master language in context and to be comfortable with less than complete comprehension in order to concentrate on using language in a goal-centered way." (Murray, 1990, p. 12). While Murray seems to offer this suggestion based more on intuition than citing empirical data, her sentiments capture the essence of a larger trend which applies interactive technology according to the communicative language teaching approach found in Terrell's approach.

An example of a desktop multimedia application developed specifically for Japanese language teaching is Nihongo Partner, developed by the Technical Japanese Program at the University of Washington (Kato & Rose, 1994). Nihongo Partner allows students to access digitized video clips of segments which illustrate specific verbal and nonverbal points of language and culture. One unique aspect of the program is that students can view each conversation from either a third person perspective, or from the first person perspectives of each of the speakers. Thus one goal of Nihongo Partner is to allow the student to experience the conversation firsthand, to the extent which this is possible on a computer monitor. Nihongo Partner also allows the students to record their own voice and compare their speaking with the model. One limitation to this approach is that it requires students to monitor and correct their own speaking errors, which not all students may be capable of doing.

While each of the programs presented above have their unique strengths, they also have limitations. For example, multimedia programs are typically limited to using sight and sound, even though the array of human senses through which we understand and learn is far more robust than the limited interactions embodied in watching a screen and giving button presses. Desktop multimedia applications do not develop full body, kinesthetic learning as immersive virtual environments can.

From a historical perspective, virtual reality has emerged as a way of surpassing the limitations of conventional computing. The high costs associated with the technology kept it almost entirely bound up in the hands of military researchers for the first two decades of its development. Early virtual reality applications focused on training and performing limited skill sets under specific, controlled conditions, such as virtual flight simulators. Sophisticated military applications continue to be developed, such as high-technology simulators which project virtual images of landscapes or the target to seek and destroy. The super cockpit simulator developed at Wright-Patterson Air Force Base (Furness, 1988; 1986) is an example of one such virtual system. The super cockpit projects virtual images of flight instrument displays and all other relevant information directly into the pilot's helmet. This information is ordered, appears and disappears automatically in response to various conditions and situations. A three dimensional sound generator presents audio information to the pilot which correlates to the three dimensional virtual environment. Spatialized audio enables the pilot to receive and prioritize audio information based on cues such as volume, position and relative motion.

The active nature of this system improves pilot performance by enabling him to concentrate on only the most crucial information. Research using the super cockpit shows virtual reality systems are useful for training psycho-motor skills used in flying aircraft. The positive results of these training examples are encouraging, but raise the question of how effective VR might be in more general educational applications, particularly those which are less emotionally engaging than combat simulation.

Advances in computing technology over the last decade have dropped the price of virtual reality hardware to the point where it is far more broadly accessible. Virtual reality is recently gaining attention for use in conventional education. Winn and Bricken (1992) have proposed an interesting virtual environment to teach elementary algebra. They believe that virtual environments are an optimal instructional medium because the axioms and behaviors of algebra can be built right into the virtual world (Winn & Bricken, 1992). Their approach follows the constructivist paradigm that learning best takes place when the learner is in control of the process. Winn and Bricken's key premise is that students will first use the virtual environment to develop an understanding of the concepts of algebra, and then proceed to the level of symbols and how they can be used to represent algebraic abstractions. In conventional instruction, students are taught abstractions and symbol manipulation first, before they have developed more general understanding.

Objects in the virtual algebra world would be given innate behaviors which mimic the behaviors of variables in algebraic equations. For example, "If a student fails to change the sign of a term as it moves from one side of an equation to the other, the rules of algebra might be programmed to apply in one of three ways:

1) The term could "float back" to where it came from, indicating that the student had made a mistake without revealing what the mistake was;
2) the sign of the term could be changed by the program, indicating that a mistake had occurred, what it was, and what the correct transformation is; or
3) the program could allow the student to make the mistake without correcting it on the assumption that ultimate failure to solve the equation would lead the student to "debug" what had occurred." (Winn & Bricken 1992, p.13).

This algebra world is yet to be built, but the model does serve to illustrate the potential educational uses and benefits of VR. Winn and Bricken (1992, p. 17-8) list six promising aspects of VR in education:

Byrne (1996) created a virtual environment for teaching high school level chemistry. Her study compared virtual reality to PC-based multimedia and a passive video presentation with high school students in Seattle, Washington. Byrne's findings showed significant learning gains using both interactive technologies, a virtual environment and multimedia, when compared with passive video and a control group. She found, however, that students using the multimedia treatment performed significantly better than all other groups. These results seem to indicate that interactivity is more powerful at stimulating learning than passive viewing, but does not show any advantage for the immersive virtual environment over flat-screen presentations in multimedia. However, Byrne cites the students' lack of experience navigating and manipulating a virtual environment as a significant handicap in comparison to the students' highly developed skills at working with desktop computers. Other limitations to the study, such as short treatment times and the low graphics quality on the computer system used, suggest that these preliminary findings, while important, still leave the instructional value of virtual environments uncertain.

Part of the motivation stimulating the development of educational virtual environments comes in response to shortcomings of conventional schooling. It is widely stated on many levels that school experiences often fail to match the expectations of the real-world (Duffy & Jonassen, 1992). Numerous researchers (Resnick, 1987; Brown, Collins & Duguid, 1989; Sherwood, Kinzer, Hasselbring & Bransford, 1987) have cited a gross disparity between what occurs in the classroom compared to the working world after graduation. The inauthenticity of educational practice and testing procedures are major reasons why many children fail to transfer school-based learning.

Work has begun to take virtual reality applications and technology out of the laboratory and into public school classrooms. The Virtual Reality Roving Vehicle (VRRV) Program at the Human Interface Technology Laboratory explored the educational efficacy of immersive virtual environments to teach specific curriculum goals (Winn, 1995). The premise behind the VRRV Program was to develop educational virtual environments which would be engaging, highly interactive and authentic learning tasks. Rose (1995) outlines an authentic assessment methodology for VR where a virtual environment becomes simultaneously the students' learning ground, and the testbed to measure their progress in a meaningful and demonstrable way.

The VRRV Program engaged in two types of activities with children: 1) exposing them to immersive virtual environments created to teach specific curriculum objectives, and 2) guiding them through the construction of their own virtual environments related to various curriculum content. Findings of the VRRV Program show significant levels of learning taking place for students who built their own virtual environments. However, because the world building process involves a variety of VR related and non-VR related components, it would be difficult to attribute all the students' learning gains directly to their experiences under the head mounted display. As yet the VRRV Program has produced too little data to draw firm conclusions regarding the educational benefits of simply experiencing immersive environments.

The literature does contain other accounts of educational prototypes being used for instruction. Loftin et al. (1993) created a virtual physics laboratory. This virtual world was designed to address students' misconceptions about phenomena such as the nature of mass, acceleration and momentum. Using input from a data-glove* and controls to vary such environmental states as gravity direction and magnitude, students performed tasks such as measuring the period of pendulums for different lengths and different magnitudes of gravity; measure average rate of every loss of falling objects; and compare trajectories of projectiles. There is no reporting of statistical data of learning in the virtual world. The researchers' anecdotal observations indicated a high level of student attention and motivation. These preliminary results have encouraged a follow-up study with pre-college and college level students to establish the efficacy of using VR in education (Loftin et al., 1993).

Table 1 summarizes the perceived advantages of virtual environments compared to conventional, flat screen computer systems.

Table 1: Advantages of Virtual Environments Compared to Conventional, Flat Screen Computer Systems.

Conventional Computer Environments Virtual Environments
Visual presentation is limited to 2-dimensions. Immersive environment is highly visual, affording a sense of physically experiencing a virtual space.
Stereo audio does not convey spatial information. Spatial audio: 3-dimensional sound matches and enhances spatial perception in the environment.
Interaction via keyboard and mouse Natural interaction using the whole body. The user moves about objects in a 3-dimensional space, and extends her hand to grab and move them. Kinesthetic learning can potentially aid retention and recall.
Presents text-based information Information is presented via the character and behavior of virtual objects. Stories are told through virtual actors rather than words. The acquisition of these stories is direct.
Allows for limited types of real-time collaboration over a network Multi-participant environments can be simultaneously experienced by any number of participants. Teachers and students can communicate verbally and visually, or collaborate in 3-dimensional space.

The above studies are exemplary of the types of inquiry which will be necessary to make virtual environments useful educational tools. Winn and Bricken's assertion, that three dimensional computer generated environments can be educationally powerful, seem intuitively plausible to VR researchers. Zengo Sayu is intended to test the strength of both the immersive and interactive aspects of virtual environments for teaching a purely cognitive skill: speaking a foreign language.

A Profile of the Language Learner

Following the literature on effective language instruction and the educational potentials of virtual environments, let us turn to a final question: How can a virtual environment help students become better language learners? Brown (1987) suggests a number of characteristics which help students become successful language learners. Table 1 shows these positive characteristics along with the corresponding attributes of virtual environments which could support and encourage successful learning and study strategies. The attributes of the virtual environment listed in Table 1 are based on the literature presented above.

It is important to acknowledge that the assertions in Table 2 are currently based primarily on theory and the intuition of researchers working in VR. Many of these basic supposition have yet to be verified by empirical studies. It can be difficult to generate broad generalizations derived from research on a single application because virtual environments are complex systems, the sum total of nearly infinite design and instructional variables. The current body of VR literature is still too incomplete to adequately guide development of educational environments?

Table 2: Positive Characteristics of Language Learners and the Attributes of Virtual Environments to Support Learners' Needs

Positive Learner Characteristics VE Supports Learners' Needs
VE Supports Learners' NeedsWilling and accurate guesser VE can encourage both deductive and inductive problem solving approaches.
Uninhibited Self-paced and individualized nature of VE remove some causes of inhibition.
Attends to form Instructional controls and directed feedback within the environment focus attention on correct language forms.
Monitors own speech and the speech of others Voice recognition system training and usage encourages vocal consistency and self-monitoring
Attends to meaning All interactions in the world are based in meaning, rather than abstractions of grammar or linguistics
An active approach to the learning task VE is highly interactive
Strategies of experimentation and planning. VE strongly invites experimentation
Constantly searching for meaning All aspects of the VE can be closely controlled to support or remove clues to the student. Dynamically changing context forces the student to re-evaluate existing knowledge in terms of the new situation to create meaning.
Willingness to practice VE offers opportunity for unlimited, self-initiated practice.
Willingness to use the language in real communication VE can mimic actual performance conditions for students to practice and gain confidence in preparation for real-world communication.
VE can mimic actual performance conditions for students to practice and gain confidence in preparation for real-world communication.Developing the target language more and more as a separate reference system. VE promotes students' development of a separate reference system rooted directly in meaning and experience.

Summary of the Literature

Interest in Japanese language learning has increased dramatically in recent years. Unfortunately, the vast majority of Americans who try to learn Japanese give up their study rather quickly due in part to the time required to master the language, but also due to factors of attitude and anxiety. Studies have shown a strong correlation between attitude and success at learning foreign language. Many people have tried to vary the approach to instruction in order to improve student performance and attitude. Asher's Total Physical Response (TPR) strategy is one such method which is based on command forms of language to which students respond physically. A basic instructional strategy of TPR is that students respond to commands and are not expected to parrot the teacher's utterances. TPR studies seem to indicate that the method has some effectiveness, but it is also criticized on the basis that it is a rather limited and incomplete instructional approach.

Terrell's Natural Approach (NA) expands on the basic notions of TPR such as recognition of the need for a silent period and the use of many TPR techniques. The NA is a far more robust instructional method with a strong emphasis on communicative skill building. The NA expressly does not teach grammar. Many have challenged the effectiveness of communicative competency instructional methods, most notably Higgs and Clifford. They assert that grammar instruction is necessary to avoid fossilized grammatical errors. Higgs and Clifford present no data to substantiate their assertions, however. Unfortunately, neither side of this debate has been able to generate conclusive evidence to settle this matter.

Educational technologies such as multimedia software on personal computers are being developed as both instructional tools and practice partners for language learners. While these systems can be highly interactive and engaging, they do not develop full body, kinesthetic learning to aid retention and recall as an immersive virtual environment can.

Virtual Reality is an emerging technology which demonstrates great promise in education, though there is still a shortage of strong research on which to base firm conclusions. VR is assumed to be a promising medium because it is: immersive, motivating, a realistic approximation of the real world and a flexible format for simulation and exploration. Learners can create their own knowledge in a constructivist fashion; that is students are allowed to build their own knowledge structure of a given domain according to self-initiated and self-directed study. The virtual system can support learners with guidance, interaction and dynamic feedback. VR lets students develop an understanding based on physical experience before attempting to tackle abstract or symbolic representations.

We currently lack a robust empirical framework to guide the development of virtual learning environments. How can virtual environments best support and enhance learning in foreign languages, as well as other domains? Are immersion and natural interaction really the key advantages to VR over other media, or are there other definitive characteristics which have not even been identified yet? How can virtual environments help students develop and perform in a rather loosely structured knowledge domain like foreign language learning? Without further research to answer such questions, the intuitively perceived advantages of virtual learning environments will continue to go untapped.


[Previous Chapter][Table of Contents][Next Chapter]


Human Interface Technology Laboratory