..........      eval-disted2.htm
     Bb-orientation.html 
     learners and learning 

Assessment and Evaluationc 

Three broad measures of the effectiveness of distance education are generally examined: (1) student outcomes, such as grades and test scores; (2) student attitudes about learning through distance education; and (3) overall student satisfaction toward distance learning (Phipps & Merisotis, 1999, p. 2). Diaz (2000) stated that researchers have long sought to determine whether distance education can provide the same level of academic excellence as courses taught in traditional modes.

 Two recent reports that have prompted debate about educational technology and distance education research are Russell's (1997) "The No Significance Difference Phenomenon" and Phipps and Merisotis' [like Mary and Tom's] (1999) critique "What's the difference? A Review of Contemporary Research on the Effectiveness of Distance Learning in Higher Education." Also entering the fray have been Ehrmann (1998) with "What Outcomes Assessment Misses" and Brown and Wack (1999) with their "The Difference Frenzy and Matching Buckshot with Buckshot."

 Russell compiled an annotated bibliography of papers, articles, and research dating back to 1928, with the general conclusion that outcomes of students using technology at a distance are similar to outcomes of students in conventional classrooms. Although acknowledging that technology is not neutral, he explained why study after study found using technology neither improves nor diminishes instruction: 

The truth lies in the fact, often acknowledged but ignored, that students are not alike. Individual differences in learning styles dictate that technology will facilitate learning for some, but will probably inhibit learning for others, while the remainder experience no significant difference. Therefore, when lumping all the students together into a fictional "mass," those who benefit from the technology are balanced by a like number who suffer; when combined with the no-significant-difference majority, the conglomerate yields the widely reported "no significant difference" results. (1997a, p.1)

Phipps and Merisotis responded that "a closer look at the evidence suggests a more cautious view" (p. 1). After reviewing research conducted in the 1990s, including experimental, descriptive, correlational, and case studies, they concluded that the overall positive findings on learning outcomes and on student and faculty perceptions were suspect because of these design flaws: (1) Much of the research does not control for extraneous variables and therefore cannot show cause and effect; (2) most of the studies do not use randomly selected subjects; (3) the validity and reliability of the instruments used to measure student outcomes and attitudes are questionable; and (4) many studies do not adequately control for the feelings and attitudes of the students and faculty, what educational research refers to as "reactive effects." These include the novelty effect, which accounts for greater motivation and a disposition to forgive technology problems, and the John Henry effect, which describes the extra effort both participants and instructors will expend consciously or unconsciously in making a novel learning situation work.

Further, they identified these research gaps: (1) The research has tended to emphasize student outcomes for individual courses rather than a total academic program; (2) the research does not account for differences among students; (3) the research does not adequately explain the higher drop-out rates of distance learners; (4) the research does not consider how students' different learning styles relate to the use of particular technologies; (5) the research focuses on the impact of individual technologies rather than on the interaction of multiple technologies; (6) the research does not include a theoretical or conceptual framework; and (7) the research does not adequately address the effectiveness of digital libraries (pp. 3-6).

Phipps and Merisotis noted that distance education, "once a poor and unwelcome stepchild" has become increasingly more prominent in the academic community (p. 29). However, the pace of innovations has overtaken educators' understanding of practical uses and how technology can enhance teaching and learning process. From their research, they observed that technology cannot replace the human factor without "significant quality loses" and that learning tasks, learner characteristics and motivation, and the instructor affect student outcomes more than technology does (p. 31). 

For them, important research questions remain regarding the necessary skills students need, adequate technical support, and, importantly, access issues, especially given the prohibitive cost of purchasing a computer and software for some students. Nonetheless, flawed research or not, studying the effectiveness of distance education has had the positive result of focusing attention on pedagogy, "the rising tide that lifts all boats."

 Diaz (2000), Ehrmann (1998), Saba (1998), and Brown and Wack (1999) have all questioned the premise of the basic research question, Is distance education as good or better than traditional education? The first difficulty is in defining "traditional" and "distance." Second, the variations across disciplines, courses, teachers, and students confound comparisons. As Diaz stated, educational modalities are framed within their unique contexts (p. 2). 

Diaz also took issue with comparing conventional and distance education because of the inherent assumption that "traditional" is the ideal mode or "gold standard" of educational delivery against which all other forms of "alternative" education should be measured. Promoting a student-centered constructivist approach, he called for research questions to shift from methods or technologies to student characteristics that facilitate success within a particular modality. Noting that student characteristics are in constant flux, "the usual requirements for broad generalization in research may need to be abandoned in favor of a model that continuously monitors student characteristics and determines which characteristics facilitate favorable outcomes" (p. 5). 

Brown and Wack (1999) responded to Phipps [here too] and Merisotsis' criticism of distance education research efforts. They noted that in the "complex, messy world of teaching and learning" (p. 3) there is "as much difference between two teachers doing, purportedly, the same thing in conventional classes as there is between two teachers doing different things" (p. 5). That complexity, therefore, explains why comprehensive, clear evidence is rarely attainable and why efforts to compare conventional and distance classes are problematic. In their view, distance education research has unjustly faced a higher burden of proof than other scientific and educational research projects.

On the other hand, Lockee, Burton, and Cross (1999) proposed that distance education research has largely been biased to show positive results: Stakeholders desire to prove that participants in distance-delivered courses receive the same quality of instruction off-campus as those involved in the "traditional classroom setting. However, the desire to prove that the quality of such distributed offerings is equal to the quality of on-campus programming often results in comparison of achievement between the two groups of student participants. Statistically, such a research design almost guarantees that the desired outcome will be attained-that indeed distance learners perform as well as campus-based students. (p. 34)

Citing Kanner, Runyon, and Desiderato's (19XX) wry observation that "intensive television sessions were no more detrimental to classroom learning than face-to-face instruction," they pointed out that "no significance difference" findings have implicit the conviction that distance learners are engaged in an equally rigorous instructional experience" (p. 35). Moreover, they cautioned against the positive twist given to "no difference" findings, comparing the failure to reject the null hypothesis to a legal finding of not guilty, which does not mean innocent.

Like Brown and Wack (1999), Lockee, Burton, & Cross (1999) noted the difficulty in conducting formal comparative research on distance education and conventional programs because of confounding variables. They further found the continuing media comparison debate "futile," referring to Clark's (1983, 1985) analysis of media effects and his controversial claim that "computers make no more contribution to learning than the truck which delivers groceries to the market contributes to improved nutrition in a community" (1985, p. 259). Clark maintained that any achievement gains in a technology-based program were attributed to other factors, including changes in instructional strategies and reactive effects, such as the novelty or John Henry effects. 

Thus, Lockee, Burton, and Cross recommended alternatives, including qualitative and quantitative longitudinal studies for patterns in specified variables; developmental studies of a program's design, development or evaluation; and analyzing distance learners' experiences. Ehrmann (1998, 1999) likewise called for changing assessment strategies, noting that "local evaluative studies have produced important insights that have reduced risks, guided policy and shaped policy" (1999, p. 1).

Ehrmann stated that evaluation practices commonly assess educational outcomes of a particular program by looking at how well the average student achieved goals. The tests and inquiries have typically been framed to report achievement rather than "being forced to look at and talk about failure" (1998, p. 1). However, only from detecting problems can institutions improve programs. [Around 1980, Lee Cronbach said that evaluation information, when it is used, is almost always formative. That is, it is improvements that matter. I probably have the exact quote somewhere, perhaps in Permissible Computing, but it isn't here.]Nothing that "ends matter, but so do means," he characterized the pervasive quantitative-outcomes-assessment-only approach as "radically incomplete." (p. 2). "How often can faculty members do an experiment that is so carefully designed that the design can rule out all 'extraneous' factors and enable valid inferences about the technology's distinctive role?" (p. 3). Probably never, he answered. Therefore, he urged higher education institutions to undertake as well qualitative evaluations that inquire into how well the technology, user behavior, and learner and other outcomes function together. These studies must be conducted regularly to "discover whether good practices are on the increase and whether problems seem on their way to being solved" (1999, p. 6). 

Qualitative evaluations can help institutions understand the relationship among technology, user behavior, and learner and other outcomes. Erhmann defined the technology of a program as its "hardware, software, and social technology" (1998, p. 2). Saying "all education is local," he emphasized the importance of knowing "what is happening right here, right now, this year, with these people" (p. 3). With that knowledge, institutions can begin to evaluate practices that are working and those that are failing. 

That knowledge must also be shared with faculty, who tend to blame problems on themselves, Ehrmann stated. "We sell faculty on the technology and teach the rudiments, but we don't prepare them for problems they might encounter as part of the teaching activity" (p. 11). He suggested faculty need practice in "simulators," which could be case studies, group discussions, role playing, or a practice class. By discussing why problems occurred and brainstorming possible solutions, faculty gain valuable insights and learn that although academic disciplines differ, teaching and learning dilemmas are comparatively universal. 

Dillon and Gunawardena (1995) also proposed that instructors' attitudes toward technology-mediated distance learning systems be included in evaluations, a view consistent with Fulk, Schmitz, and Steinfield's (1990) Social Influence Model of technology use. According to the model, influences such as work group norms and co-worker and supervisor attitudes and behaviors can positively or negatively influence attitudes, media use, and choices. Given the social influences that influence individual and group behavior (Bandura, 1986; Salancik's & Pfeffer, 1978; Schmitz & Fulk, 1991; Fulk, 1993), Webster and Hackley (1997) proposed that an organization's members are expected to develop coordinated patterns of behavior based on observations of each others' behaviors, the consequences of behaviors, and emotional reactions (p. ). 

It is detecting those patterns, whether faculty, student, or staff, that Ehrmann believed crucial. In "How (Not) to Evaluate a Grant-Funded Technology Project," Ehrmann said, "It's always a good idea to periodically step back and ask, 'What is this initiative? What is it for? Where is it going? The answers may well have changed in surprising ways over a period of months" (199?, p. 3). Rather than falling into the trap of the "rapture of technology" and erroneously assuming the technology is the innovation, evaluators must also look at changes in the staff and their skills, changes in infrastructure, and changes in the student market. 

Because different learners will use technology differently "with qualitatively different and somewhat unpredictable consequences," what matters, Erhmann contended, is "how good the learning outcome is (within a broad range of possibilities)" (p. 4). A unique uses evaluation begins "with a broad search for important outcomes, good and bad, and then assesses each case one at a time. . . . Then the evaluator considers what they imply, collectively, about the innovation" (p. 4). Underscoring the importance of the narrative in teaching and learning, Ehrmann said institutions can begin to "build up a story of the role that the technology is playing or failing to play in strategies in which you are interested" (1998, p. 5).