Public policy, research and online learning

Ubiquity, Volume 2003 Issue August, August 1- August 31, 2003 | BY Stephen Downes

Full citation in the ACM Digital Library

E-learning is more than a new way of doing the old thing. Its outcomes can't be measured by the traditional process.

I recently attended a meeting of planners and policy analysts here in Ottawa and so I have some thoughts fresh in my mind.

We were presented with a talk suggesting that what decision-makers need is an account of e-learning that shows that it is (either or both):

-- more effective than traditional learning

-- more efficient than traditional learning

In other words, the demand appears to be for (mostly quantitative) comparative studies. These studies would be used to justify (partially in hindsight) the large investment in e-learning.

On the one hand, this is a justified retrenchment. As with any large-scale public investment, it is reasonable and rational to ask whether we are receiving any sort of return. We need to be able to show that e-learning has had a positive impact, and we need to explain why money spent in this arena is well spent.

But in my mind across-the-board comparative studies miss the purpose and impact of e-learning. Policy that is guided via such studies will paint a misrepresentative picture of the field. Such studies may be used to suggest a divestment in the face of dubious or questionable results.

In our meeting we were presented with preliminary results of a study developed by Charles Ungerleider and Tracey Burns using the methodology proposed by the Campbell Collaboration. The credo of the Campbell Collaboration is that research ought to be based on evidence. It is hard to dispute such an assertion, especially for myself, a die-hard empiricist.

But the Campbell Collaboration goes further. Drawing from the methodology of the Cochrane Collaboration used in the field of medicine, the idea of the Campbell process is to systematically evaluate a set of empirical studies in which the impact of a single variable or intervention is compared across a study group and a control group.

Thus, in the case of e-learning, the methodology would propose that we study two groups of students, one group in which e-learning had been introduced, another in which e-learning had not been introduced, and measure the impact of e-learning against a common metric. In the case of learning, the metric would typically be the students' test scores.

My previous experience with this sort of study was informed by my work with Tim van Gelder, who applied a similar testing and evaluation regime against a group of students studying critical thinking with the aid of a specific piece of software, ReasonAble. By controlling for students' prior knowledge and experience, van Gelder was able to show an improvement in attainment using the software in comparison with students in other environments taking the same pre- and post-instruction tests.

I think that this is a good result. However, even such a test, rigorous and transparent though it may have been, does not meet Campbell Collaboration criteria because of the absence of a blind control group (evaluations, conducted by a testing center in Los Angeles, were blind). And in my mind, this is an appropriate application of a Campbell-like methodology. But that said, it must be observed that the results of such a test are extremely limited in their applicability.

The idea behind a Campbell-like methodology is that it measures for a single intervention (or a small, easily defined cluster of interventions) against a known and stable environment. Thus, what we see in the Gelder example is a case where students taking a first-year philosophy class are measured. Aside from the use of the digital tool, other variables are held constant: the class is scheduled, it is held in a physical location, it consists of an instructor, a teaching assistant and a certain number of students, it employs a standard curriculum, the instruction is paced.

In other words, what is happening is that the use of the new innovation is tested according to its capacity to do the old thing. It is like testing electronic communications as a means of reducing the price of a stamp, or of reducing the time it takes for a piece of mail to travel through the postal service from Ottawa to Vancouver. It is like evaluating airplanes on their capacity to move through the Rocky Mountain tunnels more efficiently than trains or on the basis of whether they will float in a rough sea.

The Campbell Collaboration methodology works exceptionally well in a static situation. The medical analogue has shown some success because, in many instances, the conditions and objectives before and after the treatment of a disease are static. But even in medicine, the applicability is limited. One area in which the Cochrane Collaboration was employed, for example, was in procedures for the treatment of war wounds. Certain interventions may be tested and improvements in the treatment identified. But the most effective means of treating war wounds -- stop having wars -- falls beyond the bounds of the Cochrane methodology. The only successful practices identified presuppose the having of a war, and consequently, the most effective remedy fails to gain any empirical support despite the existence of a substantial research program.

During the meeting I remarked that one of the characteristics of a Campbell Collaboration approach is that its conclusions are contained in the question. No doubt this may have been interpreted as a political statement, and perhaps there were ways I could have said it more carefully in the two minutes I was allotted through the course of the day. But the statement is accurate, if not from a political stance, then from a scientific stance. Studies following a Campbell Collaboration methodology are instances of what Kuhn calls "normal science." The answer is contained in the question in the sense that the overall methodology -- the paradigm -- is assumed as a constant background in which the experimentation takes place.

In the field of learning, "normal science" consists -- as it did for the van Gelder study -- of classroom-based instruction, the success of which is informed by testing. In the field of education as a whole, normal science consists of sets of courses and programs offered by traditional institutions in a mostly paced manner, the success of which is informed by (increasingly popular) standardized tests.

The problem with measuring e-learning in such an environment is what counts as teaching and what counts as learning is held to be static both before and after the intervention. To put it a bit crassly, the standardized tests pre-suppose that if a student is able to demonstrate the acquisition of a body of attained knowledge -- reading, writing, mathematics, to name a few -- then the student has demonstrated that he or she has learned. The mechanics and the methodology of the traditional system are geared toward the production of students who have attained and retained this knowledge, and who have attained it in a certain way.

But as I have urged through numerous papers and talks, e-learning is fundamentally different from traditional learning. It is not merely a new way of doing the old thing. Not only does it create a new methodology, it creates new -- and unmeasurable, by the traditional process -- outcomes. In particular, I identify three major ways in which e-learning cannot be compared to traditional instruction:

1. Its Newness

Traditional learning is well entrenched, while e-learning is brand new. Many of the tools and techniques involved in e-learning have not been developed yet, so in an important sense comparative studies are attempting to measure something that does not exist. Moreover, the use of those tools that do exist is not informed by appropriate methodologies.

The first generation of e-learning applications did little more than transfer the method and structure of the traditional classroom to an online environment. Products like WebCT, for example -- originally designed as a set of course tools, hence its name -- are designed to deliver a pre-designed and often paced collection of educational content in a traditional instructor-student mode of delivery. Though this approach was a marketing success, appealing as it does to those entrenched in a traditional model of delivery, it is far from clear that this is the most appropriate use of new technologies for learning.

The use of computers in the classroom is similarly suspect. We heard from Angela McFarlane that despite the availability of computers reaching near ubiquity, they are seldom used by students for learning, and what learning does occur is in the use of the tool. It is little surprise, then, that her studies show little, if any, positive correlation between the use of computers in school and achievement on standardized tests.

McFarlane moreover observed that students use computers three times as much at home as they do in school. Moreover, she noted, the degree of usage at home was positively correlated with educational achievement (as defined by test scores). So what are we to make of this? The only positive correlation to be found was the result of factors completely outside the evaluation parameters, and is revealed only because the Campbell methodology was not followed. How is it that computer use at home correlates with higher test scores; what are students doing? And even more importantly, what are students learning -- if anything -- over and above that which could be detected by standardized tests?

2. Differing Objectives

When we talk about the use of computers in learning, the purpose of this use is assumed to be the same as the purpose of traditional teaching. But there are clear instances in which the use of new technologies goes beyond what we sought to attain in the classroom.

Valerie Irvine sketched a number of such objectives. The use of online learning means that various groups -- people living in rural communities, people who are disabled, people who must work for a living -- are able to obtain access to an education previously unavailable to them. The only reasonable comparison that could be made is between online learning and nothing at all. This, of course, one again breaks out of the Campbell methodology, because there are no constant variables to be maintained.

Indeed, to go a bit beyond Irvine's point, the use of online learning to extend accessibility could be viewed, on the strict basis of achievement in tests, to be a bad thing. It is at least possible that these new students, because they had been away from learning for so long, will actually score lower in tests than their more privileged counterparts. Thus, the overall impact of e-learning would be a reduction in test scores.

Just as hospitals that cater only to the healthy will appear to be more successful, just as schools that demand strict admission standards will appear to produce more educated students, so also will a system that caters only to those with no barriers to their education will appear to be more successful. When a significant group of people is eliminated from the test by the testing methodology, the results are skewed.

It is moreover not clear that the educational outcomes produced via the use of computers is the same as those produced in the classroom. As I remarked above, the traditional classroom and the standardized test measures the attainment and retention of knowledge. I would argue that online learning fosters and promotes a different type of learning, the capacity to "think for oneself." I think that online learning produces a sense of empowerment in people, draws out the shy and gives a voice to those who would not speak, helps people find information on their own, encourages creativity and communication, and helps people develop a stronger sense of personal identity.

I draw out this list because these are properties that I have seen reflected in various reports of the use of e-learning and ICT in general. Such effects are not measured by standardized testing, however, and it is not clear to me that any Campbell study could reveal them. For one thing, how would we accomplish such a feat? Would we seize the students' computers from their homes for the duration of their primary studies in order to obtain a control group? And how does one measure a stronger sense of identity? It is not possible to find even a correlation, much less a causal connection. Even if a student is barred from computer ownership, the total environment -- the media, their behaviour of their connected friends -- will have an impact on the outcome.

3. Scope and Domain

The purpose of a Campbell study is to measure the impact of an intervention in a specific domain. However, it is likely that studies from beyond a specific domain will be relevant to the current result.

For example, I suggested that usability studies would have a direct bearing on the evaluation of the effectiveness of e-learning. In a usability study, the ease with which a group of people can attain a certain result is measured. These measurements are then used to determine whether a given application or system is "usable," that is, can be used in the manner intended.

It would seem immediate and intuitive that usability would be a significant variable to be considered when evaluating the effectiveness of e-learning. There is no shortage of articles and discussion list posts lamenting the poor usability of e-learning applications. If these applications, which are known and measured to be unusable, are being used in the evaluation of e-learning, then is the failure of e-learning in such a context to be based on the general uselessness of e-learning, or on the application in particular? We see no such distinction being made.

In a similar manner, there have been numerous studies of online communities over the years, beginning with Turkle's "Life on the Screen" and including the work of Rheingold and Figallo. These studies, in addition to not even remotely following Campbell methodology, are again completely outside the domain of online learning. Moreover, they never could, because the provision of an education by means of an online community involves the imposition of not a single intervention, but rather, of a whole series of interventions.

I have commented on the use of online communities in the traditional educational setting before. Characteristically, such communities consist of a limited number of members, usually the students of a particular class. Moreover, such communities typically have a starting point and an end point; when the class is over in June, the community, after a life of 8 months, is disbanded. Further, such communities are artificially contrived, grouping together a set of members that have no more in common than their age or their enrollment in a given class. Any measurement of a community in such a setting is bound to be a failure because the constraints of the traditional classroom -- required in order to conduct a single-variable study -- have doomed the community to failure, a failure that can be predicted from research outside the domain of education.

The comment was made by several participants that the methodology employed in the physical sciences, as instantiated in the Cochrane Collaboration, cannot be carried over to the field of education (McFarlane commented that it is losing credibility even in the physical sciences). Cognitive phenomena, such as learning, are not the same as physical phenomena. Indeed, some physical phenomena are not the sort of things that can be studied using the physical sciences.

In my own opinion, the theoretical basis for this assertion lies in the nature of the physical or cognitive phenomena being studied. In a classical physical experiment, as mentioned above, the system being studied is controlled. The effect of external variables is minimized or eliminated. Only the impact of a single intervention is studied.

Such a system works well for what might be described as causal chains -- A causes B which in turn causes C. But in certain environments -- the human immune system appears to be one; the world of microphysics appears to be another -- the classic physics of the causal chain breaks down. What we are presented with instead is what might more accurately be described as a network instead of a chain; everything impacts everything. Multiple, and perhaps unrelated, sets of circumstances produce the same result. Even the concept of the "same result" is questionable, because the existence of condition P in network N might be something completely different from the existence of condition P in network N1.

Because of the nature of the network, general principles along the lines of "If A then B" are impossible to adduce. Indeed, attempts to observe such principles by preserving a given network structure produces artificial -- and wrong -- results. The very conduct of a traditional empirical study in such an environment changes the nature of the environment, and does so in such a way as to invalidate the results. In a network, in order to produce condition B, it might be necessary to create condition A+C in one case, C+E in another, and G+H in another. This is because the prior structure of the network will be different in each instance.

Human cognition is a network phenomenon at two levels. One level has always existed; the other is a new development, at least to a degree. In the individual, human cognition has always been a network phenomenon. Mental states consist of the activation and interaction of neural states. These neural states exist in combination; none exists in isolation from the other. Even when we identify something that is the "same" -- the belief that "Paris is the capital of France" for example -- this sameness is instantiated differently in different individuals, both physically (the actual combination of neural activations is different) and semantically (what one person means, or recollects, about "Paris" is different).

The second level, greatly enhanced and extended by ICT, is the interaction of humans with each other. Though at the community level there was a certain sense in which every person was related to every other (especially in smaller communities), this has never been true at anything like a global level. However, the much denser degree of interactions between humans fostered by new communications technologies -- including but not limited to the Internet -- has created a global dynamic that is very similar, in terms of logic and structure, to the cognitive dynamic.

The very idea of isolating a social phenomenon -- such as education -- and the measurement of a specific intervention -- such as e-learning -- should be questioned. The education of an individual, then education of a society: neither exists without impacting and being impacted by a range of other phenomena. The impact, therefore, of an intervention in education might not be realized in educational attainment at all.

In the last 30 seconds of my comments I tried to draw this out. Probably it was, unfortunately, simply interpreted as a dogmatic anti-testing stance. But the essence of my comment was that the tests we use to evaluate the impact of an education are not appropriate because they measure the wrong sort of things, because they are, indeed, representative of the cause-effect view that an educational intervention will have an educational result.

This is not so. In the meeting I highlighted two clear impacts of education outside the field of measurement. One was the correlation, well established, between the level of educational attainment and the person's income level. Another, less well established, was the correlation between the level of education in a society and the crime rate in the society. I could suggest more examples. Greater educational attainment, for example, may correlate with improved health. Greater educational attainment could correlate with an increase in the number of inventions and (heaven forbid) patents.

None of these would be captured by a traditional research program. Moreover, a traditional research program will also abstract out the impact of non-educational factors at the input end. For example, it has been suggested that the best way to improve educational attainment in the United States is to ensure that every child has a hot lunch. It has been suggested that healthy students learn better. Perhaps the road to educational attainment in the United States lies through their social welfare and their health care system.

Where do we go from Here?

First, we can stop doing some of the wrong things. In particular, we should stop thinking of an education as something is attained by, and that benefits, an individual in isolation from the rest of the social network in which he or she resides. This is not to disclaim the importance of personal responsibility, and personal achievement, inherent in education. But it is to place these in an appropriate context.

The valuation, therefore, of "intervention A produced result B" studies (common in the educational literature) should be significantly reduced. Such studies are not producing reliable or useful research data, and in some cases, are producing harmful results. Just as the anecdote is not a reliable ground on which to infer a statistical generalization, so also a controlled study is not a reasonable ground on which to infer a network phenomenon.

Second, stop seeking -- and believing -- generalizations. That is not to say that nothing can be learned about the properties and behaviours of networks. It is to say that the sorts of generalizations obtained by single-variable experimentation will say nothing about the properties and behaviours of networks. Money invested in such research is, in my opinion, lost money. The generalizations that result may cause more harm than good.

For example of this, consider the various business and industrial organization strategies that have been all the rage in recent years. We have progressed through TQM, open concept offices, flattened hierarchies, entrepreneurial sub-units, and more. Significant evidence was brought to light testifying to the success of each of these management strategies. Yet none of them achieved the wide-ranging success promised by their proponents, and in some cases, the upheaval brought companies to the brink of ruin. Why would something that worked so well in one company work so poorly in another? Companies -- like individuals and societies -- are networks, organized in various structures internally, and subject to a wide range of influences externally.

There is no "magic bullet" that improves companies. If there were, it would have been discovered somewhere in the proliferation of business strategies. Just so, there is no "magic bullet" that improves education. This should be obvious! An intervention that will work at Ashbury Collegiate will have mixed results at Osgoode Township and may be a disaster at Davis Inlet. You cannot generalize across isolated entities and phenomena in networks.

Third, reconsider the criteria for success. Very narrowly defined criteria for success are not only misleading, they are dangerous. They indicate that failure has occurred when it has not; they indicate that success has occurred when it has not. For example: any traditional evaluation of my education would conclude that it was a failure. I was even included in a post-graduate study (University of Calgary students, 1986). The last (and I assume final) time the impact of my university education was measured, about ten years ago, I was living in poverty, having been able to eke out only the most meagre living on the strength of my philosophy degrees. But significantly, to me, my education had been a success even at that time, because the quality of my life (my mental life, if not my standard of living) had been greatly enhanced.

The fact is, there will never be a nice neat set of criteria for success in education, no more than there will be a nice neat set of criteria for what constitutes the "good life" or for what constitutes "moral purity." Such assessments are extremely subjective. In education, they are currently the subject of political and cultural debate, not scientific or educational rationale. Even something as simple as "knowing how to read" is subjected by the push and pull of different interests; in the seminar we heard that, from the point of view of the phonics industry, some people could not "read" because they were shown to be unable to pronounce new words (in other words, because they were not proficient in phonics). From another point of view, comprehension, not pronunciation, may be viewed as crucial to reading. From yet another point of view, the capacity to infer and reason is the basic criterion for literacy.

Success in education, like any network phenomenon, can only be defined against a context. The evaluation of what is achieved can only be measured against what the achiever sought to obtain. Just as it is unlikely that society as a whole, let alone individuals as a whole, seek to attain nothing more than high test scores, to that measure high test scores are an inadequate measure of achievement. A person who drops out of high school in order to become a successful rock musician should be judged to have been an educational success; a person who achieves the honour roll but who lived a frustrated and unfulfilled life should be considered an educational failure.

The criteria for measuring the success of an education must be derived from multiple sources. We already hear from business and industry that their criteria -- the usefulness and productivity of future employees -- may not be predicted by success in academia. No doubt an individual's account of success -- whether they were happy, fulfilled, rich -- will vary from person to person. The social criteria for success -- better health, lower crime -- varies from time to time and from political and social group to group.

Success, in other words, is not a micro phenomenon. It is not identified by a set of necessary and sufficient conditions in an individual. It is, rather, a macro phenomenon. It is identified by structural properties in the network as a whole. Such structural properties do not depend on the specific nature of any given entity on the network, but rather are emergent properties of the network as a whole (for example: the appearance of Jean Chretien's face on a TV screen does not depend on whether any given pixel is light or dark -- it is an emergent property, recognizable not by studying the pixels but rather by studying the patterns produced by the pixels).

Fourth, adopt methodologies and models appropriate to the study of network phenomena. I discussed several of these approaches on my Web page. We need to enter a research regime in which we are comfortable discussing multiple realities, alternative models of society, macrophenomena amidst micro-chaos. We need to begin to look at education as only a part of a larger, multi-disciplinary approach to understanding social and cognitive phenomena. We need to abandon, indeed, the idea that there are even specific disciplines in which isolated research may take place. Just as it is now current to attach economic values to social-political phenomena, we need to begin attaching values from a wide range of schools of thought to other disciplines.

To take this a step further, we need to reconsider the language and logic used to describe such phenomena. In my own work I adduce associations between similar, but disparate, sets of phenomena. That is to say, I do not try to trace causal structures, but rather, I attempt to observe patterns of perception. You might say that I attempt to identify truths in cultural, social and political phenomena in much the same way you attempt to identify Jean Chretien on a TV screen. In order to do this, however, I need to employ new terminology and new categories not appropriate within the context of any given field of study, just as your description of a certain set of phenomena as "Jean Chretien's ear" has no counterpart in the language of pixels and cathode ray tubes.

Finally, fifth, we need to reconsider the locus of control. The very idea of evidence-based public policy assumes that an external intervention at some point in the network can produce some observable, and presumably desired, results. There is no reason to believe this, and indeed, good reason to believe the contrary. It is like supposing that, by the stimulation of a few neurons, one can create the knowledge that "Paris is the capital of France." But neurons, and mental phenomena, do not work that way. At best, we can produce the mental equivalent of physical phenomena -- the sensation of toast burning, for example. We can produce the sensation. But we cannot produce the articulation, the realization that it is "toast burning." A person who has never experienced toast burning would describe exactly the same phenomenon in a different way.

The only measurable impact of an intervention will be at the macro level. We already see this in economics. We cannot, for example, evaluate the impact of an increase in interest rates by changing them for the City of Grande Prairie and observing the results. For one thing, it is not possible to isolate Grand Prairie from the world, remote though it may be. Raise the rates in Grande Prairie and residents will borrow from banks in Dawson Creek; lower them and residents from Dawson Creek will travel to Grande Prairie. Moreover, the effect of interest rates in Vancouver impacts the level of employment in Grande Prairie, since the cities are connected by a variety of trade, cultural and other associations.

The corollary of this assertion is that micro-interventions will not be measurable at all, at least, not in any meaningful way. It follows, therefore, that it is misguided to attempt to intervene at this level. For example, it may be proposed to adopt one particular type of educational intervention -- the use of ReasonAble, say -- in a pilot program and to measure its educational impact. But the impact, if any, will be swamped by the variety of external variables. A policy approach, therefore, that directs research at the micro level will produce no useful research results.

Thus, interventions at the macro level should not attempt to determine how they are instantiated at the micro level. Nor should they be described in such terms.

Concretely: a macro level intervention might be described as "increasing the connectivity of Canadians." The manner in which this intervention is implemented is not further described; it becomes the prerogative of each individual to respond to this intervention in his or her own way. Nor is it limited to some pre-selected "best" or "most appropriate" set of projects, nor is an attempt made to isolate the environment in which the intervention takes place. Nor are individual applications of the initiative evaluated, because the evaluation itself will skew the manner in which the intervention was instantiated (this creating another element of micro-control). Rather, we would ask, as a result of the connectivity program, first, did the level of connectivity increase, and second, are there any other derivable phenomena, such as a reduction in unemployment, which appear to be associated with this phenomenon.

Obviously, experimentation at the society-wide level is a risky undertaking. Such experimentation should be preceded by simulation and modelling in order to reduce the risk of an experimental failure. To a certain degree, we should try to learn from similar experiences in other jurisdictions. In the areas where we lead the world, we should try to learn from similar experiences in different domains. What can education, for example, learn from the decades-long intervention created by the ParticipAction program?

To Conclude

The introduction of ICT to the educational environment -- and to society at large -- has produced a paradigm change in the study of social and educational phenomenal. This is not something that I am advocating; I am not proposing this as an instrument of public policy. It has already happened; as McFarlane commented, students are already voting with their feet. Learning is already occurring outside the classroom; the Internet has already transformed the study (and play) habits of the young. It has transformed an area of endeavour that once could be understood (if it could be understood at all) through traditional cause-and-effect empirical science to something that must be understood only through a quite different perspective and methodology.

The idea that we can control any individual element of this new environment must be abandoned. To the extent that we could ever control such effects, this is no longer the case, and we are deluding ourselves if we believe we can derive new knowledge from the study of isolated events. Even the criteria for success -- what counts as a successful methodology in education, what counts as a failure -- can be measured only via system-wide phenomena beyond the narrow domain of educational attainment itself.

To the extent that we wish to improve society -- and the very concept of public policy presupposes that we do -- we must base our initiatives not on narrow and misleading studies conducted in artificial environments, but on modelling and analogy from similar circumstances in different domains and different environments. We need to form as clear a picture (or set of pictures) of society as a whole as we can, and to understand the inter-relationships between and across sectors, across disciplines, and to form policy on that basis, rather than on the now-illusory premise of a magical wand that will foster universal happiness.

This is what I tried to say in my two minutes.

For more information and writing by Stephen Downes, see Stephen's Web - Knowledge - Learning - Community

COMMENTS

Articles

Public policy, research and online learning