Topical Paper on Relationship Between Experimental Design and Control of Threats to External and Internal Validity

 

Experimental designs are set up to study possible cause-effect relationship among variables. These are in contrast to correlational studies, which examine the relationship among variables without necessarily trying to establish which variable causes another. Correlational studies can be done as filed studies (i.e., the phenomena can be studied as they occur in their natural environment), but causal studies usually have varying degrees of artificial constraints imposed on them, interrupting the natural sequence or flow of events.

Experimental designs fall into two categories: experiments done in an artificial or contrived environment, known as lab experiments, and those done in the natural environment in which activities regularly take place, known as the field experiment.

CONTROL. When a cause-effect relationship is to be clearly established between an independent and a dependent variable of interest, then all other variables that might contaminate or confound the relationship have to be tightly controlled. In other words, the possible effects of other variables on the dependent variable have to be in some way accounted for, so that the actual causal effects of the investigated independent variable on the dependent variable can be determined. It is also necessary to manipulate the independent variable so that the extent of its causal effects can be established. The controls and manipulations are best done in an artificial setting called laboratory, where the causal effects can be tested. When artificial controls and manipulations are introduced to establish cause-effect relationships, we have laboratory experimental designs, also known as lab experiments.

When we postulate cause-effect relationships between two variable X and Y, it is possible that some other factor, say A, might also influence the dependent variable Y. In this case, it will not be possible to determine the extent to which Y happened or varied only because of X and to what extent Y has been additionally influenced by the presence of the other factor A. For instance, a Human Resources Development manager might arrange for special training in a new word processing language to a set of newly recruited secretaries to prove to the V.P. (the boss) that such training would cause them to learn faster. However, some of the new secretaries might learn faster also because they have had previous experience with another word processing language. In this case, the manager cannot prove that the special training alone caused faster learning, since the previous experience with another word processing language is a contaminating factor. If the true effect of the training on learning is to be assessed, then, the learner's previous experience with another language has to be controlled. This might be done by not including in the experiment those who already know some kind of word processing. This is what we mean by having to control for contaminating factors.

MANIPULATION OF THE INDEPENDENT VARIABLE. In order to examine the causal effects of an independent variable on a dependent variable, certain manipulations need to be tried. Manipulation simply means that we create different levels of the independent variable to assess the impact on the dependent variable. For example, if we want to test the theory that deep knowledge of various manufacturing technologies is caused by the rotation of employees on the production line and being exposed to the various systems over a two-week period, then we can manipulate the independent variable, "rotation and exposure." That is, one group of production workers can be rotated and exposed to all the systems during a two-week period, one group of workers could be exposed and rotated partially during the two weeks (i.e., being exposed to only half of the manufacturing technologies), and the third group could continue to do what they are currently doing without any special rotation and exposure. By measuring the deep knowledge of these groups both before and after the manipulation (also known as the "treatment"), it would be possible to assess the extent to which the treatment caused the effect, after controlling for the contaminating factors. If deep knowledge is indeed caused by rotation and exposure, the results would show that the third group had the lowest increase in deep knowledge, the second group had some significant increase, and the first group had the greatest gains!

CONTROLLING THE CONTAMINATING OR "NUISANCE" VARIABLE. One way of controlling for the contaminating or "nuisance" variable is to match the various groups by taking the confounding characteristics and deliberately spreading them across groups. For instance, if there are twenty women among the sixty members, then each group will be assigned five women, so that the effects of gender are distributed across the four groups. Likewise, age and experience factors can be matched across the four groups, such that each group has a similar mix of individuals in terms of gender, age, and experience. Because the suspected contaminating factors are matched across the groups, we can be comfortable in saying that variable X alone causes variable Y, if such is the finding after the experiment.

Another way of controlling for the contaminating variables is to assign the sixty members randomly (i.e., with no predetermination) to the four groups. That is, every member would have a known and equal chance of being assigned to any of these four groups. For instance, we might throw the names of all the sixty members into a hat, and draw their names. The first fifteen names drawn may be assigned to the first group, the second fifteen names to the second group, and so on, or the first person drawn might be assigned to the first group, the second person drawn to the second group, and so on. Thus, in randomization, both the process by which individuals are drawn is random (i.e., everybody has a known and equal chance of being drawn) and the assignment of the individual to nay particular group is also random. By thus randomly assigning members to the groups we would be distributing the confounding variables among the groups equally. That is, the variables of age, sex, and previous experience—the controlled variables—will have an equal probability of being distributed among the groups. The process of randomization would ideally ensure that each group is comparable to the other, and that the effects of age, sex, and previous experience are controlled. That is, each of the groups will have some members who have more experience mingled with those who have less or no experience. All groups will have members of different age and sex composition. Thus randomization would ensure that if these variables do indeed have a contributory or confounding effect, by distributing these confounding effects across groups, we have controlled their confounding effects. We cannot now say that the cause-effect relations have been confounded by the "nuisance" variables, because they have been controlled through the process of randomly assigning members to the groups. Here we have high internal validity or confidence in the cause-effect relationship.

A field experiment, as the name implies, is an experiment done in the natural environment in which events normally occur, with treatments given to one or more groups. Thus in the field experiment, even though it may not be possible to control all of the nuisance variables because members cannot be either randomly assigned to groups or matched, treatments can still be manipulated. Control groups could also be set up in field experiments. The experimental and control groups in the filed experiment could be made up of the people working at several plants within a certain radius, or from the different shifts in the same plant, or in some other way. Any cause-effect relationship found under these conditions would have wider generalizability to other similar production settings, even though we may not be sure to what extent the price rates alone were the cause of the increase in production because some of the other confounding variables could not be controlled.

EXTERNAL VALIDITY. The above discussion can be referred to as an issue of external validity versus internal validity. External validity refers to the extent of generalizability versus results of a causal study of our confidence in the causal effects—that variable X causes variable Y. Field experiments have more external validity (i.e., the results may be more generalizable to other similar organizational settings); but they have lesser internal validity (i.e., we cannot be certain of the extent to which variable X alone caused variable Y). Note that in the lab experiment, the reverse is true. The internal validity is high but the external validity is rather low. In other words, in lab experiments we can be sure that variable X causes variable Y because we have been able to keep the other confounding exogenous variables under control, but we have so tightly controlled several variables to establish the cause-effect relationship that we do not know to what extent the results of our study could be generalized, if at all, to filed settings. In other words, since the lab setting does not reflect the "real world" setting, we do not know top what extent the lab findings validly represent the reality in the outside world.

TRADE-OFF BETWEEN INTERNAL AND EXTERNAL VALIDITY. There is thus a trade-off between internal and external validity. If you want high internal validity, we should be willing to settle for lower external validity and vice versa. To ensure both types of validity, researchers usually try first to test the causal relationships in a tightly controlled artificial or lab setting, and once the relationship has been established, they try to test the causal relationship in a field experiment. Lab experimental designs in the management area have thus far been used to sort out, among other things, gender differences in leadership styles, managerial aptitudes, and so on. However, gender differences and other factors found in the lab settings are frequently not found in filed studies. These problems of external validity usually limit the use of lab experiments in the management are, even though they are frequently used as a first step to establish cause-effect relationships before extending causal theories to field settings. Because field experiments often have unintended consequences—such as personal becoming suspicious, rivalries and jealousies being created among departments—researchers conduct field experiments only infrequently.

FACTORS AFFECTING INTERNAL VALIDITY. Even the best designed lab studies could be influenced by factors that might affect the internal validity of the lab experiment. That is some confounding factors might still be present that could offer rival explanations as to what is causing the dependent variable. These possible confounding factors pose a threat to internal validity. The seven major threats to internal validity are the effects of history, maturation, testing, instrumentation, selection, statistical regression, and mortality.

History Effects. Certain events (or factors) that would have an impact on the independent variable-dependent variable relationship might unexpectedly occur while the experiment is in progress, and this history of events would confound the cause-effect relationship between two variables, thus affecting the internal validity.

Maturation Effects. Cause-effect inferences can be contaminated by the effects of the passage of time—another uncontrollable variable. Such contamination is called maturation effects. The maturation effects are a function of the processes—both biological and psychological—operating within the respondents as a result of the passage of time. Examples of maturation processes could include growing older, getting tired, feeling hungry, and getting bored. In other words, there could be a maturation effect on the dependent variable purely because of the passage of time.

Testing Effects. Frequently, to test the effects of a treatment, subjects are given what is called a pretest. That is, first a measure of the dependent variable is taken (the pretest), then the treatment is given, and after the treatment a second test, called the posttest is administered. The difference between the posttest scores and the pretest scores is then attributed to the treatment. However, the very fact that respondents were exposed to the pretest might influence their responses on the posttest, which would adversely impact on internal validity.

Instrumentation Effects. Instrumentation effects are another source of threat to internal validity. These effects might arise because of a change in the measuring instrument between pretest and posttest, and not because of the treatment's differential impact at the end. In organizations, instrumentation effects in experimental designs are possible when the pretest is done by the experimenter, treatments are given to the experimental groups, and the posttest on measures such as performance are done by different managers. Thus, instrumentation effects also pose a threat to internal validity in experimental designs.

Selection Bias Effects. The threat to internal validity could also come from improper or unmatched selection of subjects for the experimental and control groups. For example, if a lab experiment is set up to assess the impact of working environment on employees' attitudes toward work, and if one of the experimental conditions is to have a group of subjects work in a room with some sench, for about two hours, an ethical researcher might disclose this condition to prospective subjects, who may decline participation in the study.

Statistical Regression. The effects of statistical regression occur when the members chosen for the experimental group have extreme scores on the dependent variable to begin with. For instance, if a manager wants to test if he can increase the "salesmanship" repertoire of the sales personnel through Dale Carnegie-type programs, he should avoid choosing those with extremely low abilities or extremely high abilities for the experiment. This is because we know from the laws of probabilities that those with probability of showing improvement and scoring closer to the mean on the posttest after being exposed to the treatment. This phenomenon of low scores tending to score closer to the mean is known as "regression toward the mean" (statistical regression).

Mortality. Another confounding effect on the cause-effect relationship comes from the mortality or attrition of the members in the experimental or control group or both, as the experiment progresses. When the group composition changes over time across the groups, comparison between the groups becomes difficult, because those who dropped out of the experiment may confound the results. Thus, the mortality can also lower the internal validity of an experiment.

FACTORS AFFECTING EXTERNAL VALIDITY. Whereas internal validity raises questions about whether it is the treatment alone or some extraneous factor that causes the effects, external validity, raises issues about the generalizability of the findings to other settings. For instance, the extent to which the experimental situation differs from the settings to which the findings are expected to be generalized is directly related to the degree of threat it poses to external validity. Maximum external validity can be obtained by ensuring that the experimental conditions are as close and compatible as possible to the real-world situation. It is in this sense that filed experiments have greater external validity than lab experiments. That is, the effects of the treatment can be generalized to other settings similar to the one where the field experiment was conducted. Threats to external validity can be combated by creating experimental conditions that are as close as possible to the situations to which the results of the experiment are to be generalized.

WHEN ARE EXPERIMENTAL DESIGNS NECESSARY? Before embarking on research studies using experimental designs, it is essential to consider whether experimental designs are necessary at all, since they involve special efforts and varying degrees of interference with the natural flow of events. Some questions that need to be addressed are the following:

1. Are causal relationships necessary to be identified, or would tracing the correlates that account for the variance in the dependent variable be enough? If the latter would do, experimental designs are not really needed.

2. If causal relationships are important to be identified, is these a greater need for internal validity or external validity or both? If internal validity alone is important, a carefully designed lab experiment would be the answer; if generalizability is the more important criterion, then a field experiment would be called for; if both are equally important, then a lab study should be first undertaken, followed by a field experiment.

3. Is cost an important factor in the study? If cost is a primary consideration, would a less sophisticated rather than a more sophisticated experimental design do?

There are couple of commonly used experimental designs to determine the extent to which they guard against the seven factors that could contaminate the internal validity of experimental results.

Pretest and Posttest Experimental Group Design. An experimental group (without a control group) may be given a pretest, exposed to a treatment, and then given a posttest to measure the effects of the treatment. However, testing and instrumentation effects might contaminate the internal validity. If the experiment is extended over a period of time, history effects and maturation effects may also confound the results.

Posttests Only with Experimental and Control Groups. Some experimental designs are set up with experimental and a control group, the former being exposed to a treatment and the later not being exposed to it. The effects of the treatment are studied by assessing the difference in the outcomes—that is, the posttest scores of the experimental and control groups. There are at least two possible threats to validity in this design. If the two groups are not matched or randomly assigned, selection bias could contaminate the results. That is, the differential recruitment is the persons making up the two groups would confound the cause-effect relationship. Mortality (the dropout of individuals from groups) can also confound the results, and thus pose a threat to internal validity.

Pretest and Posttest Experimental and Control Group Designs. This design can be visually depicted as follows: two groups—one experimental and the other control—are both exposed to the pretest and the posttest. the only difference between the two groups is that the former is exposed to a treatment whereas the later is not. Measuring the difference between the differences in the post- and pretests of the two groups would give the net effects of the treatment. Both groups have been exposed to both the pre- and posttests, and both groups have been randomized; thus we could expect that the history, maturation, testing, and instrumentation effects have been controlled. Mortality could, however, pose a problem in this design.

Solomon Four-Group Design. To gain more confidence in internal validity in experimental designs, it is advisable to set up two experimental groups and two control groups for the experiment. One experimental group and one control group can be given both the pretest and the posttest. The other two groups will be given only the posttest. Here the effects of the treatment can be calculated in several different ways. To the extent that we come up with almost the same results in each of the different calculations, we can attribute the effects to the treatment. This increases the internal validity of the experimental design and its results. This design, known as the Solomon four-group design, is probably the most comprehensive and the one with the least problems with internal validity.

Simulation. An alternative to lab and filed experimentation currently being used in research is simulation. Simulation uses a model-building technique to determine the effects of changes, and computer-based simulations are becoming popular in research. A simulation can be thought of as an experiment conducted in a specially created setting that very closely resembles the natural environment in which events usually occur. In that sense, the simulation lies somewhere in between a lab and a filed experiment—insofar as the environment is artificially created but not far different from "reality." Cause-effect relationships are better established in experimental simulations where the researcher has greater control.