Program Evaluation Studies
TK Logan and David Royse
A variety of programs have been developed to address social problems such as drug addiction, homelessness, child abuse, domestic violence, illiteracy, and poverty. The goals of these programs may include directly addressing the problem origin or moderating the effects of these problems on indi- viduals, families, and communities. Sometimes programs are developed
to prevent something from happening such as drug use, sexual assault, or crime. These kinds of problems and programs to help people are often what allracts many
social workers to the profession; we want to be part of the mechanism through which society provides assistance to those most in need. Despite low wages, bureaucratic red tape, and routinely uncooperative clients, we tirelessly provide services that are invaluable but also at various Limes may be or become insufficient or inappropriate. But without conducting evaluation, we do not know whether our programs are helping or hurting, that is, whether they only postpone the hunt for real solutions or truly construct new futures for our clients. This chapter provides an overview of program evaluation in gen- eral and outlines the primary considerations in designing program evaluations.
Evaluation can be done informally or formally. We are constantly, as consumers, infor- mally evaluating products, services, and in formation. For example, we may choose not to return to a store or an agency again if we did not evaluate the experience as pleasant. Similarly, we may mentally take note of unsolicited comments or anecdotes from clients and draw conclusions about a program. Anecdotal and informal approaches such as these gen- erally are not regarded as carrying scientific credibility. One reason is that decision biases play a role in our “informal” evaluation. Specifically, vivid memories or strongly negative or positive anecdotes will be overrepresented in our summaries of how things are evaluated. This is why objective data are necessary to truly understand what is or is not working.
By contrast, formal evaluations systematically examine data from and about programs and their outcomes so that better decisions can be made about the interventions designed to address the related social problem. Thus, program evaluation involves the usc of social research meLhodologies to appraise and improve the ways in which human services, poli- ci~s, and programs are conducted. Formal eval.uation, by its very nature, is applied research.
Formal program evaluations attempt to answer the following general question: Does the p rogram work? Program evaluation may also address questions such as the following: Do our clients get better? How does our success rate compare to those of other programs or agencies? Can the same level of success be obtained through less expensive means?
222 PART II • QUANTITATIVE A PPROACHES: TYPES OF STUDIES
What is the experience o f the typical client? Should this program be terminated and its funds applied elsewhere?
Ideally, a thorough program evaluation would address more complex questions in three main areas: (1) Does the program produce the intended outcomes and avoid unin- tended negative outcomes? (2) For whom does the program work best and under what conditions? and (3) Ilow well was a p rogram model developed in one setting adapted to another setting?
Evaluation has taken an especially prominent role in practi.ce today because of the focu~ on evidence-based practice in social programs. Social work, as a profession, has been asked to use evidence-based practice as an ethical obligation (Kessler, Gira, & Poertner, 2005). Evidence-based practice is defined diLTerently, but most definitions include using program evaluation data to help determine best practices in whatever area of social programming is being considered. In other words, evidence-based practice includes using objective indica- tors of success in addition to practice or more subjective indicators of success.
Formal program evaluations can be found on just about every topic. For instance, Fraser, Nelson, and Rivn rd (1997) have examined the effectiveness of family preservation services; Kirby, Korpi, Adivi, and Weissman (1997) have evaluated an AIDS and preg- nancy prevention middle school program. Morrow-Howell, Beeker-Kemppainen, and Judy ( 1998) evaluated an intervention designed to reduce the risk of suicide in elderly adult clients of a crisis hotline. Richter, Snider, and Gorey ( 1997) used a quasi-experimental design to study the effects of a group work interven tion on female survivors of childhood sexual abuse. Leukefeld and colleagues ( 1998) examined the effects of an I IlV prevention intervention with injecting drug and crack users. Logan and colleagues (2004) examined the effects of a drug court in terven tion as well as the costs of drug court compared with the economic benefits of the drug court program.
Basic Evaluation Considerations
Before beginning a program evaluntion, several issues must be initially considered. These issues are decisions 1 hat are critical in determining the evaluation methodology and goals. Although you may not have complete answers to these questions when beginning to plan an evaluation, these questions help in developing the plan and must be answered before a n evaluation ca n be carried out. We can 1.um up these considerations with the following questions: who, what, where, when, and why.
First, who will do the evaluation? This seems like a simple question at first glance. llowever, this particular consideration has major implications for the evaluation results. Program evaluators can be categorized as being either internal or external. An internal evaluator is someone who is a program staff member or regular agency employee, whereas an external evaluator is a professional, on contract, hired for the specific purpose of evalu- ation. There are advantages nnd disadvantages to using either type of evaluator. For example, the internal evaluator probably will be very familia r with the staff and the program. This may save a lot of planning time. The d isadvnn tage is that evaluations com- pleted by an internal evaluator may be considered less valid by outside agencies, including the funding source. The external evaluator generally is thought to be less biased in terms of evaluation outcomes because he or she has no personal investment in the program. One disadvantage is that an external evaluator frequently is viewed as an “outsider” by the staff within an agency. This may affect the amount of time necessar)’ to conduct the eva luation or cause problems in the overall evaluation if agency staff are reluctant to cooperate.
CHAPTER 13 • PROGRAM E VALUATION S1 UDIES 223
Second, what resources are available to conduct the evaluation? Hiring an outside eval- uator can be expensive, while having a staff person conduct the evaluation may be less expensive. So, in a sense, you may be trading credibility for less cost. In fact, each method- ological decision will have a trade-off in credibility, level of information, and resources (including time and money). Also, the amount and level of information as well as the research design \ .. ciU be determined, to some e11.”1ent, by what resources are available. A comprehensive and rigorous eval uation does take significant resources.
Third, where will the information come from? If an evaluation can be done using exist- ing data, the cost will be lower than if data must be collected from numerous people such as clien ts and/or staff across m ultiple sites. So having some sense of where the data will come from is important.
Fou rth, when is the evaluation information needed? In other words, what is the time- frame for the evaluation? The timeframe will affect costs and design of research methods.
Fifth, why is the evaluation being conducted? Is the evaluation being conducted at the request of the funding source? Is it being conducted to improve services? Is it being con- ducted to document the cost-benefit trade-off of the program? If future program funding decisions will depend on the results of the evaluation, then a lot more importance will be attached to it than 1f a new manager simply wants to know whether clients were satisfied with services. The more that is riding on an evaluation, the more attention will be given to the methodology and the more threatened staff can be, especially if they think that th e purpose of the evaluation is to downsize and trim excess employees. In other words, there arc many reasons an evaluation is being considered, and these reasons may have implica- tions for the evaluation methodology and implementation.
Once the issues described above have been considered, more complex questions and trade-offs will be needed in planning the evaluation. Specifically, six main issues guide and shape the design of any program evaluation effort and m ust be given thoughtful and deliberate consideration.
L Defining the goal of the program evaluation
2. Understanding the level of information needed for the program evaluation
3. Determining the methods and analysis that need to be used for the program evaluation
4. Considering issues that might arise and strategies to keep the evaluation on course
5. Developing results into a useful format for the program stakeholders
6. Providing practical and useful feedback about the program strengths and weak- nesses as well as providing information about next steps
Defining the Goal of the Program Evaluation
It is essential that the evaluator has a firm understanding of the short- and long-term objectives of the evaluation. Imagine being hired for a position but not being given a job description or informed aboul how the job fits into the overall organization. Without knowing why an evaluation is called for or needed, the evaluator might attempt to answer a different set of c.1uestions from those of interest to the agency director or advisory board. The management might want Lo know why the majority of clients do not return after one or two visits, whereas the evaluator might think that his or her task is to determine
224 PART II • QUANTITATIVF APPROACHES: TYPlS Or SIUDIES
whether clien ts who received group therapy sessions were better off than clients who received ind ividual counseling.
In defining the goals of the program evaluation, severa l steps should be taken. First, the program goals should be examined. These can be learned through examining official program documents as well as through talking to key program stakeholders. In clarifying the overall purpose of the evaluation, it is critical to talk with different program “stake- holders.” Scriven ( 199 1) defines a program stakeholder as “one who has a substantial ego, credibility, power, futures, or other capital invested in the program . . .. This includes program staff and many who arc no t actively involved in the day-to-day operations” (p. 334). Stakeholders include both supporters and opponents of the program as well as program clients or consumers or even potential consumers or clients. lt is essential that the evaluator obtain a variety of different views about the program. By listening and con- sidering stakeholder perspectives, the evaluator can ascertain the most important aspects of the program to target for the evaluation by looking for overlapping concerns, ques- tions, and comments from the various stakeholders. However, it is important that the stakeholders have some agreement on what program success means. Otherw ise, it may be difficult to conduct a satisfactory evaluation.
It is also important to consult the extant literature to understand what similar programs have used to evaluate their outcomes as well as to understand the theoretical basis of the program in defining the program evaluation goals. Furthermore, it is critical that the evaluator works closely with whoever initiated the evaluation to set priorities for the evaluation. This process should identify the intended o utcomes of the program and which of those outcomes, if not all of them, will be evaluated. Taking the evaluation a step further, it may be important to include the examination of unintended negative outcomes that may result from the program. Stakeholders and the literature will also help to deter- mine those kinds of outcomes.
Once the overall purpose and priorities of the evaluation are established, it is a good idea to develop a written agreement, especially if the eva I uator is an external one. Misunderstandings can and will occu r months later if things are not written in black and white.
Understanding the Level of Information Needed for the Program Evaluation
The success of the program evaluation revolves around the evaluator’s ability to develop practical, researchable questions. A good rule to follow is to focus the evaluation on one or two key questions. Too many questions can lengthen the process and overwhelm the evaluator with too much data that, instead of facilitating a decision, might produce inconsistent findings. Sometimes, funding sources require only that some vague unde- fined type of evaluation is conducted. The funding sources might nei ther expect nor desire dissertation-quality research; they simply migh L expect “good fa ith” efforts when beginning evaluation processes. Other agencies may be quite demanding in the types and forms of data to be provided. Obviously, the choice of methodology, data collection procedures, and reporting formats will be strongly affected by the purpose, objectives, and questions examined in the study.
It is important to note the difference between general research and evaluation. In research, the investigator often· focuses on questions based on theoretical considerations or hypotheses gene rated to hu ilcl on research in a specific area of study. Although
CHAPTER 13 • PROGRAM EVALUATION $ TUUIES 225
program evaluations may foc us on an intervention derived from a theory, the evalua- tion questions should, first and foremost, be driven by the program’s objectives. The eval- uator is less concerned with buildi ng on prior litera ture or contributing to the development of practice theory than with determining whether a program worked in a specific community or location.
There are actually two main types of evaluation questions. There are quc~>tions that focus on client outcomes, such as, “What impact did the program have?” These kinds of questions are addressed by using outcome evaluation methods. Then there are questions that ask, “Did the program achieve its goals?” “Did the program adhere to the specified procedures or standards?” o r “vVhat was learned in operating this program?” These kinds of questions are addressed by using process evaluation methods. We will examine both of these two types of evaluation approaches in the following sections.
Process Evaluation Process evaluations offer a “snapshot” of the program at any given time. Process evalua- tions typically describe the day-to-day program efforts; program modifica tions and changes; outside events that influenced the program; people and institutions involved; culture, customs, and traditions that evolved; and sociodemographic makeup of the clien- tele (Scarpitti, Inciardi, & Pottieger, 1993). Process evaluation is concerned with identify- ing p rogram strengths and weaknesses. This level of program cvaluarion can be usefuhn several ways, including providing a contex-t within which to interpret program outcomes and so that other agencies or localities wishing to sta rr similar programs can benefit with- out having to make the same mistakes.
As an example, Bentelspacher, DeSilva, Goh, and LaRowe ( 1996) conducted a process evaluation of the cultural compatibility of psychoeducational family group treatment with eth nic Asian clients. As another example, Logan, Williams, Leukefeld, and Minton (2000) conducted a detailed process evaluation of the drug court programs before under- taking an outcome evalual ion of the same programs. The Logan et al. sl udy used multiple methods to conduct the process evaluatio n, including .in-depth interviews with the program administrative personnel, inten,iews with each of five judges involved in the program, surveys and face- to-face interviews with 22 randomly selected current clients, and surveys of all program staff, 19 community treatment provider representatives, 6 ran- domly selected defense attorney representatives, 4 prosecuting attorney representatives, l representative 6:om the probation and parole office, 1 representa tive from the local county jail, and 2 police departmen l representatives. In all, 69 different individuals repre- senting I 0 different agency perspectives provided information about the drug court program. Also, all agency documents were examined and analyzed, observations of vari- ous aspects of the program process were conducted, and client intake data were analyzed as part of the process evaluation. The results were all integrated and compiled into one comprehensive repor t.
What makes a process evaluation so important is that researchers often have relied only on selected program outcome indicators such as termination and grad uation rates or number of rearrests to determine effectiveness. However, to better understand how and why a program such as drug court is effective, an analysis of how the p rogram was concep- tualized, implemented, and revised is needed. Consider this exan1ple-say one outcome evaluation of a drug court program showed a graduation rate of 80% of those who began the program, while another outcome evaluation found that only 40o/o of those who began the program graduated. Then, the graduates of the second program were more likely to be free from substance usc and criminal behaviors at the l2-month foUow-up than the graduates
226 PART II • QuANTITATIVE APPROACHES: TYPES OJ SJUDIES
from the first program. A process evaluation could help to explain the specific differences in factors such as selection (how clients get into the programs), treatment plans, monitor- ing, program length, and other program features that may influence how many people graduate and slay free from drugs and criminal behavior at follow-up. Tn other words, a process evaluation, in contrast to an examina tion of program outcome only, can provide a clearer and more comprehensive pictme of how drug cou rt affects those involved in the program. More specifically, a process evaluation can provide information about program aspects that need to be improved and those that work well (Scarpilli, Inciardi, & Pottieger, 1993). Finally, a process evaluation may help to facilitate replication of the drug cou rt program in other areas. This often is referred to as technology transfer.
A different but related process evaluation goal might be a description of the failures and departures from the way in which the intervention originally was designed. How were the staff trained and hired? Did the intervention depart from the treatment manual rec- ommendations? Influences that shape and affect the intervention that clients receive need to be identified because they affect the fidelity of the treatment p rogram (e.g., delayed funding or staff hires, changes in policies or procedu res). \”/hen program implementation deviates significantly from what was intended, this might be the logical explanation as to why a program is not working.
Outcome or Impact Evaluation Outcome or impact evaluation focuses on the targeted objectives of the program, often looking at variables such as behavior change. For example, many drug t reatment programs may measure outcomes or “success” by the number of clients who abstain from drug use. Questions always arise, though. For instance, an evaluation might reveal that 90% of those who graduate from the program abstai n from drug use 30 days after the program was com- pleted. However, only 50% report abstaining from drug use 12 months after the program was completed. Would key stakeholders involved all consider that a success or failure of the program? This example brings up three critical issues in outcome evaluations.
One of the critical issues in outcome evaluations is related to understanding for whom docs the program work best and under what conditions. In other words, a more interest- ing and important question , rather than just asking whether a program works, would be to ask, “Who are those 50% of people who remained abstinent from drug use 12 months after completing the program, and how do they differ from the 50% who relapsed?” It is not unusual for some evaluation questions to need a combination of both process and impact evaluation methodologies. For example, if it turned out that results of a particular evaluation showed that the program was not effective (impact), then it might be useful to know why it was not effective (process). Tn such cases, it would be important to know how the program was implemented, what changes were made in the program during the implementation, what problems were experienced during the implem entation, and what was done to overcome those problems.
Another important issue in outcome evaluation has to do with the timing of measur- ing the outcomes. Outcome effects are usually measured after treatment or postinterven- tion. These effects may be either short term or long term. immediate outcomes, or those generally measured at the end of the treatment or intervention, might or might not pro- vide the same results as one would get later in a 6- or 12-month follow- up, as highlighted in the example above.
The third important issue in outcome evaluation has to do with what specific measures were used. Is abstinence, for example, the only measure of interest, or is reduction in use something that might be of interest? Refrainin g from cri minal activity or holding a steady
CHAPTER l3 • PROGRAM EVALUATION STUOIES 2 27
job may also be an important goal of a subslance abuse program. If we only measure abstinence, we would never know about other kinds of outcomes the program may affect .
These last two issues in outcome evaluations have to do with the evaluation methodol- ogy and analysis and are addressed in more detail below.
Determining the Methods and Analysis That Need to Be Used for the Program Evaluation
The next step in the evaluation process is to determine the evaluation design. There are several interrelated steps in this process, including determining the (a) sources of data, (h) research design, (c) measures, (d ) analysis of change, and (e) cost-benefit assessment of the program.
Sources of Data Several main sources of data can be used for evaluations, including qualitative informa- tion and quantitative information.
Qualitat ive Data Sources
Qualitative data sources are often used in process evaluations and might include obsen a- tions, analysis of existing program documents such as policy and procedure manuals, in- depth interview data, or focus group data. There are, however, trade-offs when using qualitative data sources. On the positive side, q ualitative evaluation data provide an “in- depth” snapshot of various topics such as how the program functions, what staff think are the positive or negative aspects of the programs, or what clients really think of the O\’erall program experiences. Reporting clients’ experiences in their own words is a characteristic of qualitative evaluations.
Interviews arc good for collecting qualitative or sensitive data such as values and atti- tudes. This method requires an interview prolocol or questionnaire. These usual!) are structured so that respondents are asked questions in a specific order, but they can be semistructured so t.hat there are fewer topics, and the interviewer has the ability to change the order based on a “reading” of the client’s responses. Surveys can request information of clients by mail, by telephone, or in person. They may or may not be 1>clf-administered. So, besides considering what data are desired, evaluators must be concerned with prag- matic considerations regarding the best way in which to collect the desired data.
Pocus groups also offer insight in to cer tain aspects of the program or program func- tioning; participants add their input, and input is interpreted and discussed by other group members. This discussion component ml!y provide an opportunity to uncover information that might otherwise remain undiscovered such as the meaning of certain things to different people. Focus groups typically are small informal groups of persons asked a series of questions that start out very general and then become more specific. Focus groups are increasingly being used to provide evaluative information about human services. They work part icularly well in identifying the questions that might be important to ask in a survey, in testing planned procedures or the phrasing of items for the specific target population, and in exploring possible reactions to an intervention or a service.
228 P!IRT II • QuANTITATIVE APPROACHF.S: TYPES OF SruOI[S
On the other hand, qualitative studies Lend to use small samples, and care must be used in analyzing and interpreting the information. FurLhermore, although both qualitative and quantitative data are subject to method bias and threats to validity, qualitative data may be more sensitive to bias depending on how participants are selected to be inter- viewed, the nu mber of observations or focus groups, and even subtleties in the questions asked. With qualitative approaches, the evaluator often has less abil ity to account for alter- native expla nations because the data are more limited. Making strong conclusions about representativeness, validity, and reliability is more difficult with qualitative data corn- pared to something like an average rating of satisfaction across respondents (a quantita- tive measure). Yet, an average rating does not tell us much about why participants are satisfi ed with the program or why they may be dissatisfied with other aspects of the program. Thus, it is often imperative to use a mixture of q ualitative and quantitative information to evaluate a program.
Quantitative Data Sources
Two main types of quantitative data sources can be used for program evaluations: sec- ondary data and original data.
Secondary Data. One option for obtaining needed data is to use existi ng data. Collecting new data often is more expensive than using existing data. Examining the data on hand and already available always is a good llrst step. However, the evaluator might want to rearrange or reassemble the data, for example, dividing it by quarters or combining it into 12-month periods that help to reveal patterns and trends over t ime. Existing data can come from a variety of places, including the following:
Client records maintained by the program: These may include a host of demographic and service-related data items about the population served.
Program expense and financial data: These can help the evaluator to determine whether one intervention is much more expensive than another.
Agenc.y annual reports: These can be used to identify trends in service delivery and program costs. The evaluator can compare annuill reports from year to year and can develop graphs to easily identify trends wilh clientele and programs.
Databases maintained by the state health department and other state agencies. Public data such as births, deaths, and divorces are available from each state. Furthermore, most state agencies produce annual reports that may reveal the number of clients served by program, geographic region, and on occasion, selcct·ed sociodemographic variables (e.g., race or age).
Local and regional agencies. Planning boards for mental health services, child protec- tion, school boards, and so forth may be able to furnish statistics on outpatient and in patient services, special school populations, or child abuse cases.
The federal government. The federal governmen t collects and maintains a large amount of data on many different issues and topics. State and national data provide bench- marks for comparing local demographic or social indicators to national-level demo- graphic or social indicators. For instance, if you were working as a cancer educator whose objective is to reduce the incidence of breast cancer, you might want to consult cancercontrolplanct.cancer.gov. That Web site will furnish national-, state-, and
CHAPTER 13 • PROGRAM EVAlUA II ON S TUD ICS 229
county-level data on the nwnber of new cancer cases and deaths. By comparison, it will be possible to determine if the rate in one county is higher than the state or national average. Demographic information about communities can be found at www.census.gov.
Foundations. Certain well-established foundations provide a wealth of information about problems. For example, the Annie E. Casey Foundation provides an incredible Kids Count Data Book that provides an abundance of child welfare-related data at the state, national, and county level. By using their data, you could determine if infant mortality rates were rising, teen births were increasing, or high school dropouts were decreasing. You can find the Web site at www.aecf.org.
lf existing data cannot be used or cannot answer all of the evaluation questions, then original data rnust be colleclcd.
Original Data Sources. There a re rnany types or evalua tion designs (rom wh ich to choose, and no single one will be ideal for every project. The specific approach chosen for the evaluation will depend on the purpose of the evaluation, the research questions to be explored, the hoped-tor or in tended results, the quali ty and volume of data available or needed, and staff, time, and financial resources.
The evaluation design is a critical decision for a number of reasons. Without the appropriate evaluation design, confidence in the resuiL<> of the evaluation might be lack~ ing. A strong evaluation design minimizes alternative explanations and assists the evalua- tor in gauging the true effects attributable to the intervention. In other words, the evaluation design directly affects tl1e interpretation that can be made regarding whether an intervention should be viewed as the reason for change in clients’ behavior. Howewr, there are trade offs with each design in the credibility of information, causality of an)’ observed changes, and resources. These trade-off.~ must be carefully considered and discussed with the program staff.
Quantitative designs include surveys, pretest-posttest studies, quasi-experiments with noncquivalcnt control groups, longitudinal designs, and randomized experimental designs. Quantitative approaches transform answers to specific questions into numerical data. Outcome and impact evaluations nearly always are based on quantitative evaluation designs. Also, sampli ng strategies must be considered as an in regr<1l p<1rt of the research design. Below is a brief overview of the major types of quantitative evaluation designs. For an expanded discussio11 o r these topics, refer Lo Royse, Thyer, Padgell, and Logan (2005).
Research Design Cross -Sectional Surveys
A survey is limited to a description of a sample at one point in time and provides us with a “snapshot” of a group of respondents and what they were like or what knowledge or atti- tudes they held at a particular point in time. If the survey is to generate good generalizable data, then the sampling procedures must be carefully planned and implemented. A cross- sectional survey requires rigorous random sampling procedures to ensure that the sample closely represents the population of interest. A repeated survey is similar to a cross- sectional study but collects information at two or more points in time from the same respondents. A repeated (longitudinal) survey is effective at measuring changes in facts, attitudes, or opinions over a course of Lime.
230 PART II • QuANTITATIVE APPROACH ES: TYPES Of S TUOIES
Pretest-Po sttest Design s (Nonexperimental)
Perhaps the mosl common quantitative evaluation design used in social and human service agencies is the pretest-posttest. In this design, a group of clients with some specific problem or diagnosis (e.g., depression) is administered a pretest prior to the start of inter- vention. At some point toward the end or after the intervention, the same instrument is administered to the group a second time (the posttest) . The one-group pretest-posttest design can measure change, but the evaluator has no basis for attributing change solely to the program. Confidence about change increases and the design strengthens when control groups are added and when participants are randomly assigned to either a control or experimental condition.
Quasi-Experimental De signs
Also known as nonequivalent control group designs, quasi-experiments generally use comparison groups whereby two similar groups are selected and followed for a period of time. One group typically receives some program or benefit, v,rhereas the other group (the control) does nol. Both groups are measured and compared for any differences at the end of some time period. Participants used as controls may be clien ts who are on a waiting list, those who are enrolled in another treatment program, or those who live in a different city or county. The problem with this design is that the control or comparison group might no t, in fact, be equivalent to the group receiving the intervention. Comparing Ocean View School to Inner City School might not be a fair comparison. Even two differen L schools within the same rural county might be more different than similar in terms of the learn- ing milieu, the proportion of students receiving free lunches, the number of computers and books in the school library, the principal’s hiring pract~ces, and the like. With this design, there always is the possibility that whatever the results, they might have been obtained because the intervention group really was different from the control group. However, many of these issues can be considered and either controlled for by collecting the information and performing statistical analysis with these considerations or at least can be considered within the contex1: of interpreting the results. Even so, this type of study does not provide proof of cause and effect, and the evaluator always must consider o ther facto rs (both known and measured and unknown or unmeasured) that could have affected the study’s outcomes.
Longitudinal designs are a type of quasi-experimental design that involves tracking a par- ticular group of individuals over a substantial period of time to discover potential changes due to the influence of a program. It is not uncommon for evaluators to want to know about the effects of a program after an extended period of Lime has passed. The question of interest is whether treatment effects last. These ~tudies typically are compli- cated and expensive in time and resources. In addition, the longer a study runs, the higher the expected rate of attrition from cl ients who drop out or move away. High rates of allrition can bias the sample.
Randomized Experimental Designs
l.n a true experimental design, participants are randomly assigned to either the control or treatment group. This design provides a persuasive argument about causal effects of a program on participants. The random assignment of respondents lo treatment and con- trol groups helps to ensure both groups are equivalent across key variables such as age, race, area of residency, and treatment history. This design provides the best evidence Lhat
CIIAPT ER 13 • P ROC RAM EVALUATION STUDIES 231
any observed differences between the tl’IO groups after the intervention can be attributed to the intervention, assuming the two groups were equal before the intervention. E\·en with random assignment, group differences preinLervention could exist, and the evaluator should carefully look for them and use statistical controls when necessary.
One word of warning about random assignment is that key program stakeholders often view random assignment as unethical, especially if they view the treatment program as beneficial. One outcome of this diffkulty of accepting random assignment is that staff might have problems not giving the intervention they believe is effective to specific needy clients or to all of their clients instead of just to those who were randomly assigned. If they do succumb to this temptation, then the evaluation effor t can be unintentionally sabo- ttlged . The evaluator must train and prepare all of those individuals involved in the e\’al- uation to help them understand the purpose and importance of the random assignment. That, more than any other procedure, provides the evidence that the treatment really does benefit the clients.
Sampling Strategies and Considerations
vVhen the client population of interest is too large to obtain information from each individual member, a sample is drawn. Sampling allows the evaluator to make predictions about a population based on study findings from a set of cases. Sampling strategies can be very complex. lf the evaluator needs the type of precision afforded by a probability sam- ple in which there is a known level of confidence and margin of error (e.g., 95% confi- dence, plus or minus 3 percentage points), then he or she might need to hire a sampling consultant. A consultant is particularly recommended when the decisions about the program or intervention are critical such as in drug research or when treatments could have potentially harmful side effects. However, there is a need to recognize the trade-offs that are made when determining sampling strategy and sample size. Large samples can be more accurate than smaller ones, yet they usually are much more expensive. Small samples can be acceptable if a big change or effect is ell.lJected. As a rule, the more critical the decision, the larger (and more precise) the sample should he.
There are two main c<ttegories of sampli11g strategies from which the evaluator can choose: probability sampling and nonprobability sampling. Probabili ty sampling imposes statistical rules to ensure that unbiased samples are drawn. These samples normally are used for impact studies. Nonprobability or convenience sampling is less complicated to implement and is less expensive. This type of sampling often is used in process evaluations.
With probability sampling, the primary idea is that every individual, object, or institu- tion in the population under study has a chance of being selected into the sample, and the likelihood of the selection of any individual is known. Probability sampling pro,;des a firm basis for generalizing from the sample lo the population. No11probability samples severely reduce the evaluator’s ability to generalize the results of the study to the larger population.
The evaluator must balance the need for scientific rigor against convenience and often limited resources when determining sample size. If a m ajor decision is bei ng based on data collected, then precision and certainty are critical. Statistical precision increases as the sample s ize increases. When differences in the results are expected to be small, a larger sample guards against confounding variables that might distort the results of a treatment.
Measures The next important method decision is to determine how best Lo measure the variables of interest needed to answer the evaluation questions. These will vary from evaluation to
232 PART I I • QUANTITATIVE APPROACHES: T YPES OF STUOICS
evaluation, depending on the questions being asked. In one project, the focus migh L be on the outcome variable of arrests (or rearrests) so as to determine whether the program reduced criminal justice involvement. In another project, the outcome variable mighL be nmnbcr of hospitalizations or days of hospitalization.
Once there is agreement on the outcome variables, objective measures for those variables must be determined. Using the example of the drug court program above, the deci- sions might include the following: How will abstinence be measured? How will reduction in substance use be measured? How will crimina 1 behavior be measured? llow will employment be measured? This may seem simple at first glance, but there are two complicating factors. First, there are a variety of ways to measure something as simple as abstinence. One could measure it by self-report or by actually giving the client a drug test. When looking at reduc- tion of use, the issue of measurement becomes a bit more complicated. This will likely need to be self-report and some kind of comparison (either the same measures must be used with the same clients before and after the program [this being the best way) or the same mea- sure must be used with a control group of some kind [like program dropouts)).
The second complicating factor in measurement is determining what other constructs need to be included to better understand “who benefits from the program the most and under what circumstances” and how those constructs are measured. Again, using the drug court program as an example, perhaps those clients who are most depressed, have the most health problems, or have the mosL anxiety do worse in drug court programs because the program may not address co-occurring disorders. If this is the case, then it will be important to include measures of depression, anxiety, and health. However, there are many different measures for each of these constructs, and different measures use different timeframes as points of reference. T n other words, some depression measures ask aboul 12-month periods, some ask about 2-week periods, and some ask about 30-day periods.
ls one instrument or scale better than another for measuring depression? ·what are the trade-offs relative to shorter or longer instruments? (For example, the most valid instru- ment might be so long thal clients will get fatigued and refuse to complete it.) Is it better to measure a reduction in symptoms associated with a standardized test or to employ a behavioral measure (e.g., counting the number of days that patients with chronic mental illness are compliant with taking their medications)? Is measuring attitudes aboul drug abuse better than measuring knowledge about the symptoms of d rug addiction? Evaluators frequently have to struggle with decisions such as these and decide whether it is better to use instruments that are not “perfect” or to go to the tro uble of developing and validating new ones.
When no suitable instrument or available data exist for the evaluation, the evaluator might have to create a new scale or at least modify an existing one. If an evaluator revises a previ.ously developed measure, then he or she has the burden of demonstrating that the newly adapted instrument is reliable and valid. Then, there are issues such as the reliabil- ity of data obtained from clients. Will clients be honest in reporting actual d rug and alco- hol use? How accurate are their memories?
A note must be made here about a special case of program evaluation: evaluating pre- vention programs. Evaluation of prevention programs is especially challenging because the typical goal of a prevention program is to prevent a particular problem or behavior from developing. The question then becomes, “How do you measure something that never occurs?” In other words, if the prevention program is successful, the problem will not develop, but it is difficult to dclermine with any certainty that the problem would have developed in the flrst place absent the prevention program. Ti l uS, measures become very important as well as the design (such as including a control group).
Evaluators use a multitude of methods and instruments to collect data for their stud- ies. A good strategy is to include multiple measures and methods if possible, especially
CHAPrtR 13 • PROCRAM EVALUATION STUDIES 233
when random assignment is not possible. That way, one can possibly look for convergence of conclusions across methods and measures.
Analysis of Change After the data are collected, the evaluator is faced with a sometimes difficult question of how to determine whether change had occurred. And, of course, there are several consid- erations within this overall decision as welL One of the first issues to be decided is what the unit of analysis will be.
The unit of analysis refers to the person or things being studied or measured in the eval- uation of a program. Typically, the basic unit of analysis consists of individual clients but also may be groups, agencies, communities, schools, or even slates. For example, an evalu- alor might examine the effectiveness of a drug prevention program by looking for a decrease in drug-related suspensions or disciplinary actions in high schools in which the program was implemented. In that instance, schools are the primary unit of analysis. Another evaluator might be concerned only with the attitudes toward d rugs and alcohol of students in one middle school; in that situation, individuals would be the uni l of analysis. The smal lest unit of analysis from which data are gathered often is referred to as a case. The unit of analysis is critical for determining both the sampling strategy and the data analysis.
The analysis will also be determined by the research design such as the number of groups to be analyzed, the type of dependent variable (categorical vs. continuous), the control variables that need to be included, and whether the design is longitudinal. The literature on similar program evaluations is also usefullo examine so that analysis plans can consider what has been done in the past. The analysis phase of the evaluation is basi- cally the end product of the evaluation activities. Therefore, a careful analysis is critical to the evaluation, the interpretation of the results, and the credibility of the results. Analysis should be conducted by somebody with adequate experience in statistical methods and statistical assumptions, and limitations of the study should be carefully examined and explained to program stakeholders.
Cost-Benefit Analysis While assessing program outcomes is obv iously necessary to gauge the effectiveness of a program, a more comprehensive understanding of program “success” ca n be attained by examin ing program costs and economic benefits. In general, economic costs and benefits associated with specific programs have received relat ively limited attention. One of the major challenges in estimating costs of some cornm unily-baseu social programs is that slanuard cost estimation procedures do not always reflect the true costs of the program. For example, a drug court program often combines both criminal justice supervision and substance ab use trcaLrnen L in a community-based environment. And in order for drug court programs to work effectively, they often use many community and outside agency resources that are not necessarily directly paid for by the program. For example, although the drug court program may not directly pay for the jail time incurred as part of client sanctions, jail time is a central component in many drug court programs. Thus, jail costs must be considered a drug court program cost.
A comprehensive economic cost analysis would include c.slimates of the value of all resources used in providing the program. When resources are donated or subsidized, the out-of-pocket cost will differ from the opportunity cost of the resources for a given program. Opportunity costs take into account the forgone value of an alternative use for program resources. Other examples of opportunity costs for the drug court program may include the time and efforts of judges, police officers, probation officers, and prosecutors.
234 PART II • QuANIITATIVE APPROACHES: TYPES OF STUDIES
Including costs for which the program may not explicitly pay presents an interesting dilemma. The dilemma primarily stems from the trade-off in presenting only out-of- pocket expenditures for a program (thus the program will have a lower total cost) or accurately reflecting all of the costs associated with the program regardless of whether those costs are paid out of pocket (implying a higher total program cost). furthermore, when agencies share resources (e.g., shared overhead costs), the correct proportion of these resources that are devoted specifically to a program must be properly specified. To date, there has been liule discussion in the literature about estimating the opportunity cost of programs beyond the out of pocket costs. Knowing which costs to include and what value to place on certain services or items that are not directly charged Lo the program can be complicated.
A comprehensive analysis of economic benefits also presents challenges. The goal of an economic benefit analysis is to determine the monetary value of changes in a range of program outcomes, mainly derived from changes in client behavior as a result of par- ticipating in the program. When estimating the benefits of a program such as drug court, one of the most obvious and important outcomes is the reduction in criminal justice costs (e.g., reduced incarceration and supervision), and these are traditionally the only sources of benefits examined in many drug court evaluations. However, drug court programs often have a diverse set of goals in addition to reducing criminal justice costs. For example, drug court programs often focus on helping the participants become more productive in society. This includes helping par ticipants take responsibil- ity fo r their financial obligations such as child support. In addition, employment is often an important program goal for drug court clients. If the client is working, he or she is paying taxes and is less likely to use social welfare programs. Thus, the drug court program potentially red uces several different categories of costs that might have accrued had program participants not received treatment. These “avoided” costs or benefits are important compo nents to a full economic eva luation of drug court programs.
So, although the direct cost of the program usually is easily computed, the full costs and the benefits are more difficult to convert into dollars. For example, Logan et al. (2004) fo und that the average direct cost per chug court treatment episode for a grad uate was $3,319, whereas the opportunity cost per episode was $5, 132. These differences in costs due to agency collaboration highlight the importance of clearly defining the perspective of the cost analysis. As discussed earlier, the trade-off in presenting only out-of-pocket expenditures for a program or accurately reflecting all of the costs associated with the program regardless of whether those costs are paid out of pocket is an important distinc- tion that should be co nsidered at the outset of every economic evaluation . On the benefit side of the program, results suggest that the net economic benefit was 514,526 for each graduate of the program. In other words, this translates to a return of $3.83 in economic benefit for every dollar invested in the drug court programs for graduates. Obviously, those who dropped out of the program before completi ng did not generate as large of a return. However, results suggest that when both graduates and terminators were exam- ined together, the net economic benefit of any drug court experience amounted to $5,446 per participant. This translates to a return of $2.71 in economic benefit for every dollar invested in the drug court programs.
When looking <~l the cost-benefit analysis of programs o r comparing these costs and benefits across programs, it is important to keep in mind that cost- benefit analysis may be done very differently, and a careful assessment of the methods must be undertaken to ensure comparabilily across programs.