must be 250 words APA format with 3 scholarly sources (2 should be the required reading attached)
Read Chapter 2 of the Bachman textbook (they are numbered 2(1-9) 9n the attachments , Chapter 2 of the Mosher textbook, and the article “Evidence-Based Policing” before discussing the following prompts:
- Describe the major elements of evidence-based policing.
- How do the elements relate to research in criminal justice?
- In what areas is evidence-based policing already being used?
- What are the steps you would take to convince a department not using evidence-based policing to use it?
The Mismeasure of Crime
TWO: THE HISTORY OF MEASURING CRIME
There were, of course, crimes before statisticians occupied this territory, but it may be doubted whether there were crime rates.
—Porter (1995, p. 37)
The 20th century has been referred to as “the first measured century” (Public Broadcasting System, 2000). During the past 100 years or so, U.S. citizens became “the most energetic measurers of social life that ever lived. … They pioneered the measurement of facets of American life that had never been systematically counted before, such as crime, love, food, fun, religion, and work” (Caplow, Hicks, & Wattenberg, 2000). Just a few examples will illustrate the range of topics measured: In 1900, approximately 1 in 6 infants in the United States died before his or her first birthday, compared to 1 in 141 in 1999; less than 1 out of 10 men was unmarried in 1900, compared to 1 out of 3 in 1997; only 13% of U.S. adolescents completed high school in 1900, compared with 83% in 1998; in 1904, there were 69 state and federal prisoners per 100,000 population, whereas the corresponding rate was 698 in 1999.
It is certainly true that the measurement of social phenomena has become more sophisticated and extensive over time. Today, the general public is bombarded with information on an ever-expanding range of topics, yet such measurement has a long history, dating to the beginning of Greek and Roman civilizations. More to the point here, although the first national police statistics in the United States were not published until 1930, alternative measures of crime appeared much earlier, both in the United States and in other nations.
In this chapter, we discuss the development of statistics on crime, both cross-nationally and historically. We first address official statistics on crime, examining their sources and how they were often used uncritically by social scientists and individuals who wrote articles in the popular media to comment on crime and its causes. We then move to a discussion of how a growing realization of the inadequacies of these official statistics led to the development of alternative measures of crime and delinquency. Several social scientists became concerned about how official crime data were generated, noting that uncritical analyses of these statistics could result in misleading conclusions regarding the causes of crime. In particular, they noted problems related to the so-called dark figure of crime, that is, crimes committed by individuals that were not recorded in the official data. These concerns led first to the development of self-report studies of deviant and criminal behavior in the 1940s, followed by the emergence of victimization studies in the 1960s. The development and use of these alternative measures of crime led to important theoretical and policy debates within the discipline of criminology and, some would argue, to a fundamental shift in the focus of criminology as a discipline.
THE EARLY HISTORY OF MEASURING SOCIAL PHENOMENA AND CRIME
Number, weight, and measure are the foundations of all exact science; neither can any branch of human knowledge be held advanced beyond its infancy which does not, in some way or other, frame its theories or correct its practice by reference to those elements. What astronomical records or meteorological registers are to a rational explanation of the movement of the planets or of the atmosphere, statistical returns are to social and political philosophy. They assign, at determinate intervals, the numeric values of the variables which form the subject matter of its reasonings, or at least of such “functions” of them as are accessible to direct observation; which it is the business of sound theory so to analyze or to combine as to educe from them those deeper seated elements which enter into the expression of general laws. (Herschel, as cited in Duncan, 1984, p. 97)
Perhaps the earliest example of social measurement consists of censuses (taken from the Latin word censere, meaning to tax or assess) of the population. These have existed in one form or another for thousands of years; there are records of taxpaying homes recorded in China as far back as 2275 BC, and Egyptians registered their citizens from as early as 1400 BC (Storey, 1997). The early Roman census process required individuals to declare their age, family, and property holdings, which allowed the administration to record and rank the jurisdiction’s human and property resources. These early censuses were primarily used to determine the number of males available to fight in the military and for tax purposes; the data were not generally used for public policy making, as is common in the current period.
In the 1700s, the purpose of census taking shifted to the creation of a statistical database for studying social and economic trends and, in some cases, developing policies based on these trends. The first census in the United States, conducted in 1790, was different from censuses in other countries that preceded it in that it was an important part of government and was required by the Constitution (see Exhibit 2.1). The data for this initial U.S. census were collected by 16 federal marshals, who experienced considerable difficulty in enumerating the population because residents lived in widely dispersed rural areas.
In many cases, the marshals had to use word of mouth to find out about the existence of households in remote areas. Additional problems included a lack of cooperation on the part of residents, who were often suspicious of the questions being asked (History of the United States Census, 2000). Interestingly, in the first U.S. Census, blacks were counted as only three-fifths of a person, and Native Americans were not counted at all. The latter were not counted until the 1870 census—the Superintendent of the census justified their inclusion, stating, “An Indian not taxed should, to put it on the lowest possible ground, be reported in the census just as truly as the vagabond or pauper of the white or colored race. The fact that he sustains a vague political relation is no reason why he should not be recognized as a human being in a census which counts even the cattle and horses of the country” (U.S. Census Office, 1872, as cited in Seltzer & Anderson, 2001, p. 489).
In addition to census taking in several countries, social data were collected in the context of periodic surveys conducted by social scientists. For example, in the 19th century, the British social activists and philanthropists Henry Mayhew and Charles Booth conducted extensive surveys of England’s population. Booth’s survey sought to investigate “the numerical relation which poverty, misery, and depravity bear to the regular earnings and comparative comfort, and to describe the general conditions under which each class lives” (as cited in Biderman & Reiss, 1967).
The first national crime statistics, based on judicial data, were published in France in 1827 (covering the year 1825). These early crime statistics were part of the moral statistics movement that emerged in several Western nations in the 1800s. They were also very much a result of the belief that quantitative techniques being applied to measure phenomena in the physical world could also be applied to the measurement of human phenomena. These data were used in the earliest studies of the spatial and temporal distributions of crime as well as for analyses of the sex, age, income, education, and occupation of criminals.
In France, Adolphe Quetelet, who had originally worked in the field of astronomy, was one of the early moral statisticians who believed that it was possible to uncover the types of laws and regularities in social phenomena that were emerging from scientific explorations in the natural world. Writing in the 1800s, Quetelet was one of the first commentators to recognize the so-called dark figure of crime. He noted, “All we possess of statistics of crime and misdemeanors would have no utility at all if we did not tacitly assume that there is a nearly invariable relationship between offenses known and adjudicated and the total sum of offenses committed” (as quoted in Sellin & Wolfgang, 1964). According to Quetelet (as cited in Coleman & Moynihan, 1996), this dark figure was related to the seriousness of the crime but also to “the activity of justice in reaching the guilty, on the care with which the latter will take in hiding themselves, and on the repugnance which wronged individuals will feel in complaining, or on the ignorance in which perhaps they will be concerning the wrong which has been done to them” (p. 5).
There was 1 accused person for every 4,463 inhabitants over this period, and for every 100 accused, there were 61 individuals “condemned” (in prison). He observed a similar consistency in the ratio between crimes recorded and crimes prosecuted for Belgium between 1826 and 1830, where the ratio was 1 accused person for every 5,031 inhabitants. These data led Quetelet to conclude that the ratio of known to unknown offenses was fairly constant over time. However, he was also aware that this ratio of recorded to actual crime would differ according to offense type. He noted that “in a well organized society where the police [are] active and justice is rightly administered, this ratio, for murders and assassinations, will be nearly equal to unity … [but] when we look to thefts and offenses of smaller importance, the ratio will become very small, and a great number of offenses will remain unknown, either because those against whom they are committed do not perceive them, or do not wish to prosecute the perpetrator” (p. 82).
In his attempt to explain these crime rates, Quetelet (1842) conducted a number of analyses that focused on a variety of factors. Similar to some current theories of crime, he noted the relationship between the consumption of alcohol and violent crime: “Of 1,129 murders committed in France during the space of four years, 446 have been in consequence of quarrels and contentions in taverns; which would tend to show the fatal influence in the use of strong drinks” (p. 96). Quetelet also emphasized the importance of poverty and relative social inequality: “[These factors] give rise to crime, particularly if those who suffer are surrounded by materials of temptation, and are irritated by the continual aspect of luxury and inequality of fortune, which renders them desperate” (pp. 88–89). He asserted that this impact of inequality was greater in urban areas: “The great cities … present an unfavorable subject, because they possess more allurements to passions of every kind, and because they attract people of bad character, who hope to mingle with impunity in the crowd” (pp. 88–89).
Quetelet (1842) also examined the impact of racial composition on crime, noting that France’s population was comprised of three different races—the Celtic, German, and Pelasgian—that were concentrated in different regions of the country. He argued that the Pelasgian race, located primarily in the southern portion of France, was “particularly addicted to crimes against persons,” whereas members of the Germanic race were most likely to be involved in property crimes, apparently because individuals from this group were commonly engaged in “the frequent use of strong drinks” (p. 90).
Similar to contemporary analyses of crime, Quetelet (1842) also focused on the correlates of crime, with particular attention to gender and age. He noted that in France, there were 26 women for every 100 men accused of crimes against property, compared to 16 women for every 100 men accused of crimes against persons (see Exhibit 2.3). He argued that these differences were attributable to the fact that women were “more under the influence of sentiments of shame and modesty, as far as morals are concerned; their dependent state, and retired habits, as far as occasion or opportunity is concerned; and their physical weakness, as far as the facility of acting is concerned” (p. 91). Noting that women were more likely to commit serious violent offenses against intimates as opposed to strangers, Quetelet asserted, “They can only conceive and execute guilty projects on individuals with whom they are in the greatest intimacy; thus, compared with man, her assassinations are more often in her family than out of it” (p. 91).
With respect to the relationship between age and involvement in crime, Quetelet (1842) argued that “of all the causes which influence the development of the propensity to crime, or which diminish that propensity, age is unquestionably the most energetic” (p. 92). Similar to current explanations of the age correlate of crime, he suggested that crimes against property were more likely to be committed by those in the younger age groups, whereas crimes against persons evidenced a later age peak; his data from France indicated that these crimes peaked at age 25.
More generally, Quetelet’s early studies of the numerical consistency of crimes stimulated theoretical discussions on the causes of crime, contributing in particular to theories that focused on the relative importance of free will versus social determinism in explaining the criminal behavior of individuals. If crime were determined by social forces, then explanations of crime that invoked the free will of individuals were not plausible. He argued that “every thing which pertains to the human species considered as a whole, belongs to the order of physical facts; the greater the number of individuals, the more does the influence of individual will disappear, leaving predominance to a series of general facts, dependent on causes by which society exists and is preserved” (p. 90).
Quetelet was active for nearly half a century in the attempt to measure social phenomena statistically, so much of his work is generally known. Other important and, for their time, innovative studies preceded his work but have not received considerable attention. For example, M. de Guerry Champneuf, who served as director of criminal affairs for the Ministry of Justice in France from 1821 to 1835, conducted extensive analyses of crime for 86 departments in France. Similar to Quetelet, Guerry argued that crime rates were determined by larger societal, as opposed to individual-level, factors. He also recognized the important distinction between types of crime. He created categories for classifying crime against property versus crimes against the person, with 17 offenses in each category, and calculated age-sex specific crime rates for each. In contrast to Quetelet, Guerry based his measurement of crime on the number of persons accused of crime, as opposed to convictions. He believed that using convictions, which depended on the decisions and whims of juries, would result in a biased portrayal of the nature and extent of crime (Coleman & Moynihan, 1996). Guerry (as cited in Elmer, 1933) also highlighted what he thought to be the important correlates and causes of crime, noting “There is the influence of climate, and there is the influence of seasons, for whereas the crimes against persons are always more numerous in the summer, the crimes against property are more numerous in the winter” (p. 64).
Other researchers in the moral statistics tradition included Georg Mayr (as cited in Bonger, 1916) who, like Guerry, questioned the use of conviction statistics as measures of crime: “The immorality of a people is determined not by the number of individuals convicted, but by the number of crimes committed; else that people would be most moral in which no offender ever let himself be caught, even if more crimes were committed there than elsewhere” Corne (as cited in Bonger, 1916), using court statistics as his data source, attributed an increase in crime that occurred in France between the years 1849 and 1853 to a “better organization of the police” (p. 48). Also in the mid-1800s, Rettich (as cited in Bonger, 1916) criticized existing criminal statistics in Germany and the purported causes of crime that were derived from these data because they did not take into account what we now refer to as white-collar crime: “The worst offenses against property are not committed by the hungry. The merchant who goes into a fraudulent bankruptcy, the banker who embezzles deposits, the worldling who forges drafts, have all taken the step into crime from a life, if not of abundance, at least of competence” (p. 67).
Judicial statistics were first collected in England in 1805, and more standardized judicial statistics regarding indictments and convictions for indictable (more serious) offenses were collected annually in that country beginning in 1834. Commentators on these statistics emphasized the importance of exercising caution in interpreting them. Morrison (1892) argued that it was not possible to determine whether crime was increasing or decreasing in England, due to the fact that crime statistics were handled in an “erratic and haphazard manner” (p. 950). He also noted that a primary cause of increases in crime was changes in legislation that added offenses to the criminal code, whereas at the same time, decreases in crime could be attributed to the “abolition of old penal laws, and the greater reluctance of the public and the police to set the law in motion against trivial offenders.” The influence of legislative changes on crime rates was also emphasized by du Cane (1893), who noted that offenses against the British Education Act (which required parents to send their children to school), which were not legislatively mandated prior to 1870, totaled over 96,000 in 1890: “Few people [however] would say that ‘crime’ was increasing and civilization demoralising us because we now compel parents to send their children to school” (p. 486). Du Cane thus argued that an uninformed comparison of crime rates in England over the 1870 to 1890 period might conclude that crime had increased, when in reality the increase was due to an expansion in the definition of crime.
The earliest statistics published on a statewide basis in the United States were judicial statistics from the state of New York in 1829; statewide prison statistics were first collected in Massachusetts in 1834. By 1905, 25 states had enacted legislation providing for statistics on the number of people prosecuted and convicted in their courts (Pepinsky, 1976).
Prior to the development of the Uniform Crime Report (UCR) system in the United States in 1930, the closest thing to national crime statistics were data on individuals committed to jails, houses of correction, and penitentiaries (see Exhibit 2.4). These statistics were compiled by the Census Bureau beginning in the 1850s; collection continued in the years 1860, 1870, 1880, and 1890, with separate enumerations in 1904, 1910, 1923, and 1933. The data included the name of the prisoner, the date of their commitment, their sex, race, age, marital status, country of birth (of prisoner and their father), number of years living in the United States (for prisoners who were born in other countries), ability to speak English (and language spoken if not English), ability to read and write, occupation before committed, offense for which committed, the nature of sentence, the term of sentence, and the amount of fine imposed, if any (Hill, 1912). Due to limitations in the way the data were collected within jurisdictions, these data were not particularly useful for measuring levels of crime, let alone for the purposes of making comparisons across jurisdictions. Further, as Hill (1912) noted, “The Bureau [of the Census] fully recognizes that without further analysis of these totals … [they] possess no value as a measure of criminality, since they include every degree and variety of offense from disorderly conduct to murder in the first degree” (p. 32).
Federal government attention to more refined criminal statistics began in 1870, when Congress passed a law creating the Department of Justice. One section of this legislation provided that it was “the duty of the Attorney-General to make an annual report to Congress . . . [on] the statistics of crime under the laws of the United States, and, as far as practicable, under the laws of the several states” (as cited in Maltz, 1977). However, as Maltz (1977) noted, this provision was basically ignored by law enforcement officials, and it fell into almost immediate misuse.
In the 1920s, certain jurisdictions and states conducted surveys of their criminal justice systems. The first of these was conducted in the city of Cleveland in 1922, and surveys at the state level followed in Illinois, New York, Pennsylvania, California, Virginia, Georgia, Minnesota, and Oregon (Robinson, 1933). Although these studies provided some important insights into the administration of criminal justice, they too were not particularly useful for the purposes of measurement and cross-jurisdictional comparisons.
THE DEVELOPMENT OF UNIFORM CRIME REPORTS IN THE UNITED STATES
The statistics compiled by federal bureaucracies enforcing criminal justice are so deficient and incomparable as to render impossible the answering of a single important question about crime. . . . Statistics of crime . . . are of little value because of the lack of uniformity in the definitions of crime, because of the close relation between police work and politics, because of the lack of comparability among categories employed in reporting, because of varying police practices, and because of the absence of centralized reporting. (Tibbits, 1932, p. 963)
Despite attempts by the Census Bureau to collect crime statistics based on law enforcement data as early as 1907, it was not until 1930 that such statistics became available at a national level. In 1927, a committee of the International Association of Chiefs of Police (IACP) was formed to examine the feasibility of collecting uniform crime records (IACP, 1929). In this period, most crime reports produced by state and municipal agencies were virtually useless for comparative purposes. Definitions of crime were not uniform across jurisdictions or even within states, there were no centralized reporting procedures, law enforcement policies varied across jurisdictions, and crime statistics were frequently used for political purposes. In the 1920s, aside from Massachusetts, no state published any statistics on the total number of arrests, and no state released statistics on crimes known to the police. Virtually the only sources of information in the field of police statistics at this time were the annual reports of individual city police departments, with only 14 cities publishing data covering the more serious offenses that were reported to them (Maltz, 1977).
Exhibit 2.5, based on data compiled by Monkkonen (1994) reveals some of the vagaries associated with cross-jurisdictional and longitudinal comparisons of early police arrest data. In 1880, arrests for drunkenness offenses ranged from a low of 1,630 per 100,000 population in Cincinnati, Ohio, to a high of 5,098 in New Haven, Connecticut. By 1915, Boston’s drunkenness arrest rate had almost doubled, to 8,208, whereas Cleveland, which had a drunkenness arrest rate of 2,191 per 100,000 population in 1880, saw a decline to 378 in 1915. The large variations in drunkenness arrest rates over time and across jurisdictions are primarily related to differences in law enforcement activity toward such offenses, rather than to differences in the actual number of individuals who were intoxicated. Although they are not as likely to be influenced by police activity, the comparative data on homicide arrests are also interesting to consider. In 1880, Louisville, Kentucky, had the highest homicide rate, at 16.2 per 100,000, whereas New Haven, Connecticut, had a rate of 0.0 (meaning there were no homicides in New Haven in that year). By 1915, San Francisco’s homicide rate had nearly doubled from its 1880 rate, to 29.1 per 100,000, and St. Louis saw its rate increase sixfold, from 4.6 to 27.6 per 100,000.
One of the primary motivations for the establishment of the UCR program was to counter media-generated crime waves (O’Brien, 1985), and when the IACP began deliberations on the collection of crime data, there were debates about which specific crime statistics would be the most useful to collect. Some believed that the number of arrests made by the police would be most useful, but apparently the views of individuals such as August Vollmer, chief of police in Berkeley, California, ultimately held sway. Vollmer maintained that the number of arrests would constitute a false and inadequate measure of crime because these would be subject to potential bias on the part of the police. Vollmer argued that the only dependable data would be the actual number and types of complaints received by law enforcement officials (Maltz, 1977). Hence, the UCR data were based on crimes known to the police, that is, crimes that were reported to the police by the public.
As a result of the efforts of the IACP, the first monthly report of offenses known to the police was published in January 1930. The association continued monthly publication of these reports until September 1930, when the work was taken over by the Federal Bureau of Investigation (FBI) of the U.S. Department of Justice. In 1931, official crime reports were received from 1,127 cities and towns having a combined population of nearly 46 million, which represented approximately 80% of the population of the United States (Tibbits, 1932).
Even in the initial years of the UCR, the records were subject to checks for reliability by the FBI, and data from several jurisdictions were eliminated due to irregularities. In addition, some police departments were reluctant to compile and publish reports on the volume of crime in their jurisdictions due to concerns that the data would be used by the public to negatively evaluate their performance (Leonard, 1954). The limitations of these crime data were in fact recognized by the agency that published and disseminated them. Included in the 1931 UCR publication (as cited in Biderman & Reiss, 1967) was the statement, “If it took the highly centralized English government 66 years to get its famous and highly efficient police to report correctly crimes known to the police, it is evident that it will take many years before our decentralized and nonprofessional police forces can be induced to make trustworthy reports of crimes known to the police” (p. 3). Similarly, one issue of the UCR, released in May 1931 (as cited in Maltz, 1977), stated that “wide divergences in the total number of particular crimes for various cities of approximately the same population may in some cases not be indicative of a variance in the amount of crime in those cities but may be charged to inadequate record systems or a lack of understanding of the classification on the part of some officials” (p. 38). In subsequent UCRs, caveats such as “in publishing the reports sent in by the chiefs of police in different cities, the Department of Justice does not vouch for their accuracy” (Federal Bureau of Investigation, 1931) were included.
Critics were also aware of the potential misuses of crime statistics by police departments in order to generate additional funding. As the 1930 Wickersham Commission (National Committee on Law Observation and Enforcement, as cited in Maltz, 1977) noted, “It takes but little experience of such criminal statistics as we have in order to convince that a serious abuse exists in compiling them as a basis for requesting appropriations or for justifying the existence of or urging expanded powers and equipment for the agencies in question” (p. 36).
In the early UCR, offenses were classified under 26 separate headings according to their general common-law definitions, and these were then categorized into two broad categories. Seven crimes were selected for inclusion in the initial UCR index. These were murder and nonnegligent manslaughter, rape, robbery, aggravated assault, burglary, larceny (theft), and motor vehicle theft. These offenses were chosen for inclusion because (1) they constituted offenses that were most likely to be reported to the police, (2) police investigations of such incidents could easily establish that a crime had actually occurred, (3) these crimes occurred in all geographical areas of the United States, (4) they occurred with sufficient frequency to provide an adequate basis for comparison between jurisdictions, and (5) they were serious by their nature and volume (O’Brien, 1985).
Although the number of law enforcement agencies reporting to the UCR increased over time, the practices and procedures of the program essentially remained unchanged until 1958, when a number of modifications were implemented. Included among the changes were the removal of manslaughter by negligence from the criminal homicide category, the limiting of rape offenses to forcible rape, and the exclusion of thefts of property valued at less than $50 from the larceny category. One effect of these changes is that historical and cross-jurisdictional comparisons of the number, distribution, and rates of these offense classes cannot reasonably be made for the years prior to 1958.
While it is true that improvements were made in the UCR from the time of its implementation in 1930, critics still pointed to problems associated with making cross-jurisdictional and historical comparisons of crime rates. Beattie (1960) noted the difficulties in making cross-state comparisons, first using the example of the ratio of crimes against the person to crimes against property across a number of states. In California, the ratio of crimes against the person to crimes against property was 1 to 7; in New York, 1 to 6.5; in Ohio, 1 to 7. However, the ratio of crimes against the person to crimes against property was 1 to 2 in North Carolina, 1 to 3 in Mississippi, 1 to 22 in Rhode Island, and 1 to 50 in Vermont. Beattie suggested that these vast disparities indicated that states were using entirely different practices in reporting offenses known to the police.
Moving to the level of city comparisons, Beattie (1960) used the example of Akron and Canton, adjacent metropolitan areas in Ohio. In 1958, Akron reported three times the forcible rape rate of Canton and more than three times the aggravated assault rate, leading Beattie to conclude, “It is just not conceivable that crime rates in these metropolitan areas could vary as indicated by these published figures. It is much more likely that the disparities are due to differences in methods of accounting from crimes reported to the police” (p. 55). Beattie further asserted that Los Angeles, which had an efficient police department that collected complete records, was falsely identified as having a high crime rate compared to other cities that were not characterized by similar standards of high-quality record keeping. He also pointed to difficulties in classifying crimes into the various index categories, noting, for example, that with larceny-theft there was no standard basis for assessing the value of stolen property. As such, comparisons of theft statistics across jurisdictions could be extremely misleading, if not completely meaningless.
The misuse and misrepresentation of UCR data has a long history. In an interesting example of this phenomenon, on January 1, 1962, the popular magazine U.S. News and World Report published an interview with J. Edgar Hoover, head of the FBI (“Who’s to Blame,” 1962). Hoover asserted that crime was becoming more serious and frequent in the United States. In support, he noted that between 1950 and 1960, “serious crime” had increased by 98%, although the population of the United States grew by only 18% over the same period. More specifically, Hoover invoked the rather questionable practice of combining all crimes in the crime index to claim that “in 1960, more than 7,700 police departments of this country reported 1,861,300 murders, forcible rapes, robberies, aggravated assaults, burglaries, automobile thefts, and larcenies of $50 or more.” The bulk of this total was, however, comprised of more minor crimes such as larcenies. Hoover attributed these increases in crime to a decline in parental authority and control of children as well as to a more general deterioration in moral standards in the United States. Presaging a view that is popular in explanations of crime today, Hoover also argued that there was a relationship between exposure to violence in the media and crime: “The highly suggestive, and at times, offensive scenes, as well as the frequent portrayal of violence and brutality on television screens and in motion pictures, are bound to have an adverse effect on young people” (p. 3).
THE DARK FIGURE AND ADDITIONAL PROBLEMS WITH CRIME STATISTICS
[A criminologist] studies the criminals convicted by the courts and is then confounded by the growing clamor that he is not studying the criminal at all, but an insignificant proportion of non-representative and stupid unfortunates who happened to become enmeshed in technical legal difficulties. (Tappan, 1947, p. 96)
Although official data from the UCR and other sources were commonly used by journalists, criminologists, and other social scientists to comment on crime trends and the causes of crime, many commentators began to recognize the potential weaknesses of these data. Beattie (1941) noted that police statistics were manipulable for political purposes and hence questionable in their validity. “Traditionally, police departments are anxious to make a good showing in their annual figures, and there is, therefore, a natural tendency to record and report those facts which show a good administrative record on the part of the department” (p. 21). Vold (1935) commenting on an alleged crime wave in St. Paul, Minnesota, in the early 1930s, suggested that “it has been impossible for the present writer to determine whether this represents an actual increase in serious crime in this part of the country, or merely much needed improvement in police statistics” (p. 802). Sellin (1931) also argued that crime statistics primarily reflected the activity of law enforcement personnel and therefore could not be accepted as indicating particular trends in crime. Indeed, along these lines, Sellin is perhaps best known for his suggestion that “in general, it may be said that the value of a crime rate for index purposes is in reverse ratio to the procedural distance from the commission of the crime and the recording of it as a statistical unit” (p. 346).
Early 20th-century criminologists also recognized that crime rates could be affected by the policies and practices of individual police departments. For example, in London, England, a change in recording practices was implemented in 1932 whereby citizens’ reports to the police of thefts that were not previously recorded in official data were now included. This change resulted in an increase in indictable (more serious) offenses in London, from approximately 26,000 in 1931 to 83,000 in 1932 (Radzinowicz, 1945)—an increase of more than 300%. Similarly, a change in police administration in New York City in 1950, which led to a modification of recording practices, resulted in increases of more than 400% in robberies, 700% in larcenies, and more than 1,300% in burglaries over a one-year period (Brantingham & Brantingham, 1984).
One of the most influential studies that drew attention to the dark figures of crime and the possibility of criminal justice system biases in generating police and judicial statistics was Robison’s (1936) study of delinquent youth in New York. Robison challenged the common use of juvenile court statistics in examinations of delinquency and was particularly critical of the delinquency area technique adopted by the Chicago School sociologists Shaw and McKay. Shaw and McKay (1931), relying on juvenile court referral data in a number of U.S. cities, found that the highest rates of delinquency occurred in neighborhoods characterized by rapid population change, poor housing, poverty, tuberculosis, adult crime, and mental disorders. In addition, they found that delinquency rates were highest in inner-city or core areas of cities but declined with the distance from the center of the city. They emphasized the importance of social disorganization in explaining how youth in these areas became involved in delinquency. This disorganization was manifested in the alienation of children from their parents and adult institutions, resulting in detachment from informal social controls that would normally produce conformity in children.
Robison (1936) took issue with Shaw and McKay’s studies. She argued that the method used by these sociologists not only was invalid for measuring the extent and nature of juvenile delinquency but also was ineffective for generating theoretical explanations of delinquency and policies for delinquency prevention. The Chicago School studies and others like them had implicitly assumed that the extent of delinquency could be measured accurately through the identification of apprehended delinquents—a questionable assumption, at best. The studies also failed to sufficiently differentiate between different types of delinquency by generally treating truancy, stealing, and malicious mischief as similar behaviors.
Robison (1936) challenged the commonly held notion that there was a fundamental association between poverty, race or ethnicity, and delinquency and claimed instead that these relationships were due to the differential treatment of individuals of different socioeconomic status and racial or ethnic backgrounds by criminal justice system officials. She argued that the customs of diverse nationality and cultural groups had an impact on which youth would be labeled officially delinquent; therefore, variation in the behavior of parents and authorities when confronted with “troublesome children” was potentially more important in determining official rates of delinquency than real differences in the proportion of delinquents in these groups. Robison further noted, “Is it not rather a human tendency to regard less critically the behavior of the children who live on the right side of the tracks than that of the urchins who, surprised in suspicious activity, react with an almost reflex furtiveness? The policeman is more prone to suspect the poor man’s child of theft and the rich man’s child of a prank” (p. 30).
Robison’s (1936) study combined administrative records of delinquent behavior from a cross section of public schools, family agencies, and agencies that cared for neglected children in New York City. The inclusion of these additional data resulted in an increase in the total number of delinquents, from 7,090 (the figure derived from juvenile court records) to 15,898 children who were under care for the commission of delinquent behavior in 1930. In her analysis of these and other data, she pointed to racial and ethnic differences in the identification and processing of delinquency cases. Referring specifically to official juvenile court data, which indicated that there were seven Catholic youth for each Protestant youth processed and three Catholic to every Jewish youth, Robison noted that “the same behavior by a boy in a Jewish family, no matter where he lives, has evidently not the same chance of being labeled delinquent and referred to the court as that of a boy in an Italian family. Apparently, a misbehaving boy in a Protestant family has even less chance of being referred as a delinquent, either to the court or an unofficial agency, than the Italian or Jewish boy, and with Negro boys the chances are still different” (pp. 195–196). In short, Robison’s research suggested that reliance on official data to study juvenile delinquency would result in misleading theories about its causes and misguided policy solutions to the juvenile delinquency problem. Another prominent critic of official crime data was Edwin Sutherland (1947), who made a number of important points regarding the weaknesses in these data. Sutherland was one of the first to note that, for the purposes of comparison, crime statistics needed to be calculated in proportion to the population or some relevant base. He suggested that crime rates should be corrected for variations in the age, sex, race, and urban-rural composition of populations.Underlining the importance of using relevant bases in calculating rates, Sutherland provided the example of an increase in convictions for violations of motor vehicle legislation in Michigan from 1,566 in 1912–1913 to 27,794 in 1931–1932. He noted that the number of automobiles in Michigan had increased from 54,366 to 1,230,980 over the same period; using the number of motor vehicles instead of population figures for calculating the conviction rate for these offenses indicated that there had in fact been a significant decline in the conviction rate over the 20-year period.
Sutherland (1947) also noted that the way the crime index was constructed could result in misleading interpretations of changes in crime over time. Focusing on homicide, he noted that the principal increase in such offenses was primarily due to homicides by negligence, especially in the form of killings related to automobile use. He asserted that the number of homicides by negligence was approximately equal to the number of other criminal homicides. Because the total number of homicides had decreased while the proportion of homicides due to negligence had increased from the 1930s to the 1940s, it followed that the crime of murder had decreased significantly.
Sutherland (1940) is best known for his contention that explanations of crime were invalid because the official statistics they were based on did not include white-collar criminals (p. 4). Defining white-collar crime as “a crime committed by a person of respectability and high social status in the course of his occupation” (p. 4), Sutherland included in this category members of the medical profession—who, he alleged, illegally sold narcotics, provided fraudulent reports and testimony in accident cases, and split fees—and disreputable business and professional men who were “quacks, ambulance-chasers, bucketshop operators, dead-beats, and fly-by-night swindlers” (p. 4). Although Sutherland’s identification of white-collar crime did not lead to changes in how crime was officially counted in the United States, his assertion that crime was committed by individuals in all social classes inspired the work of labeling theorists such as Edwin Lemert (1951) and indirectly contributed to the development of self-report measures as alternatives to official measures of crime.
Some commentators also suggested that crimes committed by females, as well as by middle- and upper-class individuals, were underrepresented in official statistics. As early as 1932, a conference sponsored by the White House recognized that official juvenile court data underestimated the extent of female delinquency. One contributor to this debate was psychologist Otto Pollack. As Coleman and Moynihan (1996) noted, “While Sutherland had seen the dark figure wearing a collar and tie, Otto Pollack claimed he had seen the dark figure wearing a dress” (p. 11). Pollack’s 1951 study of female criminality, while presenting a controversial and rather misguided biopsychological theory of female crime, made an important contribution to the debate on the validity of crime statistics by focusing on the underrepresentation of females in official data. Pollack argued that female criminality was concealed by the underreporting of offenses committed by women, the lower detection rates of female offenders compared to male offenders, and the greater leniency shown to women by officials in the criminal justice system, including police, prosecutors, and judges.
The recognition that there was a significant amount of crime that was not recorded in official statistics, whether committed by middle- and upper-class individuals or females, led to the development of the first important alternative measure of crime and delinquency, that is, self-report studies.
EARLY SELF-REPORT STUDIES
Discussions of the development of self-report methodology in criminology have typically focused on the pioneering studies of researchers such as Wallerstein and Wyle, Porterfield, and Short and Nye. Important precursors to self-report studies of crime, however, were more general developments in the science of survey research, including developments in political opinion polling (Igo, 2007) and specific refinements in questioning people about their involvement in deviant behavior. Early forerunners of self-report studies of crime and deviance were, in fact, studies that examined the sexual behaviors of adults and college youth, many of which focused on sexually deviant behaviors. These studies demonstrated that people would answer questions about private and potentially embarrassing behaviors. For example, in 1897, Havelock Ellis published his then controversial book Sexual Inversion,1 which was based on interviews with people in London and Paris regarding their sexual attitudes and behaviors and revealed “the vast, tangled jungle of sex activities which flourish in human bodies” (“Studies,” 1936, p. 7). Similarly, Yamamoto Senji conducted surveys of sexuality in Japan in the 1920s, and around the same time, anthropologist Margaret Mead was examining sexual practices and mores of peoples in diverse cultures (Igo, 2007).
One of the first studies of sexual behavior in the United States was conducted at the University of Missouri in the late 1920s as part of a college course on the family. Included in the survey administered to students were questions about their view of sexual relations, both premarital and extramarital; whether fear, religious convictions, pride, or other forces inhibited their sexual desires; and opinions on divorce and alimony. Although the results of this study were apparently never published, the reactions to it are notable. Two of those who conducted the survey were dismissed from the University of Missouri, and another was suspended for one year. In justifying these actions, a member of the executive board of the university stated, “The time has come for a crusade against such discussions and such literature; for ridding our schools of those who cannot distinguish between legitimate research and cesspool delving and for barring from local libraries volumes which reek with sex appeal” (Literary Digest, 1929, p. 27).
As attitudes toward sexual issues became somewhat more liberal in the United States following World War II, several other studies of sexual behavior emerged. In a study of 613 students in Texas, Porterfield and Salley (1946) queried their subjects about their views and behaviors with respect to premarital sex, among other things. Focusing on gender differences in these behaviors, Porterfield and Salley noted that 58.5% of the pre-college men and 59% of the college men in their sample reported involvement in premarital sex, compared to only 1 in 137 (less than 1%) of the women surveyed.
The best known and most widely publicized of these surveys of sexual behavior and attitudes were those published by Alfred Kinsey and the Institute for Sex Research in the late 1940s and early 1950s (see Exhibit 2.6). While teaching a course on marriage and the family at the University of Indiana, Kinsey, originally trained as a biologist at Harvard University, began to question his students about topics such as their age at first premarital intercourse, their frequency of sexual activity, and the number of sexual partners they had. He was eventually given funding to conduct more detailed studies of sexual behavior by the Committee for Research in the Problems in Sex, a Rockefeller-funded grant-giving body that operated under the umbrella of the National Research Council. Operating from the premise that “there cannot be sound clinical practice, or sound planning of sex laws, until we understand more adequately the mammalian origins of human sexual behavior” (Kinsey, Pomeroy, Martin, & Gebhard, 1953, p.8), Kinsey’s project focused on examining sexual behaviors such as masturbation, nocturnal emissions, heterosexual petting, premarital sexual intercourse, homosexual contact, and animal contact “using orgasm as the primary unit of measurement” (Igo, 2007, p. 192). The project resulted in interviews with approximately 18,000 males and females, ranging in age from 2 to 90 years old, from a variety of educational, occupational, and religious backgrounds.
Kinsey (Kinsey et al., 1953) believed that the best method for obtaining information on these issues was to conduct personal interviews2 with subjects. “We have elected to use personal interviews rather than questionnaires because we believe that face-to-face interviews are better adapted for obtaining such personal and confidential material as may appear in a sex history” (p. 58). He felt that it was easier for interviewers to establish a rapport with subjects through this method and that personal interviews made it possible to adapt the wording of each question to the vocabulary and experience of each subject. Recognizing that respondents would be more truthful if they were assured of the confidentiality of their answers, Kinsey and his interviewers recorded answers on special sheets printed with a grid. Kinsey informed his respondents that the information was being recorded using unintelligible codes that only he and his two colleagues would be able to understand. On the other hand, he recognized that respondents might lie in personal interviews, so he embedded a number of checks into his interview schedules to detect individuals who were not being truthful. For instance, questions always placed the burden of denial on the person being interviewed. Instead of asking “Have you ever engaged in masturbation?,” the interviewer would ask “When did you first masturbate?” (Igo, 2007). If contradictions in answers were revealed, subjects were asked to explain them. If they refused to do so, the interview was terminated, and the information from it was not used (Bullough, 1998).
Perhaps somewhat unbelievably in the context of the sensitive questions being asked, Kinsey (Kinsey et al., 1953) claimed that “unlike the experience of those engaged in public opinion and some other surveys, we found no difficulty in getting our subjects to answer all of the questions in an interview. In the course of 14 years, there have not been more than half a dozen subjects who have refused to complete the records after they had once agreed to be interviewed” (p. 45).
The average interview encompassed approximately 300 questions and required between one and one-half to two hours, but for some respondents who had extensive sexual experience, the number of questions extended to 500 or more, and in one case an interview took 17 hours to complete (Igo, 2007).
Kinsey and his associates were aware of reliability and validity problems in their data, and they used a number of techniques to address these issues. “Retakes,” or re-interviews, in which the same questions were asked of the respondents, were conducted with 124 males and 195 females 18 months (in most cases) after the original interview had taken place. For the adult females in the sample, the retakes modified the lifetime incidence of reported sexual behavior by less than 2%; for adult males, there was no activity for which the retakes modified the incidences calculated on the original histories by as much as 3%. However, when examining reported frequencies of sexual behavior, there was less reliability. On seven out of the nine items reported by females and on eight out of the nine items reported by males, fewer than 70% of the subjects provided identical responses in the re-interview. As an additional check on reliability, Kinsey examined data from 706 pairs of spouses in the sample. He found that the number of identical responses ranged from 39% on the maximum frequency of coitus in any single week of the marriage to 99% on the use of the male superior position during coitus.
Although perhaps not well known, Kinsey’s original interview protocol actually included five measures of penis size: estimated erect penis length, measured flaccid penis length, measured erect penis length, measured flaccid penis circumference, and measured erect penis circumference. Obtaining such information in an interview setting would obviously be problematic, so the measurements were performed after the interview, with participants mailing their measurements to the Kinsey Institute using standard response cards and preaddressed stamped envelopes—44% of the males returned the cards (Bogaert & Hershberger, 1999).
As previously noted, Kinsey broke sexual practices into six types: masturbation, nocturnal emissions, heterosexual petting, heterosexual intercourse, homosexual outlets, and animal contacts; and he correlated these with 12 factors: sex, race (although it is notable that he ultimately excluded African Americans from his final analysis, probably because he had not collected enough sexual histories from members of this group), marital status, age, age at adolescence, educational level, occupation, occupation of parent, rural-urban background, religion, religious adherence, and geographic origin (Igo, 2007).
One of the standard methods of organizing data in this period was to create seven-point scales to classify behaviors, and Kinsey used a similar scale to classify individuals as homosexual or heterosexual. He did not trust individuals’ self-classification as homosexual or heterosexual, so his only objective indicator was self-reports of the type of sexual activity that resulted in the respondent experiencing an orgasm. Although the use of this measure indicated that most in his sample were exclusively heterosexual, it also implied that homosexuality was just another form of sexual activity—a revolutionary assertion in that historical period, as well as one that resulted in the most serious attacks on Kinsey and his data.
Kinsey’s other highly controversial findings challenged prevailing beliefs about the asexuality of women. He found that 40% of the females he interviewed had experienced orgasm within the first few months of marriage, 67% within the first six months, and 75% by the end of the first year. However, he also reported instances in which women failed to achieve orgasm after 20 years of marriage (Bullough, 1998).
Kinsey’s studies and data have had a lasting impact—in 1977, the National Gay Task Force used his findings on the prevalence of homosexuality in the 1940s and 1950s to pronounce that 10% of the population was homosexual (Igo, 2007). In addition, a 1999 study (Bogaert & Hershberger) used Kinsey’s data on penis size and homosexuality and reported that the average erect penis size was 6.46 inches for homosexual men and 6.14 inches for heterosexual men. The authors concluded, “In the largest sample of its kind in the world, homosexual men were found to report larger penises (in both length and circumference) than did heterosexual men. Moreover, the effects were not due to possible confounding factors such as height, weight, or education” (p. 218). Bogaert and Hershberger (1999) did concede, however, that the differences could be due to reporting bias, “In particular, homosexual men may be more likely than heterosexual men to exaggerate the size of their penises to conform to an ideal standard of sexual attractiveness” (p. 219).
Although critics have questioned several of Kinsey’s methods and theories and asserted that he was attracted to sex research by his own private masochistic sexual obsessions and shame over his homosexual impulses (Igo, 2007), to suggest that his studies were revolutionary is by no means overstating the case. His 1948 book, Sexual Behavior in the American Male, which sold for $6.50 and was more than 800 pages long, was published by W.B. Saunders, a respected publishing company that specialized in medical texts. Due to the anticipated popular appeal of the book, the company ordered 25,000 copies, rather than the usual 2,000 or 3,000, and these quickly disappeared (Schwarz, 1997), and sales of the book eventually reached nearly 250,000 (Igo, 2007). As Igo (2007) noted, Kinsey “practically became a brand name himself, useful in selling pajamas and housewares, not to mention magazines” (p. 207). Kinsey was further popularized in a 2004 biographical film, with actor Liam Neeson playing the role of Kinsey.
Kinsey’s findings that premarital and extramarital relations, homosexuality, oral sex, masturbation, and a host of other sexual practices were far more common than most people believed were groundbreaking. However, it is his method of data collection that is more important for our purposes. The Kinsey studies were among the first to show that people would report involvement in “deviant” activities. Other studies, conducted by researchers who were contemporaries of Kinsey, were revealing a similar willingness to report socially questionable behaviors via self-administered questionnaires. One example is the large and sophisticated study of college students’ drinking behavior conducted in the late 1940s by researchers at Yale University’s Center of Alcohol Studies (Straus & Bacon, 1953). The study covered 27 colleges in the United States, which were selected to represent a number of different types of institutions, including public, private, and sectarian institutions; coeducational, exclusively male and female; institutions with white and black students; urban and rural; those with large and small enrollments; and institutions in different regions of the country. The researchers administered questionnaires to a total of 17,000 college students, and of the 16,300 who filled out the questionnaire, 96.6% were used in the analyses.
A few months after their survey operations had begun, Straus and Bacon issued a press release announcing the survey and describing its procedures and purposes. They noted how the popular media’s reaction to this study tended to trivialize their efforts. The study was referred to as “Booze Kinsey,” and an article in the New Haven, Connecticut, Journal Courier in 1949 suggested, “Yale, for some odd reason (maybe they haven’t got enough work to do in New Haven) would like to amass a flock of statistics. The snoopers would like to know if rich kids drink more than poor kids, or if the sons of teetotalers lush it up more than the scions of soaks. … Every so often I despair of the work that Satan finds for the idle professional hands to do” (as cited in Straus & Bacon, 1953, pp. 42–43). The researchers asserted, however, that it was important to conduct this study “against the background of stereotypes, conflicts, problems, and changing patterns of drinking and control” (Straus & Bacon, 1953, p. 45).
The questionnaire used by Straus and Bacon contained items about, among other things, students’ religious affiliation, family characteristics, frequency of drinking, age when first “tight,” “times high, tight, drunk, and passed out,” and their opinions about the association between drinking and sexual behavior.
Straus and Bacon (1953) were aware of the potential reliability and validity problems associated with asking students about their alcohol consumption. Similar to Kinsey, they engaged in extensive checks of each questionnaire to eliminate responses that indicated insincerity or inconsistency. They noted that “about 100 students made humorous or sarcastic comments. Most of the latter were male students from one school where a member of the faculty assisting with the distribution of the forms invited attempts at humor with a joking remark at the start” (p. 4). They also realized that there would be variation in the students’ knowledge of some issues, such as family income and the drinking practices of their parents. Answers to other questions about so-called measurable facts, such as frequency of drinking, amounts consumed on each occasion, or number of times intoxicated, were dependent on memory and other factors in perception that could vary significantly from individual to individual.
Although there is not sufficient space here to enter into a detailed discussion of their results, several of the findings from this study are worth noting. Straus and Bacon (1953) found that 20% of the males and 39% of the females in the study identified themselves as abstainers from alcohol. Male students were much more likely to report that they had become tight from drinking alcohol than females (see Exhibit 2.7). Consumption of alcohol for most of the students took place in homes or public places such as restaurants, taverns, bars, or night clubs, as opposed to college dorm rooms and other on-campus sites. Beer was the most common alcoholic drink consumed by males, and wine was more commonly consumed by females. Straus and Bacon also noted that most students began drinking before they entered college. Of those who drank, approximately half had begun drinking by the age of 17. The reasons for drinking varied somewhat according to the type of beverage consumed, although these were generally related to issues of sociability.
As survey methodology in general, and self-report methodology in particular, progressed in sophistication and acceptability, researchers began to focus more on the measurement of criminal and deviant behavior. One of the pioneering studies in this genre was that of Porterfield (1946), who compared the self-reported delinquent behavior of 337 college students with that of 2,049 “alleged delinquents” who had appeared in the juvenile court in Fort Worth, Texas. Porterfield noted that those in the officially delinquent sample had been charged with a total of 55 specific offenses, ranging from “shooting spit wads at a wrestling match” (p. 205) to murder. However, the study also revealed a significant amount of delinquent activity on the part of the sample of college students. “One well-adjusted ministerial student said he had indulged 27 of the 55 offenses” (p. 205), and a few of the college students allegedly even confessed to committing murder. Presaging the findings of self-report studies conducted in the 1960s and later years, however, Porterfield acknowledged that although the offenses of college students were as serious as those committed by the official delinquents, they were probably not committed as frequently. Porterfield (1946) surmised that the official delinquents had been labeled as such due to inherent biases in the operations of the criminal justice system and characterized the juvenile delinquent as “a friendless young person who does not live in a good home or in a college dormitory … but who has offended some part of a rather peevish and irresponsible community, and been charged with the necessity for being responsible and other than peevish himself” (p. 205).
Murphy, Shirley, and Witmer (1946) similarly questioned the validity of official data on juvenile delinquency and took as their point of departure the fact that a considerable number of juveniles who broke the law did not appear in official criminal statistics. Studying a group of 114 officially delinquent boys, they grouped delinquency into three categories of seriousness: (1) violations of city ordinances, including such offenses as shining shoes or vending without a license, street ball playing, hopping streetcars, swimming or fishing in forbidden places, and violating curfew laws; (2) minor offenses, involving behaviors such as truancy, petty stealing, trespassing, running away from home, and sneaking into movies; and (3) more serious offenses, involving acts such as breaking and entering, larceny-theft, assault, drunkenness, and sex offenses.
To measure the extent to which their subjects engaged in these offenses, Murphy, Shirley, and Witmer (1946) engaged in a group consultation with youth case workers and reviewed the youth’s self-reported delinquency to determine whether he had engaged in these offenses rarely (denoting a frequency span of from 1 to 3 offenses per year), occasionally (4 to 9 offenses per year), or frequently (more than 10 times per year). Based on these measures, they estimated that the 114 boys had committed a minimum of 6,416 infractions of the law during a five-year period (covering the ages between 11 and 16). Supporting their contention that there existed a large dark figure of unrecorded delinquency, the researchers noted that only 95 of these violations became a matter of official complaint. The authors noted the implications of their study for official measures of crime: “So frequent are the misdeeds of youth that even a moderate increase in the amount of attention paid to it by law enforcement authorities could create a semblance of a ‘delinquency wave’ without their being the slightest change in adolescent behavior” (p. 696).
Wallerstein and Wyle (1947) conducted a self-report study of delinquent behavior using a sample of 1,698 adult men and women in New York City,Focusing on the delinquent behavior these subjects had committed before they reached the age of 16. The mailed questionnaire listed 49 separate offenses, and 99% of their sample reported committing at least 1 delinquent act. Men admitted to an average of 18 crimes and women to an average of 11. Perhaps even more surprising, 64% of the males and 29% of the females in the sample reported committing at least 1 of the 14 felonies included in the list of offenses.
Further developments in self-report methodology were associated with the work of James F. Short, Jr. and Ivan Nye (Nye & Short, 1957; Nye, Short, & Olson, 1958; Short & Nye, 1957–1958). These researchers began their project with a critique of existing criminological theories that had examined the relationship between juvenile delinquency and socioeconomic status based on official data. Similar to the earlier observations of Robison (1936), they asserted that studies using court records, police files, and other official measures were adequate for measuring official delinquency but were unreliable as indexes of delinquent behavior in the general population.
In one of their studies, Short and Nye (1957–1958) administered a questionnaire to samples of the general population of adolescents who were attending school and a sample of official delinquents who were institutionalized in training schools. The questionnaire included a total of 23 items, and from those, they created a delinquency index that consisted of seven items. Respondents were asked if they had committed the following acts since beginning grade school: (1) defied parents’ authority (to their face), (2) taken little things (worth less than $2) you didn’t want or need, (3) driven a car without a driver’s license or permit, (4) skipped school without a legitimate excuse, (5) bought or drank beer, wine, or liquor (including drinking at home), (6) purposely damaged or destroyed public or private property that did not belong to you, and (7) had sexual relations with a person of the opposite sex. For each one of the items, involvement in the behavior was divided into four categories: (1) did not commit the act, (2) committed the act once or twice, (3) committed the act several times, and (4) committed the act very often.
In analyses of their data, Nye, Short, and Olson (1958) found only weak relationships between delinquency and social class. For example, they noted that heterosexual relations were most frequently engaged in by lower-class boys in their sample, but purposely damaging or destroying property was committed most frequently by upper-class boys and girls. However, the researchers did recognize the limitations of their study in that not all adolescents were in school. School dropouts, who would not have been questioned in their surveys, may well have been more delinquent than those in school and may have been disproportionately concentrated in the lower class.
Short and Nye were also aware of potential reliability and validity problems in measuring crime and delinquency through self-reports. They were especially concerned that the institutionalized delinquents they included in their samples would attempt to manipulate the interview situation. To assess the extent of lying on the questionnaires, Short and Nye (1957–1958) included a number of trap questions, which were designed to identify both overreporting and underreporting of delinquency. They argued that if respondents indicated they had never told a lie and had never disobeyed their parents, they were presenting an over-conformist image; respondents who were identified as such were excluded from the subsequent analyses. On the other hand, some non-institutionalized respondents in their study reported that they had committed all the offenses on the checklist; such individuals were also excluded because it was believed that “such a person would not be at large” (p. 209). Such attention to the potential methodological weaknesses in these early self-report studies has been of considerable value to those who continue to conduct research using this methodology.
Despite the inclusion of relatively trivial offenses, Short and Nye’s work and other early self-report studies were important both methodologically and substantively. They demonstrated that people would report having committed delinquent acts and that the alleged negative relationship between social class and delinquent behavior was not as strong as extant criminological theory purported it to be. As Hindelang, Hirschi, and Weis (1981) noted, “Much like the Kinsey studies before them, the Short/Nye studies revolutionized ideas about the feasibility of using survey procedures with a hitherto taboo topic. They also eventually led to a revolution in thinking about the substance of the phenomenon itself” (p. 23).
Erickson and Empey (1963) also took issue with the use of official data to measure delinquency. They argued that such behavior was not an attribute per se, but was instead a phenomenon that was distributed along one or more continua. Their study involved personal interviews with males aged 15 to 17 in Utah and included four subsamples: (1) 50 high school boys who had never appeared in court, (2) 30 boys who had been to court once, (3) 50 repeat offenders who were on probation, and (4) 50 incarcerated offenders. Erickson and Empey suggested that the face-to-face interview method was the most effective in uncovering delinquent behavior because it allowed interviewers to provide more complete and reliable data, especially given a lack of literacy among some of their subjects. They also asserted that this method allowed for more accurate estimates of the frequency of involvement in delinquent behavior than the standard method of having subjects respond to predetermined categories such as “none,” “a few,” or “a great many times.” However, interviewers in the Erickson and Empey study encountered problems because some respondents were reluctant to reveal their involvement in offenses and some more habitual offenders had committed the offenses so frequently that they could not accurately estimate the number of times.
Similar to the findings of previous self-report studies, Erickson and Empey (1963) noted that the number of violations admitted to by their respondents was tremendous. Three types of offenses were most common:
theft (a total of 24,199 offenses), traffic violations (23,946), and the purchase and drinking of alcohol (21,698). In more than 90% of the cases, these offenses were undetected and not acted on by any official agency. Although the amount of hidden delinquency in this sample was significant, they found that boys who had been labeled as officially delinquent had committed a far greater number of delinquent acts than those who had not been so labeled.
Gold’s (1966) self-report study began by noting that in one Michigan city, boys who lived in the poorer section of town and were apprehended by police were more likely to be officially labeled delinquent than boys from the wealthier sections of town who were involved in the commission of the same types of offenses. This study relied on interviews with 522 boys and girls, 13- to 16-years old, living in the school district who were matched with interviewers on the basis of race and sex. The self-report instrument contained a total of 51 questions that asked youth about offenses they had committed during the previous three years. Aware of the possibility that his respondents might conceal their delinquent activities, Gold also interviewed a criterion group of 125 young people for whom he had already collected reliable information on their delinquency from official data. In addition, he interviewed peers of the respondents in order to gather independent information on delinquent acts they had witnessed or that had been described to them by the respondents.
Using these data, Gold (1966) concluded that 72% of the youth provided self-reported delinquency information that was consistent with the information provided by their peers, 17% were identified as “outright concealers,” and the remaining 11% were “questionables.” More important for measures of delinquency and for theories of delinquent behavior, and in contrast to several of the earlier self-report studies, Gold concluded that crime was, in fact, inversely related to social status. The lower-status young people in this study were found to commit delinquent acts more frequently than higher-status adolescents. However, this relationship existed only among boys.
An additional development with respect to reliability checks of self-reports of deviant behavior was associated with the work of Clark and Tift (1966). These researchers conducted a study of 45 white males enrolled in a sociology course at a Midwestern U.S. university in which respondents were asked to report the frequency of commission of a number of delinquent behaviors. The subjects were also given a polygraph test to check the veracity of their reports. Clark and Tift found that self-reports of delinquent behavior were accurate when a wide range of behaviors was considered simultaneously, but that there was differential validity on specific questionnaire items.
The first National Youth Survey (NYS), conducted in 1967 (Williams & Gold, 1972), drew on interviews and official records of 847 boys and girls, aged 13 to 16 years. In this study, respondents were given the following instructions: “Here is a set of things other kids have told us they have done. Which of them have you done in the past three years, whether you were caught or not?” (p. 213).Respondents were asked to report whether they had ever engaged in the activity, whether they had done it just once in the previous three years, or whether they had done it more than once. Williams and Gold found that 88% of the teenagers they interviewed confessed to committing at least one chargeable offense in the three years prior to the interview. They concluded that “if the authorities were omniscient and technically zealous, a large majority of American 13- to 16-year olds would be labeled juvenile delinquents” (p. 213).
The Williams and Gold study was one of the first to focus in some detail on racial differences in self-reported delinquency. The researchers discovered that black females were not more frequently or seriously delinquent than white females, and black boys were not more frequently delinquent than white boys. However, black males reported more involvement in more serious forms of delinquency than whites. For example, when involved in theft, blacks stole more expensive items, and when involved in assaults, they tended to inflict more serious injury.
Further refinements and developments in self-report methodology have certainly occurred since the 1960s, and these will be discussed in Chapter 4. But these early studies established self-reports as an alternative method of studying crime and delinquency, and, as noted, they called into question the purported negative relationship between social class and involvement in crime.
VICTIMIZATION SURVEYS
Another method of measuring crime that arose in response to a recognition of limitations in official data is the victim survey. Although not commonly identified as such in the criminological literature, one of the first studies of this nature was conducted by Fitzpatrick and Kanin (1957), who investigated sexual aggressiveness in dating relationships on a university campus. Questionnaires were distributed to 291 females in 22 university classes, asking them about their experiences with males in dating relationships. Of the respondents, 55.7% reported that they had been “offended at least once during the academic year at some level of erotic intimacy,” 20.9% had experienced forceful attempts at intercourse, and 6.2% had experienced “aggressively forceful attempts at sexual intercourse in the course of which menacing threats or coercive infliction of pain were employed” (p. 56).
More generally, the primary motivation for the development of comprehensive national surveys of crime victims in the United States was the recognition of the limitations in official measures of crime. In response to apparently escalating crime rates and urban unrest in several U.S. cities in the late 1960s, the President’s Commission on Law Enforcement and the Administration of Justice (1968) was impaneled in 1965 to develop policy and recommendations concerning the crime problem. The President’s Commission noted that “one of the most neglected subjects in the study of crime is its victims” (p. 3) and found that much of the information needed to formulate policy recommendations with respect to the rising crime problem was not available. The Commission also noted that official statistics on crime were problematic because many offenses were not reported to the police, and a number of administrative and organizational factors were believed to affect the reporting of these statistics in particular jurisdictions.
Both the deficiencies of official data and developments in the methodology of large-scale sample surveys provided the President’s Commission with the impetus for developing victimization surveys. The initial efforts involved three separate studies: a pilot study in Washington, D.C.; a second-stage study in three U.S. cities; and a national survey.
The Washington, D.C. pilot study was conducted during the spring of 1966. Working from a probability sample of homes in three police precincts, 511 interviews were completed with individuals who were asked to report whether they had been victims of crimes since New Year’s Day, 1965. This study contributed to the methodological sophistication of subsequent victimization studies. It also demonstrated that household surveys would provide a different picture of crime than that derived from police statistics. Depending on the type of crime, the pilot study revealed that there were from 3 to 10 times as many criminal incidents reported by victims than were recorded in official data (President’s Commission on Law Enforcement and the Administration of Justice, 1968).
The second-stage study was designed to elicit criminal victimizations experienced by businesses and organizations in selected high-crime areas in Boston, Chicago, and Washington, D.C., and to measure household victimizations among residents in Boston and Chicago. This study similarly found that there was much more crime than was reported in official statistics.
The third victimization study sponsored by the President’s Commission was a national survey conducted by the National Opinion Research Center, in which one respondent was interviewed in each of 10,000 households. This study revealed that approximately twice as many incidents of personal violence, and more than twice as many individual property victimizations, were estimated to have occurred than were recorded in the UCR. These data indicated that although many people experienced crime, many chose not to report it to the police. This survey also examined nonreporting of victimization and the reasons individuals did not report offenses they had experienced. Nonreporting was found to vary across offenses, ranging from a high of 90% for consumer frauds to a low of 11% for automobile thefts. Most of those who did not report their victimizations to the police felt either that the incident was private or that the police could not do anything about the offense (President’s Commission on Law Enforcement and the Administration of Justice, 1968).
The National Opinion Research Center victimization study reported a number of findings that were of interest to criminologists and policy makers. For example, the highest rates of victimization were found in the lower-income groups, nonwhites were victimized disproportionately by all index crimes except theft over $50, and the rates of victimization for men were almost three times higher than those for women.
Early reports also recognized the need for caution in the interpretation of victimization data, however, and pointed to a number of methodological problems. For example, in reference to the higher rates of burglary, larceny, and auto theft committed against men, the President’s Commission suggested that this was primarily an artifact of the survey methodology, whereby offenses committed against the household were assigned to the head of the household, which, in most cases, was a male. There was also recognition of the problems of telescoping and recency effects in the interview data. For instance, Biderman (1967) noted that the distribution of incidents reported for the national survey had a bulge at the beginning of the 12-month period for which the respondents were asked to report, as well as a larger bulge at the recent end of the distribution. These bulges suggested that respondents were remembering some crimes that they had been the victims of before the 12-month reference period and that they were more likely to recall crimes that they had experienced recently. The studies also revealed that people interviewed about crimes affecting their households mentioned incidents they had experienced personally in considerable disproportion to incidents affecting others who lived with them—that is, they were more likely to report their own victimization experiences as opposed to those of their family members. As Biderman (1967) noted, the inefficiency of asking about others’ victimization experiences was underscored by the finding that there was not a positive relationship between the number of individuals in the household and the number of incidents reported by the respondents.
The initial victimization surveys also revealed an interesting finding with respect to the education levels of respondents: Those who were college educated reported more frequent victimization experiences than others. But as Biderman (1967) noted, this may have been an artifact related to the productivity of respondents: Those with higher levels of education may have had better memories of events.
Although these initial victimization surveys were thus characterized by several methodological problems that will be explored in more detail in Chapter 5, they were important in establishing the victimization survey as an alternative to official measures of crime and in revealing that far more crime occurred than was recorded in official data.
SUMMARY AND CONCLUSIONS
Accurate accounts of various social phenomena have been important to policy makers and citizens alike since at least early Greek and Roman times. Still, the 20th century may well have been the first measured century because more and more numbers were used to characterize more and more facets of our lives. Historically, measures of crime and delinquency are among those that have the greatest potential for generating controversy and debate.
As described in this chapter, the earliest measures of crime were derived from official statistics. Concern over the reliability and validity of official counts led to the development of self-report and victimization measures—the earliest examples of which likewise evidenced limitations. Nonetheless, those pioneering official, self-report, and victimization studies served as the basis for more recent developments and refinements in crime measurement.
Chapters 3, 4, and 5 critically examine each of these data sources— official statistics, self-report studies, and victimization surveys—in turn, with the goal of identifying their role in providing accurate measurements of crime and delinquency.
Ideas in
American
Policing
By Lawrence W. Sherman
Evidence-Based Policing
July 1998
Ideas in American Policing presents
commentary and insight from
leading criminologists on issues of
interest to scholars, practitioners,
and policymakers. The papers
published in this series are from the
Police Foundation lecture series of
the same name. Points of view in
this document are those of the
author and do not necessarily
represent the official position of the
Police Foundation.
©1998 Police Foundation and
Lawrence W. Sherman. All rights reserved.
Lawrence W. Sherman is
professor and chair of the
Department of Criminology
and Criminal Justice at the
University of Maryland at
College Park. He was the
Police Foundation’s director
of research from 1979 to
1985.
POLICE
FOUNDATION
Abstract
The new paradigm of “evidence-based medicine” holds
important implications for policing. It suggests that just doing
research is not enough and that proactive efforts are required to
push accumulated research evidence into practice through national
and community guidelines. These guidelines can then focus in-
house evaluations of what works best across agencies, units,
victims, and officers. Statistical adjustments for the risk factors
shaping crime can provide fair comparisons across police units,
including national rankings of police agencies by their crime
prevention effectiveness. The example of domestic violence, for
which accumulated National Institute of Justice research could
lead to evidence-based guidelines, illustrates the way in which
agency-based outcomes research could further reduce violence
against victims. National pressure to adopt this paradigm could
come from agency-ranking studies, but police agency capacity to
adopt it will require new data systems creating “medical charts”
for crime victims, annual audits of crime reporting systems, and
in-house “evidence cops” who document the ongoing patterns
and effects of police practices in light of published and in-house
research. These analyses can then be integrated into the NYPD
Compstat feedback model for management accountability and
continuous quality improvement.
Most of us have thought of the
statistician’s work as that of measuring
and predicting . . . but few of us have
thought it the statistician’s duty to try to
bring about changes in the things that he
[or she] measures.
—W. Edwards Deming
—— 2 ——
Of all the ideas in policing,
one stands out as the most
powerful force for change: police
practices should be based on
scientific evidence about what
works best. Early in this century,
Berkeley Police Chief August
Vollmer’s partnership with his
local university helped generate
this idea (Carte and Carte 1975),
which was clearly derived from
that era’s expansion of the
scientific method into medicine,
management, agriculture, and
many other fields (Cheit 1975).
While science had greater initial
impact in those other professions
during the first half of the
century, policing in recent
decades has been moving rapidly
to catch up. However, any
assessment of this idea in modern
policing must begin with an
accurate benchmark: catching up
to what? More complete evidence
on the linkage between research
and practice suggests a new
paradigm for police improvement
and for public safety in general:
evidence-based crime prevention.
For years, Sherman (1984,
1992) and others have used
medicine as the exemplar of a
profession based upon strong
scientific evidence. Sherman has
praised medicine as a field in
which practitioners have advanced
training in the scientific method
and keep up-to-date with the
most recent research evidence by
reading medical journals. He has
cited the large body of
randomized controlled
experiments in medicine—now
estimated to number almost one
million in print (Sackett and
Rosenberg 1995)—as the highly
rigorous scientific evidence used
to guide medical practices. He
has suggested that policing
should therefore be more like
medicine.
Sherman was right about the
need for many more randomized
experiments in policing, but
wrong about how much medicine
was really based on scientific
research. New evidence shows
that doctors resist changing
practices based on new research
just as much as police do, if not
more so. Closer examination
reveals medicine to be a
battleground between research
and practice, with useful lessons
for policing on new ways to
promote research. Those lessons
come from a new strategy called
“evidence-based medicine,”1
“widely hailed as the long-sought
link between research and
practice” (Zuger 1997) to solve
problems like the following
(Millenson 1997, 4, 122, 131):
• An estimated 85 percent of
medical practices remain
untested by research evidence.
• Most doctors rarely read the
2,500 medical journals
available, and instead base their
practice on local custom.
• Most studies that do guide
practice use weak, non-
randomized research designs.
Medicine, in fact, seems just
as resistant to the use of evidence
to guide practice as are fields with
lower educational requirements,
such as policing. The National
Institutes of Health (NIH)
Consensus Guidelines are a case
in point. NIH convenes advisory
boards to issue to physicians
recommendations that are based
on intensive reviews of research
evidence on specific medical
practices. These recommendations
usually receive extensive publicity,
and are reinforced by mailings of
the guideline summaries to some
one hundred thousand doctors.
But according to a RAND
evaluation, doctors rarely change
their practices in response to
publication of these guidelines
(Kosecoff et al. 1987, as cited in
Millenson 1997). Thus three
years after research found that
heart attack patients treated with
calcium antagonists were more
likely to die, doctors still
prescribed this dangerous drug to
one-third of heart attack patients.
Eight years after antibiotics were
shown to cure ulcers, 90 percent
of ulcer patients remained
untreated by antibiotics
(Millenson 1997, 123–25).
Evidence Cops
The struggle to change
medical practice based on
research evidence has a long
history, with valuable implications
for policing. In the 1840s, Ignaz
Semmelweiss found evidence that
maternal death in childbirth
could be reduced if doctors
1 The term “evidence” in this mono-
graph refers to scientific, not criminal,
evidence.
—— 3 ——
washed their hands before
delivering babies. He then tried to
apply this research to medical
practice in Vienna, which led to
his being driven out of town by
his boss, the chief obstetrician.
Hundreds of thousands of women
died because the profession
refused to comply with his
evidence-based guidelines for
some forty years. The story shows
the important distinction between
merely doing research and
attempting to apply research to
redirect professional practices.
One way to describe people
who try to apply research is the
role of “evidence cop.”
More like
a traffic cop than Victor Hugo’s
detective Javert, the evidence
cop’s job is to redirect practice
through compliance rather than
punishment. While this job may
be as challenging as herding cats,
it still consists of pointing
professionals to practice “this way,
not that way.” As in all policing,
the success rate for this job varies
widely. Fortunately, the initial
failures of people like Semmelweiss
paved the way for greater success
in the 1990s.
Consider Scott Wein-
garten, M.D., of Cedars-Sinai
Hospital in Los Angeles. As
director of the hospital’s Center
for Applied Health Services
Research, Weingarten is an
evidence-cop-in-residence. His
job is to monitor what the 2,25
0
doctors are doing to patients at
the hospital and to detect
practices that run counter to
recommendations based on
research evidence. He does this
through prodding rather than
punishment, convening groups of
doctors who treat specific
maladies to discuss the research
evidence. These groups then
produce their own consensus
guidelines for practices that
become hospital policy. Thirty-
five such sets of guidelines were
produced in Weingarten’s first
four years on the job (Millenson
1997, 120).
What NIH, Weingarten, and
the 1995 founders of the new
journal called Evidence-Based
Medicine are all trying to do is to
push research into practice. Just
as policing has become more
proactive at dealing with crime,
researchers are becoming more
proactive about dealing with
practice. This trend has developed
in many fields, not just medicine.
Increased pressure for
“reinventing government” to
focus on measurable results is
reflected in the 1994 U.S.
Government Performance Results
Act (GPRA), which requires all
federal agencies to file annual
reports on quantitative indicators
of their achievements. Education
is under growing pressure to raise
test scores as proof that children
are learning, which has led to
increased discussion of research
evidence on what works in
education (Raspberry 1998). And
the U.S. Congress has required
that the effectiveness of federally
funded crime prevention
programs be evaluated using
“rigorous and scientifically
recognized standards and
methodologies” (House 1995,
sec. 116). All this sets the stage
for a new paradigm for making
research more useful to policing
than it has ever been before.
Key Questions
In suggesting a new paradigm
called evidence-based policing,
there are four key questions to
answer: What is it? What is new
about it? How does it apply to a
specific example of police
practice?
How can it be
institutionalized?
What is it?
Evidence-based policing is the
use of the best available research
on the outcomes of police work
to implement guidelines and
evaluate agencies, units, and
officers. Put more simply,
One way to describe people
who try to apply research is the
role of “evidence cop.”
—— 4 ——
evidence-based policing uses
research to guide practice and
evaluate practitioners. It uses the
best evidence to shape the best
practice. It is a systematic effort
to parse out and codify
unsystematic “experience” as the
basis for police work, refining it
by ongoing systematic testing of
hypotheses.
Evaluation of ongoing
operations has been the crucial
missing link in many recent
attempts to improve policing. If it
is true that most police work has
yet to go “beyond 911”
(Sparrow, Moore, and Kennedy
1990), the underlying reason may
be a lack of evaluation systems
that clearly link research-based
guidelines to outcomes. It is only
with that addition that policing
can become a “reflexive” or
“smart” institution, continuously
improving with ongoing
feedback.
The basic premise of
evidence-based practice is that we
are all entitled to our own
opinions, but not to our own
facts. Yet left alone to practice
individually, practitioners do
come up with their own “facts,”
which often turn out to be
wrong. A recent survey of 82
Washington State doctors found
137 different strategies for
treating urinary tract infections
(Berg 1991). No doubt the same
result could be found for
handling domestic disturbances.
A study evaluating the accuracy
of strep throat diagnoses based
on unstructured examination by
experienced pediatricians found it
far inferior to a systematic,
evidence-based checklist used by
nurses. The mythic power of
subjective and unstructured
wisdom holds back every field
and keeps it from systematically
discovering and implementing
what works best in repeated tasks.
A prime example of the
power of systematic, ongoing
evaluations comes again from
medicine. In 1990, the New York
State Health Department began
to publish death rates for
coronary bypass surgery grouped
by hospital and individual
surgeon. This action was
prompted by research showing
that while the statewide average
death rate was 3.7 percent, some
doctors ran as high as 82 percent.
Moreover, after adjusting for the
risk of death by the pre-operation
condition of the patient caseload,
patients were 4.4 times more
likely to die in surgery at the least
successful hospitals than at the
best hospitals. Despite enormous
opposition from hospitals and
surgeons, these data were made
public, revealing a strong practice
effect: the more operations
doctors and hospitals did each
year, the lower the risk-adjusted
death rate. Using this clear
correlation to push low-frequency
surgeons and hospitals out of this
business altogether, hospitals
were able to lower the death rate
in these operations by 40 percent
in just three years (Millenson
1997, 195).
Evidence-based policing is
about two very different kinds of
research: basic research on what
works best when implemented
properly under controlled
conditions, and ongoing
outcomes research about the
results each unit is actually
achieving by applying (or
ignoring) basic research in
practice. This combination creates
a feedback loop (fig. 1) that
begins with either published or
in-house studies suggesting how
policing might obtain the best
effects. The review of this
evidence can lead to guidelines
taking law, ethics, and community
Figure 1. Evidence-Based Policing.
Literature
Best
Evidence
In-House
Guidelines
Outputs
Outcomes
➤
➤
➤
➤
➤
➤
➤
➤
—— 5 ——
culture into account. These
guidelines would specify
measurable “outputs,” or
practices that police are asked to
follow. Their varying degrees of
success at delivering those
outputs can then be assessed by
tracking risk-adjusted
“outcomes,” or results over a
reasonably long follow-up period.
These outcomes may be defined
in several different ways: offenses
per 1,000 residents, repeat
victimizations per 100 victims,
repeat offending per
100
offenders, and so on. The
observation that some units are
getting better results than others
can be used to further identify
factors associated with success,
which can then be fed back as
new in-house research to refine
the guidelines and raise the
overall success level of the agency.
Such research could also be
published in national journals or
at least kept in an agency
database as institutional memory
about success and failure rates for
different methods.
What is new about it?
Skeptics may say that there is
nothing new in evidence-based
policing, and that other
paradigms already embrace these
principles. On closer examination,
however, we will see that no
other paradigm contains the
principles for its own
implementation. No other
paradigm contains a principle for
both changing practices and
measuring the success of those
changes with risk-adjusted
outcomes research (like bypass
surgery death rates). No other
paradigm—not even NYPD’s
Computerized Crime Comparison
Statistics (Compstat) strategy
(Bratton with Knobler 1998)—
uses scientific evidence to hold
professionals accountable for
results in peer-reviewed and even
public discussions of outcomes
evidence.
Evidence-based policing is
clearly different from, but very
helpful to, all three present
paradigms of policing. Incident-
specific policing, or 911
responses, currently lack any
outcomes measure except time
out of service. Police officers who
take too much time to handle a
call are sometimes accused of
shirking and are urged by
supervisors to work faster.2 But
no one tracks the rate of repeat
calls by officer or unit to see how
effective the first response was in
preventing future problems.
Evidence-based policing could
use such outcomes to justify
longer time spent on each call on
the basis of an officer’s average
results, rather than issuing a
crude demand that he or she stay
within an average time limit. It
could also place much more
emphasis on learning how to deal
with each call most effectively
and preventively, a question that
currently gets little attention.
Community policing,
however defined, is not clearly
linked to evidence about
effectiveness in preventing crime.
It is much more about how to do
police work—a set of outputs—
than it is about desired results, or
outcomes. Working with the
community and listening to and
respecting community members
are all important elements of the
paradigm. But that paradigm
alone has been easy for many
officers to ignore. Adding the
accountability systems from the
paradigm of evidence-based
policing could actually make
police far more active in working
with the community.
Problem-oriented policing is
clearly the major source for
2 This sounds oddly like the pressure
for drive-in, drive-out childbirth health
insurance now barred by federal law.
Evidence-based policing is
clearly different from, but
very helpful to, all three
present paradigms of policing.
—— 6 ——
evidence-based policing. Herman
Goldstein’s writings (1979,
1990), as well as John Eck and
William Spelman’s SARA model
(1987), clearly emphasize
assessment of problem-solving
responses as a key part of the
process. Yet there is no clear
statement about the use of
scientific evidence either in
selecting strategies for responding
to problems or in monitoring the
implementation and results of
those strategies (Sherman 1991).
Reports on problem-oriented
policing have so far produced
little evidence either from
controlled tests or outcomes
research. Because the paradigm
stresses the unique characteristics
of each crime pattern, problem-
oriented policing has not been
used to respond to highly
repetitive situations like domestic
assaults or disputes. Few
comparisons of different methods
for attacking the same problem
have been developed. Few officers
are even held accountable for not
implementing a problem-solving
plan they have agreed to
undertake. Problem-oriented
policing has clearly revolutionized
the way many police think about
their objectives, moving them
away from a narrow focus on
each incident to a broader focus
on patterns and systems. But in
the absence of pressure from an
evidence-based approach to
evaluating success and
management accountability,
problem-oriented policing has
been kept at the margins of
police work.
NYPD’s Compstat strategy
(Bratton with Knobler 1998) has
pushed the results accountability
principle farther than ever before,
but it has not used the scientific
method to assess cause and effect.
Successful managers are
rewarded, but successful methods
are not pinpointed and codified.
What evidence-based policing
adds to these paradigms is a new
principle for decision making:
scientific evidence. Most police
practice, like medical practice, is
still shaped by local custom,
opinions, theories, and subjective
impressions. Evidence-based
policing challenges those
principles of decision making and
creates systematic feedback to
provide continuous quality
improvement in the achievement
of police objectives (see Hoover
1996). Hence the inspiration for
this paradigm is not only
medicine and its randomized
trials, but also the principles of
quality control in manufacturing
developed by Walter Shewhart
(1939) and W. Edwards Deming
(1986). These principles were
initially rejected by U.S. business
leaders, but were finally embraced
in the 1980s after Japanese
industries used them to far
surpass U.S. manufacturers in the
quality of their products.
What makes both policing
and medicine different from
manufacturing, of course, is the
far greater variability in the raw
material to be processed—human
beings. That is what gives the
gold standard of evaluation
research, the randomized
controlled trial, both its strength
and its limitations. The strength
of the research design, pioneered
in policing by the Police
Foundation, is its ability to
reduce uncertainty about the
average effects of a policy on vast
numbers of people. The
limitation of the research design
is that it cannot escape variability
in treatments, responses, and
implementation.
The variability of treatments
in policing is much like that in
surgery, which stands in sharp
contrast to pharmaceuticals.
While the chemical content of
medical drugs is almost always
identical, the procedural content
of surgery varies widely. Similarly,
the style and tone each officer
brings to a citizen encounter
varies enormously and can make a
big difference in the outcome of
a specific case. Dosage, timing,
and follow-up of both drugs and
police work can vary widely in
practice.
Even holding treatment
constant, there is evidence that
both patients and offenders
respond to treatments with wide
variations. Some of these
responses, allergic reactions, can
kill some people with treatments
that cure most others. Offenders
are known to vary in their
responses to police actions by
individual, neighborhood, and
city. And implementation of new
practices based on controlled
experiments in both medicine and
policing varies according to how
well research is communicated,
how much information is created
—— 7 ——
about whether practices actually
change, and how much
reinforcement there is for the
change, both positive and
negative.
Evidence-based policing
assumes that experiments alone
are not enough. Putting research
into practice requires just as
much attention to
implementation as it does to
controlled evaluations. Ongoing
systems for researching
implementation can close the
feedback loop to create the
principle of industrial quality
improvement.
How does it apply to a specific
example of police practice?
The policing of domestic
violence offers a clear illustration
of what is new about the
evidence-based paradigm.
Domestic violence has been the
subject of more police practices
research than any other crime
problem. The research has
arguably had little effect on
police practice, at least by the
new standards of evidence-based
medicine. Yet the available
evidence offers a fair and
scientifically valid approach for
holding police agencies, units,
and officers accountable for the
results of police work, as
measured by repeated domestic
violence against the same victims.
The National Institute of
Justice (NIJ) and the Police
Foundation have provided
policing with extensive
information on what works to
prevent repeated violence. The
research has also shown that, like
surgery, police practices vary
greatly in their implementation.
These variations in practice cause
varying results for repeat
offending against victims. Even
holding practice constant,
responses to arrest vary by
offender, neighborhood, and city.
Finally, research shows very poor
compliance with mandatory arrest
guidelines after they are adopted
(Ferraro 1989).
There are many varieties of
arrest for misdemeanor domestic
violence. The offender may or
may not be handcuffed, arrested
in front of family and neighbors,
given a chance to explain his
version of events to the police, or
treated with courtesy and
politeness. Do these variations on
the theme of arrest make a
difference? They should,
according to the “defiance”
theory of criminal sanction effects
(Sherman 1993). And they did in
Milwaukee, according to
Raymond Paternoster and his
colleagues (1997). The
Milwaukee evidence reveals that
controlling for other risk factors
among some 800 arrested
offenders, those who felt they
were not treated in a procedurally
fair and polite manner were
60 percent more likely to commit
a reported act of domestic
violence in the future (fig. 2).
This finding suggests three ways
0%
10%
20%
30%
40%
40%
50%
25%
Fair Unfair
Figure 2. Repeat Domestic Violence and Police Fairness.
Source: Paternoster, et al.
—— 8 ——
to push research into practice:
1) change the guidelines for
making domestic violence arrests
to include those elements that
would enable offenders to
perceive more “procedural
justice”; 2) hold police
accountable for using these
guidelines by comparing rates of
repeat victimization associated
with different police units; and
3) compute these rates using
statistical adjustments for the pre-
existing level of recidivism risks.
The NIJ research provides
other evidence for ways that
police can reduce repeat
offending in misdemeanor
domestic violence. Rather than a
one-size-fits-all policy, the
evidence suggests specific guide-
lines to be used under different
conditions. Offenders who are
absent when police arrive—as
they are in some 40 percent of
cases—respond more effectively
to arrest warrants than offenders
who are arrested on the scene
(Dunford 1990). Offenders who
are employed are deterred by
arrest, while offenders who are
unemployed generally increase
their offending more if they are
arrested than if they are handled
in some other fashion (Pate and
Hamilton 1992; Berk et al. 1992;
Sherman and Smith 1992).
Offenders who live in urban areas
of concentrated poverty commit
more repeat offenses if they are
arrested than if not, while
offenders who live in more
affluent areas commit fewer
repeat offenses if they are arrested
(Marciniak 1994). All of these
findings could be changed by
further research, but for the
moment they are the best
evidence available.
This research evidence could
support guidelines for policing
domestic violence that differed by
neighborhood and absence or
presence of the offender. It could
also support guidelines about
listening to suspects’ side of the
story before making arrest
decisions and generally treating
suspects with courtesy. Other
evidence, such as the extremely
high-risk period for repeat
victimization in the first days and
weeks after the last police
encounter (Strang and Sherman
1996), could be used to fashion
new problem-oriented strategies.
Most important, the existing
research can be used to create a
fair system for evaluating police
performance on the basis of risk-
adjusted outcomes. That evidence
(fig. 3) shows that the likelihood
of a repeat offense is strongly
linked to the number of previous
offenses each offender has.
Once the risk of repeat
offending can be predicted with
reasonable accuracy, it becomes
possible to use those predictions
as a benchmark for police
performance. Just as in the bypass
surgery death rates in New York,
the outcomes of policing can be
Figure 3. Risk of Repeat Domestic Assault by Priors.
Milwaukee Domestic Violence Experiment
0
20
40
60
80
0 1 2 3 7
Percent Repeats
42%
48%
75%
60%
—— 9 ——
controlled for the risk level
inherent in the caseload they face.
Using a citywide database of all
domestic assaults, now running
over ten thousand cases per year
in cities like Milwaukee, a model
can be constructed to assess the
risk of repeat offending in each
case. The overall mix of cases in
each police precinct or for each
officer can generate an average
risk level for that caseload. Each
police patrol district can then be
evaluated according to the actual
versus predicted rate of repeat
offending each year (fig. 4). All
patrol districts in the city can
then be compared on the basis of
their relative percentage
difference between expected and
actual rates of repeat domestic
assault (fig. 5).
By constructing information
systems for this kind of outcome
research, police departments can
focus on an objective that has
only previously been measured in
major experiments. Making the
goal of policing each domestic
assault the outcome of a reduced
repeat offending rate rather than
the output of whether an arrest is
made would have several effects.
One is that crime prevention
would get greater attention than
retribution for its own sake.
While not everyone would
welcome that, it is consistent with
at least some police leaders’ view
of the purpose of the police as a
crime prevention agency (Bratton
with Knobler 1998). Another
effect would be to seek out and
Figure 4. Observed vs. Expected Risk of Repeat
Domestic Violence.
0
10
20
30
40
50
60
Observed Expected
Percent Repeat
25%
50%
-100
-50
0
50
100
150
200
PCT 1 PCT 2 PCT 3 PCT 4 PCT 5
Percent Repeat
Figure 5. Observed vs. Expected Ranking by Precinct.
–50%
–25%
50%
150%
—— 10 ——
even initiate more research on
what works best to prevent
domestic violence. In the world
as we now know it, no one in
policing—from the police chief to
the rookie officer—has any direct
incentive to reduce repeat
offending against known victims.
No one in policing is held
accountable for accomplishing, or
even measuring, that objective.
As a result, no one knows
whether repeat victimization rates
get better or worse from year to
year. Using outcomes evidence to
evaluate performance would make
police practices far more victim-
centered, the top priority being
that of preventing any further
assaults.
How can it be
institutionalized?
The strongest claim about
evidence-based policing is that it
contains the principles of its own
implementation. The principles of
using evidence both to change
and evaluate practice can be
applied to a broad institutional
analysis of implementation. Thus
while the changes described
above would have to occur one
police agency at a time, there are
certain national forces that can
help start the ball rolling. This
can be seen, for example, in
national rankings of big-city
police agencies, as well as national
mandates for improving police
data systems to provide better
evidence. Yet even such external
pressures will not succeed
without internal evidence cops to
import, apply, and create research
evidence.
No institution is likely to
increase voluntarily its
accountability except under
strong external pressure. It is
unlikely that evidence-based
policing could be adopted by a
police executive simply because it
appears to be a good idea. The
history of evidence-based
medicine and education strongly
suggests that professionals will
only make such changes under
external coercion. Nothing seems
to foster such pressure as much as
performance rankings across
agencies (Millenson 1997;
Steinberg 1998). Just as various
public performance measures
allow stockbrokers to rank
publicly-held corporations and
provide those companies with
strong incentives for better
results, public information about
police performance would create
the strongest pressure for
improvement.3
One example of how the
major city police departments
could be ranked on performance
can be found in their homicide
rates, which already receive
extensive publicity. What these
statistics lack, however, is any
scientific analysis of expected risk.
Police performance has nothing
to do, at least in the short run,
with the social, economic,
demographic, and drug market
forces that help shape a city’s
homicide rate. While police
performance may also affect those
homicide rates, the other factors
must be taken into account.
Using risk-adjusted homicide
rates provides one indication of
how well a police department
may be doing things like
confiscating illegal weapons,
patrolling hot spots, regulating
violent taverns and drug markets,
and monitoring youth gangs.
While the basic research literature
would increasingly provide a
source of guidance for taking
initiatives against homicide, a
3 The 1919 results of the first
national rankings of hospitals were
deemed so threatening that the American
College of Surgeons decided to burn the
report immediately in the furnace of
New York’s Waldorf-Astoria Hotel
(Millenson 1997, 146).
The strongest claim about
evidence-based policing is that
it contains the principles of
its own implementation.
—— 11 ——
risk-adjusted outcomes analysis
(fig. 6) would indicate how well
that research had been put into
practice.4
If a credible national research
organization would produce such
“league rankings” among big-city
police departments each year (like
the U.S. News & World Report
rankings of colleges and
universities), the predictable
result in the short term would be
attacks on the methodology used.
That is, in fact, what continues to
go on in New York with the
death rates in surgery. But the
New York rankings have spread
to other states, and consumers
have found them quite valuable.
Doctors—and police—may also
find rankings very valuable in the
long run. Both professions should
enjoy greater public respect as
they get better at producing the
results their consumers want.
The more seriously
performance indicators influence
the fate of organizations, the
more likely they are to be
subverted. Recent examples
include the U.S. Postal Service in
West Virginia, where an elaborate
scheme to defeat the on-time
mail delivery audit was recently
alleged (McAllister 1998). Other
examples include teachers helping
students to cheat on their answers
to national achievement tests and,
of course, police departments
under-reporting crime. The New
York City police have removed
three commanders in the past five
years for improperly counting
crime to make their performance
look better (Kocieniewski 1998),
and several chiefs of police
elsewhere have been convicted on
criminal charges for similar
conduct.
Quite apart from pressures to
corrupt data, criminologists have
long known that police crime
reporting is not reliable, with the
possible exception of homicide.
No two agencies classify crime
the same way. The same event
may be called an aggravated
assault in one agency and a
“miscellaneous incident” in
another. The recent FBI decision
to drop Philadelphia from the
national crime reporting program
was not an isolated action. In
1988, the FBI quietly dropped
the entire states of Florida and
Kentucky. Since the FBI lacks
resources to do on-site audits in
each police agency every year,
these examples are just the tip of
a very big iceberg. There are
already rising suspicions of police
manipulation of crime data as
4 While many of the basic risk factors
would be computed from Census data
that could be out of date by the middle
of each decade, other risk data can be
derived from annually updated sources,
such as the NIJ ADAM data on drug
abuse among arrestees. Unemployment,
school dropout, teen childbirth, and
infant mortality data are also available
annually for each city and could help
predict the expected rate of homicide.
Hypothetical Data
Figure 6. Homicide by City, Actual vs. Predicted.
-60
-40
-20
0
20
40
60
80
NYC Balt. Chi LA Dallas
Percent Difference
–50%
–25%
25%
60%
NYC Baltimore Chicago LA Dallas
—— 12 ——
crime rates fall in many cities.
More serious pressure from
national rankings would threaten
data integrity even more.
One viable solution to this
problem is a federal requirement
for police departments to retain
CPA firms to produce annual
audits of their reported crime
data. This requirement could be
imposed as a condition for
receiving federal funds, just as
many other federal mandates have
already done. Anticipating court
challenges about unfunded
mandates (such as the Brady Bill),
Congress could also provide
funds to pay for the audits.
Crime counting standards could
be set nationally by the
accounting profession in
collaboration with the FBI.
Alternatively, each state legislature
could require (or even fund)
these audits as a means of
assuring fairness in performance
rankings of police departments
within the state. State agencies
such as the criminal justice
statistical centers could also
produce such rankings as a
service to taxpayers. States already
have the option of spending
federal funds on such a purpose
under the broad category of
evaluation funds.
In the process of revitalizing
crime data integrity, there would
be great value in reorganizing
police data systems. Most
important would be the creation
of a “medical chart” for each
crime victim. Like computerized
patient records, this chart would
show the diagnosis (offense
description) for each incident a
victim presents to a police agency,
perhaps anywhere in the state.
The chart would also show what
police did in response, everything
from taking an offense report to
arresting an offender whose
release date from prison is also
kept, updated, in the
computerized victim chart. This
information tool could help
develop many proactive police
methods for preventing repeat
victimization. Allowing officers to
use these data to keep their own
private “batting averages” for
repeat victimization (even
without adjusting for risk) may
encourage them to become
involved and committed to doing
a better job at preventing crime.
Better records are also needed
about what police do about crime
according to certain patterns of
offenses. “Medical charts” for
violent taverns, frequently robbed
convenience stores, and other hot
spots where most crime occurs
would be very useful for ongoing
problem-oriented policing
attempts to reduce repeat
offending at those places. Similar
records could be kept about a
pattern of crimes spread out
across a wider area, such as
automatic teller machine
robberies. If officer teams or units
identify these places or patterns as
crime targets and designate a
control group, these medical
charts can become the basis for
estimating how much crime each
police unit has prevented.
Computers can also help
police officers to implement
practice guidelines. Medical
computer systems now offer
recommended practice guidelines
in response to a checklist of data,
as well as warning when drug
prescriptions fall outside
programmed parameters of
disease type and dosage. The use
of hand-held computers to advise
officers in the field and to provide
instant quality control checks may
not happen soon, but the growth
of police research may make it
inevitable in the long run.
Doctors are not expected to keep
In the process of revitalizing
crime data integrity, there
would be great value in
reorganizing police data
systems.
—— 13 ——
large amounts of research data in
their heads, nor even medical
guidelines for each diagnosis.
Computers will not replace good
judgment, but they can clearly
enhance it.
Federal rules could also
require police departments to
appoint a certified police
criminologist (either internally or
in partnership with a university or
research organization), who
would become the agency’s
evidence cop. Like Scott
Weingarten of Cedars-Sinai, the
departmental criminologist would
be responsible for putting
research into practice, then
evaluating the results. Whether
the criminologist is actually an
employee or a university professor
working in partnership with the
police may not matter as much as
the role itself. The criminologist
could help develop more effective
guidelines for preventing repeat
offending, and could develop
expected versus actual repeat
offending data by offense type for
each police district or detective
unit. A criminologist could add
the scientific method to the
NYPD Compstat process
(Bratton with Knobler 1998),
providing statistics at each
meeting on each patrol district’s
crime trends and patterns (or
even its complaints against police
officers) in relation to the
district’s risk level. Building the
capacity to import, apply, and
create evidence within each police
agency may be an essential
ingredient in the success of this
paradigm.
We may also find that the
traditional distance between
researchers and police officials
shrinks when researchers provide
more immediate managerial
information. Criminologists have
long refused to provide police
managers with data on particular
officers, deeming it contrary to
the ethics of basic research
(Hartnett 1998). By finally
providing the data in a
scientifically reasonable format,
criminologists may become far
more effective at pushing research
into practice.
Criminologists can also act on
the finding that doctors tend to
change practices based on
personal interaction and repeated
computerized feedback, and not
from conferences, classes, or
written research reports
(Millenson 1997, 127–30).
Similar findings have been
published about the effectiveness
of agricultural extension services,
in which university scientists visit
farms and show farmers new
techniques for improving their
crop yields. They echo a Chinese
proverb: Tell me and I will
forget; show me and I will
remember; involve me and I will
understand.
The one test of this principle
in policing to date is Alex Weiss’s
(1997) research on how police
departments adopt innovations.
Based on a national survey of
police chiefs and their top aides,
Weiss discovered that telephone
calls from agency to agency
played a vital role in spreading
new ideas. While written reports
may have supplemented the
phone calls, word-of-mouth
seems to be the major way in
which police innovations are
communicated and adopted.
Weiss’s study suggests the
great importance of gathering
more evidence on evidence. The
empirical question for research is,
what practices work best to
change practices? This inherently
reflexive posture may lead us to
empirical comparisons of the
effectiveness of, for example, NIJ
conferences, mass mailings of
research-in-brief reports, or new
one-on-one approaches. One
example of the latter would be
proactive telephone calls to police
agencies around the U.S. made
by present or former police
officers; callers could be trained
by research organizations to
describe new research findings. If
national consensus guidelines for
practice were developed by panels
of police executives and
The empirical
question for
research is,
what practices
work best to
change
practices?
—— 14 ——
researchers, the callers could
communicate those as well. Other
approaches worth testing might
include field demonstrations in
police technique. This training
would not be based on
experience, as is the current Field
Training Officer system, but
rather it would be based on
evidence that the method being
demonstrated has been proven
effective in reducing repeat
offending.
Conclusion
The test of this paradigm’s
results is not whether it is
adopted this year or in twenty
years. As Lord Keynes has
suggested, the influence of ideas
may be far more glacial than
volcanic. The pressure for better
measures of results is in the spirit
of the age, and police cannot
long escape it. All this paper does
is add one inch to the glacier, so
that we can say of policing what
Dr. William Mayo of the Mayo
Clinic said of his profession
almost a century ago: “The glory
of medicine is that it is constantly
moving forward, that there is
always something more to learn.”
References
Berg, A.O. 1991. Variations
among family physicians’
management strategies for
lower urinary tract infections
in women: A report from the
Washington Physicians’
Collaborative Research
Network. Journal of the
American Board of Family
Practice (September–
October): 327–30.
Berk, Richard A.; Alec Campbell;
Ruth Klap; and Bruce
Western. 1992. The deterrent
effect of arrest in incidents of
domestic violence: A Bayesian
analysis of four field
experiments. American
Sociological Review 57: 698–
708.
Bratton, William, with Peter
Knobler. 1998. Turnaround:
How America’s top cop
reversed the crime epidemic.
New York: Random House.
Carte, Gene, and Elaine Carte.
1975. Police reform in the
United States: The era of
August Vollmer. Berkeley:
University of California Press.
Cheit, Earl. 1975. The useful arts
and the liberal tradition. New
York: McGraw-Hill.
Deming, W. Edwards. 1986. Out
of the crisis. Cambridge:
Massachusetts Institute of
Technology, Center for
Advanced Engineering Study.
Dunford, Franklyn. 1990.
System-initiated warrants for
suspects of misdemeanor
domestic assault: A pilot
study. Justice Quarterly 7:
631–53.
Eck, John, and William Spelman.
1987. Problem-solving:
Problem-oriented policing in
Newport News. Washington,
D.C.: Police Executive
Research Forum.
Ferraro, Kathleen J. 1989.
Policing woman battering.
Social Problems 36: 61–74.
Goldstein, Herman. 1979.
Improving policing: A
problem-oriented approach.
Crime and Delinquency 25:
236–58.
———. 1990. Problem-oriented
policing. New York: McGraw-
Hill.
Hartnett, Susan. 1998. Address
to the Third National
Institute of Justice
Conference on Police-
Research Partnerships,
February.
Hodge, Melville H. 1990. Direct
use by physicians of the TDS
Medical Information System.
In A History of Medical
Informatics, edited by
Bruce I. Blum and Karen
Duncan. New York: ACM.
Kosecoff, Jacqueline, et al. 1987.
Effect of the National
Institutes of Health
Consensus Development
Program on physician
practice. Journal of the
American Medical Association
258 (November 20): 2708–
13.
McAllister, Bill. 1998. A “special”
delivery in West Virginia:
Postal employees cheat to
beat rating system.
Washington Post, 10 January,
A1.
Marciniak, Elizabeth. 1994.
Community policing of
domestic violence:
Neighborhood differences in
the effect of arrest. Ph.D.
diss., University of Maryland.
Millenson, Michael L. 1997.
Demanding medical excellence:
Doctors and accountability in
—— 15 ——
the information age. Chicago:
University of Chicago Press.
Office of Technology Assessment
of the Congress of the United
States. 1983. The impact of
randomized clinical trials on
health policy and medical
practice. Background paper
OTA-BP-H-22. Washington,
D.C.: Government Printing
Office.
Pate, Antony M., and Edwin E.
Hamilton. 1992. Formal and
informal deterrents to
domestic violence: The Dade
County Spouse Assault
Experiment. American
Sociological Review 57: 691–
98.
Paternoster, Ray; Bobby Brame;
Ronet Bachman; and
Lawrence W. Sherman. 1997.
Do fair procedures matter?
Procedural justice in the
Milwaukee Domestic Violence
Experiment. Law and Society
Review.
Raspberry, William. 1998. Tried,
true and ignored. Washington
Post, 2 February, A19.
Sackett, David L., and William
M.C. Rosenberg. 1995. On
the need for evidence-based
medicine. Health Economics 4:
249–54.
Sherman, Lawrence W. 1984.
Experiments in police
discretion: Scientific boon or
dangerous knowledge? Law
and Contemporary Problems
47, no. 4: 61–81.
———. 1992. Policing domestic
violence: Experiments and
dilemmas. New York: Free
Press.
———. 1993. Defiance,
deterrence and irrelevance: A
theory of the criminal
sanction. Journal of Research
in Crime and Delinquency 30:
445–73.
——— and Douglas A. Smith.
1992. Crime, punishment and
stake in conformity: Legal
and informal control of
domestic violence. American
Sociological Review 57.
Shewhart, Walter A. 1939.
Statistical methods from the
viewpoint of quality control.
Edited by W.E. Deming.
Lancaster, Pennsylvania:
Graduate School of the U.S.
Department of Agriculture.
Sparrow, Malcolm; Mark Moore;
and David Kennedy. 1990.
Beyond 911: A new era for
policing. New York: Basic
Books.
Steinberg, Jacques. 1998. Public
shaming: Rating system for
schools: Some states are
finding that humiliation leads
to improvement. New York
Times, 7 January, A19.
Strang, Heather, and Lawrence
W. Sherman. 1996. Predicting
domestic homicide. Paper
presented to the American
Association for the
Advancement of Science,
January, in Baltimore,
Maryland.
U.S. Congress. House. 104th
Congress, 1st sess., H. Rept.
104-387, sec. 116.
Weiss, Alexander. 1992. Diffusion
of innovations in police
departments. Ph.D. diss.,
Northwestern University.
Zuger, Abigail. 1997. New way
of doctoring: By the book.
New York Times, 16
December, C1.
—— 16 ——
1201 Connecticut Avenue, NW, Washington, DC 20036
(202) 833-1460 • Fax: (202) 659-9149 • e-mail: pfinfo@policefoundation.org
ABOUT THE
POLICE FOUNDATION
The Police Foundation is a private, independent, not-for-profit organization dedicated to
supporting innovation and improvement in policing through its research, technical assistance, and
communications programs. Established in 1970, the foundation has conducted seminal research in
police behavior, policy, and procedure, and works to transfer to local agencies the best new
information about practices for dealing effectively with a range of important police operational
and administrative concerns. Motivating all of the foundation’s efforts is the goal of efficient,
humane policing that operates within the framework of democratic principles and the highest
ideals of the nation.
BOARD OF DIRECTORS
Chairman
William G. Milliken
President
Hubert Williams
Freda Adler, PhD
Lee P. Brown, PhD
William H. Hudnut III
W. Walter Menninger, MD
Victor H. Palmieri
Henry Ruth
Stanley K. Sheinbaum
Alfred A. Slocum
Sally Suchil
Kathryn J. Whitmire
POLICE
FOUNDATION
OFFICE OF RESEARCH
David Weisburd, PhD
Senior Research Scientist
Rosann Greenspan, PhD
Research Director
Patrick R. Gartin, PhD
Senior Research Associate
David G. Olson, PhD
Senior Research Associate
Edwin E. Hamilton, MA
Senior Research Analyst
Michael Clifton, MA
Research Associate
Jennifer C. Nickisch, MA
Research Associate
Justin Ready, MA
Research Associate
Annette C. Miller, MA
Research Assistant
Rachel Dadusc, BA
Administrative Assistant