Center for Problem-Oriented Policing

POP Center Tools Assessing Responses to Problem, 2nd Ed Appendix A

Appendix A: The Effects of the Number of Time Periods on the Validity of Evaluation Conclusions

To understand the importance of examining a large number of time periods, consider the following hypothetical example. The data here were created using a random number generator, so none of the fluctuations are systematic. This series illustrates how we can be deceived by randomness, particularly if we look at very short time intervals. All the charts that follow are from the same series.

Figure A1 shows the results of a pre-post evaluation where measures of the problem are taken just before and just after a response (time periods 19 and 20 in the series). The conclusion we would draw from this chart is that the problem experienced a moderate decline following the response.

The next chart (Figure A2) shows periods 12 through 20 of the series, so there are now eight periods before the response and one after the response. The additional time periods provide an opportunity to examine the trend in the problem leading up to the response. The straight line shows this trajectory. The extension of the trajectory to period 20 allows a comparison of what we might expect if the response were not implemented (the trajectory) to the actual level of the problem.

Figure A1: Two-period Pre-post Design

Figure A2: Nine-period Time Series Design (with projected trajectory of problem)

We can see plainly that the problem was trending downward prior to the response, so not all of the drop in the problem following the response can be attributed to the response. Nevertheless, it appears that there was a greater drop in the problem following the response than we would have expected due to the trend alone.

The periods prior to the response help establish the trajectory of the problem time series. Here we focused exclusively on the overall trend, but it is also possible to look for seasonal cycles and other recurring fluctuations.

Extending the data to periods after the response helps determine the stability of the response. Does the response continue to be effective, driving the problem further down? Or does the response wear off, allowing the problem to rebound? This is shown in Figure A3, which depicts an additional seven
periods following the response. The same trend line is used based on the data prior to the response—but now projected out eight time periods after the response.

Figure A3: Sixteen-period Time Series Design (with projected trajectory of problem)

We see that the problem rebounded and then seems to oscillate around the same trend line. So at best, the response was temporarily helpful.

The value of a very long time series cannot be overstated. Too often police agencies show only a few time periods even though their computer systems contain data for many more. Note how our interpretation of the trend changes when we look at the entire 40-period series from which these three charts were extracted. This is shown in Figure A4.

Figure A4: Forty-period Time Series Design (with average number of events per period)

Whereas the trend in Figure A3 suggests a downward trend, Figure A4 shows that this was an illusion; in fact, the longer-term trend is flat. The underlying randomness of the data becomes much more apparent. It oscillates around 101 events per period (dotted line). Further undermining our confidence in the response, we can see that there are at least two other intervals with declines like we see after the response. So it appears that what we thought was a decline due to the response may very well be a temporary fluctuation due to normal variation in the problem.

The lesson here is that it is easy to be deceived by randomness, particularly when analyzing crime over short time periods. The police, public, news media, and elected officials are all susceptible to this deception, because comparing this month to last month, or this year to last year is so common. Using multiple measures and using a longer time series are reasonable guards against this sort of deception.

Unlike real data, where one is never quite sure of the cause, with this intentionally random data we know with absolute certainty that the variation around the 101 events per period average is random.† This includes the periods just before and after the response. The example illustrates the point that random fluctuations in data can be easily misinterpreted as meaningful changes. It is worth noting that a significance test to detect randomness in a pre-post design might actually suggest that the drop is not due to random changes. This is because the randomness affects the entire series and the pre-post design only looks at a small part of the series.

† We created this data series by setting a constant level for the problem and then used a random number generator to provide the fluctuations around this level. We placed the beginning of the hypothetical response at the center of the series.