Now I'm just an uneducated bumpkin, but it seems like you've tried to answer two questions here but mistakenly labeled them as one (until you back off in the last paragraph of "Analysis"):
1. Do police give more tickets at the end of the month?
2. If they do, is it because of quotas?
The first is a statistical question, the second is much more complex. I can think of a few very good reasons why numbers would spike at the beginning of the month, plummet in the fat middle then spike at the end. None of them really have much to do with statistics directly.
I think this would be compelling if you'd compare stats for years in which "quotas" (whatever they decide to call them) were policy against the stats from years which they weren't.
For example, in Massachusetts, it's illegal to have ticket quotas.
Of course, all that does is prevent them from admitting so publicly. Anecdotal, but a friend of a friend claimed that he goes out some nights and has to write a certain amount of tickets.
Not so anecdotal, a collection of news stories related to Mass ticket quotas:
Circa 1992 I read a book titled something like "How to Speed and Get Away With It". It was written by a retired state trooper. One of the many things he mentioned was that the middle of the month is the best time to speed because troopers tended to fill their quotas early and late.
The most useful piece of advice, though, was on how to pull over. Turns out that every time a cop pulls somebody over, he's worried about getting smooshed by traffic or shot by a loon. His advice, which I've followed faithfully, is to pull waaaay over, so the cop can park close to traffic and create a bubble of protected space. Then you turn on the interior lights, roll down the windows, turn off the car, throw the keys on the dash, and hold the steering wheel at the top with both hands.
Authority figures like respect, and this display of considerate compliance has worked wonders for me.
I think you're right in that the question of the title is really two questions. I tried to make that distinction clear in the analysis and conclusion sections.
I didn't actually try to find information on quota policies (specifically for Baltimore and Maryland) until I was writing the analysis section. I was expecting to be able to find a detailed description of policies by state (or something helpful) on Wikipedia, but that didn't really pan out [1]. I wasn't able to find any reliable information elsewhere either. I think the taboo attached to the subject may keep information about policies pretty far under-wraps.
That's an interesting idea, but I don't think that there were enough speed cam records to do that. If you notice the graphs before and after I discarded them, they are pretty much identical.
I don't understand all the gymnastics to correct for the fact that months end on different days (30, 31, etc.) and all the effort put into dealing with days by number.
For a given month, you can determine the last day. You can also determine the last weekday.
So just do a series that for each month shows "average weekday citations, non automatic" and "final weekday citations, non automatic" (and maybe "first weekday citations, non automatic").
This is exactly my point - the quotas are monthly not daily, but he's doing the wrong kind of work. He's putting his data into a daily format and then trying to extrapolate monthly information.
The right way to solve this problem is to find the exact thing you want - the precise last weekday of the month, the precise first weekday, and a precise weekday average - which is trivial with any decent date/time library. These would be the RIGHT kind of gymnastics. Tallying information by day and then running some kind of regression to approximate last-day traffic is a waste when you can just GET the last-day traffic.
Looking at the last plot, you can't conclude that police give out more tickets at the end of the month, unless you also conclude that police like to give out tickets around the 8th, and don't like to give out tickets around the 11th of each month.
Problem is, there are no error bars, if there were, it'd probably show that there is no significant evidence for the hypothesis. The error of the mean of the last plot is probably comparable to the signal itself.
I appreciate your concern! I didn't really draw any conclusions from the data, as I mentioned in the "Conclusions" sections. This wasn't meant to be hard science, just some fun with data.
I did make an update with a few, more conventional statistics as well as an adjustment based on day of the week (see "Updates"): http://robert.io/posts/4.html
Indeed. Whenever I see an article like this, I immediately start skimming for discussion of variance and confidence intervals. Wihout them, you can't really know what is going on.
It would be interesting to adjust the data to compensate for day of the week, as well as for the dates that only happen in a few months out of the year. I wonder if the 31sts in that year were all in the middle of the week, or were all on weekends?
However the 7th, 14th, 21st and 28th days aren't always the end of a week (Friday, or Sunday?).
If we make the assumption that it doesn't matter, and the quotas are 'rolling', then surely the same could be said about the monthly quota not ending at the actual end of the month?
If you look at the graphs, the first two days and the 7th day of the month are higher than average. Same for the second week; the 8th & 9th and the 14th days of the month are higher than average.
Also, the 7th, 14th, 21st, and 28th days are all above average.
I can't speak about all police but I've asked this question to my father who works with the RCMP. He has been in charge of traffic sections for years. They don't have a quota, not in the usual sense but they do keep track of data on tickets, this includes how many tickets each member is giving out. They don't require them to give out X tickets per month but what they do look for is members who are not giving out around the same amount of tickets as their peers. This usually indicates they are not patroling and doing their job.
Another thing they can look at with this data is where they are giving out tickets. If they notice one officer has all his tickets in one location which is known for high accident rates then they know that person is "fishing" rather than actually doing active patroling.
Isn't giving out tickets in a high accident area is exactly what should happen? Tickets in a low accident area must surely be less likely to change accident rates?
This is an anecdote from me, but that's all most people in this thread have, so... One day I was sitting in an empty parking lot with my girlfriend eating pizza on the tailgate and watching traffic go by. A cop came along and ticketed a driver for rolling through the stop sign. As soon as the cop turned off her lights, another car rolled through the stop sign, and she lit that driver up as well. This continued for probably 5-10 drivers before the officer came up to us and asked if we were enjoying the show. She said she hopes everyone involved learned a lesson about stopping completely at stop signs. I offered her a slice of pizza, she declined, and we drove off.
I guess my take-away from that was, cops can give out tickets in short order if people are consistently breaking the law right in front of them. I wouldn't necessarily punish an officer for giving out many tickets in a high-offending area, just based on what I saw that day. Sometimes people in an area need to have the rules reinforced. For every driver that got pulled over, 20 other drivers were witness.
My take would be that if there's an intersection where people are doing that many rolling stops, then it's probably perfectly safe. In my neighborhood people do that all the time at certain spots as long as they're the only one near the intersection.
If those people were driving safely, then ticketing them for a technical violation of the law probably isn't helping anything. The point of tickets is to punish the unsafe drivers so that they learn a lesson and have a record that makes it easier to reform or weed out the really bad ones.
So a cop that spends all day writing tickets on technical violations probably should get a talking to from their superiors. They're wasting their time and ticking off solid citizens while actual unsafe behavior gets a pass.
Isn't the answer "statistically, we can't disprove the null hypothesis, because the variation near the end of the month wasn't bigger than the month-wide variation"? Getting some actual standard deviations might help.
But I think the data is a little more conclusively inconclusive than a mere "I'm not sure I can tell anything from the data".
Well out of the 7 times a 31st appears, 2 of them are holidays, Halloween and New Year's Eve. I would have guessed that this might increase ticket issuance, but I suppose it's possible it could have a negative impact as well.
I'm playing around with the data[1] and it looks like you may be on to something. October 31, 2010 (Halloween) had 122 tickets, while August 31, 2010 had 531!
Do union-required holiday pay rates cause the PD to only staff for minimum requirements? Perhaps the PD simply staffs at minimum levels because most officers want those days off? Do holidays cause so many "real" crimes that the police don't have time to write tickets?
I'd guess that at least for the new years eve 31st, there's be more staff rostered on than usual, but probably fewer then normal doing regular traffic patrol. I suspect Halloween is similar. I've got no data or evidence to back that up (and even less for Halloween, since I'm not in the US), but I'd probably choose to remove those two days as "exceptions". Filtering out other known public holidays might improve the signal to noise ratio of the data too...
In Australia long weekends usually trigger double demerits and an increase in visible police, especially roadside breath testing. I don't know what it does to the ticket rates but the effect on every driver I know personally is to be extra cautious, which would reduce ticket rates.
Maybe, there's a possibility of increased ticket rates too due to the extra driving km done by people "going away for the long weekend", and by drivers being less experienced on the routes they're driving (compared to regular commuter traffic who know exactly where the speed/redlight cameras are, and are usually in heavy enough traffic to not be able to exceed the speed limit).
I wonder if there's local (NSW, Australia - for me) tickets-by-day day available to analyse?
i think your analysis is slightly flawed - you have no anchor / point of reference.
maybe a better question would be, for ever hour worked, how many ticks are issued. aka normalizing against the number of patrol men/women active for a day or an hour.
e.g. more police on patrol == more tickets, but does not mean each police man/woman is biased to issue more tickets near the end of the month to meet their quota.
also consider counting backwards. e.g end of the month = 0, 1 day before end of month = 1, 2 days before end of month = 2 etc..
1. It looks like the total number of tickets in 2009 and 2010 is about 10% that of 2011. I'm guessing that there weren't actually ten times as many tickets given in 2011, so either the data is incomplete (as the author suggested), or there was a typo. If the data is incomplete, I'd suggest normalizing to the 2011 totals; otherwise, the 3-year average doesn't make much sense.
2. The scale of the "normalized" difference graphs (showing "Actual - Expected"). The formula given is
(actual - expected) / total * 1000 = normalized number
If this is the case, then since the scale goes to about +/- 5, the differences are very small (less than 1% away from what you'd expect!). But from eyeballing the data, that doesn't seem right.
1) Yes, there were far fewer tickets in the data for 2009 and 2010 than 2011.
2) The fact that the normalized numbers were so small was very unintuitive to me at first too, but the important thing to realize is that in that formula, you're dividing the difference, not actual value for the given day, by the total number for the year. When I first ran those numbers I was so confused by the output. I was originally thinking that I'd normalize it by saying "X percent of the total for that year," but since I was working with the differences, and not the actual values, the numbers were too small a fraction.
Either that or I made some huge mistake in my logic...
WRT the use of standard deviations, like I said in the post, I'm not a statistician, so I wasn't really sure what the canonical way of normalizing data was. I pretty much just made one up. Thanks for pointing that out. I'll look into using standard deviation for the next one. :)
In that case, you should divide by "expected", so you get a percentage difference for each day. (Normalizing by total for the year doesn't make sense, since imagine that there were 300 days per month instead of 30 - your numbers would be divided by 10 again, but the data you want to visualize would stay the same.)
I made an update using standard deviation and Z-score charts instead of my homebrew normalization function (see "Update"): http://robert.io/posts/4.html
It doesn't make sense to look at an averaged plot like this without error bars. Just comparing 2009 and 2010 there seems to be a huge amount of variation.
Even better than error bars would be box plots of the data corresponding to each day of the month. You can easily produce such plots with R.
It bothers me that in some months, the 27th is 3 days from the end of the month, but in some months, the 27th is 4 days from the end. Could this mess up the graph?
The 27th appears in every month, but the time remaining to meet your (possible) quota would be different for different months.
I was originally thinking that the drop on the 31st could be explained by the last day of the month not counting for quotas, so they could be tallied or something. Then I realized that the last day of the month varies, and that it if that was the cause of the massive drop on the 31st, it would have made a bigger impact on the 30th.
I might do an update with some of the suggestions here, including a graph of tickets / days until end of month. [1]
Is there any evidence in regard to the nature of quota-esque incentives? Are the measurement periods monthly vs. say, every two weeks? One would assume that, if quotas exist, they would have some periodicity, but there certainly are other schemes.
It would be interesting to see weekdays vs weekends. Houston works the freeways pretty hard during the week, and not so much on the weekend. Or, at least that's how it seems.
Also, re quotas, from my cop buddy (constable) "Of course there are no quotas, that's ridiculous. What they do, is tell you to attend the meeting of the commissioner's court; during which the commissioners bemoan the budget woes and discuss which county employees may have to be laid off if things don't improve, etc. etc." He said that usually inspires the troops, and nobody ever says the word "quota."
the way an actual statistician would try to answer this question would be to first describe a "null" hypothesis, called H0, then talk about the probability of seeing this set results, or an even more extreme set, if H0 were in fact the case.
H0: The police do not change their ticket collection strategies at the end of the month
H1: They try to collect more tickets at the end of the month than the other days of the month
H0 would describe some distribution for the numbers you are seeing. This is where you can put all of your assumptions about how things are - so that you ultimately end up with a model that generates a certain distribution.
Under this model, there's going to be a certain probability of seeing this result, or a more extreme result.
If that probability is less than, say, 5%, then this is saying that you'd only have a 5% chance of seeing these numbers given that there isn't in fact any conspiracy to collect more tickets.
In such a case, you might then "reject" the null hypothesis in favour of H1.
If the probability was higher than 5%, you might say that there is no "significant" evidence in favour of H1.
It could be that they have to get in the paperwork before the 31st so it can be manually processed. So the last day to them would be a day or two before. Also the beginning of the month makes sense because you'd want to dole out a bunch in the first week so you could coast a little in the middle.
That was my first thought, but then I think we'd see a larger drop on the 30th as well, since that's the last day of the month sometimes too. Someone suggested doing a graph with "days until end of month" on the X-axis, which might be interesting.
You need to compare the variations compared to the expected variation for a Poisson distribution with the observed rate to judge whether or not they are significant or likely arise from noise. Just looking at the deviations in absolute number isn't telling you anything.
Good analysis. Let me comment that it would be improved by investigating quarterly or yearly quota, or controlling for budget deficits (e.g. if 2010 has a large expected budget deficit, likely in September 2009 tickets begin being written with greater frequency).
It would be interesting to ask the Baltimore PD what incentives are given to officers to write speeding tickets. [I bet they claim none] Then, ask some cops on the street what the real incentive system is like. From there, use the data to see how cops respond to both the real and official incentives.
I wonder if ticket writing happens when there's nothing better to do. It would be interesting to see the correlation between arrests for "real" stuff and traffic tickets. Are there differences in pay rates based on days/times worked? Do more tickets get written when rates are low vs. 2.5x pay on a holiday? Does the PD only staff for necessities when rates are higher or when officers most desire time off?
I'm surprised I haven't read something yet about Stephen Levitt looking at this data. This provides such incredible insight into the impact of incentives and lets us compare the probability that the official incentive policy is perceived as accurate by the officers vs. the likely real but unwritten incentives.
It'd probably be better to first ask cops, and then ask officials. If cops give examples, and officials say none, then you can ask officials about the examples cops have. But ask them plainly first. Give them a chance to be honest with you. Or hang themselves.
Thanks. There is so much more that could be explored here. It's difficult to really come up with an answer with so many different factors, which is why I'm reluctant to point the finger at quotas in my conclusion.
I'd be interested to see what the distribution of automated tickets is compared to police-issued tickets. The automated tickets provide a baseline of random lawlessness. Do police issue more tickets on the same days that more automated tickets are issued?
There is a strong weekly cycle to human behavior. I would suggest starting by measuring that cycle, then subtracting it out of the data before computing your day of the month distribution.
That was a very poor analysis. I am not a statistician either but I have experience in analysing large pools of data, and you would typically look at a number of other factors, such as day of the week, impact of weekends, impact of seasons, error range, standard deviation in order to get a hint of what is out of the average or not. Just plotting data and comparing bars does not quality, even remotely, as an analysis.
On the other hand when you apply any non mathematical (ie. not standard deviation, etc) segmentation on the dataset you are artificially fitting the data.
The correct approach would be to search for patterns over time in the dataset (frequency analysis) and see what turns up.
not arbitrary segment the data into week sized blocks because you 'think' there might be a data pattern in there.
I see this sort of 'inspiration based' analysis in web analytics all the time, and it's complete nonsense.
Look for patterns in the data, don't look at the world and try to fit the data to it. You'll end up with stupid and statistically invalid results.
I see your point but I do not fully agree with your view. There are so many times where we see patterns, correlations emerging from the data, which make no sense whatsoever. Statistical correlations can happen by chance (by definition) and something sticking out does not mean that something is happening. You would need to repeat the analysis on different pools of data to prove that there is indeed something you are missing.
By looking at established segments, at least you know where to look for things that MAY make sense, and then derive hypothesis.
If not, you end with bullcrap analysis we see every single day in literature where researchers find a "link between vegetarians and personality disorder" or things like that, without even probing the fact such links may be coincidental at best.
The problem is that mathematically when you partition your data into sub-set and analyse those subsets you are no longer analysing the original data set.
Your data is now composed of the original set of data [n1, n2, n3... nn] + a NEW DATA SET [s1, s2, s3 ... sn] which is your selection criteria.
This is often masked by the fact that your selection data set is a function f(x) ("we'll just split them up by week and partition by gender"), so it doesn't look like your actually combining two dataset, but you are.
This new set of data S, biases the result of the analysis, and the more complex it is, the closer you are to over-fitting to find the 'golden correlation' you're looking for in your data.
Humans are pattern matching machines. We see patterns in random noise, and hear voices in random tones. It's easy for us to see patterns (or what we think are patterns) in data sets, but its much harder to statistically substantiate those patterns as distinct from randomness.
It's provable, I imagine, that any data pattern can be found in a data set if we have sufficiently complex selection criteria for the samples, and sufficiently random raw data.
dont do this
If you do, at the very least acknowledge that you have selectively modified the original dataset and specifically test your partitioning rules.
Its much, much better to search the dataset for a target pattern, and then investigate the partitioning rules that your search matches against (because you can examine the partitioning rules for complexity, and data-content and easily spot over-fitting).
What I meant when I said, "I'm not a statistician" is "I have no clue what I'm doing, but I want to see what the data says." I went in with no experience, so I just went with what made sense to me. Several people made some good suggestions though, so I did do a little self-study and made an update that uses standard deviation and Z-scores, and adjusts for days of the week (see "Updates"): http://robert.io/posts/4.html
Has there been any research on average miles driven vs. the day of the month? I would speculate that people who live paycheck to paycheck might drive more at the beginning of the month (after payday when they can afford to fill up on gas and run errands like buying groceries) than at the end of the month. I'm not sure if this would have a significant impact on the data.
You also need to consider that most wage earners have more discretionary money available at the end of the month. Some of this money will be spent on alcohol derived entertainment, and a consequent increase in the number of DUIs. I wonder if different types of violations are more common by time of month.
I think the explanation that monthly tickets, arrests and convictions are easy numbers for managers to rate the 'productivity' of police is enough to explain such behavior. If your anywhere carrer minded as a police officer, or your getting shit from your manager, you'll probably initiate such behavior.
I'd like to see some error bars or confidence intervals on these plots. If you just compare the data from two different years there seems to be a lot of variation. Some error bars would help you decide if the variation between days of the month is meaningful, or if it can be explained by noise.
"It’s also worth noting that a quota system wouldn’t really explain the drop on the 31st...".
Maybe on the 28th or so they panic and start issuing tickets. And by the 31st they've made their quota and relax. They don't want to wait to the last day and risk it.
Or maybe these results mean nothing (more likely)!
While it's interesting to read about the individual cases of unofficial quotas, it'd be much more interesting to see a larger study done on multiple police forces over a significant amount of time to see how widespread the practice is. I only say this as this thread is full of anecdotal evidence, and the skeptics question's responses seems to be much the same.
Well it's not necessarily anecdotal evidence. It's anecdotal in that it proves that only some forces in some countries have a quota system, but that is enough to answer the question: Some definitely do. It certainly doesn't answer the question for all known police forces in all countries, but I don't think that's a reasonable expectation.
An end of month bump doesn't really surprise me... for an ex-cop's viewpoint into how police officers operate, Dale Carson's book (Arrest-Proof Yourself) is an illustrative read.
Also, the data could drop low on the 31st because not all months have 31 days so when tickets are distributed evenly over months, that day does not easily match up with 28 (being the definite last day of a month)...then 29, 30 and finally 31 being the least used.
That was actually compensated for in the "Correcting for more frequent dates" section. The first graph of that section shows what you would expect the distribution to look like, since not all months have the same number of days.
The following graphs are based on variation from that expectation.
But the analysis did not account for the variable number of weekend/weekday days on each day of the month. I don't know how meaningful this is, but it would be worth it to inspect this separately and then weigh it in if needed.
1. Do police give more tickets at the end of the month? 2. If they do, is it because of quotas?
The first is a statistical question, the second is much more complex. I can think of a few very good reasons why numbers would spike at the beginning of the month, plummet in the fat middle then spike at the end. None of them really have much to do with statistics directly.
I think this would be compelling if you'd compare stats for years in which "quotas" (whatever they decide to call them) were policy against the stats from years which they weren't.