justanotherpizza

This is excellent work. Data driven, evidence based research from undisputed sources. quality! I tip my hat to you sir.

connornm777

Good stuff. I've been busy with studies lately but have wanted to do something more technical like this. I'm stealing your data and code, and I'll try to independently verify your results. I'll pm you if I get around to it soon.

Do you happen to have a write-up for how you got that calculated that confidence interval? Did you just calculate a mean and standard deviation assuming a normal distribution from pre-2006, and use that to claim post-2006 is a 3-4 sigma event?

Pizza_agent

Hey,

no, the results would be messed up if you do it this way. The calculation was more complex. There are 2 ways of doing it.

1) The goal is to get the "changes" for each state between pre-2006 and post-2006 and compare the changes to each other. As I said above, I have used the "number of missing childs per capita" - I called it "Density". Well, it means you get the number of childs from the database for the pre-2006 timerange (22 years), then you get the number for the post-2006. You do it for each state ofcourse, you can achieve this with 2 SQL Queries. Now you have to caculate the "density", you will need the number of habitants for each state. I have imported the data from the cited source, the data was from 2015.

And now you might see the 1. problem, the population did grow between pre-2006 and post-2006, you can't use just one value for both time frames, or the "density" would be wrong. The difference between the time ranges is about 16,8%, so you have to correct the data by this value when you calculate the density.

The 2. problem is, the time frames are not of equal length. The pre-2006 time frame is 2,75 times longer than post-2006 and its obvious the number would be higher by this factor, so you have to take it into account and correct the data as well.

And there is a 3rd problem, the post-2006 time frame contains more "yet to be found" children than the other time frame, because of the less time passed. The long term numbers for each year suggest, there is around 30% of missing children yet to be found (in post-2006) in the next years, I disucssed this topic in my paper. Thats why you multiply the post-2006 density by 0,7. You can play with this number, even 0,5 would still be a >3 sigma result at the end...

Now you got comparable densities for both time frames and you just substract them and then you compare the states to each other, caculate the mean value, standard deviation and the Virginia offset to the mean value. By the way, the possible countrywide increase of "missing children" would not affect your calculations since it would take place in every state.

2) A different way to get the results, is to compare not the "density changes", but just the density of each state. Here you don't need to correct the data as mentioned above. You just calculate the post-2006 density , and compare the states, you don't need pre-2006 data. To improve the results you should just compare the states in east USA only and not the whole USA, that is what I did. The reason is, there is a clear difference between east and west USA. In general, west USA has more missing children per capita, its easy to see on a colored USA map as I used in my paper. I dont know the reason, but it could be some ethical or racial differences between the states, I dont know. But its pretty clear, its the best if you compare similar states to Virginia, like New Jersey etc...I compared the whole east USA because I didn't want to be accused of "cherry picking". And if you look at the east USA, all the states but Virginia, are pretty equal. I excluded the states below 2000000 inhabitants, since the time frame is to short and you get to much noise.. ... As expected, you dont get here the same result as in 1) but you still get an >3 sigma event.

As you can see, there are pros and cons in 1) and 2).... In 1) the general number of missing childs doesnt matter, thats why you dont really need any east/west comparison like in 2). Further, the distribution should be better and you are able to spot some time related events... In 2) you dont need to correct the data and its easier to calculate the results and you are able to to see none time related events.... Overall, in this case I somehow prefer 1) over 2)

zoupSER

A little more in-depth, however we have known about the higher rate in Virginia. And it is suspect.

Thanks for looking into this further.

Pizza_agent

Yes, I know, the higher Virginia numbers are well known. But the problem was, these Virginia numbers are misleading, since they contain all current missing children and 95% of missing children are found within 3 years. Some guy even called the Virginia police department and asked them about the high numbers. They claimed, it was just because Virginia reports much faster on every case and like 90% - 95% of them are found, thats the reason for high numbers.

If you take it into account and remove the missing children for the last 3 years, Virginia is just nr. 13 countrywide on missing childs per capita for the last 30 years. So, the police claims seems to be legit, but in fact, its bs. If you look closer, there is an increase since 2006. For instance, in Fairfax county, between 1984-2005 (22 years) there is only one child missing. But between 2006-2013 (8 years) there are 12 childs missing.