3 Nis 2019

Explarotary Analysis on the Project

I tried several histograms, scatters, bar charts, box plots to catch a visually  significant meaning in the data. However, only looking at the count of the stops per race, gender, age does not imply any. Then, I decided to deep dive into the below columns:

 


I calculated 2 scores:

  • DIS: Driver Impact Score
  • PRS: Police Response Score
I read several articles* about crimes, punishments and police rights/authorizations in both states to come up with a weighted score of what the driver did and how the driver responded. Then, I worked on this scores per race, age and gender.


I plotter a scatter diagram of Driver Impact Score vs Police Response Score for both countries. Then, I tried to fit them to simple lineer regression. Even though I observe some differences in fitted line parameters, it did not signify a visual insight due to the imbalanced data in terms of race and scores as can seen below:

Los Angeles, CA

Columbus, OH


Hence, I decided to dig into the correlation, AND BINGO! I guess I found some correlations between police behavior & race in Columbus! 

Here is the evidences:

Clue1:

On the contrary of Los Angeles's racial correlation matrix, I see that there is a positive correlation between Hispanics & driver impact score. However, I am much more interested in the negative correlation between blacks & police response score AND positive correlation between whites & police response score.



Clue2:

When we check the box-plot of Police Response Score per race, the black  race is distinguished from the other races as can seen below. 



My aim would be to make further analysis if this distinguishment is significant or not.


Side project on d3


While analysing the histogram of the data, I noticed that there is a significant decreasing trend in stop count in Los Angeles. During the same period, Police Budget decreased, too. I noticed that Los Angeles stop count trend is highly consistent with the US Police Service budget trend, while it is not the case for Columbus as below.


Los Angeles City is one of the biggest contributor of the Police spending. Hence, it is directly effected by the country budget. There is no statistical testing for this, it is obvious. So, as the side project on d3, I will show stop counts & estimated budgets on a timely basis.

*References


Project Outcome!


I prepared an infographic website for the project. 


Also,
Here is my d3 showreel which shows the stop counts in two cities & US budget on a timely basis. Los Angeles City is one of the biggest contributor of the US Police spending. Hence, it is directly effected by the country budget, while it is not the case for Columbus.



Project Definiton

Project Definiton


The final definition of my project:

I have 2 datasets :

1- Police stops in Los Angeles City, CA between 2013 and 2015 (350K+ rows)

1- Police stops in Columbus City, OH between 2013 and 2015 (150K+ rows)

I aim to make a comparative analysis between two cities on below:
  1. A significant correlation between driver/police behavior & race/gender.


Early Attempts

I have 2 datasets :

1- Police stops in Los Angeles City between 2013 and 2015 (350K+ rows)

The data has the information of driver race, driver, age, driver gender; while no info regarding the police officer. 
Also, it has the information of violation, stop outcome, search outcome, contraband found or not, arrested or not. 

2- Weather dataset of Las Angeles City between 2013 and 2015

The data has the information of temperature, humidity, precipitation, wind speed, wind direction info.


At the beginning, I aimed to elaborate on this datasets to find below:
  1. A significant correlation between driver/police behavior & race/gender.
  2. A significant correlation between driver/police behavior & weather conditions.
Here are my exploratory work & findings:

  1. Visually, I could not find any correlation between driver/police behavior & weather conditions.
  2. Even though there is a correlation between driver/police behavior & age, it is not enough. My purpose it to capture the racism or sexism! 
  3. And, in Los Angeles, I can say that there is no significant evidence that race has an impact on driver behaviors and/or police behaviors.
You can find the correlation maps of my all data. I also circled the non-correlated pairs which I was expecting some correlation to dig in!