The following map displays the spatial distribution of dog licenses and dog bite incidents across various zip codes in New York City from 2015 to 2022. The color fill of the map indicates the number of dog licenses in each zip code, while the dots represent the total number of dog bite incidents in each area.
The map suggest that the geographic distribution of dog bite incidents exhibits spatial autocorrelation, with certain zip codes, particularly in Midtown Manhattan, identified as high-frequency areas for such incidents.
Additionally, the dog licensing data indicates that areas with a higher number of dog licenses, such as zip codes 10025, 10024, and 10314, typically have higher numbers of dog bite incidents. Therefore, there may be a positive correlation between the number of dog licenses and dog bite incidents.
To further investigate the relationship between dog license counts and dog bite incidents, we created a scatter plot to visualize their correlation. Each zip code was treated as a single observation, with the number of dog licenses and dog bite incidents from 2014 to 2022 compared for each zip code, as shown in the figure below.
valid_zipcodes = pull(ny_zip_codes, zip_code)
bites_by_zip = dog_bites_clean |>
group_by(zip_code) |>
summarise(Bite_Count = n())
licenses_by_zip = dog_licensing_df |>
group_by(zip_code) |>
summarise(License_Count = n())
zip_data = merge(bites_by_zip, licenses_by_zip, by = "zip_code", all = TRUE) |>
mutate(
zip_code = as.factor(zip_code)
) |>
filter(zip_code %in% valid_zipcodes) |>
filter(!is.na(License_Count), !is.na(Bite_Count))
scatter_plot_interactive = ggplotly(
ggplot(data = zip_data, aes(x = License_Count, y = Bite_Count)) +
geom_point(color = "blue", alpha = 0.6) +
geom_smooth(method = "lm", color = "red") +
labs(title = "License vs. Bite", x = "License Count", y = "Bite Count") +
theme_minimal()
)
scatter_plot_interactive
Overall, the positive slope of the regression line indicates a positive correlation between the number of licenses and the number of bite incidents. In areas with fewer licenses, the distribution of data points is denser, suggesting that bite incident variability is more stable and the model fit might be better in these regions. Conversely, in areas with a high number of licenses, the data points are more dispersed, indicating that bite incident variability is less predictable when license counts are high. Lastly, the confidence interval, depicted in gray, gradually widens after the license count reaches 3000, indicating increased uncertainty in the model’s predictions in areas with higher numbers of licenses.