This section investigated dog bite incidents temporal trends and the correlation between dog bite incidents and dogs’ demographics. By looking more closely at the relation between variables and dog bite incidents, we hope to capture potential trends in the dog bites dataset.
dog_bites_trend_plot = dog_bites_clean |>
mutate(
year = as.factor(year),
month = month(date_of_bite),
day_of_month = day(date_of_bite),
common_date = mdy(paste(month, day_of_month, "2020"))
) |>
group_by(common_date, year) |>
summarise(bites_count = n(), .groups = "drop") |>
ggplot(aes(x = common_date, y = bites_count,
group = year, color = year,
text = paste("Count:", bites_count, "<br>Year:", year))) +
geom_point(alpha = .3) +
geom_smooth(se = FALSE, span = 0.5) +
labs(
title = "Dog Bite Trends by Month",
x = "Month",
y = "Number of Bites"
) +
theme_minimal() +
theme(
text = element_text(size = 15),
axis.text.x = element_text(angle = 60, hjust = 1, size = 10)
) +
scale_color_viridis_d(end = .8) +
scale_x_date(date_breaks = "1 month", labels = function(x) format(x, "%b"))
plotly_trend <- ggplotly(dog_bites_trend_plot, tooltip = "text")
plotly_trend
The scatter plot above illustrated the number of dog bite incidents across months for years 2015 to 2021. For most years, dog bite incidence peaked in the summer (June, July, and August), possibly due to the increase of outdoor activities in the summer when the weather permits and when students are on break.
However, for the year 2020, dog bite incidence dipped in May, and did not rise up to the similar level as previous years, even for months June, July, and August. This unusual trend could be attributed to the prevalence of COVID-19 in the US in 2020. During that period, public health measures like social distancing and quarantine were in place, which significantly reduced people’s outdoor activity.
Note that for the year 2016, the fitted line plateaued during the summer instead of showing an obvious peak. This was likely due to extreme heat in the summer of 2016 that marked the 7th highest daily temperature at all time in New York City. Under this temperature, it was reasonable that people reduced outdoor activity and therefore suffered from less dog bite injuries.
daily_bites <- dog_bites_clean |>
group_by(year, date_of_bite) |>
summarise(bite_count = n(), .groups = 'drop')
hole_trend = ggplot(daily_bites, aes(x = date_of_bite, y = bite_count)) +
geom_point(aes(color = as.factor(year(date_of_bite))), alpha = 0.5, size = 1) +
geom_smooth(se = FALSE, method = "loess", span = 1, linetype = "solid", color = "pink") +
labs(
title = "Daily Dog Bite Trends (Overall Smooth Line)",
x = "Year",
y = "Number of Bites",
color = "Year"
) +
scale_x_date(date_breaks = "1 year", labels = scales::date_format("%Y")) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1)
)+
scale_color_viridis_d(end = .8)
ggplotly(hole_trend)
To look at a bigger picture, we visualized the number of daily dog bite incidents across the years 2015 to 2021. The scatterplot showed a consistent trend as described above where the number of dog bite incidents fluctuates from winter to summer. The fitted smooth line depicted the general temporal trend. From the start of 2015 to the ends of 2018, the daily dog bite incidence stayed around 9 cases, then slowly declined to 7 cases from the start of 2019 to the end of 2021. This decline could also be attributed to COVID outbreak from 2019 to 2021, where people spent less time outdoor.
bites_by_year_month = dog_bites_clean |>
count(year, month, name = "Bite_Count") |>
mutate(hover_text = paste("Year:", year, "<br>Month:", month, "<br>Count:", Bite_Count))
heatmap_interactive = plot_ly(
data = bites_by_year_month,
x = ~month,
y = ~year,
z = ~Bite_Count,
type = "heatmap",
colors = c("white", "red"),
text = ~hover_text,
hoverinfo = "text"
) |>
layout(
title = "Dog Bites by Month & Year",
xaxis = list(title = "Month"),
yaxis = list(title = "Year"),
colorbar = list(title = "Count for Dog Bites")
)
heatmap_interactive
The interactive heatmap also showed similar trend. The darker the shade of the block, the more dog bite incidents were recorded in the respective month and year. In most years, incidents concentrated during the summer from June to August. Like what we inferred from the previous visualizations, all blocks in 2020 observed a very light shade because of the pandemic. The heatmap showed a more drastic comparison between 2020 and the other years, especially April 2020 where the shade was almost white.
top_breeds = dog_bites_clean |>
group_by(breed) |>
summarise(count = n()) |>
arrange(desc(count)) |>
filter(!is.na(breed) & breed != "unknown") |>
slice_head(n = 10)
dog_bites_top10 = dog_bites_clean |>
filter(breed %in% top_breeds$breed) |>
mutate(
breed = factor(breed),
gender = factor(gender),
spay_neuter = factor(spay_neuter),
borough = factor(borough)
)
label(dog_bites_top10$breed) = "Top 10 Breeds"
label(dog_bites_top10$gender) = "Gender"
label(dog_bites_top10$spay_neuter) = "Neuter Status"
label(dog_bites_top10$borough) = "Borough"
table1 = table1(~ breed + gender + spay_neuter + borough,
data = dog_bites_top10,
overall = "Total")
table1
Total (N=16302) |
|
---|---|
Top 10 Breeds | |
Bull | 4760 (29.2%) |
Chihuahua | 648 (4.0%) |
Maltese | 376 (2.3%) |
Mixed | 5167 (31.7%) |
Poodle | 476 (2.9%) |
Shepherd | 825 (5.1%) |
Shih Tzu | 735 (4.5%) |
Terrier | 402 (2.5%) |
Unknown | 2425 (14.9%) |
Yorkshire | 488 (3.0%) |
Gender | |
F | 2637 (16.2%) |
M | 6552 (40.2%) |
U | 7113 (43.6%) |
Neuter Status | |
FALSE | 12013 (73.7%) |
TRUE | 4289 (26.3%) |
Borough | |
Bronx | 2780 (17.1%) |
Brooklyn | 3679 (22.6%) |
Manhattan | 3451 (21.2%) |
Other | 635 (3.9%) |
Queens | 4372 (26.8%) |
Staten Island | 1385 (8.5%) |
The table above displayed how dog bite incidence was distributed across different levels and groups in dog demographic features like breed, gender, neuter status, and the borough where the incident occurred (likely the borough they reside). The top demographic features that contributed the most dog bite incidents are being mixed, male, un-neutered, and active in Queens.