Shiyu Zhang, sz3319, Weiqi Liang, wl3011, Zhenkun Fang, zf2352, Zeqi Li, zl3545
While dogs are a common part of many households, the increasing number of dog bite incidents has become a significant public health concern. In 2022, dog bites ranked among the top 15 causes of nonfatal emergency department visits across all age groups. This project investigates the factors contributing to dog bite incidents, focusing on dog-specific traits such as breed, age, gender, and spay/neuter status, along with their geographical distribution and trends over time. In this project, we aim to identify hotspots and examine the relationship between dog characteristics and bite frequency. The findings will help inform data-driven strategies to prevent dog bites and enhance public safety.
We began by exploring trends in NYC dog bite incidents over time, along with their associations with dog demographics and spay/neuter status. We were interested in the following:
Which dog breeds are most frequently associated with dog bite incidents?
Has the incidence rate of dog bites shown a decline over the years?
Is the incidence rate of dog bites influenced by the dog’s gender and neuter status?
As our project evolved, we delved into more in-depth questions:
Is the incidence rate of dog bites associated with the number of registered dog licenses?
Can we utilize a regression model to investigate the spatial correlations between dog bite incidents and dog licenses?
Which model is most suitable for analyzing the cumulative impact of various factors on dog bite incidents?
The dataset utilized for analyzing and visualizing dog bite cases in New York City includes information on various factors such as sex, breed, and age. The data, covering the period from 2015 to 2022, were sourced from NYC OpenData. Data is collected from reports received online, mail, fax or by phone to 311 or NYC DOHMH Animal Bite Unit. Each record represents a single dog bite incident.
Another dataset used in this project documents active dog licenses in New York City for a given year. This data is derived from the DOHMH Dog Licensing System, which facilitates the application and renewal of dog licenses. The dataset is also sourced from NYC OpenData.
The data was cleaned using RStudio/RMarkdown to meet the following criteria:
Every observation required a unique identifier
(unique_id
).
Every observation required non-missing data for
breed
, age
, gender
,
borough
, zip_code
, spay_neuter
,
and date_of_bite
.
All continuous variables were numeric and stored as such.
Dates must be saved in date format.
The dog bite dataset we utilized includes the following variables:
UniqueID
: Unique dog bite case identifier.DateOfBite
: Date bitten.Species
: Animal Type (Dog).Breed
: Breed type.Age
: Dog’s age at time of bite. Numbers with ‘M’
indicate months.Gender
: Sex of Dog. M=Male, F=Female, U=Unknown.SpayNeuter
: Surgical removal of dog’s reproductive
organs. True (reported to DOHMH as Spayed or Neutered), False (Unknown
or Not Spayed or Neutered).Borough
: Dog bite Borough. Other’ indicates that the
bite took place outside New York City.ZipCode
: Dog bite Zipcode. Blank ZipCode indicates that
information was not available.The NYC dog licensing dataset we utilized includes the following variables:
AnimalName
: User-provided dog name (unless specified
otherwise).AnimalGender
: M (Male) or F (Female) dog gender.AnimalBirthYear
: Year dog was born.BreedName
: Dog breed.ZipCode
: Owner zip code.LicenseIssuedDate
: Date the dog license was
issued.LicenseExpiredDate
: Date the dog license expires.Extract Year
: Year the data was extracted.Our exploratory analysis is segmented into three distinct areas of focus: Dog bites by breeds, gender & neuter status, and borough. We also conducted a more in-depth analysis focusing on trends in NYC dog bite incidents over time, the correlation between dog bite incidents and NYC dog licensing
In the Breed section, we found that the top 10 dog breeds that contributed to the most dog bite incidences are shown in the pie chart above. Besides the mixed and the unknown categories, the breed that contributed to the most dog bite incidences are bulls (29.2%), followed by shepards (5.06%) and shih tzus (4.51%).
Across dog bite data with known genders, dog bite incidences appeared more than twice as frequent for male dogs compared to female dogs, for both neutered and un-neutered dogs. Comparing their neutered status, there is only a slight difference in dog bite incidences between the neutered and un-neutered groups. Neutered female dogs contributed to slightly more dog bite incidences than un-neitered female dogs, while un-neutered male dogs contributed to slightly more dog bite incidences. It is worth noting that a significant portion of dog bite data had missing data for the dogs’ gender, and that among the unknown gender group, most dogs appeared to be neutered.
Looking at license counts across boroughs, Manhattan has the most licensed dogs, followed by Brooklyn, Queens, Bronx, and Staten Island. Looking at dog bite incidences across these boroughs, however, Queens reported the most incidences, followed by Manhattan, Brooklyn, Bronx, and lastly Staten Island. The mismatch between these two sets of bar likely implies that Queens may be a hotspot for dog bite incidences.
Borough | Bite Count | License Count | Bite to License Ratio (%) |
---|---|---|---|
Bronx | 3782 | 52627 | 7.19% |
Brooklyn | 4985 | 135596 | 3.68% |
Manhattan | 5270 | 182398 | 2.89% |
Queens | 5773 | 104288 | 5.54% |
Staten Island | 1872 | 44477 | 4.21% |
Across all years, there is a noticeable seasonal trend, with incidents peaking during the summer months (June to August) and declining in the colder months (November to January). While the general patterns remain consistent, some variability exists, with years like 2020 and 2021 showing slightly lower bite rates compared to earlier years, such as 2015-2017. This trend suggests a potential link between warmer weather, increased outdoor activities, and higher human-dog interactions, contributing to a rise in dog bite incidents during summer.
We created a scatter plot to visualize their correlation. Each zip code was treated as a single observation, with the number of dog licenses and dog bite incidents from 2014 to 2022 compared for each zip code, as shown in the figure below.
Overall, the positive slope of the regression line indicates a positive correlation between the number of licenses and the number of bite incidents. In areas with fewer licenses, the distribution of data points is denser, suggesting that bite incident variability is more stable and the model fit might be better in these regions. Conversely, in areas with a high number of licenses, the data points are more dispersed, indicating that bite incident variability is less predictable when license counts are high. Lastly, the confidence interval, depicted in gray, gradually widens after the license count reaches 3000, indicating increased uncertainty in the model’s predictions in areas with higher numbers of licenses.
To assess whether the spatial distribution of dog license issuance and dog bite incidents in New York City exhibits spatial correlations and clustering characteristics, this study calculated both the global and local Moran’s I for these variables during the research period.
Analyzing dog bite incidents from 2015 to 2021, the observed global Moran’s Index is 0.35, which is greater than 0, and both the expected Moran’s I and the p-value are close to 0. Consequently, the distribution of dog bite incidents in New York City significantly exhibits positive spatial autocorrelation.
Metric | Value |
---|---|
Observed Moran’s I | 0.350 |
Expected Moran’s I | -0.005 |
Standard Deviation | 0.064 |
P-value | 0.000 |
Similarly, we can discover the correlation in the distribution of dog licenses across New York City. The global Moran’s Index of 0.29 suggests that similar values tend to cluster spatially, indicating that certain adjacent zip code areas may exhibit concentrations of dog ownership or areas where few dogs are kept.
Metric | Value |
---|---|
Observed Moran’s I | 0.293 |
Expected Moran’s I | -0.005 |
Standard Deviation | 0.064 |
P-value | 0.000 |
Local Moran’s I is used to further explore specific areas exhibiting spatial clustering. The results show that he zip codes 10029, 10035, 11217, and 11237 on the map all exhibit significant high-high clustering for both dog bite incidents and licenses, with neighboring areas also showing trends of high concentrations of dog licenses. This indicates that a higher number of dogs in these areas is a non-negligible factor contributing to the increase in dog bite incidents. Additionally, there is a significant high-value clustering of dog bite incidents in the northern part of New York City, without a corresponding high concentration of dog ownership.
To measure the combined effects of year, dog demographic characteristics (i.e., gender and whether neutered) and the Borough they belong to on the incidence of dog bites, we first conduct a negative binomial regression analysis using records of dog bite incidents in New York.
The negative binomial regression model is a type of count model designed for dependent variables that can only take non-negative integer values. From the Spatial Correlations section, the study area meets the three prerequisites for negative binomial regression: the dog bite incidents are not independent, exhibiting spatial clustering; the dependent variable shows overdispersion; and the frequency of dog bite incidents is sufficiently low. Let the dependent variable follows a negative binomial distribution.
After utilizing Lasso regression to select main effect variables, we employed two different regression models and compared their outcomes.
\[\ln ({\hat y_i}) = {\beta _0} + {\beta _1}{X_{i1}} + {\beta _2}{X_{i2}} + \cdots {\beta _m}{X_{im}} + \xi\] The results of the main effects model fitting are as follows:
Year
: Some years (e.g., 2017, 2019) show statistically
significant coefficients, suggesting that the number of bite incidents
varies significantly across years. year2021
(Coefficient =
-0.28710, p < 0.001) indicates a significant
decrease in bite counts compared to year2015.
Month
: Some months show no significant effects on bite
counts (e.g., month2: p = 0.97; month3: p = 0.61). Several months (e.g.,
May, June, August) have significant coefficients, indicating seasonality
in bite incidents.
Gender
: Male dogs (genderM) are associated with a
significantly higher bite count, as indicated by the large positive
coefficient.
Spay/Neuter
: Neutered dogs are associated with a lower
bite count, with a statistically significant negative coefficient.
Boroughs
: Bite counts vary across boroughs, with Staten
Island showing a significant negative association compared to the
reference borough.
To further increase the precision of the model fit, interaction terms between variables were added. After using stepwise reduction to simplify the model complexity, the variables of final negative binomial regression model are as follow:
Main.Effect | Double.Interaction.Effect |
---|---|
year | year * spay_neuter |
month | year * borough |
gender | month * spay_neuter |
spay_neuter | gender * spay_neuter |
borough | gender * borough |
spay_neuter * borough |
Firstly, the intercept is 3.83 with a p-value near zero, indicating that the baseline level of the response variable (on a logarithmic scale) is significant when other variables are not considered. For the main effect variables, the years 2016, 2018, and 2019 show a significant increase in the log change of event counts compared to 2015; male dogs, compared to female dogs, exhibit stronger aggression, with a 10.59 times increase in the incidence of dog bite events, calculated as with \(e^{2.36}\). The coefficient for neutering, 0.3960, indicates a significant reduction in the log count of bite incidents for neutered dogs compared to unneutered dogs. Geographically, the coefficients for Manhattan and Staten Island are 0.72 and 0.48, respectively, showing significantly lower event counts compared to the Bronx. Regarding interaction terms, the years 2018, 2020, and 2021 show a significant reduction in interaction with the neutering status.
To address the model fitting issue, a generalized linear mixed model is employed. The GLMM is structured as follows:
\[g(\vec {\rm E}(y)) = {\rm X}\vec \beta + {\rm Z}\vec b + \varepsilon\] Result interpretation should be divided into the following two parts.
Random Effects:
Borough
: Variance: 0.1297, standard Deviation: 0.3601.
Indicates moderate variability between boroughs in the baseline bite
count.
Fixed Effects:
Year
: 2017 has positive coefficient
(0.15336, p < 0.001), suggesting a higher bite count compared to
2015. While 2020 and 2021 have strong negative
coefficients (-0.27884 and -0.28669, p < 0.001), indicating a
substantial reduction in bite counts. This could relate to external
factors like the COVID-19 pandemic.Month
: Positive coefficient (0.12734, p = 0.026),
suggesting higher bite counts in April. In May, June and July, all p
< 0.001, maybe there is a seasonal peak during summer.Gender
: Male dogs have strong positive coefficient
(0.88482, p < 0.001), indicating male dogs have a significantly
higher bite count compared to female dogs.Spay/Neuter Status
: Dogs who are neutered have have
significantly lower bite counts (-0.11231, p < 0.001).Finally, this study compared the fit, model complexity, interpretability, and residual diagnosis results of three regression models.
Model | AIC | Null Deviance | Residual Deviance | Random Effects | Description |
---|---|---|---|---|---|
model | 7403.4 | 3914.5 | 1549.7 | None | Negative binomial regression with main effects only |
filter_model | 7045.8 | 4944.9 | 1437.1 | None | Negative binomial regression with main effects and interaction terms |
glmm_model | 7428.7 | NA | 1551.0 | Borough | Negative binomial mixed model with random intercept for Borough |
Overall, the filter_model
with interaction provided the
best fit.
Overall, we found that dog bite incidents shared similar patterns regarding top breeds, locations, gender, and spay/neuter status from 2015 to 2021. According to our analysis, the number of dog bite incidents exhibited a decreasing trend over the observed period, which aligns the result of other studies.1 Several factors may contribute to this downtrend. One possible explanation is increased public awareness and education on responsible pet ownership, which could lead to better management of dogs and reduced aggressive behavior. Additionally, stricter enforcement of leash laws and regulations in urban areas may have minimized opportunities for uncontrolled interactions between dogs and the public. Another contributing factor could be the rise in spay/neuter programs, as research suggests that neutered dogs are less likely to display aggressive tendencies. Finally, as NYC Department of Health and Mental Hygiene (DOHMH) began its emergency department syndromic surveillance system in November 2001 in response to bioterrorism preparedness and animal bite incidents, we believe improvements in data reporting in recent years could also influence the observed decline.2
It is notable that pit bulls account for a significant percentage of reported dog bites, despite comprising a small proportion of all registered dogs. This leads us to question what might be unique about this breed. It also aligns with other studies, indicating that Pit Bull-type breeds have accounted for a significant portion of dog-related maulings and fatalities over the past three decades.3 Based on our research, pit bulls are often trained by their owners to exhibit aggressive behaviors, including fighting and attacking. Additionally, pit bulls are frequently associated with illegal dog fighting rings, where they are bred to be larger, stronger, and more powerful than average.4
Based on our study, neutered dogs were associated with fewer reported bite incidents. This could be due to several factors. Neutering is known to reduce certain aggressive behaviors in dogs, such as territoriality, dominance, and mating-related aggression, which are often linked to biting incidents. Additionally, neutered dogs may be more likely to live in controlled and stable environments, as owners who choose to neuter their pets often demonstrate greater responsibility in managing their behavior. It is also possible that neutering reduces hormone-driven impulses that may trigger biting behavior. While these explanations are plausible, further research would be needed to fully understand the relationship between neutering and the reduced frequency of dog bite incidents.
For our modeling analysis, we selected the model with interaction terms as our final predictive model. This choice was made because the interaction effects between variables provided a more nuanced understanding of how different factors, such as location, time, and other dog-specific or environmental characteristics, combine to influence the likelihood of dog bite incidents. By accounting for these interactions, the model can better capture the complex relationships between variables and improve the accuracy of predictions.
In the future, this model could serve as a valuable tool to inform communities, policymakers, and public health officials about areas and time periods with a higher risk of dog bite incidents. For example, the model could help identify specific zip codes or neighborhoods where dog bites are more likely to occur during certain months, days of the week, or times of day. Such insights could be used to develop targeted interventions, such as public awareness campaigns, stricter enforcement of leash laws, or educational programs promoting responsible pet ownership. Ultimately, leveraging predictive modeling in this way has the potential to enhance public safety and reduce the frequency of dog bite incidents.
Despite the insights achieved by the project, there are several
limitations of note. The age
variable contains a
substantial amount of data that is either unreasonable, inconsistent, or
difficult to interpret, making it unsuitable for meaningful analysis.
This could include missing values, unrealistic age entries, or ambiguous
data that lack clarity or reliability. Retaining such a variable could
introduce noise into our analysis, skew the results, or lead to
incorrect conclusions. After careful consideration, we determined that
excluding the age
variable was the most appropriate
decision to maintain the integrity and accuracy of our findings.
Additionally, when generating the map of dog bite incidents, we were
unable to pinpoint specific locations using the street
variable due to limitations in the data. Instead, we mapped each dog
bite incident based on the zip_code
, which provides a less
precise representation of the exact locations. This limitation reduces
the spatial accuracy of the map and may obscure finer-scale patterns,
such as clustering of incidents on certain streets or neighborhoods.
Future improvements in data collection, including more consistent and
accurate recording of street-level information, would enhance the
spatial analysis and allow for a more detailed understanding of dog bite
incident distribution.
In the Residuals vs. Leverage plot, several influential points were identified. These points warrant further investigation as they may significantly impact the model’s performance. Examining these influential observations could help identify potential data issues, such as outliers or errors, or reveal areas where the model may require further refinement to better account for underlying patterns in the data.