Shiyu Zhang, sz3319, Weiqi Liang, wl3011, Zhenkun Fang, zf2352, Zeqi Li, zl3545



Motivation

While dogs are a common part of many households, the increasing number of dog bite incidents has become a significant public health concern. In 2022, dog bites ranked among the top 15 causes of nonfatal emergency department visits across all age groups. This project investigates the factors contributing to dog bite incidents, focusing on dog-specific traits such as breed, age, gender, and spay/neuter status, along with their geographical distribution and trends over time. In this project, we aim to identify hotspots and examine the relationship between dog characteristics and bite frequency. The findings will help inform data-driven strategies to prevent dog bites and enhance public safety.



Initial Questions

We began by exploring trends in NYC dog bite incidents over time, along with their associations with dog demographics and spay/neuter status. We were interested in the following:

  1. Which dog breeds are most frequently associated with dog bite incidents?

  2. Has the incidence rate of dog bites shown a decline over the years?

  3. Is the incidence rate of dog bites influenced by the dog’s gender and neuter status?



Evolution of Questions

As our project evolved, we delved into more in-depth questions:

  1. Is the incidence rate of dog bites associated with the number of registered dog licenses?

  2. Can we utilize a regression model to investigate the spatial correlations between dog bite incidents and dog licenses?

  3. Which model is most suitable for analyzing the cumulative impact of various factors on dog bite incidents?



Data

The dataset utilized for analyzing and visualizing dog bite cases in New York City includes information on various factors such as sex, breed, and age. The data, covering the period from 2015 to 2022, were sourced from NYC OpenData. Data is collected from reports received online, mail, fax or by phone to 311 or NYC DOHMH Animal Bite Unit. Each record represents a single dog bite incident.

Another dataset used in this project documents active dog licenses in New York City for a given year. This data is derived from the DOHMH Dog Licensing System, which facilitates the application and renewal of dog licenses. The dataset is also sourced from NYC OpenData.

The data was cleaned using RStudio/RMarkdown to meet the following criteria:

The dog bite dataset we utilized includes the following variables:

The NYC dog licensing dataset we utilized includes the following variables:



Exploratory Analysis

Our exploratory analysis is segmented into three distinct areas of focus: Dog bites by breeds, gender & neuter status, and borough. We also conducted a more in-depth analysis focusing on trends in NYC dog bite incidents over time, the correlation between dog bite incidents and NYC dog licensing

Dog Bites vs. Breed

In the Breed section, we found that the top 10 dog breeds that contributed to the most dog bite incidences are shown in the pie chart above. Besides the mixed and the unknown categories, the breed that contributed to the most dog bite incidences are bulls (29.2%), followed by shepards (5.06%) and shih tzus (4.51%).

Dog Bites vs. Gender & Spy/Neuter Status

Across dog bite data with known genders, dog bite incidences appeared more than twice as frequent for male dogs compared to female dogs, for both neutered and un-neutered dogs. Comparing their neutered status, there is only a slight difference in dog bite incidences between the neutered and un-neutered groups. Neutered female dogs contributed to slightly more dog bite incidences than un-neitered female dogs, while un-neutered male dogs contributed to slightly more dog bite incidences. It is worth noting that a significant portion of dog bite data had missing data for the dogs’ gender, and that among the unknown gender group, most dogs appeared to be neutered.

Dog Bites vs. Borough

Looking at license counts across boroughs, Manhattan has the most licensed dogs, followed by Brooklyn, Queens, Bronx, and Staten Island. Looking at dog bite incidences across these boroughs, however, Queens reported the most incidences, followed by Manhattan, Brooklyn, Bronx, and lastly Staten Island. The mismatch between these two sets of bar likely implies that Queens may be a hotspot for dog bite incidences.

Dog Bite to License Ratio by Borough
Borough Bite Count License Count Bite to License Ratio (%)
Bronx 3782 52627 7.19%
Brooklyn 4985 135596 3.68%
Manhattan 5270 182398 2.89%
Queens 5773 104288 5.54%
Staten Island 1872 44477 4.21%

Dog Bites vs. Dog Licensing

We created a scatter plot to visualize their correlation. Each zip code was treated as a single observation, with the number of dog licenses and dog bite incidents from 2014 to 2022 compared for each zip code, as shown in the figure below.

Overall, the positive slope of the regression line indicates a positive correlation between the number of licenses and the number of bite incidents. In areas with fewer licenses, the distribution of data points is denser, suggesting that bite incident variability is more stable and the model fit might be better in these regions. Conversely, in areas with a high number of licenses, the data points are more dispersed, indicating that bite incident variability is less predictable when license counts are high. Lastly, the confidence interval, depicted in gray, gradually widens after the license count reaches 3000, indicating increased uncertainty in the model’s predictions in areas with higher numbers of licenses.



Spatial Correlations of Bite Incidents and Licenses

To assess whether the spatial distribution of dog license issuance and dog bite incidents in New York City exhibits spatial correlations and clustering characteristics, this study calculated both the global and local Moran’s I for these variables during the research period.

Analyzing dog bite incidents from 2015 to 2021, the observed global Moran’s Index is 0.35, which is greater than 0, and both the expected Moran’s I and the p-value are close to 0. Consequently, the distribution of dog bite incidents in New York City significantly exhibits positive spatial autocorrelation.

Global Moran’s I Test of Dog Bite Incidents
Metric Value
Observed Moran’s I 0.350
Expected Moran’s I -0.005
Standard Deviation 0.064
P-value 0.000

Similarly, we can discover the correlation in the distribution of dog licenses across New York City. The global Moran’s Index of 0.29 suggests that similar values tend to cluster spatially, indicating that certain adjacent zip code areas may exhibit concentrations of dog ownership or areas where few dogs are kept.

Global Moran’s I Test of Dog Licenses
Metric Value
Observed Moran’s I 0.293
Expected Moran’s I -0.005
Standard Deviation 0.064
P-value 0.000

Local Moran’s I is used to further explore specific areas exhibiting spatial clustering. The results show that he zip codes 10029, 10035, 11217, and 11237 on the map all exhibit significant high-high clustering for both dog bite incidents and licenses, with neighboring areas also showing trends of high concentrations of dog licenses. This indicates that a higher number of dogs in these areas is a non-negligible factor contributing to the increase in dog bite incidents. Additionally, there is a significant high-value clustering of dog bite incidents in the northern part of New York City, without a corresponding high concentration of dog ownership.



Modeling Dog Bite Incidents

To measure the combined effects of year, dog demographic characteristics (i.e., gender and whether neutered) and the Borough they belong to on the incidence of dog bites, we first conduct a negative binomial regression analysis using records of dog bite incidents in New York.

The negative binomial regression model is a type of count model designed for dependent variables that can only take non-negative integer values. From the Spatial Correlations section, the study area meets the three prerequisites for negative binomial regression: the dog bite incidents are not independent, exhibiting spatial clustering; the dependent variable shows overdispersion; and the frequency of dog bite incidents is sufficiently low. Let the dependent variable follows a negative binomial distribution.

After utilizing Lasso regression to select main effect variables, we employed two different regression models and compared their outcomes.

\[\ln ({\hat y_i}) = {\beta _0} + {\beta _1}{X_{i1}} + {\beta _2}{X_{i2}} + \cdots {\beta _m}{X_{im}} + \xi\] The results of the main effects model fitting are as follows:

Year: Some years (e.g., 2017, 2019) show statistically significant coefficients, suggesting that the number of bite incidents varies significantly across years. year2021 (Coefficient = -0.28710, p < 0.001) indicates a significant decrease in bite counts compared to year2015.

Month: Some months show no significant effects on bite counts (e.g., month2: p = 0.97; month3: p = 0.61). Several months (e.g., May, June, August) have significant coefficients, indicating seasonality in bite incidents.

Gender: Male dogs (genderM) are associated with a significantly higher bite count, as indicated by the large positive coefficient.

Spay/Neuter: Neutered dogs are associated with a lower bite count, with a statistically significant negative coefficient.

Boroughs: Bite counts vary across boroughs, with Staten Island showing a significant negative association compared to the reference borough.

To further increase the precision of the model fit, interaction terms between variables were added. After using stepwise reduction to simplify the model complexity, the variables of final negative binomial regression model are as follow:

Final Negative Binomial Regression Model
Main.Effect Double.Interaction.Effect
year year * spay_neuter
month year * borough
gender month * spay_neuter
spay_neuter gender * spay_neuter
borough gender * borough
spay_neuter * borough

Firstly, the intercept is 3.83 with a p-value near zero, indicating that the baseline level of the response variable (on a logarithmic scale) is significant when other variables are not considered. For the main effect variables, the years 2016, 2018, and 2019 show a significant increase in the log change of event counts compared to 2015; male dogs, compared to female dogs, exhibit stronger aggression, with a 10.59 times increase in the incidence of dog bite events, calculated as with \(e^{2.36}\). The coefficient for neutering, 0.3960, indicates a significant reduction in the log count of bite incidents for neutered dogs compared to unneutered dogs. Geographically, the coefficients for Manhattan and Staten Island are 0.72 and 0.48, respectively, showing significantly lower event counts compared to the Bronx. Regarding interaction terms, the years 2018, 2020, and 2021 show a significant reduction in interaction with the neutering status.

To address the model fitting issue, a generalized linear mixed model is employed. The GLMM is structured as follows:

\[g(\vec {\rm E}(y)) = {\rm X}\vec \beta + {\rm Z}\vec b + \varepsilon\] Result interpretation should be divided into the following two parts.

Random Effects:

Borough: Variance: 0.1297, standard Deviation: 0.3601. Indicates moderate variability between boroughs in the baseline bite count.

Fixed Effects:

Finally, this study compared the fit, model complexity, interpretability, and residual diagnosis results of three regression models.

Model Comparison Table
Model AIC Null Deviance Residual Deviance Random Effects Description
model 7403.4 3914.5 1549.7 None Negative binomial regression with main effects only
filter_model 7045.8 4944.9 1437.1 None Negative binomial regression with main effects and interaction terms
glmm_model 7428.7 NA 1551.0 Borough Negative binomial mixed model with random intercept for Borough

Overall, the filter_model with interaction provided the best fit.



Discussion

Overall, we found that dog bite incidents shared similar patterns regarding top breeds, locations, gender, and spay/neuter status from 2015 to 2021. According to our analysis, the number of dog bite incidents exhibited a decreasing trend over the observed period, which aligns the result of other studies.1 Several factors may contribute to this downtrend. One possible explanation is increased public awareness and education on responsible pet ownership, which could lead to better management of dogs and reduced aggressive behavior. Additionally, stricter enforcement of leash laws and regulations in urban areas may have minimized opportunities for uncontrolled interactions between dogs and the public. Another contributing factor could be the rise in spay/neuter programs, as research suggests that neutered dogs are less likely to display aggressive tendencies. Finally, as NYC Department of Health and Mental Hygiene (DOHMH) began its emergency department syndromic surveillance system in November 2001 in response to bioterrorism preparedness and animal bite incidents, we believe improvements in data reporting in recent years could also influence the observed decline.2

It is notable that pit bulls account for a significant percentage of reported dog bites, despite comprising a small proportion of all registered dogs. This leads us to question what might be unique about this breed. It also aligns with other studies, indicating that Pit Bull-type breeds have accounted for a significant portion of dog-related maulings and fatalities over the past three decades.3 Based on our research, pit bulls are often trained by their owners to exhibit aggressive behaviors, including fighting and attacking. Additionally, pit bulls are frequently associated with illegal dog fighting rings, where they are bred to be larger, stronger, and more powerful than average.4

Based on our study, neutered dogs were associated with fewer reported bite incidents. This could be due to several factors. Neutering is known to reduce certain aggressive behaviors in dogs, such as territoriality, dominance, and mating-related aggression, which are often linked to biting incidents. Additionally, neutered dogs may be more likely to live in controlled and stable environments, as owners who choose to neuter their pets often demonstrate greater responsibility in managing their behavior. It is also possible that neutering reduces hormone-driven impulses that may trigger biting behavior. While these explanations are plausible, further research would be needed to fully understand the relationship between neutering and the reduced frequency of dog bite incidents.

For our modeling analysis, we selected the model with interaction terms as our final predictive model. This choice was made because the interaction effects between variables provided a more nuanced understanding of how different factors, such as location, time, and other dog-specific or environmental characteristics, combine to influence the likelihood of dog bite incidents. By accounting for these interactions, the model can better capture the complex relationships between variables and improve the accuracy of predictions.

In the future, this model could serve as a valuable tool to inform communities, policymakers, and public health officials about areas and time periods with a higher risk of dog bite incidents. For example, the model could help identify specific zip codes or neighborhoods where dog bites are more likely to occur during certain months, days of the week, or times of day. Such insights could be used to develop targeted interventions, such as public awareness campaigns, stricter enforcement of leash laws, or educational programs promoting responsible pet ownership. Ultimately, leveraging predictive modeling in this way has the potential to enhance public safety and reduce the frequency of dog bite incidents.



Limitations

Despite the insights achieved by the project, there are several limitations of note. The age variable contains a substantial amount of data that is either unreasonable, inconsistent, or difficult to interpret, making it unsuitable for meaningful analysis. This could include missing values, unrealistic age entries, or ambiguous data that lack clarity or reliability. Retaining such a variable could introduce noise into our analysis, skew the results, or lead to incorrect conclusions. After careful consideration, we determined that excluding the age variable was the most appropriate decision to maintain the integrity and accuracy of our findings.

Additionally, when generating the map of dog bite incidents, we were unable to pinpoint specific locations using the street variable due to limitations in the data. Instead, we mapped each dog bite incident based on the zip_code, which provides a less precise representation of the exact locations. This limitation reduces the spatial accuracy of the map and may obscure finer-scale patterns, such as clustering of incidents on certain streets or neighborhoods. Future improvements in data collection, including more consistent and accurate recording of street-level information, would enhance the spatial analysis and allow for a more detailed understanding of dog bite incident distribution.

In the Residuals vs. Leverage plot, several influential points were identified. These points warrant further investigation as they may significantly impact the model’s performance. Examining these influential observations could help identify potential data issues, such as outliers or errors, or reveal areas where the model may require further refinement to better account for underlying patterns in the data.



References

  1. https://link.springer.com/article/10.1186/s40621-020-00281-y

  2. https://journals.sagepub.com/doi/abs/10.1177/003335491212700208

  3. https://pubmed.ncbi.nlm.nih.gov/33136964/

  4. https://atlantaadvocate.com/legal-guides/dog-bites/pitbull-dog-attacks/