class: center, middle, inverse, title-slide # 🔥 ##
Using the Twitter and Google APIs to Track Fires in NYC
### Amanda Dobbyn --- class: inverse ## Quick About Me <br> .left-column[ **Day job**: ultimate frisbee player **For fun**: Data Scientist at [Earlybird Software](http://www.earlybird.co/), former co-organizer of [R-Ladies Chicago](https://rladieschicago.org/) **GitHub**: [@aedobbyn](https://github.com/aedobbyn) **Website**: https://dobb.ae **Twitter**: [@dobbleobble](https://twitter.com/dobbleobble) ] .right-column[![](./img/fris.jpg)] --- ## This Talk - We will use Twitter and Google Maps APIs to figure out when and where fires happen in NYC -- <br> - Great use case for the `drake` package! -- <img src="./img/happy_drake.jpg" height="200" align="right"> -- <br> <br> <br> <br> **Caveats** This analysis relies on the [rtweet](https://github.com/mkearney/rtweet) and [ggmap](https://github.com/dkahle/ggmap) packages. To be able to run it in full you'll need a [Twitter API access token](https://rtweet.info/articles/auth.html) and [Google Maps Geocoding API key](https://developers.google.com/maps/documentation/geocoding/intro#Geocoding). -- <br> <br> All code, slides, and data on [GitHub](https://github.com/aedobbyn/nyr-2019). Use for whatever you like! --- class: blue-light <!-- background-image: url("https://static01.nyt.com/images/2018/12/29/nyregion/28xp-explosion-sub-print/28xp-explosion-sub-facebookJumbo.jpg) --> ## Why Fires? Remember the [crazy blue light](https://twitter.com/NYCFireWire/status/1078478369036165121) in NYC from late December? -- <p align="left" style="padding-right: 20%;"> <img src="./img/blue_light.jpg" height="350px"> </p> -- <br> ## 😱 😱 😱 --- .pull-right[![](./img/nyc_fire.jpg)] <br> The Twitter account that let us know that this wasn't in fact aliens is [NYCFireWire](https://twitter.com/NYCFireWire). -- <br> Normally they just tweet out fires and their locations in a more or less predictable pattern: <br> -- Before February: `<borough> ** <some numbers> ** <address> <description of fire>` After February: `<borough> *<type of fire>* Box <digits> <address> <description of fire>` <br> We can use their tweets to get some info on where and when fires happen in NYC. ??? I'll illustrate a way you might want to use `drake` with something that's close to home for us. What if we were constructing an analysis of these tweets and wanted to make sure our pipeline worked end-to-end, but didn't want to unnecessarily re-run outdated parts of it unless we needed to? --- ## The Pipeline 1. Pull in tweets, either a big initial batch or any new ones that show up -- 2. Extract addresses from the tweets (🎶 regex time 🎶) -- 3. Send addresses to the Google Maps API to grab their latitudes and longitudes -- 4. Profit -- <br> All functions are defined in [`R/didnt_start_it.R`](https://github.com/aedobbyn/nyr-2019/blob/master/R/didnt_start_it.R) in this repo, which we'll source in now. ```r source(here::here("R", "didnt_start_it.R")) ``` --- ### Sans `drake` ```r # Get tweets fires <- get_tweets( n_tweets_seed = 3000, output_path = here("data", "raw", "lots_o_fires.csv"), write_out = TRUE ) # Pull out addresses addresses <- pull_addresses(fires) # Geocode lat_long <- get_lat_long(addresses) # Sum up n fires by lat-long combo fire_sums <- count_fires(lat_long) # Plot fires on a map plot_fire_sums(fire_sums) ``` --- ## `drake`'s Main Idea -- [`drake`](https://github.com/ropensci/drake) is workflow manager for your R code. -- In a complex analysis pipeline, it makes changing your code easier. -- <br> `drake` loves changes. -- <p align="left"> <img src="https://media.giphy.com/media/JFawGLFMCJNDi/giphy.gif" alt="ilovechanges" height="300px"> </p> --- ## `drake`'s Main Idea -- When something changes that makes the most recent results **out-of-date**, `drake` rebuilds *only* things that need to be rebuilt, so that -- *what gets done stays done*. <p align="left" style="padding-right: 20%;"> <img src="./img/drake_pitch.svg" height="300px"> </p> -- Created and maintained by [Will](https://twitter.com/wmlandau) [Landau](https://github.com/wlandau) and friends. --- class: inverse ## Better Workflows <br> Does your analysis directory look like this? -- .pull-left[ `01_import.R` `02_clean.R` `03_deep_clean.R` `04_join.R` `05_analyze.R` `06_analyze_more.R` `07_report.Rmd` ] -- .pull-right[ <br> #### What's bad about this? <br> **It doesn't scale well** <br> Which you know if you've tried to add another intermediate step or reorganize your subdirectories. ] --- #### Your pipeline depends on -- - You keeping file names up-to-date and sourcing things in the right order -- - You knowing when the input data changes -- - You knowing which objects and functions are used by which other objects and functions <!-- - Explicitly saving intermediate data representations --> -- <br> #### If something breaks -- - Can you be sure about where it broke? -- - Do you know which intermediate data stores are up to date? -- - Do you need to re-run the entire pipeline again? -- .pull-right[ <p align="right"> <img src="./img/tired_drake.jpeg"> </p> ] --- ## Nice features of `drake` .pull-left[ 1) Tidy **dataframe** shows how pieces in your pipeline fit together ] -- <br> .pull-right[ 2) **Dependency graph** of all inputs and outputs ] <br> -- .pull-left[ 3) Great for iteration and **reproducibility**, especially if used with git ] <br> -- .pull-right[ 4) Automated parallel and distributed computing ] <br> -- .pull-left[ 5) It's all in R, so no writing config files! 🎉 ] <!-- .pull-right[![](./img/iris_dependency_graph.jpg)] --> <!-- .pull-right[![](./img/mtcars_dependency_graph.jpg)] --> --- ## Dependency Graph <img src="./img/simple_drake_vis.jpg" style="padding-left:0px;"> --- class: inverse ## A Few Pieces of `drake` Vocab <br> > **Targets** are the objects that drake generates; <br> -- > **Commands** are the pieces of R code that produce them. <br> -- > **Plans** wrap up the relationship between targets and commands into a workflow representation: a dataframe. <br> ??? one column for targets, and one column for their corresponding commands. --- ## More on Plans Plans are like that top-level script that runs your entire pipeline. <br> ```r source("01_import.R") source("02_clean.R") ... source("06_analyze_more.R") final <- do_more_things(object_in_env) write_out_my_results(final) ``` <br> --- *But*, a plan **knows about the dependencies** in your code. -- <img src="./img/drakes_plan.jpg" style="padding-left:0px;"> --- ## How to `drake` -- <br> 1) Store functions and any packages you need to load in a file `funs.R` -- 2) Store a `drake` **plan** in another file ```r plan <- drake_plan( cleaned_data = clean_my(raw_data), results = analyze_my(cleaned_data), report = report_out_my(results) ) ``` -- 3) **Run** the plan ```r make(plan) ``` --- ## What `drake` does -- ```r plan <- drake_plan( cleaned_data = clean_my(raw_data), results = analyze_my(cleaned_data), report = report_out_my(results) ) ``` -- `drake_plan` stores your plan as targets and commands in a dataframe. -- ```r plan ## # A tibble: 3 x 2 ## target command ## <chr> <expr> ## 1 cleaned_data clean_my(raw_data) ## 2 results analyze_my(cleaned_data) ## 3 report report_out_my(results) ``` --- ## What `drake` does ```r plan ## # A tibble: 3 x 2 ## target command ## <chr> <expr> ## 1 cleaned_data clean_my(raw_data) ## 2 results analyze_my(cleaned_data) ## 3 report report_out_my(results) ``` -- ```r make(plan) ``` -- **First run** of `make(plan)`: `drake` runs the plan from scratch -- <br> **Thereafter**: `drake` will only rebuild targets that are out of date, and everything downstream of them --- ## What makes a target become out of date? 1) A trigger is activated -- 2) Something used to generate that target *or one of its upstream targets* has changed -- ```r plan <- drake_plan( cleaned_data = clean_my(raw_data), * results = analyze_my(cleaned_data), report = report_out_my(results) ) ``` `drake` knows that `results` depends on the object `cleaned_data` and the function `analyze_my()` because those are both part of the command used to generate `results`. <br> -- **So, if `cleaned_data` changes or `analyze_my` changes, `results` is out of date.** --- ## Where is all this stuff stored? <br> #### **targets** -- In a hidden `.drake` directory, or cache, in your project's root. [More on storage.](https://ropensci.github.io/drake/articles/storage.html) -- <p align="left"> <img src="./img/drake_cache.jpg" height="180px"> <figcaption style="margin-left: 20%;">Spot the cache among the hidden dirs.</figcaption> </p> -- <br> `loadd()` loads targets from the cache into your R session. -- `clean()` cleans the cache. (You can recover a cache if you clean it by accident.) <br> --- ## Where is all this info stored? <br> #### **dependencies** -- `drake` **hashes** a target's dependencies to know when one of those dependencies changes -- <p align="left"> <img src="./img/drake_cache_hashes_small.jpg" height="150px"> <figcaption style="margin-left: 20%;">Inside the data subdir of the .drake cache</figcaption> </p> -- and creates a `config` list that stores a dependency graph (`igraph` object) of the plan along with a bunch of other things. -- You can access all of this with `drake_config()`. ??? You can check that the cache is there with `ls -a`. You have [control](https://ropensci.github.io/drake/articles/storage.html#hash-algorithms) over the hashing algorithm used, location of the cache, etc. --- ### Without `drake` ```r # Get tweets fires <- get_tweets( n_tweets_seed = 3000, output_path = here("data", "raw", "lots_o_fires.csv"), write_out = TRUE ) # Pull out addresses addresses <- pull_addresses(fires) # Geocode lat_long <- get_lat_long(addresses) # Sum up n fires by lat-long combo fire_sums <- count_fires(lat_long) # Plot fire sums on a map of NYC plot_fire_sums(fire_sums) ``` --- ## With `drake` ```r plan <- drake_plan( # Get a seed batch of tweets seed_fires = get_tweets(), # Reup if there are new tweets fires = target( command = get_tweets(tbl = seed_fires), trigger = trigger(condition = there_are_new_tweets) ), # Extract addresses from tweets addresses = pull_addresses(fires), # Send to Google for lat-longs lat_long = get_lat_long(addresses), # Sum up n fires per lat-long combo fire_sums = count_fires(lat_long), # Plot fires on a map of NYC plot = plot_fire_sums(fire_sums) ) ``` --- <img src="./img/drake_out_of_date.jpg" style="padding-left:0px;"> --- <br> ```r plan ## # A tibble: 6 x 3 ## target command trigger ## <chr> <expr> <expr> ## 1 seed_fires get_tweets() … NA … ## 2 fires get_tweets(tbl = seed_fi… trigger(condition = there_are_new_t… ## 3 addresses pull_addresses(fires) … NA … ## 4 lat_long get_lat_long(addresses) … NA … ## 5 fire_sums count_fires(lat_long) … NA … ## 6 plot plot_fire_sums(fire_sums… NA … ``` -- <br> #### Yay, let's get some 🔥s! --- ## Getting Tweets The main function we'll use is `rtweet::get_timeline`. -- Which returns a looooot of stuff. ```r get_timeline("NYCFireWire") ## # A tibble: 100 x 88 ## user_id status_id created_at screen_name text source ## <chr> <chr> <dttm> <chr> <chr> <chr> ## 1 560024… 11272103… 2019-05-11 13:54:28 NYCFireWire Quee… thefi… ## 2 560024… 11272065… 2019-05-11 13:39:18 NYCFireWire Broo… thefi… ## 3 560024… 11271434… 2019-05-11 09:28:31 NYCFireWire Quee… thefi… ## 4 560024… 11270184… 2019-05-11 01:11:58 NYCFireWire Manh… thefi… ## 5 560024… 11269896… 2019-05-10 23:17:16 NYCFireWire Quee… thefi… ## 6 560024… 11269743… 2019-05-10 22:16:28 NYCFireWire @Cal… Twitt… ## 7 560024… 11269220… 2019-05-10 18:48:52 NYCFireWire Broo… thefi… ## 8 560024… 11269218… 2019-05-10 18:47:51 NYCFireWire Broo… thefi… ## 9 560024… 11268908… 2019-05-10 16:44:59 NYCFireWire Toda… Twitt… ## 10 560024… 11268633… 2019-05-10 14:55:39 NYCFireWire Quee… thefi… ## # … with 90 more rows, and 82 more variables: display_text_width <dbl>, ## # reply_to_status_id <chr>, reply_to_user_id <chr>, ## # reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>, ## # favorite_count <int>, retweet_count <int>, hashtags <list>, ## # symbols <list>, urls_url <list>, urls_t.co <list>, ## # urls_expanded_url <list>, media_url <list>, media_t.co <list>, ## # media_expanded_url <list>, media_type <list>, ext_media_url <list>, ## # ext_media_t.co <list>, ext_media_expanded_url <list>, ## # ext_media_type <chr>, mentions_user_id <list>, ## # mentions_screen_name <list>, lang <chr>, quoted_status_id <chr>, ## # quoted_text <chr>, quoted_created_at <dttm>, quoted_source <chr>, ## # quoted_favorite_count <int>, quoted_retweet_count <int>, ## # quoted_user_id <chr>, quoted_screen_name <chr>, quoted_name <chr>, ## # quoted_followers_count <int>, quoted_friends_count <int>, ## # quoted_statuses_count <int>, quoted_location <chr>, ## # quoted_description <chr>, quoted_verified <lgl>, ## # retweet_status_id <chr>, retweet_text <chr>, ## # retweet_created_at <dttm>, retweet_source <chr>, ## # retweet_favorite_count <int>, retweet_retweet_count <int>, ## # retweet_user_id <chr>, retweet_screen_name <chr>, retweet_name <chr>, ## # retweet_followers_count <int>, retweet_friends_count <int>, ## # retweet_statuses_count <int>, retweet_location <chr>, ## # retweet_description <chr>, retweet_verified <lgl>, place_url <chr>, ## # place_name <chr>, place_full_name <chr>, place_type <chr>, ## # country <chr>, country_code <chr>, geo_coords <list>, ## # coords_coords <list>, bbox_coords <list>, status_url <chr>, ## # name <chr>, location <chr>, description <chr>, url <chr>, ## # protected <lgl>, followers_count <int>, friends_count <int>, ## # listed_count <int>, statuses_count <int>, favourites_count <int>, ## # account_created_at <dttm>, verified <lgl>, profile_url <chr>, ## # profile_expanded_url <chr>, account_lang <chr>, ## # profile_banner_url <chr>, profile_background_url <chr>, ## # profile_image_url <chr> ``` --- ## Getting Tweets The main function we'll use is `rtweet::get_timeline`. Which returns a looooot of stuff. <br> ![](./img/jackjack.jpg) --- ## Getting Tweets Wowza. -- <br> We'll wrap that up in a function that: <br> -- - Pulls in tweets, either the first big batch or any new ones that show up -- <br> - Writes the result out to a file --- ## Grabbing Tweets <br> **First Big Batch** - `get_seed_tweets` grabs a batch of tweets *or* reads in seed tweets from a file if the file exists -- <br> **New Ones** - `get_more_tweets` checks if there are new tweets and, if so, pulls in the right number of them -- ??? - If neither file nor `tbl` is supplied as arguments, grabs an initial *seed* batch of tweets - If either is supplied, checks for new tweets and grabs them if any - Spits out the latest to the same file --- <br> #### Let's get some tweets -- ```r get_tweets(n_tweets_seed = 15) ## # A tibble: 15 x 4 ## text status_id created_at screen_name ## <chr> <chr> <dttm> <chr> ## 1 Queens *10-75* Box 9137. 1… 11272103653… 2019-05-11 09:54:28 NYCFireWire ## 2 Brooklyn *Collapse* Box 34… 11272065503… 2019-05-11 09:39:18 NYCFireWire ## 3 Queens *All Hands* Box 637… 11271434369… 2019-05-11 05:28:31 NYCFireWire ## 4 Manhattan *CO Emergency* B… 11270184767… 2019-05-10 21:11:58 NYCFireWire ## 5 Queens *All Hands* Box 540… 11269896098… 2019-05-10 19:17:16 NYCFireWire ## 6 @CalFireNews 'GZ Hero', di… 11269743118… 2019-05-10 18:16:28 NYCFireWire ## 7 Brooklyn *All Hands* Box 2… 11269220659… 2019-05-10 14:48:52 NYCFireWire ## 8 Brooklyn *10-75* Box 2769.… 11269218122… 2019-05-10 14:47:51 NYCFireWire ## 9 Today's Plaque Dedication … 11268908894… 2019-05-10 12:44:59 NYCFireWire ## 10 Queens *10-75* Box 6293. F… 11268633761… 2019-05-10 10:55:39 NYCFireWire ## 11 Queens *10-75* Box 1245. 3… 11268432764… 2019-05-10 09:35:47 NYCFireWire ## 12 Queens *10-75* Box 1245. 3… 11268376898… 2019-05-10 09:13:35 NYCFireWire ## 13 Brooklyn *Confined Space* … 11268370577… 2019-05-10 09:11:04 NYCFireWire ## 14 Staten Island *All Hands* … 11268150478… 2019-05-10 07:43:37 NYCFireWire ## 15 Manhattan *All Hands* Box … 11268140296… 2019-05-10 07:39:34 NYCFireWire ``` --- A closer look at just the text of the tweets: -- <table> <tbody> <tr> <td style="text-align:left;"> Queens *10-75* Box 9137. 110-11 72nd Ave, . L-151 transmitting 10-75, fire 7th floor. 7 story multiple dwelling </td> </tr> <tr> <td style="text-align:left;"> Brooklyn *Collapse* Box 3432. Stillwell Ave &amp; Ave W.. Bn-43 using 2x2 for a localized collapse, does not appear to affect structural intrgrity. Request NYC DOB. </td> </tr> <tr> <td style="text-align:left;"> Queens *All Hands* Box 6375. 42-02 Marathon Pkwy, . Fire in a private dwelling </td> </tr> <tr> <td style="text-align:left;"> Manhattan *CO Emergency* Box 1238. 103 W 96th St. Div.3 has high CO levels in the day care in the basement of a 6 story. 600ppm of CO. Incident duration 8 hours. ConEd searching for the source. </td> </tr> <tr> <td style="text-align:left;"> Queens *All Hands* Box 5409. 217-41 Hollis Ave, . Fire 2nd floor private dwelling. </td> </tr> <tr> <td style="text-align:left;"> Bronx *66-75-3049* 1449 Commonwealth ave. Attic fire private dwelling. </td> </tr> <tr> <td style="text-align:left;"> Manhattan *66-75-0755* 330 E 39 St. Fire in the duct work 3rd floor. 10-77(HiRise Residential). E-16/TL-7 1st due </td> </tr> <tr> <td style="text-align:left;"> Manhattan 10-77* 66-75-2017* 70 Little West St x 2nd Pl. BC01 has a fire on the 7th floor in the laundry area. </td> </tr> <tr> <td style="text-align:left;"> Bronx *66-75-2251* 2922 3rd Avenue at Westchester Avenue, Battalion 14 transmitting a 10-75 for a fire on the 4th floor of a 6 story commercial building. Squad 41 First Due </td> </tr> <tr> <td style="text-align:left;"> Brooklyn **77-75-0270** 330 Bushwick Avenue Near McKibbin Street, Fire on the 4th Floor </td> </tr> <tr> <td style="text-align:left;"> Brooklyn *77-75-0855* 899 Hancock St. Fire top floor 3 story </td> </tr> <tr> <td style="text-align:left;"> Brooklyn **77-75-0855** 899 Hancock Street Near Howard Avenue, All hands going to work for fire I’m the top floor </td> </tr> <tr> <td style="text-align:left;"> Bronx *66-75-3937* 3840 Orloff Av. Fire 4th floor. </td> </tr> <tr> <td style="text-align:left;"> Staten Island *MVA/PIN* Box 1744- 490 Harold St off Forest Hill Rd. Hurst tool in operation. </td> </tr> <tr> <td style="text-align:left;"> Queens 99-75-6810 111-15 227 St BC-54 using all hands for a fire in a pvt dwelling </td> </tr> </tbody> </table> --- ## Extracting Addresses Next step is to pull out addresses. <br> These come in two parts: **borough** and **street**. <br> <table> <tbody> <tr> <td style="text-align:left;"> Queens *10-75* Box 9137. 110-11 72nd Ave, . L-151 transmitting 10-75, fire 7th floor. 7 story multiple dwelling </td> </tr> <tr> <td style="text-align:left;"> Brooklyn *Collapse* Box 3432. Stillwell Ave &amp; Ave W.. Bn-43 using 2x2 for a localized collapse, does not appear to affect structural intrgrity. Request NYC DOB. </td> </tr> <tr> <td style="text-align:left;"> Queens *All Hands* Box 6375. 42-02 Marathon Pkwy, . Fire in a private dwelling </td> </tr> <tr> <td style="text-align:left;"> Manhattan *CO Emergency* Box 1238. 103 W 96th St. Div.3 has high CO levels in the day care in the basement of a 6 story. 600ppm of CO. Incident duration 8 hours. ConEd searching for the source. </td> </tr> <tr> <td style="text-align:left;"> Queens *All Hands* Box 5409. 217-41 Hollis Ave, . Fire 2nd floor private dwelling. </td> </tr> </tbody> </table> --- ~ After some fun w/ regexes ~ -- ```r pull_box_address <- function(x) { x %>% str_extract("Box [0-9]+.+[,\\.]") %>% str_extract("\\..+?\\.") %>% str_remove_all("[,\\.]") %>% str_trim() } ``` -- for a given tweet <table> <tbody> <tr> <td style="text-align:left;"> Queens *10-75* Box 9137. 110-11 72nd Ave, . L-151 transmitting 10-75, fire 7th floor. 7 story multiple dwelling </td> </tr> </tbody> </table> -- <br> we can pull just the street address out. -- ```r a_box_address %>% pull_box_address() %>% kable(col.names = NULL) ``` <table> <tbody> <tr> <td style="text-align:left;"> 110-11 72nd Ave </td> </tr> </tbody> </table> --- ## Getting Addresses ```r get_tweets(max_id = old_tweet_id) %>% * pull_addresses() %>% select(text, street, borough, address) %>% kable() ``` <table> <thead> <tr> <th style="text-align:left;"> text </th> <th style="text-align:left;"> street </th> <th style="text-align:left;"> borough </th> <th style="text-align:left;"> address </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Bronx *66-75-3049* 1449 Commonwealth ave. Attic fire private dwelling. </td> <td style="text-align:left;"> 1449 Commonwealth ave </td> <td style="text-align:left;"> The Bronx </td> <td style="text-align:left;"> 1449 Commonwealth ave, The Bronx </td> </tr> <tr> <td style="text-align:left;"> Manhattan *66-75-0755* 330 E 39 St. Fire in the duct work 3rd floor. 10-77(HiRise Residential). E-16/TL-7 1st due </td> <td style="text-align:left;"> 330 E 39 St </td> <td style="text-align:left;"> Manhattan </td> <td style="text-align:left;"> 330 E 39 St, Manhattan </td> </tr> <tr> <td style="text-align:left;"> Manhattan 10-77* 66-75-2017* 70 Little West St x 2nd Pl. BC01 has a fire on the 7th floor in the laundry area. </td> <td style="text-align:left;"> 70 Little West St x 2nd Pl </td> <td style="text-align:left;"> Manhattan </td> <td style="text-align:left;"> 70 Little West St x 2nd Pl, Manhattan </td> </tr> <tr> <td style="text-align:left;"> Bronx *66-75-2251* 2922 3rd Avenue at Westchester Avenue, Battalion 14 transmitting a 10-75 for a fire on the 4th floor of a 6 story commercial building. Squad 41 First Due </td> <td style="text-align:left;"> 2922 3rd Avenue at Westchester Avenue </td> <td style="text-align:left;"> The Bronx </td> <td style="text-align:left;"> 2922 3rd Avenue at Westchester Avenue, The Bronx </td> </tr> <tr> <td style="text-align:left;"> Brooklyn **77-75-0270** 330 Bushwick Avenue Near McKibbin Street, Fire on the 4th Floor </td> <td style="text-align:left;"> 330 Bushwick Avenue Near McKibbin Street </td> <td style="text-align:left;"> Brooklyn </td> <td style="text-align:left;"> 330 Bushwick Avenue Near McKibbin Street, Brooklyn </td> </tr> <tr> <td style="text-align:left;"> Brooklyn *77-75-0855* 899 Hancock St. Fire top floor 3 story </td> <td style="text-align:left;"> 899 Hancock St </td> <td style="text-align:left;"> Brooklyn </td> <td style="text-align:left;"> 899 Hancock St, Brooklyn </td> </tr> <tr> <td style="text-align:left;"> Brooklyn **77-75-0855** 899 Hancock Street Near Howard Avenue, All hands going to work for fire I’m the top floor </td> <td style="text-align:left;"> 899 Hancock Street Near Howard Avenue </td> <td style="text-align:left;"> Brooklyn </td> <td style="text-align:left;"> 899 Hancock Street Near Howard Avenue, Brooklyn </td> </tr> <tr> <td style="text-align:left;"> Bronx *66-75-3937* 3840 Orloff Av. Fire 4th floor. </td> <td style="text-align:left;"> 3840 Orloff Av </td> <td style="text-align:left;"> The Bronx </td> <td style="text-align:left;"> 3840 Orloff Av, The Bronx </td> </tr> <tr> <td style="text-align:left;"> Staten Island *MVA/PIN* Box 1744- 490 Harold St off Forest Hill Rd. Hurst tool in operation. </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> Staten Island </td> <td style="text-align:left;"> Staten Island </td> </tr> <tr> <td style="text-align:left;"> Queens 99-75-6810 111-15 227 St BC-54 using all hands for a fire in a pvt dwelling </td> <td style="text-align:left;"> NA </td> <td style="text-align:left;"> Queens </td> <td style="text-align:left;"> Queens </td> </tr> </tbody> </table> --- ## Getting Lat and Long Last step of the main pipeline! -- **Reverse geocoding** = getting latitude and longitude from an address. The [`ggmap`](https://www.rdocumentation.org/packages/ggmap/versions/2.6.1/topics/geocode) package exposes this feature of the [Google Maps](https://cloud.google.com/maps-platform/) API. <br> -- The `ggmap::geocode` accepts a string and returns a dataframe of `lon` and `lat`. -- ```r (sherlock <- ggmap::geocode("221B Baker Street, London")) ## lon lat ## 1 -0.1585557 51.52377 ``` --- #### Where's Sherlock? ```r london <- get_map("london", zoom = 13) ggmap(london) ``` ![](index_files/figure-html/unnamed-chunk-24-1.png)<!-- --> --- #### Where's Sherlock? -- ```r ggmap(london) + geom_point(data = sherlock, aes(x = lon, y = lat), color = "blue", size = 10) ``` ![](index_files/figure-html/unnamed-chunk-25-1.png)<!-- --> ## 👋 --- #### Where's Sherlock? <br><br> <p> <img src="https://media.giphy.com/media/EVjAANNjkMBKE/giphy.gif" height="300px" align="middle"> </p> -- <br> <div align="middle"> <h2> 👋 </h2> </div> --- <br> Now we can stick geocoding into our pipeline. <br> ```r get_lat_long <- function(tbl) { tbl %>% mutate( address = case_when( is.na(address) ~ "", # Gives an NA in lat and long response df TRUE ~ address ), l_l = address %>% * geocode() %>% # <-- doing the work here list() ) %>% unnest() %>% select(address, lat, lon, created_at, text) %>% rename(long = lon) } ``` --- #### Let's get some geo info <br> -- ```r full_tweets %>% sample_n(1) %>% # Random tweet pull_addresses() %>% * get_lat_long() %>% select(text, address, lat, long) %>% kable() ``` <table> <thead> <tr> <th style="text-align:left;"> text </th> <th style="text-align:left;"> address </th> <th style="text-align:right;"> lat </th> <th style="text-align:right;"> long </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Brooklyn **77-75-0855** 899 Hancock Street Near Howard Avenue, All hands going to work for fire I’m the top floor </td> <td style="text-align:left;"> 899 Hancock Street Near Howard Avenue, Brooklyn </td> <td style="text-align:right;"> 40.67302 </td> <td style="text-align:right;"> -73.91986 </td> </tr> </tbody> </table> -- <br> ## 👍 👍 👍 --- ## Counting Up `count_fires` sums up the total number of fires per `lat`-`long` combo <br> ```r count_fires <- function(tbl) { tbl %>% drop_na() %>% count(lat, long) } ``` <br> -- so we can plot them on a map (thanks again, `ggmap`) --- ### Running our whole plan... <br><br> ```r plan ## # A tibble: 6 x 3 ## target command trigger ## <chr> <expr> <expr> ## 1 seed_fires get_tweets() … NA … ## 2 fires get_tweets(tbl = seed_fi… trigger(condition = there_are_new_t… ## 3 addresses pull_addresses(fires) … NA … ## 4 lat_long get_lat_long(addresses) … NA … ## 5 fire_sums count_fires(lat_long) … NA … ## 6 plot plot_fire_sums(fire_sums… NA … ``` --- <img src="./img/drake_up_to_date.jpg" style="padding-left:0px;"> --- Using about a year's worth of tweets (~3k or so): -- ```r plot_fire_sums(fire_sums, output_path = NULL) ``` ![](index_files/figure-html/unnamed-chunk-30-1.png)<!-- --> <!-- <p> --> <!-- <img src="./img/fire_sums_plot.png"> --> <!-- </p> --> --- <img src="https://media.giphy.com/media/5VMNcCxVBibZK/giphy.gif" height="300px"> -- <br> ...time for... <br> -- #### ⚡ Fire trivia lighting round! ⚡ (Code for these in [analysis.R](https://github.com/aedobbyn/nyr-2019/blob/master/R/analysis.R)) --- #### ⚡ Fire trivia lighting round! ⚡ -- <br> *On what day of the week are fires most common?* -- <br> Answer: **Wednesday** <table> <thead> <tr> <th style="text-align:left;"> Day of Week </th> <th style="text-align:right;"> N </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Wed </td> <td style="text-align:right;"> 347 </td> </tr> <tr> <td style="text-align:left;"> Mon </td> <td style="text-align:right;"> 327 </td> </tr> <tr> <td style="text-align:left;"> Tue </td> <td style="text-align:right;"> 312 </td> </tr> <tr> <td style="text-align:left;"> Fri </td> <td style="text-align:right;"> 306 </td> </tr> <tr> <td style="text-align:left;"> Thu </td> <td style="text-align:right;"> 303 </td> </tr> <tr> <td style="text-align:left;"> Sat </td> <td style="text-align:right;"> 289 </td> </tr> <tr> <td style="text-align:left;"> Sun </td> <td style="text-align:right;"> 286 </td> </tr> </tbody> </table> --- #### ⚡ Fire trivia lighting round! ⚡ -- <br> *Which borough has the most fires?* -- <br> Answer: **Brooklyn** <table> <thead> <tr> <th style="text-align:left;"> Borough </th> <th style="text-align:right;"> N </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Brooklyn </td> <td style="text-align:right;"> 633 </td> </tr> <tr> <td style="text-align:left;"> Queens </td> <td style="text-align:right;"> 538 </td> </tr> <tr> <td style="text-align:left;"> Manhattan </td> <td style="text-align:right;"> 426 </td> </tr> <tr> <td style="text-align:left;"> The Bronx </td> <td style="text-align:right;"> 385 </td> </tr> <tr> <td style="text-align:left;"> Staten Island </td> <td style="text-align:right;"> 135 </td> </tr> </tbody> </table> --- #### ⚡ Fire trivia lighting round! ⚡ <br> *Which borough has the most fires per capita?* -- <br> Answer: **Staten Island** <table> <thead> <tr> <th style="text-align:right;"> Borough </th> <th style="text-align:right;"> N </th> <th style="text-align:right;"> Population </th> <th style="text-align:right;"> Fires Per Person </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> Staten Island </td> <td style="text-align:right;"> 135 </td> <td style="text-align:right;"> 476,179 </td> <td style="text-align:right;"> 0.0002835 </td> </tr> <tr> <td style="text-align:right;"> The Bronx </td> <td style="text-align:right;"> 385 </td> <td style="text-align:right;"> 1,432,132 </td> <td style="text-align:right;"> 0.0002688 </td> </tr> <tr> <td style="text-align:right;"> Manhattan </td> <td style="text-align:right;"> 426 </td> <td style="text-align:right;"> 1,628,701 </td> <td style="text-align:right;"> 0.0002616 </td> </tr> <tr> <td style="text-align:right;"> Brooklyn </td> <td style="text-align:right;"> 633 </td> <td style="text-align:right;"> 2,582,830 </td> <td style="text-align:right;"> 0.0002451 </td> </tr> <tr> <td style="text-align:right;"> Queens </td> <td style="text-align:right;"> 538 </td> <td style="text-align:right;"> 2,278,906 </td> <td style="text-align:right;"> 0.0002361 </td> </tr> </tbody> </table> <br> <br> (Population stats scraped from [citypopulation.com](https://www.citypopulation.de/php/usa-newyorkcity.php)) --- #### ⚡ Fire trivia lighting round! ⚡ *What time of day do fires typically happen?* -- Answer: -- <!-- .pull-left[![](./img/fires_by_hour.png)] --> <img src="./img/fires_by_hour.png" style="padding-left:0px;"> <!-- style="padding-right:100px;" --> (Daily sunrise and sunset times scraped from [timeanddate.com](https://www.timeanddate.com/sun/usa/new-york).) --- class: inverse <br> ## Thanks! <img src="https://media.giphy.com/media/AyXYkGy0LQWhG/giphy.gif" height="450" align="center" style="padding-left:0px;">