A Hurricane and a Ruler

Nico Carvajal
7 min readFeb 5, 2021

--

By early evening Sept. 19, 2017 we were hunkered down inside a concrete house, two families anxiously awaiting what destiny had in store for us. We felt we were due, it had been almost two decades since our last serious hurricane, and two weeks after a close encounter with Hurricane Irma, which “spared” us (although part of the island was still without power), but levelled the nearby Virgin Islands. With this one coming at an odd angle, and aiming straight at us like a mad bull with sharp horns, we kept our faith that we could squeak by another hurricane season without major damage.

Living in ‘Hurricane Alley’, this wasn’t our first rodeo. When I was in elementary school, these drills seemed to be as common (and fun!) as ‘snow days’ in New York. Most were false alarms; after close grazes, we thanked our good fortunes and the “Cordillera Central” (a mountain range that crosses the middle of the island from east to west) for deterring most of the storms and sparing us from the worst. And so, we believed that every hurricane will be deterred by our savior mountain range (we always prepared for the worst, though). Even our last memorable hurricane at the time (Hurricane Hugo), could’ve been a lot worse, had it not turned North at the last minute (the eye of the hurricane grazed the North-East corner of the island).

That is, until 1998 when Hurricane Georges flouted the mountain range and crossed the island in a straight line from East to West. Georges was only a Cat1; as we contemplated the destruction (many lost their homes and most of us were left without power for over a month), we shuddered at what would happen if a Cat4 or Cat5 attempted the same feat.

Nineteen years and two days later, we would find out. After two unusually calm decades, which combined had less close calls than the prior decade, we were coming to the realization our luck had run out. Early morning Sept. 20, 2017, with our immediate needs and safety under control, I couldn’t help but feel awe and admiration for the power of Mother Nature. The aftermath of Hurricane Maria is a story for another day.

The Motivation

Lately, a common narrative in the news has been that hurricanes are getting bigger and more numerous. This goes against my intuition, in Puerto Rico, 2017 ended the longest (19 years) hurricane-free streak I had witnessed in my short lifespan (though with a bang!). I remember monster Hurricane Luis in the 1990’s, stories about Hurricane San Felipe in 1928 are still part of the lore of the island, “increasing number of hurricanes” just didn’t feel correct. Combine this with my admitted skepticism of the press’ motives, and my newly acquired Python and Statistics skills at Lambda School; and I decided to check this out for myself.

The Data

The data came directly from NOAA and contains information about hurricanes since 1851, the format’s description can be found here. The data consists of one row with information identifying hurricane (Primary Row) and subsequent rows with information known about the hurricane at specific date & time (Data Rows) — up until the next Primary Row. So the first order of business was to put the data in a workable format.

Raw data from NOAA

To achieve this I added a column where a Unique Identifier (called id-name, consisting of a concatenation of the id and time column) was copied over, the subsequent rows were filled with the id-name information for its hurricane. A new dataframe (df_hur) was then created with the following features for each hurricane:

  • id-name: unique identifier used to group the data
  • year: year the hurricane was recorded
  • dec: Decade, denoted by the first three digits of the year
  • max_wind: maximum wind speed recorded for the hurricane
  • min_press: minimum pressure recorded for the hurricane
  • lf: 1 if the hurricane made landfall, 0 if not
  • wind_speed_lf: wind speed recorded at the moment of landfall (NaN if the hurricane did not make landfall

The mean or median speed for individual hurricanes was not recorded because it is highly dependent on the number of observations, as well as when the observations were taken for each hurricane (i.e. a hurricane whose observations were taken at the tail end of it’s lifecycle will significantly lower mean and median wind speeds). The result was 1,893 unique hurricanes recorded, of which 1,016 made landfall at some point.

Unique Hurricanes Dataframe (tail)

Removing hurricanes whose wind speed never surpassed 50mph, left us with 1,363 hurricanes and 549 that made landfall. These were grouped by decade with the following features:

  • max_wind: maximum wind speed recorded for a hurricane in the decade
  • mean_of_ws: the mean of the maximum wind speed recorded for each hurricane in the decade
  • min_press: the minimum pressure recorded for a hurricane in the decade
  • mean_min_press: the mean of the minimum pressure recorded for each hurricane in the decade
  • No_of_Hurricanes: number of hurricanes recorded in the decade
  • No_of_lf_Hurricanes: number of hurricanes with recorded landfall in the decade
  • mean_ws_at_lf: the mean of the maximum wind speed recorded at the moment of landfall for each hurricane that made landfall in the decade
  • lf_perc: the percentage of hurricanes that made landfall in the decade
Information about Hurricanes for each decade 1850–2019

The Results

The first test was to compare if there was any correlation between decades and the number and max speed of hurricanes (if the number and intensity of hurricanes has been increasing in the past decade). As expected, the answer was a clear yes, the OLS analysis showed an increase of 2.81 hurricanes per decade with a p-value of .002. Similar results were found for maximum wind speed found in the decade, which saw a p-value of .000 and an increase of 2.64mph per decade!

These results may, or may not, sound surprising, but their compelling conclusions deserve to be analyzed further. The first thought that comes to my mind is: how were hurricanes tracked and recorded in the late 1800’s, early 1900's? Indeed the data description notes that the data is “far from being complete and accurate for the entire century and a half” and that uncertainty and biases may become more pronounced the farther back in time the data goes.

Given that nowadays, the maximum wind-speeds are measured by cool planes with pilots with considerably more nerves than me, and by satellites, it is safe to assume that both the number of hurricanes, and the maximum windspeeds were underestimated (many hurricanes that did not cross landfall where a human population was present were never recorded, and those which did, it’s maximum wind speed, which is sometimes achieved for only a few hours in the middle of the ocean, was not witnessed by anyone).

The Ruler

According to Wittgenstein’s Ruler, a philosophical razor I recently came across, when you’re measuring a table with a ruler, you’re also using the table to measure the ruler. As an example (when taken literally), if you’re in a place that’s at room temperature, but the thermometer unexpectedly marks 23 degrees, the measurement might be telling you that the thermometer is in Celsius (or broken). Would there be a way to correct the early rulers? Could we correct the units to give us a more accurate representation?

We’re going to attempt to do that by testing for a relationship (using the ordinary least squares method again) between the percentage of hurricanes which made landfall and the decade. If the reason many hurricanes were missed is because they never made landfall (at least where there were humans to record them) we expect this percentage to go down as we move into the future. The null hypothesis is that there will be no predictable change in hurricane landfall percentages. Unfortunately, just by plotting the graph, we can tell this is a no-go. There is no clear relationship and if there is, it is slightly positive. With a p-value of .795, we fail to reject the null hypothesis.

Percentage of recorded hurricanes that made landfall for a given decade

Note that the 1970’s decade had 0 hurricanes making landfall which denotes a gap in the data, I confirmed this by finding at least one hurricane in the decade (Celia) which is not marked as making landfall when, in fact, it did.

Further Research

So hurricane landfall percentage in the decade was not successful ‘calibrating’ our ruler, what other methods could we use? Playing with this data further may reveal other ways we could use to correct our biases. For example, we could look for other correlations between our variables that would be transferable to prior periods when instrumentation was not as accurate.

The National Climate Assessment either tried and gave up, or did not think it worth bothering, in any case, they do not make any asseverations regarding hurricane trends before the 1970’s because of the limitations in the data explained above, they do conclude that hurricane activity has significantly increased since the 1980's (which our data also shows).

Remember my intuition about the two “quiet” decades I mentioned before (2000–2017)? Well, each one had around 50% more hurricanes than the “turbulent” 1990’s decade. Guess intuition ain’t always right, I’m sure somebody’s got a philosophical razor for that I still haven’t learned about…

--

--