The time on the clock
Yesterday Ollie, a student here at the Master in Applied Data Science, had a very good question about local times.
The very short answer is you don’t simply mix local times in R. Not sure about the political message of this 😼.
Ollie is working on a dataframe with a lot of datetimes, all expressed in UTC. The datetimes are from different places around the world. Given the nature of the problem he is solving (something to do with bee keeping!), he wants to know what time it was on the wall clock in those places.
There’s a direct solution — thanks to lubridate — but the path we take to solve the task offers an interesting view on datetime vectors (namely, POSIXct
vectors). And let me play with purrr
and rap
.
The roads goes ever on
Let’s walk. We don’t even get to the door without our walking sticks.
library(tidyverse) # for data wrangling
library(rap) # an alternative to purrr
library(vctrs) # we'll see way later
library(lubridate) # for dates and stuff
The road goes ever on, and it is a time of lies and intrigues: we build a test dataset based on the lakers dataset from lubridate
.
lakers contains two columns, date
and time
, and we need to unite()
them to obtain a proper datetime
column. We pretend all those datetimes are UTC time.
example_df <- lakers %>%
tbl_df() %>%
unite(datetime, date, time, sep = " ") %>%
mutate(utc_time = datetime %>% ymd_hm(tz = "UTC"),
timez = sample(OlsonNames(),
size = nrow(.),
replace = TRUE)) %>%
select(utc_time, timez)
The result is a suitable narrow tibble with a UTC time and a (random) timezone.
example_df
#> # A tibble: 34,624 x 2
#> utc_time timez
#> <dttm> <chr>
#> 1 2008-10-28 12:00:00 America/Moncton
#> 2 2008-10-28 11:39:00 America/Nipigon
#> 3 2008-10-28 11:37:00 Antarctica/Syowa
#> 4 2008-10-28 11:25:00 MET
#> 5 2008-10-28 11:23:00 Brazil/DeNoronha
#> 6 2008-10-28 11:22:00 Pacific/Wallis
#> 7 2008-10-28 11:22:00 Etc/Greenwich
#> 8 2008-10-28 11:22:00 US/Eastern
#> 9 2008-10-28 11:00:00 Factory
#> 10 2008-10-28 10:53:00 Etc/GMT-5
#> # … with 34,614 more rows
what-the-heck o’clocl
Looking into its tools bag, Ollie found with_tz()
: the lubridate function to convert datetime from one time zone to another. That looks exactly like what we need, doesn’t it? So, let’s try to use it!
😠take 1:
Let’s see if we can just use it like it is. We use mutate
to create the new variable, and cross our fingers.
example_df %>%
mutate(local_time = with_tz(time = utc_time, tzone = timez))
#> Error in as.POSIXlt.POSIXct(x, tz) : invalid 'tz' value
What the banana! Invalid time zone
?? That can’t be! We used the time zones straight out of OlsonNames()
, so there’s no way they can be wrong. So what? mmm, maybe it’s not vectorized?
😠take 2:
Let’s try to use purrr::map()
— or a suitable version — to work on every row and get what we need.
example_df %>%
mutate(local_time = map2(utc_time,timez,with_tz))
#> # A tibble: 34,624 x 3
#> utc_time timez local_time
#> <dttm> <chr> <list>
#> 1 2008-10-28 12:00:00 America/El_Salvador <dttm [1]>
#> 2 2008-10-28 11:39:00 Asia/Beirut <dttm [1]>
#> 3 2008-10-28 11:37:00 Greenwich <dttm [1]>
#> 4 2008-10-28 11:25:00 Pacific/Ponape <dttm [1]>
#> 5 2008-10-28 11:23:00 Libya <dttm [1]>
#> 6 2008-10-28 11:22:00 Asia/Ulan_Bator <dttm [1]>
#> 7 2008-10-28 11:22:00 GMT+0 <dttm [1]>
#> 8 2008-10-28 11:22:00 America/Belize <dttm [1]>
#> 9 2008-10-28 11:00:00 Asia/Gaza <dttm [1]>
#> 10 2008-10-28 10:53:00 Indian/Mayotte <dttm [1]>
#> # … with 34,614 more rows
Looks promising, but we got back a list instead of just a datetime — and it’s not purrr error: we did not specify any output format, so it probably just throw everything in a big list. But this are all datetimes. Alas!, there’s not map_...()
function specific to datetimes.
No biggie, we can try to unnlist()
example_df %>%
mutate(local_time = map2(utc_time,timez,with_tz)) %>%
unnest()
#> # A tibble: 34,624 x 3
#> utc_time timez local_time
#> <dttm> <chr> <dttm>
#> 1 2008-10-28 12:00:00 America/El_Salvador 2008-10-28 12:00:00
#> 2 2008-10-28 11:39:00 Asia/Beirut 2008-10-28 11:39:00
#> 3 2008-10-28 11:37:00 Greenwich 2008-10-28 11:37:00
#> 4 2008-10-28 11:25:00 Pacific/Ponape 2008-10-28 11:25:00
#> 5 2008-10-28 11:23:00 Libya 2008-10-28 11:23:00
#> 6 2008-10-28 11:22:00 Asia/Ulan_Bator 2008-10-28 11:22:00
#> 7 2008-10-28 11:22:00 GMT+0 2008-10-28 11:22:00
#> 8 2008-10-28 11:22:00 America/Belize 2008-10-28 11:22:00
#> 9 2008-10-28 11:00:00 Asia/Gaza 2008-10-28 11:00:00
#> 10 2008-10-28 10:53:00 Indian/Mayotte 2008-10-28 10:53:00
#> # … with 34,614 more rows
What the kerfuffle! That’s just the UTC time back again. That’s NOT what we want. Silly purrr.
😠take 3:
Let’s try something different, something new and wonderful! Romain Francois just twitted about rap, a way of doing mutate + map in one call, and you can specify the format of the outputs using vctrs. vctrs has the appropriate constructor for the kind of ouput we want:
Using the notation of rap (here’s there’s a nice introduction) we can do this:
example_df %>%
rap(local_time = new_datetime() ~ with_tz(utc_time, timez))
#> # A tibble: 34,624 x 3
#> utc_time timez local_time
#> <dttm> <chr> <dttm>
#> 1 2008-10-28 12:00:00 Brazil/DeNoronha 2008-10-29 01:00:00
#> 2 2008-10-28 11:39:00 America/Lima 2008-10-29 00:39:00
#> 3 2008-10-28 11:37:00 Africa/Ouagadougou 2008-10-29 00:37:00
#> 4 2008-10-28 11:25:00 Pacific/Kosrae 2008-10-29 00:25:00
#> 5 2008-10-28 11:23:00 Australia/Melbourne 2008-10-29 00:23:00
#> 6 2008-10-28 11:22:00 America/Porto_Velho 2008-10-29 00:22:00
#> 7 2008-10-28 11:22:00 America/Argentina/Tucuman 2008-10-29 00:22:00
#> 8 2008-10-28 11:22:00 Antarctica/South_Pole 2008-10-29 00:22:00
#> 9 2008-10-28 11:00:00 America/Argentina/La_Rioja 2008-10-29 00:00:00
#> 10 2008-10-28 10:53:00 Etc/GMT-10 2008-10-28 23:53:00
#> # … with 34,614 more rows
Oh, that’s not UTC time anymore.
The left-handside of the formula (the stuff at the left of ~) is the kind of output we want. The right-handside of the formula (the stuff at the right of ~) is something very similar to map()
, but we can use directly the variable names and not those pesky purrr names (.x
, .y
, ..1
, ..2
, …: purrr uses this shorthand names to define its lambda functions arguments).
But! Wait! Take a second look! It’s not UTC time, but it’s not even the local time: the difference between utc_time
and time
is always the same, no matter what timez
says. That’s not good 💔.
Got the number 13 tatooed on my neck
The issue here is with the POSIXct
, and thus datetimes, encoding. Datetimes are stored as vectors and the timezone is an attribute of that vector. That means that we can’t mix datetimes with different timezones: we either get an error, or they are promoted to same timezone. So, no luck here.
The issue was not purrr nor rap, but the way datetimes are treated. A compromise solution was to accept a list column back: that is, skipping the unnest()
step or not specifying anything in the left-handside of the formula for rap()
(rap(local_time = ~ with_tz(utc_time, timez))
). We don’t get to see the datetime printed nicely to screen, but we can still operate on those (more or less comfortably).
The happy encoding
Don’t shed no tear: this story has an happy ending.
Oh, you were here for the short story and the direct solution? Well, in lubridate
there exists a function, local_time()
, which returns the time on the clock at a specified time zone. And it turns out that local_time
is vectorized, so you don’t need to use map()
nor rap()
:
example_df %>%
mutate(local_time = local_time(utc_time, timez, units = "hours"))
#> # A tibble: 34,624 x 3
#> utc_time timez local_time
#> <dttm> <chr> <time>
#> 1 2008-10-28 12:00:00 Atlantic/St_Helena 12.000000 hours
#> 2 2008-10-28 11:39:00 America/Grand_Turk 7.650000 hours
#> 3 2008-10-28 11:37:00 Asia/Kolkata 17.116667 hours
#> 4 2008-10-28 11:25:00 Pacific/Efate 22.416667 hours
#> 5 2008-10-28 11:23:00 Europe/Stockholm 12.383333 hours
#> 6 2008-10-28 11:22:00 Antarctica/Vostok 17.366667 hours
#> 7 2008-10-28 11:22:00 America/Lower_Princes 7.366667 hours
#> 8 2008-10-28 11:22:00 Australia/West 20.366667 hours
#> 9 2008-10-28 11:00:00 America/Edmonton 5.000000 hours
#> 10 2008-10-28 10:53:00 Pacific/Port_Moresby 20.883333 hours
#> # … with 34,614 more rows
Here you are Ollie! 😄