Enriching data sets is common activity carried out in data science projects, whereby new data sources such as demographic, social media, geospatial and weather data are obtained to provide better insights into specific scenarios. At Elastacloud we have seen enhanced model performance with the addition of weather data on several projects in Energy and Healthcare industries. This week I had the opportunity to utilise the darksky R package to obtain hourly weather data over two years for a city in South East England. This weather data may improve the performance of a RNN model I am developing for a transport project. The darksky R package provides programmatic access to the Dark Sky API, which provides current or historical global weather conditions. The first 1000 API requests you make every day are free of charge. Every API request beyond that costs $0.0001.
The darksky R package consists of six main functions, and use of the main function (get_forecast_for) is illustrated below. One handy pointer that was highlighted to me by Andrew Booth was the fact that errors can often be encountered if UNIX time is not inputted, when specifying the date directly in the function input. To convert POSIXct dates to UNIX time, simply convert the variable to numeric type.
An example of the data set and different types of weather variables obtained is shown below.