Project Template for Data Science in R

Every time that you start a new project in Data Science, you need to create several folders to locate the different inputs and outputs of your project. So, for each project, you need to decide where you will store your datasets, your scripts, your images, etc. and also how you will name all these folders. That’s fine if you are working in your own as you have space for creativity, but when you are working in a team or collaborating with more people, this creativity could be a bit of a nightmare.

Then, you start thinking how nice would be to have a fixed structure for all your projects, because in this way, it doesn’t matter who is working in your project, you know exactly where you can find everything.

In this article, I would like to introduce you an R package called “ProjectTemplate” that I find quite useful for this purpose. It will help you not only to organize the files in your project, but also, it will load for you all the R packages and data sets needed in your project.

Note: Be sure that you have already installed R version 2.7 or higher working in your RStudio, otherwise it may not work as expected. You can see your R version in your R console.

The first step is the installation of this package. You can simply follow the instructions provided at: http://projecttemplate.net/installing.html

For this example, we will use the IRIS dataset. You can get some information about this dataset and the CSV from: https://archive.ics.uci.edu/ml/datasets/iris

Creating your first project using ProjectTemplate package:

1. Start a new session in RStudio and open an R script.

File > New File > R Script

2. In your script, set your working directory using setwd() and check out that you got it right using getwd(). For example:

3. Then, we can create our project in the working directory that we have chosen. We have two different options when creating a working directory: the minimal version or the full version.

Let’s create both projects and see how they look like. For the minimal version, we will do the following:

If we explore the folder called RprojectMin, will see that ProjectTemplate has generated automatically the following folders:

While if we do the following:

It will take by default the full structure and it will generate automatically the following folders:

Details of the meaning and purpose of the different folders can be find at: http://projecttemplate.net/architecture.html. You will also find a README.md file inside each folder giving a general explanation of the goal of that particular folder.

Finally, we copy our IRIS dataset to our folder called data and load the project. In this way, we will load all the data in memory automatically and we can start working with it.

If you do it directly, you may find this error:

So, we have to set our working directory to the path where we have the ProjectTemplate directory, that is, the RProjectMin or RProjectFull that we already created. For example, if we have added the data to our data folder inside the RProjectMin folder, we can do the following:

Then, you will see that you’ve got automatically all your data in memory:

The purpose of this package is to work with a specific dataset and not really with general datasets. If you want to develop a solution for general datasets, you may find interesting the creation of your own packages instead of using ProjectTemplate package. You can find more info at http://projecttemplate.net/packages.html