Dec 8, 2017

Towards Developing a Shiny App



Shiny is an R package that makes it easy to build interactive web apps straight from R. I came across shiny apps early this year (2017) whilst looking at various ways of productionizing Data Science. However, back then I had only gone through online examples and tutorials. If you have any experience with the hosts of tutorials online, I’m sure you will have realised that working on tutorials and solving real life problems are two entirely different things. For one, there are various issues which arise when you work on real life problems, issues that one may not encounter when following a tutorial.


Last week, I was given the task of building a shiny app for a client on an on-going project at work. I had the option of looking through the myriad of solutions online and just simply tweaking some of these to suit my needs. Another option was to start from scratch and build step by step. The latter option ensures that you get a very robust and custom made solution, but In a fast paced system like ours, option 1 with a little blend of option 2 was the way forward! In this post, I will highlight the steps I followed in going from start to finish (well, close to finish 😊 )


Step 1: Have an idea of what you want to present

Basically, it’s always good to at least know what you want to present to the client in terms of visualizations, tables, etc. How do you want to present the data, how many tabs do you want? How many visualizations on each tab? This will be mostly dependent on what you believe the client will love to see based on your interactions with the client and other team members. It is a good idea to scribble a wireframe down on paper. The wireframe is simply a draft design of what goes where for each tab. If you’ve got time and skills, you can use any of the fancy tools like Microsoft Visio or Gliffy for doing up your wireframe. Check a sample of mine below :D



Make as much notes as possible on your wireframes. The web is filled with many ideas for your app. Look through them for inspiration. Don’t be afraid to play around. Note that this isn’t your final version.


Step 2: Organise your work

This is a follow up on the previous step and some of you may have already accomplished this in Step 1. So, you could call this step 1b if you want.


If your project involves multiple outputs that may not fit into one page or have different formats (images, tables, etc), it is advisable to put them in their own separate tabs. Tabs in shiny are like separate pages for our displays. You may want to display what is most required first. Clients tend to love visualizations so this should be in the first (opening tab). It’s very important to organise this part of your work before starting. However, note that you can move things around as you go on. This however isn’t as straightforward as you may think but something you will have to get the hang of as you go on.


Step 3: Templates are your best friend!

Unless you have all the time in the world and you are prepared to spend hours trying to fix simple problems, start developing your app using templates. If you have the time, then feel free to go through this tutorials (). However, if you are on the clock like I was (and still am), use templates.


That said, it is important to first learn how shiny works. Every Shiny app is composed of two parts: a web page that shows the app to the user (ui), and a computer that powers the app (server). The ui or user interface is what the users of the app will see and interact with. The server part is the engine that does the work in the background. I will speak more on this in another post when I break shiny down into its functionalities. For now, let’s go back to how useful templates can be!


For my project, the first operational object I wanted on my app was a way to load my data on to shiny. Most of the templates out there on the internet have pre-loaded .csv files (or other files) but I wanted a bit of flexibility where a user can upload his/her own file before doing any analytics. Luckily for me, I did not have to search too far to find this. As with most shiny objects, there are two sets of codes required for this, one is the interface (written into the ui section) and the other is the engine that is called when the user clicks on the upload button.





If you are familiar with R codes, you will immediately notice that the serve side of the code is basically an R script to read the file. In the ui section, you can define what files to restrict the user to. In my case, I want to read in RDS files so I use a readRDS() function. If you want it to read .csv and other files, simply add these to your ‘accept’ argument in the user interface and use the appropriate read in function on the server side. I have assigned my file to a variable ‘df’ to make it re-usable. You don’t have to do this but it’s convenient.


This is just one example of where using a template has been useful. Don’t expect to find a template that perfectly matches your needs. You may have to tweak it here and there or use a combination of templates. In my case for example, the load file code I found only allowed loading of csv files and used the read.csv() function. I had to edit this to suit what I wanted to achieve. Going forward, you can search the internet/books for sample codes to help you achieve your aim.


Step 4: Iterate, iterate, iterate!

As you develop your app, you will always feel the need to make changes here and there especially when you see other Apps with some functionalities you didn’t think of. Feel free to iterate. I say that with a warning: Be sure you have an idea of what you want to accomplish from start (step 1), complete this idea or at least get 90% of it done before you start making additional changes. This is because, if you are like me, you will always want to keep improving on your design and as there are so many beautiful designs out there, there’s a chance of continuous iterations thus delaying the project. This is why steps 1 and 2 are so important. Ensure you know what client wants. Make sure you achieve at least 90% of that, then you can make improvements to designs.


Step 5 Don’t forget to tidy up

As with all coding projects, remember to tidy up your code and design when you are done. Include additional comments ( I expect that you’ve been commenting whilst coding. Here you can add more comments), improve the aesthetics – colours, logos, icons (there’s an icon library here for shiny (

In my next post, I will discuss the more technical details of building a Shiny App. Look out for that!



New Posts
  • Microsoft have recently released an updated version of their Azure Machine learning service. At Elastacloud we have been using AML since the first release to deploy machine learning models to the cloud. AML provides a platform to develop, train, test, deploy, manage, and track machine learning models but it is mostly the deploy and manage part that Elastacloud have made use of, so my article is going to focus on these aspects. Based on feedback from the community Microsoft have made sweeping changes to the service which essentially mean it is a new product. Some of the major changes that users will have to adjust to are: - No workbench Don’t need individual experimentation and modelmanagement accounts, just a single workspace New Python SDKs New Azure Machine Learning CLI extension My view on the removal of the workbench is neutral, I never used it previously other than to launch the CLI. My understanding is that it was an underused tool, with most people preferring to develop their models in an IDE such as VS Code or even in Jupyter Notebooks. There are two new Python SDKs; the Machine learning SDK and the Data prep SDK. The ML SDK, in Microsoft’s words, “is used by data scientists and AI developers to build and run machine learning workflows upon the Azure Machine Learning service”. In a recent Elastacloud project we have deployed a number of predictive machine learning models, as a web service, for a customer using the new AML service. I used the Machine learning SDK to complete this task and after just a few teething problems I found that the SDK was easy to work with and certainly felt like a tool that made me more productive. One of the requirements for this web service required some consideration on how to best create the service; different models should be loaded and used to generate the predictions based on the day of the week. This requirement arises because our customer always needs forecasts for the next two days and the next two business days, meaning that on a Friday, for example, they want forecasts for the next two days (Saturday and Sunday) and the next two business days (Monday and Tuesday) whereas on a Monday they only need them for the next two business days (Tuesday and Wednesday). Therefore, we have three different models (one day ahead, two days ahead, weekend) deployed to the same service, with multiple dependencies (e.g. *.py files). Azure ML made the creation of the service very easy, as demonstrated in the code examples below. Creating a Docker image Models are deployed in Docker images to Azure Kubernetes Service (AKS) or Azure Container Instances (ACI). The code excerpt below shows how simple it is to create this image in only two lines; image_config contains the required file alongside the optional dependencies and a conda .env file. The image is then built with ContainerImage.create where the already registered models are provided. Deploying the service Once we have a successfully built Docker image we can deploy it to AKS or ACI as a web service. This, again, is very easy to do with the SDK as shown in the image below. We only need to define a configuration, with AksWebservice.deploy_configuration() (gives default configuration), then use the deploy_from_image method, providing our Docker image as one of the arguments. And so we have successfully deployed our machine learning models as a web service! Now we can get the scoring URI ( aks_service.scoring_uri ) and the access keys ( aks_service.get_keys ) and start making requests to our machine learning models.
  • Extreme Gradient Boosting (xgboost) is a very fast, scalable implementation of gradient boosting that has taken the data science world by storm, with xgboost regularly online data science competitions and use at scale across different industries. Xgboost was originally developed by Tiangi Chen and is renowned for execution speed and model performance. I have recently been conducting some experiments with xgboost for the Renewables AI products. Below I show how to run a simple regression type tree and linear based model. Thereafter we go on to explore grid search and random search with xgboost. But first a little background information… Boosting is what gives xgboost it’s state of the art performance. Boosting is not a specific machine learning algorithm, but a concept that can be applied to set of machine learning algorithms, hence boosting is known as a meta algorithm. Essentially, xgboost is an ensemble method, used to convert many weak learners (models performing slightly better than chance) to a strong learner. This is achieved via boosting, where a set of weak learners on subsets of the data is iteratively learnt. Each weak learner is weighted according to performance. Thereafter, each weak learner’s predictions are combined and multiplied by their weight to obtain a final weighted prediction, which is better than any of the individual predictions themselves. The Python API is capable of running the xgboost on regression and classification problems, using decision tree and linear learners. Below we apply xgboost to regression type problem using a tree-based learner. Decision trees are an iterative contruction of binary decisions (one decision at a time) until a stopping criterion is met (ie. The majority of one decision split consists of one category/value or another). Individual trees tend to overfit (low bias, high variance), hence perform well on training data but don’t generalise as well, hence ensemble methods are useful in this scenario. Notice as this is a regression type problem we use the loss function "reg:linear", whereas for a classification problem we would use "reg:logistic" or "binary: logistic" depending on whether you are interested in the class or the probability of the class. A loss function maps the difference between the actual and predicted values - we aim to find the model with the lowest loss function. import numpy as np import pandas as pd from sklearn.metrics import r2_score import xgboost as xgb from sklearn.metrics import mean_squared_error from xgboost import plot_tree X_train = pd.read_csv("X_train.csv") Y_train = pd.read_csv("Y_train.csv") X_test = pd.read_csv("X_test.csv") Y_test = pd.read_csv("\Y_test.csv") list(X_train) list(X_test) #################################### # XGBoost Decision Tree ################################### xg_reg = xgb.XGBRegressor(objective='reg:linear', n_estimators=10, seed= 123),Y_train) #XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1, # colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0, # max_depth=3, min_child_weight=1, missing=None, n_estimators=10, # n_jobs=1, nthread=None, objective='reg:linear', random_state=0, # reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=123, # silent=True, subsample=1) preds = xg_reg.predict(X_test) rmse = np.sqrt(mean_squared_error(Y_test, preds)) print("RMSE: %f" % (rmse))#RMSE: 164.866642 r2 = r2_score(Y_test, preds) # Plot the first tree xgb.plot_tree(xg_reg,num_trees=0) In the following code we apply xgboost to a regression type problem using linear based learners. ##################################### # XGBoost Linear Regression ##################################### DM_train = xgb.DMatrix(data=X_train, label=Y_train) DM_test = xgb.DMatrix(data=X_test, label=Y_test) params = {"booster":"gblinear", "objective":"reg:linear"} xg_reg = xgb.train(params = params, dtrain=DM_train, num_boost_round=10) preds = xg_reg.predict(DM_test) rmse = np.sqrt(mean_squared_error(Y_test, preds)) print("RMSE: %f" % (rmse)) #RMSE: 169.731848 r2 = r2_score(Y_test, preds) Like many other algorithms, performance can be enhanced by tuning the hyperparameters. Below shows an example of a xgboost grid search. Grid search can be quite computationally expensive as we exhaustively search over a given set of hyperparameters, and pick the best performing hyperparameters. For example, if we have 2 hyperparameters to tune and 4 possible values for each parameter, that’s 16 possible parameter configurations. An alterative to grid search is random search, where you can define how many models/iterations to try before stopping. During each iteration, the algorithm randomly selects a value in the range specified for each hyperparameter. import pandas as pd import xgboost as xgb import numpy as np from sklearn.model_selection import GridSeachCV housing_data = pd.read_csv("ames_housing_trimmed_processed.csv") X, y = housing_data[housing_data.columns_tolist()[:-1]], housing_data[housing_data.columns.tolist()[-1]] housing_dmatrix = xgb.DMatrix(data=X, label=y) gbm_param_grid = {'learning_rate':[0.01,0.1,0.5,0.9], 'n_estimators': [200], 'subsample': [0.3,0.5,0.9]} gbm = xgb.Regressor() grid_mse = GridSearchCV(estimator = gbm, param_grid = gbm_param_grid, scoring = 'neg_mean_squared_error', cv = 4, verbose = 1),y) print("Best parameters found: ", grid_mse.best_params_) print("lowest RMSE: ", np.sqrt(np.abs(grid_mse.best_score_))) A quick overview of the hyperparameters that can be tuned for tree based models: - eta/learning rate (how quickly the model fits residual error using additional base lase learners -gamma; minimum loss reduction to create new tree split -lambda: L2 regularisation on lead weights -alpha: L1 regularisation on leaf weights -max depth: how big a tree can grow -subsample: Percentage of sample that can be used for any given boosting round -colsample_tree: the fraction of features that can be called on during any boosting round (ranges from 0-1) An overview of hyperparameters that can be tuned for linear learners: -lambda: L2 regularisation on weights -alpha: L1 regularisation on weights -lambda_bias: L2 regularisation term on bias Another useful blog posts related to xgboost can be found here . Happy experimenting!
  • I have had the opportunity to speak at various DataScience MeetUp events in Nottingham, Loughborough and London. Typically, rather than use the usual PowerPoint presentation, I prefer running LIVE codes as this gives the audience the confidence that my codes work and is repeatable. The challenge with this for me however is that I tend to do most of my preparation at the office and end up using my personal laptop for the eventual presentation. This means that I run a 'office computer' - github repo - personal computer back and forth triangle. Not until a colleague in the office introduced me to azure notebooks (thanks Darren!). Microsoft Azure Notebooks is a free service that provides Jupyter Notebooks along with supporting packages for R, Python and F#. Using this notebooks is easy! All one needs is a free account at Azure notebooks uses libraries for grouping notebooks. For example, I now have a library for my MeetUp events based on location Once you are signed in and you've created a new library by clicking the '+New Library' button, you can use the '+New' button in the library environment to create a notebook by clicking the 'Item type' drop box. At the moment, it supports Python(2.7, 3.5, 3.6), R and F#. If you are fussy about organisation, it also allows you to create a folder instead and then create your files. In addition to creating new files, you can also load files from a URL or from your computer. That's not all! The notebook file on azure has a cool slide presentation feature called the RISE Slideshow. This RISE Slideshow is a notebook extension which allows you to use it for presentations. To enable Slide mode: In your notebook click View/Cell Toolbar/Slide Show For each cell select its type and hierarchy on the right hand side To start the presentation, click the "graph" icon (shown above) on the main toolbar.  Use left/right/up/down to navigate slides. This automatically turns your notebook into a slide presentation! The coolest thing about this azure notebooks is it's all on the cloud. This means that you can present using any computer/laptop (even if there's no python installed). And because of this, you do not need to pip install any library on the local machine. It's all done on the cloud Note that your packages will only be available for the lifetime of your notebook server and your notebook server will typically shutdown after 1 hour of inactivity. Azure Notebooks also lets you auto-setup your environment if you have a pip requirements file. There's so much to it and I'll just let you explore! I hope this is as useful for you as it was (and still is) for me. Let me know if you've used this before and what you like or do not like about it. For suggestions on other platforms especially for beginners to use for Python/R, see this post by Laura Da Silva