Nov 6, 2018

Deploying machine learning models with the Microsoft Azure ML service


Edited: Nov 6, 2018

Microsoft have recently released an updated version of their Azure Machine learning service. At Elastacloud we have been using AML since the first release to deploy machine learning models to the cloud. AML provides a platform to develop, train, test, deploy, manage, and track machine learning models but it is mostly the deploy and manage part that Elastacloud have made use of, so my article is going to focus on these aspects.


Based on feedback from the community Microsoft have made sweeping changes to the service which essentially mean it is a new product. Some of the major changes that users will have to adjust to are: -


  • No workbench

  • Don’t need individual experimentation and modelmanagement accounts, just a single workspace

  • New Python SDKs

  • New Azure Machine Learning CLI extension


My view on the removal of the workbench is neutral, I never used it previously other than to launch the CLI. My understanding is that it was an underused tool, with most people preferring to develop their models in an IDE such as VS Code or even in Jupyter Notebooks.


Possible use cases for the Microsoft Azure ML service

There are two new Python SDKs; the Machine learning SDK and the Data prep SDK. The ML SDK, in Microsoft’s words, “is used by data scientists and AI developers to build and run machine learning workflows upon the Azure Machine Learning service”.


In a recent Elastacloud project we have deployed a number of predictive machine learning models, as a web service, for a customer using the new AML service. I used the Machine learning SDK to complete this task and after just a few teething problems I found that the SDK was easy to work with and certainly felt like a tool that made me more productive.


One of the requirements for this web service required some consideration on how to best create the service; different models should be loaded and used to generate the predictions based on the day of the week. This requirement arises because our customer always needs forecasts for the next two days and the next two business days, meaning that on a Friday, for example, they want forecasts for the next two days (Saturday and Sunday) and the next two business days (Monday and Tuesday) whereas on a Monday they only need them for the next two business days (Tuesday and Wednesday).


Therefore, we have three different models (one day ahead, two days ahead, weekend) deployed to the same service, with multiple dependencies (e.g. *.py files). Azure ML made the creation of the service very easy, as demonstrated in the code examples below.


Creating a Docker image


Models are deployed in Docker images to Azure Kubernetes Service (AKS) or Azure Container Instances (ACI). The code excerpt below shows how simple it is to create this image in only two lines; image_config contains the required file alongside the optional dependencies and a conda .env file. The image is then built with ContainerImage.create where the already registered models are provided.

Creating the docker image

Deploying the service


Once we have a successfully built Docker image we can deploy it to AKS or ACI as a web service. This, again, is very easy to do with the SDK as shown in the image below. We only need to define a configuration, with AksWebservice.deploy_configuration() (gives default configuration), then use the deploy_from_image method, providing our Docker image as one of the arguments.

Creating the AKS web service

And so we have successfully deployed our machine learning models as a web service! Now we can get the scoring URI (aks_service.scoring_uri) and the access keys (aks_service.get_keys) and start making requests to our machine learning models.

New Posts
  • Extreme Gradient Boosting (xgboost) is a very fast, scalable implementation of gradient boosting that has taken the data science world by storm, with xgboost regularly online data science competitions and use at scale across different industries. Xgboost was originally developed by Tiangi Chen and is renowned for execution speed and model performance. I have recently been conducting some experiments with xgboost for the Renewables AI products. Below I show how to run a simple regression type tree and linear based model. Thereafter we go on to explore grid search and random search with xgboost. But first a little background information… Boosting is what gives xgboost it’s state of the art performance. Boosting is not a specific machine learning algorithm, but a concept that can be applied to set of machine learning algorithms, hence boosting is known as a meta algorithm. Essentially, xgboost is an ensemble method, used to convert many weak learners (models performing slightly better than chance) to a strong learner. This is achieved via boosting, where a set of weak learners on subsets of the data is iteratively learnt. Each weak learner is weighted according to performance. Thereafter, each weak learner’s predictions are combined and multiplied by their weight to obtain a final weighted prediction, which is better than any of the individual predictions themselves. The Python API is capable of running the xgboost on regression and classification problems, using decision tree and linear learners. Below we apply xgboost to regression type problem using a tree-based learner. Decision trees are an iterative contruction of binary decisions (one decision at a time) until a stopping criterion is met (ie. The majority of one decision split consists of one category/value or another). Individual trees tend to overfit (low bias, high variance), hence perform well on training data but don’t generalise as well, hence ensemble methods are useful in this scenario. Notice as this is a regression type problem we use the loss function "reg:linear", whereas for a classification problem we would use "reg:logistic" or "binary: logistic" depending on whether you are interested in the class or the probability of the class. A loss function maps the difference between the actual and predicted values - we aim to find the model with the lowest loss function. import numpy as np import pandas as pd from sklearn.metrics import r2_score import xgboost as xgb from sklearn.metrics import mean_squared_error from xgboost import plot_tree X_train = pd.read_csv("X_train.csv") Y_train = pd.read_csv("Y_train.csv") X_test = pd.read_csv("X_test.csv") Y_test = pd.read_csv("\Y_test.csv") list(X_train) list(X_test) #################################### # XGBoost Decision Tree ################################### xg_reg = xgb.XGBRegressor(objective='reg:linear', n_estimators=10, seed= 123),Y_train) #XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1, # colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0, # max_depth=3, min_child_weight=1, missing=None, n_estimators=10, # n_jobs=1, nthread=None, objective='reg:linear', random_state=0, # reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=123, # silent=True, subsample=1) preds = xg_reg.predict(X_test) rmse = np.sqrt(mean_squared_error(Y_test, preds)) print("RMSE: %f" % (rmse))#RMSE: 164.866642 r2 = r2_score(Y_test, preds) # Plot the first tree xgb.plot_tree(xg_reg,num_trees=0) In the following code we apply xgboost to a regression type problem using linear based learners. ##################################### # XGBoost Linear Regression ##################################### DM_train = xgb.DMatrix(data=X_train, label=Y_train) DM_test = xgb.DMatrix(data=X_test, label=Y_test) params = {"booster":"gblinear", "objective":"reg:linear"} xg_reg = xgb.train(params = params, dtrain=DM_train, num_boost_round=10) preds = xg_reg.predict(DM_test) rmse = np.sqrt(mean_squared_error(Y_test, preds)) print("RMSE: %f" % (rmse)) #RMSE: 169.731848 r2 = r2_score(Y_test, preds) Like many other algorithms, performance can be enhanced by tuning the hyperparameters. Below shows an example of a xgboost grid search. Grid search can be quite computationally expensive as we exhaustively search over a given set of hyperparameters, and pick the best performing hyperparameters. For example, if we have 2 hyperparameters to tune and 4 possible values for each parameter, that’s 16 possible parameter configurations. An alterative to grid search is random search, where you can define how many models/iterations to try before stopping. During each iteration, the algorithm randomly selects a value in the range specified for each hyperparameter. import pandas as pd import xgboost as xgb import numpy as np from sklearn.model_selection import GridSeachCV housing_data = pd.read_csv("ames_housing_trimmed_processed.csv") X, y = housing_data[housing_data.columns_tolist()[:-1]], housing_data[housing_data.columns.tolist()[-1]] housing_dmatrix = xgb.DMatrix(data=X, label=y) gbm_param_grid = {'learning_rate':[0.01,0.1,0.5,0.9], 'n_estimators': [200], 'subsample': [0.3,0.5,0.9]} gbm = xgb.Regressor() grid_mse = GridSearchCV(estimator = gbm, param_grid = gbm_param_grid, scoring = 'neg_mean_squared_error', cv = 4, verbose = 1),y) print("Best parameters found: ", grid_mse.best_params_) print("lowest RMSE: ", np.sqrt(np.abs(grid_mse.best_score_))) A quick overview of the hyperparameters that can be tuned for tree based models: - eta/learning rate (how quickly the model fits residual error using additional base lase learners -gamma; minimum loss reduction to create new tree split -lambda: L2 regularisation on lead weights -alpha: L1 regularisation on leaf weights -max depth: how big a tree can grow -subsample: Percentage of sample that can be used for any given boosting round -colsample_tree: the fraction of features that can be called on during any boosting round (ranges from 0-1) An overview of hyperparameters that can be tuned for linear learners: -lambda: L2 regularisation on weights -alpha: L1 regularisation on weights -lambda_bias: L2 regularisation term on bias Another useful blog posts related to xgboost can be found here . Happy experimenting!
  • I have had the opportunity to speak at various DataScience MeetUp events in Nottingham, Loughborough and London. Typically, rather than use the usual PowerPoint presentation, I prefer running LIVE codes as this gives the audience the confidence that my codes work and is repeatable. The challenge with this for me however is that I tend to do most of my preparation at the office and end up using my personal laptop for the eventual presentation. This means that I run a 'office computer' - github repo - personal computer back and forth triangle. Not until a colleague in the office introduced me to azure notebooks (thanks Darren!). Microsoft Azure Notebooks is a free service that provides Jupyter Notebooks along with supporting packages for R, Python and F#. Using this notebooks is easy! All one needs is a free account at Azure notebooks uses libraries for grouping notebooks. For example, I now have a library for my MeetUp events based on location Once you are signed in and you've created a new library by clicking the '+New Library' button, you can use the '+New' button in the library environment to create a notebook by clicking the 'Item type' drop box. At the moment, it supports Python(2.7, 3.5, 3.6), R and F#. If you are fussy about organisation, it also allows you to create a folder instead and then create your files. In addition to creating new files, you can also load files from a URL or from your computer. That's not all! The notebook file on azure has a cool slide presentation feature called the RISE Slideshow. This RISE Slideshow is a notebook extension which allows you to use it for presentations. To enable Slide mode: In your notebook click View/Cell Toolbar/Slide Show For each cell select its type and hierarchy on the right hand side To start the presentation, click the "graph" icon (shown above) on the main toolbar.  Use left/right/up/down to navigate slides. This automatically turns your notebook into a slide presentation! The coolest thing about this azure notebooks is it's all on the cloud. This means that you can present using any computer/laptop (even if there's no python installed). And because of this, you do not need to pip install any library on the local machine. It's all done on the cloud Note that your packages will only be available for the lifetime of your notebook server and your notebook server will typically shutdown after 1 hour of inactivity. Azure Notebooks also lets you auto-setup your environment if you have a pip requirements file. There's so much to it and I'll just let you explore! I hope this is as useful for you as it was (and still is) for me. Let me know if you've used this before and what you like or do not like about it. For suggestions on other platforms especially for beginners to use for Python/R, see this post by Laura Da Silva
  • I just rounded up a project with a client that centred around Natural Language Processing. One of the things I enjoy about working at Elastacloud is the diversity of projects on a week to week basis. One week you may be working on building models for optimising energy usage and another week you could find yourself building a model for expanding abbreviations. What this does is that it helps to keep you on your toes with technology and gives one a new challenge every week. Back to my reason for today’s post. The Natural Language Processing Kit (NLTK) library in Python is a gem! If you are working on any NLP project using Python, the NLTK library should be your anchor. With 50 corpora and lexicons, 9 stemmers, and dozens of algorithms and modules that can be used for tokenization, stemming, building n-grams, naïve Bayes, k-means, EM classifiers, and so much more, the NLTK is definitely a go-to for anyone working on any NLP project. That said, the NLTK has a steep learning curve. If you are however looking for a quick, easy to learn library, I will recommend TextBlob. It is built on the shoulders of NLTK and Pattern and offers some features from the NLTK such as sentiment analysis, pos-tagging, noun phrase extraction, etc. It’s pretty much easy to use! Say we have two strings: With TextBlob, we can do simple things like subset the data, convert all to lower/upper case or concatenate both sentences One can also break down multiple sentences into individual sentences or words (tokens) and select a particular token One of the problems we had to solve on the project involved working with data in context. In NLP problems, this could mean breaking your sentences/phrases into n-grams. An n-gram is a contiguous sequence of n items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. TextBlob makes it easy to do this. Say we want to convert string3 into an n-gram where n=3, Very easy! Or 2-grams To make it more interesting, we can add find out the sentiments of our sentences. The sentiment property returns a namedtuple of the form Sentiment(polarity,subjectivity) . The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective. This implies that the sentence "Life is fun!" with a polarity of 0.375 is a positive statement and more of a factual statement (objective) than a personal opinion (subjective). TextBlob has so much more capabilities (including spelling correction and translation) which you can find here . One final one I would like to show is the spell check. Given a string (string4), TextBlob can give suggestions as to correct spelling for one of the misspelt words. Feel free to play with the TextBlob() library and it's 'big brother' - NLTK library. Have fun doing so!