Machine Learning: Finding the secrets in your data

TL;DR: Machine Learning (ML) finds patterns in your data to make intelligent predictions about the future state of your system. Perhaps one of the most challenging aspect of ML is accessing and manipulating the right data. Our two main tips are:

1. Store everything (delete it at a later date if it’s not needed).

2. Start small, don’t try and answer everything all at once.

What is ML?

In an ML project, (large) data sets are repeatedly transformed and analyzed in intelligent ways to spot complex patterns that would be too difficult to be seen by human analysis. Historically this was only achievable by technology giants such as IBM or Google and employed by retail giants such as Walmart, Tesco and Costco.

Cloud-based analytics platforms such as the Azure ML Studio, supported by the surge in popularity of open-source tools such as the R programming language, opens up the world of Machine Learning to more than just a few major enterprises. As a result, an increasing number of smaller businesses can now create valuable predictive solutions to drive sales, gain a better understanding of a business, make applications and processes more effective, and instill intelligence into applications that were once simplistic.

The machine learning workflow diagram.

Azure ML Studio is a tool that supports the machine learning workflow and enables predictive models to be created, tested, exposed and consumed.

We start by understanding and clearly defining the question we want to answer, and identifying the relevant data sources (if we have them). Next we clean the raw data and add, join and manipulate additional data. Finally, we use this data to train multiple predictive models using a selection of suitable machine learning algorithms.

The various models are scored and evaluated against each other to determine which model provides the most accurate answer to our original question.

This model is then deployed to a web service that can be consumed by an application. The application can continuously retrain the existing model with newly gathered data to ensure its predictions are always improving and on trend. The more data it gets, the more accurately it can predict. Pretty powerful stuff!

So Where Do You Start?

It’s obvious that Machine Learning has a lot of potential and you can probably see how this could really improve your business. But where do you start? This is a question that we have been answering for some of our partners.

The place to start is to ask the right question. We begin with a simple question, such as “What product will a customer want to buy?”, or “What is the probability that this order will be returned?”, or even “What are the chances this order may get stolen, so that additional insurances can be sold to the customer?”. We take this simple question and, through a process of discovery and co-creation, we produce a ‘solution statement’ which accurately defines the scope, context, and validity of the required predictive model.

The next step is to gather the data that might help us answer these questions.

Preparing your data

Preparing your data is by far the biggest challenge in a Machine Learning project; expect 50-80% of the time spent on the project to be consumed by this. Azure ML Studio has some great capabilities when it comes to cleaning data and metadata, but it is not a silver bullet.

The likelihood is that your data is in a form that works with your application(s), in a location that makes sense for your application(s). Of course, it should be, but we need to get this into a place where Azure ML Studio can use it. We do this by using data pumps.

Data pumps extract the data from various sources, translate and “de-normalise” it, and then load it into appropriate data stores. This is also referred to as Extract, Translate, and Load (ETL). We use Azure Data Factory to perform this process.

Azure Data Factory can move and process data from/to nearly any data source, regardless of being in Azure or on-premise. Pipelines are arranged to process all the data we need, and duplicate it into flat, denormalized data stores that make it easy for Azure ML Studio to consume. At this stage we can also pull in data from external sources, such as weather data, to be used as factors when training the predictive model.

If we don’t currently have the data that we need, we plan and implement a way to capture it as soon as is feasible. The mantra we employ is: if in doubt, store it - it’s easier to delete it later than it is to interpolate it. Once we have the data in an appropriate place and in the right format, we select the appropriate machine learning algorithms and begin to train and fine tune the models.

Start off small

We have been working with a number of our clients bringing the power of machine learning to their products. This could seem like a daunting task - however, with the emphasis on achieving marginal gains, we introduce it in a focused and controlled way..

In one (anonymised) example, we were tasked with predicting the answer to a simple question; “How many products of a given type will we sell in the specified date range?”.

The first problem was that the data didn’t exist; the existing applications could only tell you the status “now”. Using Azure Data Factory, daily data exports were scheduled that captured this data in a de-normalised format. Embedded Power BI was then used to display these reports in the existing reporting application.

The data sets are still relatively small, but the ability to have historical reporting, with the added benefit of being able to visualise trends, adds massive value to the business. The predictive capabilities that will soon be realised thanks to ML is palpable, not only in this one small area but across the whole business!

So, what now?

Widespread use of Machine Learning and Artificial Intelligence is anticipated. It is going to affect many aspects of our lives and disrupt the status quo for many businesses, but it doesn’t have to be scary. The technology can be leveraged incrementally, reducing the financial barriers and minimising business risk.. If a business doesn’t take advantage of this technology, their competitor will.

If you’re interested in finding out how your business can use machine learning to stay ahead, get in touch for chat.