Big Data in Manufacturing

Big Data has become category for software technology that is developed specifically for storage, transformation and analysis. It has also been associated with the “3 V’s” - volume, variety and veracity.

Instances of ‘Big Data’ are exceptionally large datasets, those that extend into billions of rows and parameters. This data can be unstructured or structured, where unstructured means information that does not have a pre-defined model or is not organised in a pre-defined manner. The data types within unstructured data are more likely to be text or files, resulting in irregularities and ambiguities which make processing more difficult. Structured data on the other hand is fielded in databases or semantically tagged in documents. Because of the sheer volume and potential complexity, a number of tools have emerged that lead to the growth in the field of data science. There is a wide range of hardware and infrastructure that supports Big Data processing. Data Lakes in the cloud, allowing it to be processed by clusters.

Although the term Big Data is seen as a buzzword in some contexts, the value of data in the manufacturing industry is difficult to understate - the value is enormous in enabling step-changes in performance across the whole organisation - it is not exclusively about technology - it is about your business. Today it is imperative that manufacturing organisations have a company-wide strategy to acquire and utilise data from the shopfloor to the boardroom. Well utilised data is directly responsible for enabling long-standing business challenges in the industry.

But where do the opportunities lie specifically? And what are manufactures from all corners of the industry and different sizes doing with data? Although manufacturing data is often specific to manufacturing, these organisations are similar to other commercial enterprises - with business and operational data about their partners, suppliers, distributors, finances, inventory, products, marketing and human resources. This extends into including data collection via sensors in the products themselves.

As one of the fundamental components of the Industry 4.0 paradigm, data is discussed within the manufacturing industry regularly. Data is not a new concept to manufacturing - not at least as a purely technological concept, but actually as evidence to identify, quantify and ultimately improve processes. The paradigm was initially around the ‘lean’ and ‘six-sigma’ practices, normally attributed to Shewhart and Deming that shaped manufacturing as a field. The tools and techniques found particular application in the Toyota Production System (TPS), embodied in the broader “Toyota Way”. TPS is now over 70 years old and promoted a set of techniques that continuously improve the ratio of value adding to non-value adding activities. This was intended to include the production rate, quality rate, reduce costs and stabilise processes (reduce variability). Whilst the ideas are fully established and ubiquitous in industry, the execution varies significantly.

The introduction of the Industrie 4.0 paradigm and IT in general serves as a digital framework for collecting the raw material for these activities - observational data that measures and quantifies processes automatically and continuously. The data would traditionally be then used in techniques of applied statistics and mathematics, for example in the case of part dimensions, history and process variability, Statistical Process Control (SPC) charts were used. To measure the duration of processes, a stopwatch and a lean practitioner would actively observe the process. In order to get a large contextual understanding of the manufacturing system would require careful modelling to identify patterns and relationships. Whilst these steps still exist for the digital component, most are automatic, the practitioner would be utilising the output of the informatics platform directly into insights. This means that Industrie 4.0 powered continuous improvement will not be slowed down by the up-front collection of data related to a certain project, but rather to simply access it.

Biopharmaceuticals are a great example of data-driven manufacturing informatics and process monitoring - the products range from blood compounds to hormones and vaccines. The raw input of these production processes are live, genetically engineered cells, and by the end over 200 variables have been continuously monitored. Whilst all areas of manufacturing must deal with variation in quality, some have greater demands - the use of concessions in aerospace is one such example. By centralising all the relevant data, important causal relationships can be easily elicited. In the case of biopharmaceuticals the business impact can be dramatic because of how sensitive the yield is against the process variability.

One of the main use cases of information in continuous improvement is that of identification and understanding of root causes. In fact, Root Cause Analysis (RCA). The “root cause” refers to an event or a factor that is classified as “non-conformant” in relation to the design specification. Linguistically, it is essentially “why is this part not within specification? - how has this happened?”. In fact, one of the main elements of the method is to use the principle of the “5 whys”; the application of rational, objective and rational thought in order to identify and understand underlying causation.

Following this, a solution is implemented, typically around a refinement of the manufacturing processes themselves, or perhaps in the management or supervisory process. In some cases, it may be outside of the organisations direct control - a defective raw material or part from a supplier, for example. This would perhaps trigger the creation of a raw material/input part validation process to check the supplied parts. Over time as these improvements are established, processes become more stable and the quality level increases. It is, however, an iterative, marginal gain process.

There can be statistical regularities that can support the process of identifying causal relations - e.g. determining the principle components of N dimensions. The “Big Data” paradigm takes this concept, combines it with more advanced statistical approaches such as anomaly detection - this method holds a model of normal behaviour with a threshold and when the threshold is breached, it is classified as anomalous.

What does "Big Data" mean?

Steps in for Big Data in Manufacturing Organisations