An Introduction to Machine Learning
Since the 1960s, computer scientists have tried to teach intelligence to computers and most of this work falls under the field of artificial intelligence. Machine learning is a subfield of artificial intelligence which aims to teach machines to recognise and model patterns in data. Although machine learning has had many waves of popularity since then, its performance has only caught up with the hype and expectations over the past couple of years. This has been enabled by new model architectures such as transformers and the availability of mass compute power and data. This article provides an introduction to machine learning, the different types of machine learning models and how to build your own machine learning models.
What is Machine Learning?
Machine learning is a type of artificial intelligence (AI) that develops algorithms that improve computer performance and learning capabilities. Computers that engage in machine learning may be able to perform tasks that are not explicitly prompted by a human through pattern recognition and decision-making.
How Machine Learning Works
Machine learning consists of three elements: data, models, and an algorithm:
- Data
Data is a collection of inputs and outputs that a programmer wants an algorithm to work with. Data cannot do anything by itself, but in the right framework, it may be useful. For example, a dataset might include customer records which might contain underlying patterns which are useful for making pricing or purchase optimisations.
- Models
Models represent the relationship between output data to its input. There are different kinds of models, such as linear regression models for predicting continuous values or neural network models for modelling complex, non-linear patterns in the data.
- Algorithm
The algorithm is an implementation of the model and describes how the machine is trained. It defines its parameters and attempts to reduce the difference between the predicted outputs and actual outputs during the training of the model on the data.
The process by which a machine learning algorithm models the underlying patterns in the data is referred to as the training of the machine learning model. During this step, a portion of the data set is withheld from the training so that the fitted algorithm can be tested appropriately. The parameters of the algorithm are fine tuned during the training and testing phase to minimise over fitting on the training data and maximise performance on the test data.
Machine Learning vs Generative AI
Machine learning relies on predictions that are based on data, while generative AI attempts to create new content by mirroring the underlying properties of existing data.
Both of these techniques can provide value and can work together to create an autonomous, predictive, generative system.
Types of Machine Learning
There are three main types of machine learning: supervised learning, unsupervised learning, and reinforcement learning. Regardless of their differences, all three techniques rely on data to work.
Supervised Learning
Supervised learning involves training a computer using labelled data (each input corresponds to an output).
During the training process, a machine learning algorithm creates a map of inputs and outputs and classifies them in order to make predictions.
Unsupervised Learning
Unsupervised learning is similar to supervised learning, but the algorithm is given no corresponding output data.
In this case, the algorithm has to rely on alternative methods such as association, clustering, and other techniques that support the lack of output data. For example, a supermarket might want to run a clustering algorithm on its customer records to identify similar groups of customers.
Reinforcement Learning
Lastly, reinforcement learning allows a machine to make its own choices, but the programmer rewards the “correct” choices. When exposed to enough reward scenarios, a machine can learn to make more correct choices over time. Reinforcement learning is often used in conjunction with other models to improve the training of these models.
The Benefits of Machine Learning
Machine learning can unlock unique insights from an organisation’s data, automate repetitive tasks and increase productivity. Below are a few examples of how machine learning can benefit companies.
Task automation
With enough data, machine learning algorithms can learn how to perform certain tasks. This means that these tasks can be automated by an algorithm. For example, with enough customer support data, a machine learning algorithm can automate the responses to generic and frequently asked questions.
Pattern recognition
Patterns may exist in your data which may or may not be known by your organisation. Using machine learning to model these patterns will allow you to make systematic decisions based on this data. For example, an organisation’s sales may have seasonal patterns which modelled correctly could allow the organisation to better manage its resources.
Predictions
Machine learning models are masters of prediction. For example, machine learning algorithms can predict an outcome based on input data. This can allow an organisation to better plan and unlock unique opportunities. For example, a model can be trained to predict whether a customer is likely to buy an add-on which will allow a company to optimise how it delivers upsell opportunities to these customers.
Informed decision-making
Once a machine learning model has come to conclusions, a person can take that information and use it to make future decisions. For example, procurement leaders use TextMine to better understand the impact of renewals on their financial budgets.
Anomaly detection
Machine learning models have been known to detect unusual or fraudulent patterns in data. For instance, if there are some strange transactions coming from a person’s bank account, a machine learning model can intervene and stop the theft.
Language understanding
Machine learning models can use a framework called natural language processing (NLP) which analyses and interprets human language. Interestingly, neural networks and generative models can be used in this case to understand the language and apply context to it.
Continuous learning and adaptation over time
The most fascinating aspect of machine learning is that it is adaptable. As the model takes on new information, it can make changes to its parameters to ensure that it is keeping up with the changes to the underlying patterns in the data.
How To Build Your Own Machine Learning Model
Organisations can now build their own machine learning models either from scratch or with software like EvoML. However, in order to understand whether your problem can be solved with machine learning, its important to follow these steps.
Define the problem
Knowing that you want to build a machine learning model is important, but equally as important is defining the problem that you hope the model will solve. This can be achieved by defining the questions you are looking to answer. How you define this questions will impact the model’s performance and the availability of data. Its important to try and simplify the questions where possible and framing them in a more simple way can in certain cases increase the amount of data which is available.
Collect and prepare training data
Most organisations will have data sitting in internal databases, software platforms or cloud warehouses. Organisations may wish to enrich their data with external data abd this can be done manually, with web scrapers, or by purchasing third party data sets. It is important to ensure that the data is clean, unbiased and ready to be implemented into your model’s training regime. Unstructured data like documents can be structured into tabular data for further analysis.
Split into training/testing sets
The data that you obtain should be segregated into training and testing sets. This is important to ensure that the predictive power of the trained model generalises on data it has not seen before. An 80/20 split is industry standard but can be varied based on the data set size and the nature of the problem you are looking to solve.
Pick a machine learning algorithm
Consider your problem and pick the most appropriate machine learning algorithm. For example, complex machine learning tasks with large amounts of available data may need a neural network or deep learning approach. However, neural networks are not natively explainable compared to linear based machine learning algorithms. It is therefore important to consider factors like explainability depending on your use case.
Train the model and check its performance
During the training phase, the model will try to fit its parameters to the training data and an optimisation algorithm will seeing how the model performs on the test data. The optimisation algorithm will iterate through the models parameters until its precision, accuracy, or mean square error rate are acceptable.
Deploy for predictions
Once the model has met all of the criteria that you have set for it, it’s time to deploy it into production. When deploying models into production its important to monitor the models performance on the fresh data as this data may be completely different from what the model will have seen during the training set. If this is the case and the model performs badly, consider retraining the model on more data.
Conclusion
Machine learning algorithms are now widely used within organisations even if their applications are not always known. They have a lot of potential to unlock productivity and innovation as more and more executives become aware of the value they can create. Organisations wishing to unlock the value of the unstructured data in their documents can consider using platforms like TextMine. If this is the case, feel free to get in touch with a member of our team for a demo!
About TextMine
TextMine is an easy-to-use data extraction tool for procurement, operations, and finance teams. TextMine encompasses 3 components: Legislate, Vault and Scribe. We’re on a mission to empower organisations to effortlessly extract data, manage version controls, and ensure consistency access across all departments. With our AI-driven platform, teams can effortlessly locate documents, collaborate seamlessly across departments, making the most of their business data.
Newsletter
Blog
Read more articles from the TextMine blog