Alkymi CTO on Lessons Learned Implementing Active Learning
Over 11,000 people attended the Amazon Web Services AWS Summit in New York and had the opportunity to learn how Alkymi uses active learning to enable self-automation. Alkymi’s CTO Steven She walked the audience through how Alkymi leverages AWS for machine learning in his talk titled: "ML on AWS + Kubernetes."
How Alkymi uses active learning for document understanding
In the financial services industry, data often comes in the form of documents sent via email. For a business analyst, part of their typical workflow includes sifting through that document to locate critical information to copy and paste into a structured data store, such as Excel. This is an arduous, risky process that results in data either not being used or possibly being wrong.
Alkymi uses computer vision to identify the basic components of the document, such as tables and charts, and Natural Language Processing (NLP) and machine learning to understand the intention of the texts and summarize their content. Computer vision works similarly to that of a self-driving car, as it applies labels to elements of text, such as charts, tables, and paragraphs to understand the basic components of the document.
Next a selection model employs active learning to understand what the user is looking for and interactively trains the product to extract that data. The system identifies low confidence documents and sends them to an annotation queue, which is then presented to the user for them to label. That information is retained using SageMaker. Alkymi uses SageMaker on Kubernetes, allowing for real-time training, deployment, and prediction loops between the user and the model. In seconds, the newly learned information is applied to the subsequent unlabeled data and the user immediately benefits by saving time. Now you have a more accurate model that requires less labeled data.
From the presentation abstract:
“The process of automating workflows with machine learning models often requires a significant amount of labeled data. Acquiring this data can be a costly and time-consuming process. Active learning is a type of machine learning that reduces the amount of labeled data required by allowing the model to select which examples will be labeled. In this talk, we describe the challenges and solutions Alkymi has encountered while implementing active learning on AWS using Kubernetes and SageMaker.”