Tech Corner August 20, 2021
The move towards digitalization had a profound impact on business. The access to new data sources and the resulting volumes of data had to be addressed in the most efficient and cost-effective way possible. Companies have quickly realized that manual data processing was ineffective in tackling the issue, with many turning to a human-plus-template hybrid approach.
When you work with highly structured sources—such as claims forms, for example—templates can be a straightforward and effective way of extracting data, especially when dealing with high volumes of documents. With templates, you essentially create a frame around the location of the content you want to extract. And because structured documents like forms have fixed fields where the information is housed, a template can reliably pull the details without missing a beat.
What’s critical to understand here is that the location and size of the template “frame” always remain the same, meaning that only the information that fits into this frame will be captured. Think of it as a filter, programmed very precisely to look for specific information in certain places on a file or document.
For example, the rules programmed into the template-based system could be: look at the top left corner of the PDF to capture the sender’s address. The template would look for text with a certain number of lines—say four. Anything that matched these parameters would be copied into the indicated application or repository.
Templates may have been getting a bad rap lately, but they can be very cost-effective and adequate. But, of course, it ultimately depends on the type of documents you are trying to extract data from.
As we said, templates are great for extracting structured data. But the reality is that unstructured data makes up more than 80% of all enterprise data, and is expanding at a 65% growth rate every year.
There’s enormous value in unstructured data. Think of all the emails, all the Word documents, Excel spreadsheets, pdfs. Think of the graphics you find in research reports. Or diagrams or charts. Small wonder that financial professionals spend so much time pouring over them manually to extract meaning. There’s gold in those hills.
And unless you have the right tools to analyze unstructured data, you are missing a whole lot of information that could be valuable to you and your clients. So, you need a solution that allows you to eliminate all this time-intensive data collection and build a central repository of real-time, auditable data to improve client reporting and investment decision-making.
Unfortunately, solutions that rely solely on templates are missing out on all the rich unstructured data we just talked about.
Templates would work fine if you only had one client and they used a standard invoicing document. But add another client to the mix, who perhaps lists the amount invoiced in the top left-hand corner of the document, where the template expects to find the address. Or maybe your one client changes its letterhead. What would the template-based system do then? Of course, it would still extract the information, but it just might not be the information you’re looking for.
One way around this would be to program enough rules into the template to allow for variations in content. But trying to write rules for all the possible exceptions—the amount invoiced might be at the top of the page, or it might be at the bottom of the page, or in the middle—would soon add up to a time-consuming initiative.
There’s another way: adding intelligence to data extraction. New technologies such as computer vision, machine learning, and natural language processing allow financial professionals to automatically capture, extract, and process data pulled from unstructured sources such as emails, PDFs, scanned documents, tweets, and hand-written contracts. Such technology can then feed the output back to humans for review/approval or automatically advance it to the next stage of the process.
Unlike template-based systems, these AI-powered solutions can locate and extract any desired data in a file without being told where to find it. It could be on page 1 or page 10. It could be in a chart or a table. It doesn’t matter. The technology will find it because it’s not looking for a location; it’s actually looking for contextual information (aka the data).
Such solutions also learn on the job. The more data that passes through these systems, the more accurate they become.
Leading solutions use these advanced technologies to transform raw emails and documents from any enterprise source into targeted, structured data for continuous delivery to your desired destination. You can use REST APIs to pull emails and other documents directly from email systems, content management systems, and more. They will also offer full support for dozens of enterprise file formats.
If your department mostly deals with standardized documents and files, such as W2 or 1099s, then a templated approach might be sufficient. But the reality is that most data sources are rather messy and complex. And they are also always changing. New clients, new types of financial instruments and products, new sources of data about investments…the list goes on and on.
The bottom line is, even if you have a template that’s been doing a pretty good job, adding an intelligent solution will help you address all those data sources that are still being processed manually.
Try our Alkymi solution for free to see how easy it can be to address your unstructured data challenges.
Fine-tuning is not the only way to get relevant, domain-specific responses out of an LLM. Alkymi’s team of expert data scientists explain an alternate route.
Find out which type of automated document processing solution is right for you: data extraction, an IDP, or a complete business system for unstructured data.
We’re partnering with Portfolio BI, a provider of portfolio analytics and reporting solutions, to bring structured and unstructured data sources together.