Document Data extraction
Vault is an AI powered document data extraction solution
Vault uses cutting edge large language model and knowledge graph technology to structure the unstructured data in your documents.
Vault is able to automatically detect the document’s important terms
Vault is able to detect a document's type, structure and understand its substance.
Specify custom document data extraction tasks
Vault can track the data points which are most important and relevant to your business.
Automate document data extraction now!
Drag and drop files or connect Vault to your most popular file storage systems to facilitate the conversion of your organisation’s documents into a searchable database.
How we built Vault?
Vault is powered by a large language model which has been trained on thousands of contracts and financial documents which means that Vault is able to accurately extract key information about your business critical documents. TextMine’s large language model is self-hosted which means that your data stays within TextMine and is not sent to any third party. Moreover, Vault is flexible meaning it can process documents it hasn’t previously seen and can respond to custom queries.
Large language model data extraction vs OCR
Optical character recognition (OCR) is a great technology for extracting text from pdf documents and is a core part of Vault’s document import module. However, Vault’s document data extraction is not limited to key words. As a result, Vault treats a start date the same way as a commencement date and is able to distinguish a start date from an end date. Moreover, Vault is able to infer implicit information from a document such as whether a job title in an employment contract is junior or senior for the purpose of determining whether the rest of the terms and conditions of employment are reasonable.
What document data extraction can Vault do?
Vault’s large language model has been fine-tuned to extract key data from a broad range of documents and contracts including but not limited to supplier agreements, invoices, order forms, consultancy agreements, NDAs and tender agreements. The list of documents Vault has been trained on is growing on a weekly basis and can be expanded quite flexibly for specific client requirements. Get in touch if you would like to discuss your specific needs and document data extraction requirements!
Watch a video of Vault extract data from documents
Which use cases can Vault solve?
Vault is able to augment existing or new document data extraction workflows in a wide range of domains including but not limited to procurement, compliance and transport. To find out more, visit our solutions section.
TextMine is suitable for other industries
TextMine has document data extraction, table extraction and question answering capabilities for documents in health care, construction, manufacturing and property. Get in touch if you would like to discuss your document data extraction use cases!