How can I dynamically read an invoice using tesseract?

6 days ago 5
ARTICLE AD BOX

I am currently making a invoice reader and am having some issues on how to dynamically read the actual invoice. The goal is to upload a picture or series of pictures (I'm focusing on one picture then will alter it to accept multiple) of a physical invoice and based on keywords a user enters the data stored can be extracted into a list to be sent off to an api. I have no plans of building an api but I do want to prototype this project.

So far I have a file uploader that extracts my text from the invoice. From there the user can enter a series of keywords and hit a button that starts to extract the requested invoice data. The user can then select if the extractor need to read the data horizontally or vertically via a drop down (This does not currently work but I can fix it later). After that the user selects a button to extract data based on the direction and keywords provided. However my issue is I don't know how to read an invoice dynamically. For example I am trying to read this invoice and my issue is the table is horizontal which would not be an issue if each keyword (Item, Quan, etc) had only 1 word under it however item has several items under it with multiple words that throws off my count. I can't come up with a way to get my program to infer how many words to store before moving to the next keyword without over engineering.

One of my ideas so far is to split up the keywords and have the user assign each one a number or word characteristic because I can then just cut up and divide the words under it easily. However I don't like the idea of the extra user input however if you take into account time saved since you technically would only have to configure the program for one invoice then you can pass through several of the same type no problem makes it not seem too crazy. I've seen the same in production inventory software. However this feels overengineered. Any better ideas?

Invoice example:enter image description here

Extraction function that parses text data:

const extractData = () => { if (!textResult) return; const dataList: { keyword: string; value: string }[] = []; console.log('Extract Start'); if (orientation === 'row') { const lines = textResult.split('\n').map((l) => l.trim()).filter(Boolean); lines.forEach((line, lineIndex) => { const words = line.split(/\s+/); console.log(`Line ${lineIndex}:`, words); keywords.forEach((keyword) => { const lowerKeyword = keyword.toLowerCase(); const keywordIndex = words.findIndex((w) => w.toLowerCase() === lowerKeyword); if (keywordIndex !== -1) { console.log(`Found keyword "${keyword}" at index ${keywordIndex} in line ${lineIndex}`); // Everything after the keyword until the next numeric (or end of line) let valueWords: string[] = []; for (let i = keywordIndex + 1; i < words.length; i++) { if (isNumeric(words[i])) break; valueWords.push(words[i]); } const value = valueWords.join(' '); console.log(`Extracted value for "${keyword}":`, value); dataList.push({ keyword, value }); } }); }); } else { // column-wise const words = textResult.split(/\s+/).map((w) => w.trim()); console.log('Words for column:', words); keywords.forEach((keyword) => { const lowerKeyword = keyword.toLowerCase(); const startIndex = words.findIndex((w) => w.toLowerCase() === lowerKeyword); if (startIndex === -1) { dataList.push({ keyword, value: '' }); console.log(`Keyword "${keyword}" not found`); return; } console.log(`Keyword "${keyword}" found at index ${startIndex}`); let valueWords: string[] = []; for (let i = startIndex + 1; i < words.length; i++) { if (keywords.some((k) => k.toLowerCase() === words[i].toLowerCase())) break; valueWords.push(words[i]); } const value = valueWords.join(' '); console.log(`Extracted value for "${keyword}":`, value); dataList.push({ keyword, value }); }); } console.log('Final Extracted Data:', dataList); console.log('--- Extract Data End ---'); };
Read Entire Article