OCR and AI combined
Get the feeling of what Coda brain may be like
In a previous blog post from June 13th, 2024, I discussed Rickard’s OCR pack.
This post explores the enhanced capabilities of the pack. The most striking improvement is its ability to create new rows based on predefined values in the column names. You’ll work with two tables: one for storing PDFs and another for holding the extracted information from each PDF. These tables are linked through the prompt logic.
This logic allows you to, for example, scan contracts to identify signatories and signing dates, even with low-quality text and vague images (like signatures). Instead of manual extraction by a clerk or intern, this pack automatically creates a new row in a table for each contract and populates the columns with the extracted information. This exemplifies efficient data storage in Coda, maintaining a clear row-by-row structure and keeping data relationships intact.
The Pack’s Setup
The pack has been updated to be simpler and more powerful. Instead of using AI provided by Coda or a specific pack (like Gemini), you now select your preferred engine and pay for usage only directly via an easy and affordable pay as you go logic.
The pack’s structure is shown below. It offers 5 easy-to-understand options.
Numbers 6 and 7 relate to the table storing the PDFs. We want to track when the button is pressed. Once pressed and executed, the button label changes to “extracted” and the button itself gets disabled.
The button
Indeed the execution of the pack takes place via a button. The button asks for a few variables:
- User linked to payment: You receive 100 free credits to test the pack.
- Token: This is a unique code sent to you by Rickard via email after you make a purchase. I bought 4000 credits for $20. This token is hidden for security.
- Column storing the PDF file: This specifies where your PDFs are located.
- Prompt: This is divided into two parts:
- Part A references a prompt stored in my prompt table (more on this later).
- Part B ensures a link between the PDF table and the table containing the extracted information.
5. AI Type, there are different options available, and you can choose the one that best suits your needs. For testing, I opted for the “mini” version, which is more affordable than the GPT-4o (the “o” stands for omnium). Only the GPT versions allow for the row adding, which is the option we are after.
The prompt
Let’s delve into the prompt, the heart of this pack. The OCR process reads all the data from the file stored on the server, it processes even up to 150 pages. It then sends this data to the AI server, which extracts the information you request. This request is based on the column names you define in your prompt. The AI then populates the corresponding columns with the relevant data. This is the “4A” part of the prompt, which references the table shown below.
The prompt’s content provides specific instructions to the AI, guiding it on where to place the extracted data within your Coda table. It acts like a roadmap, telling the AI how to navigate the information and organize it correctly.
The “4B” section, highlighted in the blue box, instructs the AI to not only execute the prompt logic but also to capture the name of the PDF’s table and insert it into the designated column. This setup allows for a clear link between the two tables, which is crucial for working with structured data in Coda.
Getting a feel of how Coda brain will operate
This pack offers a glimpse into the future of Coda with Coda Brain. For those unfamiliar, here’s a quick overview:
Coda Brain is an AI-powered application within Coda that can access all your workspace information, including both structured and unstructured data within your docs. It also connects to external data sources through packs, like Google Drive, Dropbox, or even company-specific packs (e.g., for time sheets and paid time off). This makes all your data accessible in one central location.
Furthermore, Coda Brain respects your workspace access rights. If you don’t have permission to view a specific document or a Dropbox folder containing sensitive information (like promotions), you won’t be able to access it through Coda Brain.
Essentially, Coda Brain grants you access to information, generates responses based on that information, and adheres to established permissions.
When you click the button to process a file, the extracted information is neatly organized in a table according to your specified column names. This streamlined process, achieved with a structured prompt, offers a glimpse into the exciting possibilities of Coda Brain.
While the current functionality may be less extensive than what Coda Brain will eventually offer, it provides a similar sense of excitement and efficiency. The ability to extract data from files and have it automatically appear in a structured table is truly remarkable.
I highly recommend that all Coda makers explore this pack. I have no affiliation with the pack’s creator and receive no benefits for endorsing it. My enthusiasm stems solely from the opportunity to unlock valuable data that was previously hidden within files. This pack makes that data readily accessible with a simple click.
Give it a try and see the magic for yourself!
I hope this article was informative and helpful. Did it help you to solve a problem you unlike would have solved other ways? What about a donation?
My name is Christiaan, and I regularly blog about Coda. While this article is free, my professional services (including consultations) are not, but I’m always happy to chat and explore potential solutions. You can find my free contributions in the Coda Community and on X. The Coda Community is a fantastic resource for free insights, especially when you share a sample doc.
More about Coda AI and Coda Brain: