Inject hyperlinks in Coda
Turn your transcript interactive
A while ago I wrote my first blog on how to extract the time stamps from a messy and by AI generated transcript relating to any YouTube video. I noticed last week an error in my regex which I had to correct with this regex that also handles a hyphen.
.RegexExtract("([A-Za-z\-]+)|([^A-Za-z]+)", "gi")
Coda your YouTube transcripts
Using RegexExtract() to get a searchable text — part I
huizer.medium.com
The next exercise describes how we relate time stamps to hyperlinked values.
This third and last blog is about replacing the time stamps with the hyperlinked values. Below you see the hyperlinks I related to the timestamps. You can see that 4 min 44 is equal to 284 seconds. This I explained in the previous blog.
Before I started this serie, I had already figured out how to inject hyperlinked timestamps. The formula was running fine on short texts, but breaking the doc on longer videos (1 hour and more). The expensive part was due to the looping it required to evaluate each item. The logic is not for beginners but good to follow:
- turn the text into a list that also isolates the time stamps — which are often glued to the words (regex below).
- evaluate each item using
ForEach()
- filter the items
- if an item is equal to a time stamp, use the item and glue the link to it.
- in case the items is just a normal item, leave it as it is
- glue all items together and turn the string into a sentence.
This pattern looks simple, because you see it working, but believe me, this is not. It took me far more time than anticipated to make it work.
It is also an expensive pattern and it remained too heavy even after I broke the text into smaller parts and I replaced the time stamps with hyperlinked time stamps, using a button. The main advantage of this button driven process is that there is no longer a formula active in the column. Nevertheless, the doc became slow and worse, I could not integrate related parts into one canvas cell via a button. Too much data. Even when adding the second part of about 200 words…. This is not good at all and goes against the intuition that you can use canvas cells as pages.
The canvas can handle it, it was the button that refused to execute. The alternative — integrating data via a formula in a table worked, but isn’t good either. In the template on one page I present the complete text, which is nice.
My approach failed due to button limitations. It annoyed me for a while and I decided to try something else. Instead of putting data into a canvas cell, why not distributing lines over rows? I got the inspiration from how YouTube does it.
We need something alike in Coda.
Alternative approach
Putting a large text in a canvas, splitting, evaluating, filtering, replacing and gluing together does not work on large text. Large text can be distributed over rows. It looks a bit like below and even better in the template.
YouTube keeps its phrases rather short as you can see and the AI is not doing a perfect job yet. We may need some stronger AI to improve the text, that is a concern for later.
We get the above result by executing the code below. I admit, this is (again) not easy, but worthwhile to pay attention to because of the pattern.
01 The lines
In terms of code this is maybe the easiest, but to get the idea was the hardest part. It is about splitting the text via the time stamps, distributing the text over as many lines as there are time stamps.
The If statement relies completely on named functions. We start with the regex to split the text into parts. This regex was difficult for me. My first set up failed because it did not take [ ]
into account and had to test quite some variations before this one worked. Just before publishing this blog I noticed an other omission which I had to take care of before publishing, it was about text between ( )
which needs to remain together. Edit on Thursday May 30, I noticed issues with 46% alike capture groups and 30 years, so I had to fix that as well. Regex has a reputation for something….
thisRow.rawTranscript.RegexExtract("(-?\d+(?:\.\d*)?\s*\%)|([A-Za-z\-]+)|(?:\d+:)?\d+:\d+|(?<!\w)\([^)]+\)(?!\w)|\[(.*?)\]|(-?\d+(?:\s*[a-zA-Z]+){1,2})", "gi")
I share this code to make it easier for you to integrate this in your own work. It is a valuable code snippet. You can say thank you by leaving a positive comment.
The regex to extract time stamps I already shared.
If there is a time stamp, then a line break if not then the text as it is. We thus leave out the time stamps in the outcome. At the end we glue the parts via a Join(“ “)
.
02 Adding rows
Once we have the lines, we need a mechanisme to distribute the lines over rows. We use the ForEach()
and since we will end op with a table with many videos, we need a key to keep them apart.
03 Modifying rows
The last part is the most difficult. We don’t want to add rows, we only want to modify the existing rows. This set up avoids that we end up with only the latest result. We avoid that outcome by evaluating row by row applying the Nth()
directly on the table.
You can type over this complete solution (plus get inspiration from the related blogs) and it will do the job. In the template below I polished the outcome a bit to make it look better.
The crucial insight remains that the canvas field and buttons have limitations we not see when working with tables and simple rows. I did not test it, but I guess we can store 100K plus rows like this. Translated to video that is about 300 rows per 10 minutes. 100.000 rows dived by 300 gives us about 3.300 minutes. That is not endless, but a good start.
This solution is interesting from a technical perspective. It is less practical than initially assumed, but once you want to check one video out fast the page below helps.
A few days after publishing the first version (Monday, May 26) I got the idea to clear the raw transcript and replace it with what you see below
I distributed a large text as you see below and it went remarkable fast:
The story is continued. I got an email from Coda pointing to this filter on the view of the table via the controller _relVideo
. They simply stop the filter from working and this full stop happend late in my evening: 22h07.
I don’t understand this behavior nor do I like it.
This was the email.
Hello,
Our system has detected that your doc has formula(s) configured in a way that will negatively impact the performance of your doc.
In an effort to improve your doc performance, we have disabled the view filter formula on the following table
What did I do?
I changed the filter to what you see below.
You can indicate the first item in the list and this one shows IfBlank()
.
To be continued.
I hope you enjoyed this article. If you have questions feel free to reach out. Though this article is for free, my work (including advice) won’t be, but there is always room for a chat to see what can be done. You find my (for free) contributions in the Coda Community and on Twitter. The Coda Community provides great insights for free once you add a sample doc.
My name is Christiaan and blog about Coda. Since the summer of 2023 often (but not only) about how to Coda with AI. The latest major Coda AI update was on Dec 7, 2023. With the announcement of Snowflake as partner on April 10, I expect to see a new Coda AI logic put in place before the 2024 summer holidays. The current implementation is not sustainable.
Why I focus on Coda AI you can read here: ⤵️
May 15, 2024:
All the AI features we are starting to see appear — lower prices, higher speeds, multimodal capability, voice, large context windows, agentic behavior — are about making AI more present and more naturally connected to human systems and processes. If an AI that seems to reason like a human being can see and interact and plan like a human being, then it can have influence in the human world. This is where AI labs are leading us: to a near future of AI as coworker, friend, and ubiquitous presence. I don’t think anyone, including OpenAI, has a full sense of all of the implications of this shift, and what it will mean for all of us