Coda your YouTube transcripts

Using RegexExtract() to get a searchable text — part I

Christiaan Huizer
7 min readApr 11, 2024

--

I was taking notes while watching a YouTube video and using the Coda YouTube pack that brings in some relevant data as you can see below to structure my notes.

what the YouTube packs gives you

The pack is made and maintained by Coda.

the functions

Transcripts

Although the YouTube API has the option to get the transcripts, Coda does not use this function yet. Note also that the last update was a year ago. I am not even sure Coda is motivated in adding this feature because you bring in a serious volume of text per video. That said, the transcripts help me with my note taking and can be used by the search bar top left.

The solution we offer in this blog is a temporary fix. The real fix should be an updated pack that makes the manual copy — paste unnecessary and links the time stamps directly to the video.

Manual work

The first action is to copy past the transcript into a canvas column. It looks like this fragment below.

0:00- There is a famous story about Einstein that he used to, you know, go, think, think, think, and then go for a walk.0:06And like he would whistle sometimes. So I remember the first time I heard this story, I thought, hmm, how interesting.0:12What coincidence that he, this came to him when he was whistling. But in fact it’s not. This is how it works in some sense,0:19that you have to prepare for it, but then it happens when you stop thinking actually.0:24

The idea is to press a button to get the following:

This requires a few steps and I am going to restrict myself to the isolation of the timestamps. We need a regex to split and isolate the timestamp. It took me a few iterations to get it right, because first I did not see that we had two types of time stamps, the HH:MM:SS and the MM:SS. The regex below handles both variations.

Regex to get the time stamps

Here’s a breakdown of the expression:

  • (?:\d+:)?: This is a non-capturing group that matches zero or one occurrence of one or more digits followed by a colon. This allows for timestamps with or without leading hours.
  • \d+: This captures one or more digits, representing the minutes (MM).
  • :: This matches a literal colon character (":").
  • \d+: This captures one or more digits, representing the seconds (SS).

With only timestamps we won’t get far, we need to integrate theses into the text. That is what you see below:

thisRow.transcripts.RegexExtract("[a-zA-Z]+", "gi")

I am skipping here the logic that drives this replacement to focus on the button we need to handle large texts. Below visuals showing the steps to turn time stamps into video URLs with start time.

The button

The real issue is not the regex, although for many it is likely a barrier. The harder problem is splitting the large text into parts a button can handle. In these transcripts we easily get a large body of text. The podcasts I listen to, sometimes pass the 3 hours.

In a calm conversation, people typically speak somewhere between 100 and 180 words per minute . There can be some variation depending on the specifics of the conversation and the people involved.

  • Slower conversational speech: 100–130 words per minute
  • Average conversational speech: 150–180 words per minute

Keep in mind that this is just an average, and some people naturally speak faster or slower than others. The video you see in the example contains over 34000 words (rather fast spoken). That is too much to handle by a button at once.

Breaking the text into parts

We have to break this text into parts. We first count the words (by splitting on a white space) and we assume that a button can handle 200 words without any problem. The 200 is a bit more than one minute ‘on speed’. An one hour video will thus result in max 60 rows and 3 hours in max 180 rows. These are numbers Coda can handle with ease. Below the formula I developed and which is the solution we need.

splitting the large text into parts

How it works:

  • we split the transcript into words based on a white space
  • we define the size of a group of words, in the example we go for 200
  • we calculate via sequence the breakpoints: 1–200 200–400, 400–600 etc.
  • we slice the text using breakpoints and add minus one to avoid overlap
  • we add rows and turn the parts into readable texts with regex
  • we add the thisRow value to enable chaining in the target table

These steps result in what you see below. There is a bit more than one minute per row and we counted 207 words. That is because the regex cleaned the parts and separated time stamps from text.

resulting in a table with many rows

This exercise is difficult. Not so much because of the regex, but because of breaking the raw transcript into readable parts. It has helped me to admit that coding Coda can be hard, can take time. The moment you accept this instead of believing it is always easy for others, you can start moving forward (faster) because you no longer have the feeling of failing.

From here I have a few more steps to execute:

  • inject the hyperlinks, regex alone will not help here…
  • wrapping up the parts and turn it into one big text again we link back to the video — we need a button again

This is for a next blog (may I find the time).

The role of AI

There is not (yet). Too many characters, but what is too many? I asked it a few times but Coda did don’t answer, so we don’t know. My idea was to improve the transcript using AI. As said, it did not work on the total transcript and it does not make sense on the parts (per row) because you miss context and you cannot reference all rows at once because then again you have too many characters.

There is AI every in your doc, often when you don’t need and want it, but when you really need it, it is not working. That disappoints me. Maybe the snowflake collaboration will solve this issue as well.

Doc for note taking

There is a doc updated in 2020 on how to take notes and keep track of important moments in the YouTube video.

The main principle is that you have a timer based on the function Now() You press a button when something interesting happens. The button glues the time to the video URL via a concatenate() and that makes it possible to jump directly to interesting parts related to your notes. This logic gave me the idea to replace the time stamps with hyperlinked time stamps in the transcript.

The doc feels very much as 2020 solution seen the syntax, but it works fine. The button logic with the time is a smart find.

My name is Christiaan and blog about Coda. Since the summer of 2023 often (but not only) about how to Coda with AI to support organisations dealing with texts and templates. The latest major Coda AI update was on Dec 7, 2023.

Why I focus on Coda AI you can read here: ⤵️

I hope you enjoyed this article. If you have questions feel free to reach out. Though this article is for free, my work (including advice) won’t be, but there is always room for a chat to see what can be done. You find my (for free) contributions to the Coda Community and on Twitter.

Coda comes with a set of building blocks ー like pages for infinite depth, tables that talk to each other, and buttons that take action inside or outside your doc ーso anyone can make a doc as powerful as an app (source).

Not to forget: the Coda Community provides great insights for free once you add a sample doc.

--

--

Christiaan Huizer

I write about Coda.io - AI and (HR )planning challenges. You find blogs for beginners and experienced makers. I publish about once per week. Welcome!