Push hyperlinks in your Coda texts
An advanced find & replace logic partly based on Regex
I got a question to support an advanced find & replace logic. In the doc a dirty string had he to be replaced by a clean string. The dirtiness was the result of mistyping, misspelling, and or failed manuel cleaning efforts. While the standard string is composed of some letters and some numbers, it got glued to dots, commas, underscores, brackets and so on.
Unstructured data vs structured data
When I work in tables and I have a first list of items and I have a second list of items, I can compare both lists. In case the second column is not yet a list, I can turn it into a list by splitting the text into words using a white space. The logic to get there you find below:
This is simple and based on the assumption that all words are separated by a white space. The result is a list of items this list we can evaluate. However and here it becomes tricky, the moment a word is glued to a dot or a comma it is seen as one item. As you see below the glued items are not recognized. I only get a the Carrefour item back. That is not good enough.
When you test with ContainsText()
and Find()
you will find out — like I did — that these functions are not made to evaluate a list of items, only one item.
In the world of unstructured data, as you see above the standard functions are not strong enough. Even my favorite function Contains()
, allowing you to evaluate a list, only works when there is perfect match.
How to proceed? I reached out for help and contacted Federico Stefanato. You should know that it is mainly due to his support in the Coda community when I started, that I dared to dive into the Coda realm. He is not only super skilled, but also friendly and supportive. Merci Federico!
I told him about my problem, shared a doc and he had look. He came back with one solution and one suggestion. I’ll share both with you.
The solution Federico presented
Below you see the occurrences of the supermarkets in the text that contains supermarket names glued to dots and commas.
How did Federico Stefanato get here? He applied two times regexReplace()
. The first time to normalize the list of stores removing possible white spaces, this is a kind of standard procedure. However here we could also us Substitute()
. It is the second time (in red) that stands out. What happens is the following:
"[^A-Za-z0-9]"
: is the regular expression pattern. The ^
symbol inside the square brackets indicates negation, meaning that the pattern matches any character that is not within the character class. The character class [A-Za-z0-9]
includes all uppercase and lowercase letters (A-Z, a-z) and all digits (0-9). We remove the dots, the commas and so forth to only keep the words. The last part “ “
is there to replace the values that do not fit the pattern with nothing.
So, in essence, this function replaces all non-alphanumeric characters with nothing, effectively removing them from the text.
Since I work a lot with the French language, I would use this variation:
regexReplace("[^A-Za-z0-9àâçéèêëîïôœùûüÿÀÂÇÉÈÊËÎÏÔŒÙÛÜŸ]", "")
While Spanish speaking people would prefer an other flavor.
regexReplace("[^A-Za-z0-9ñáéíóúüÑÁÉÍÓÚÜ]", "")
These patterns you put in your code snippet. In a few years, we have an AI active in our code editors that guides us, improves code and takes care of these variation. Until then, we paste them in the RegexReplace().
The result is that we can isolate the names of the supermarkets in our text. This was the solution Federico presented.
Hyperlinking the supermarkets
It is important to understand that functions as RegexReplace()
and Substitute()
flatten the output and thus do not support Hyperlink()
. When working on this problem I first assumed I did something wrong, but this how it works and we have to find a work around.
In the previous part we used RegexReplace()
to clean the items in the list and we could relate the supermarkets to these cleaned items. One possible output is that we can count per supermarket how many times they appear and we can hyperlink them. The exercise below is a finger exercise before we engage in the main challenge.
Finger exercise
We start with our previous work:
- we normalize the names of the shops
- we normalize the items in text as list due to the split on white space
Normalisation is an important part of the work when dealing with unstructured data. The previously explained RegexReplace()
is required to get the names of the shops.
The result of this finger exercise is that we can hyperlink the names of the stores we found.
Our next and main challenge is to integrate the hyperlinked items back into the text.
Hyperlinks in text
As long as we only need to get data out of a string, the layout of the string is not our concern. As you have seen, the regexReplace()
cleaned the words and in combination with the split, we turned words turned into items. These items we could compare , or in formal terms — evaluate — and next we could link them to the URL.
The challenge we face is to keep the text as it is, while replacing the items — even when glued to commas, dots and so on. I took me a while to figure this out, but here we go.
- We normalize the names of the stores by removing possible white spaces before or after the names of the stores
- We turn the words into a list of raw items based on
Split()
andForEach()
. Raw means includes punctuation, no Regex yet. - We take this second list of items and normalize these raw items using
regexReplace()
. We name this results as a list of normalized items - We use an if statement that allows us to run over each normalized item — since the foreach is still open — and we check if the normalized item contains the normalized store names
- in case we have match we use a filter to get the hyperlinked value.
- in case (fall back) there is no match, we use the normal raw items
- once done, we apply a join to bring the text nicely together.
The result is a text with hyperlinks in it. When you follow this pattern it won’t work 100%, because one detail is missing and this missing link took me days to see. An expensive insight as you can imagine.
Below two outcomes and it is the second one with the blue frame we need. For a long time I only had the left version.
The downside of this approach is:
- we need a
ForEach()
to evaluate all items. This is okay to a certain limit. A stronger solution should be based on aRegexExtract()
. So far this is an unsupported Coda function, one that would solve quite some issues when properly implemented. - We no longer have the commas and the dots glued to the store names, that damages the original texts a tiny bit.
- it does not work in lists (bullet, number) and other formatting
In general it works fine in texts you write to explain related subjects.
Why does this matter?
I observe two scenarios, but I guess there are many more.
Company wiki
You want to reference certain terms, but while writing you don’t want to ask the writer to add manually these links, also not because you want to be able to update the links automatically when needed.
Legal documents
You not only want to read about objects and people, but understand their properties. It can be achieved in two ways. We reference a URL like https://huizer.be or we can reference a row in a table that contains interesting back ground information.
My name is Christiaan and blog about Coda. Since the summer of 2023 mainly about how to Coda with AI to support organisations dealing with texts and templates. Why I focus on Coda AI you can read here: ⤵️
I hope you enjoyed this article. If you have questions feel free to reach out. Though this article is for free, my work (including advice) won’t be, but there is always room for a chat to see what can be done. You find my (for free) contributions to the Coda Community and on Twitter.
Coda comes with a set of building blocks ー like pages for infinite depth, tables that talk to each other, and buttons that take action inside or outside your doc ーso anyone can make a doc as powerful as an app (source).
Not to forget: the Coda Community provides great insights for free once you add a sample doc.