Home Blog

Quickly setting up a CDC pipeline with Estuary and Lassoo

Setting up a change-data-capture (CDC) pipeline can be a real pain. So after finding out about Estuary, I thought I'd see if I could stand one up quickly, and more importantly, without help from the technical folks on the team.

Quickly setting up a CDC pipeline with Estuary and Lassoo

Background

A few weeks back, my co-founder Max wrote a post entitled Simplifying Real-Time Data Streaming with Debezium and PostgreSQL Logical Replication. After posting about it on Reddit and other forums, several commenters pointed out that they found using a Debezium setup, paid or open-source, overly complex, especially for those who hadn’t done this much in the past.

Through interactions with various data folks on LinkedIn, I came across Dave Yaffe’s posts and his company, Estuary. Estuary offers a managed CDC and ETL pipeline product. I thought Estuary might be worth looking at as an alternative to using Debezium/Confluent to stream behavioral data from Lassoo to Snowflake. Exploring this alternative is relevant as some of our customers are looking to move away from this setup.

Setting up my CDC pipeline

After setting up my Estuary account, I was excited to create a CDC pipeline of behavioral data from Lassoo. What’s key to point out here is that I’m not a data engineer who would typically be tasked with this work. I aimed to show how easy it is to complete this setup for someone like me, whose creds are more akin to a ‘technical marketer.’

In starting this process, I aimed to stream the data collected to a Google sheet, a ubiquitous destination anyone could use to test this out. I appreciated this because I didn’t have to worry about setting up a more complex destination for this test. That said, once I saw how easy it was to stream the data to my sheet, I also set up a Snowflake database as a destination to replicate a typical customer scenario.

Let’s go through my process.

Setting up the Source

After setting up my Estuary account, I start by creating a Capture source, which for Lassoo is a PostgreSQL, our underlying data store.

alt_text

Within Lassoo, I then added the Estuary IP to my whitelist and copied the relevant credentials into the Esturay Endpoint Config. This included the publication and slot names provided by Lassoo.

alt_text
Estuary Endpoint Config Details

Once that was done and validated, Estuary presented a list of the available Lassoo tables. I chose the ‘person’ and ‘ecommerce’ tables for my test and published my Capture.

alt_text

Setting up the Destination

With my Source configured, the next step was to stream the data somewhere. As I mentioned in the intro, I chose a Google Sheet primarily because it is so accessible. I followed the steps to set my Sheet up as a Destination in Estuary, which involved authorizing access to my Google account and providing the link to my Sheet.

alt_text

Testing the Stream

Once that was complete, I opened my Sheet and saw a new tab with the name of the table I chose in my setup.

alt_text

With the historic data Lassoo had already collected loaded into my Sheet, I was excited to test the actual capture of changes.

I opened my demo site in an incognito browser, creating a new session. I started clicking around and entered some personal details to see how they would be streamed to my Sheet.

alt_text

Once Lassoo processed my new events, I saw the data in my sheet. You can see the mc_tester@michael.com email I used to help identify the new visitor.

alt_text In a real-world scenario, I could do much more with Estuary to transform the data before streaming it. For example, I may have wanted only to send over people with an email address or met a recency, frequency, or monetary value threshold. I could also just do my filtering or transformations in my downstream tools.

Going A Step Further

I was excited about the ease of setting up my pipeline to stream behavioral data from Lassoo without relying on technical folks. So, I figured I’d take it a step further and try and do the same with Snowflake. This is a pretty common destination for the data collected by Lassoo users.

The first thing I did was create a Snowflake trial account. I left all permissions and config at the default state. All I did was create a new database in Snowflake, which would be the destination for my data.

Next, I added the Snowflake credentials in Estuary when setting up my new destination. I connected it to the same source I had used for the Sheets stream - my Lassoo database.

alt_text

When I hit publish for my new Destination, Estuary created all the relevant tables in Snowflake. I then opened the Snowflake database and table where I expected to see my streamed data, and BOOM, it was there waiting for me.

alt_text

To test that my new events were streamed to Snowflake, I returned to my demo site to generate fresh data. I expected these events and visitor properties to show up in both my Sheet and Snowflake - precisely what happened.

Summary

Obviously, this is an extremely simple example. In the wild (aka production), issues always pop up. What I’m trying to get across here is that even with some basic skills, I could set up a streaming pipeline in just a few minutes. This starkly contrasts some horror stories about setting up similar CDC pipelines with paid and open-source tools - even for simple testing cases like mine.


Stay in the loop.

Get the latest Lassoo news directly in your email box.

Nice. You're now registered for the Lassoo Newsletter.


Michael Lieberman
Michael Lieberman  Co-founder @ Lassoo. Startup guy with multiple exits with a love for product and marketing.