DataOps As A Service For Your Data Integration Workflows With Rivery

Hello, and welcome to the Data Engineering Podcast, the show about modern data management.

Have you ever woken up to a crisis because a number on a dashboard is broken and no 1 knows why? Or sent out frustrating Slack messages trying to find the right dataset? Or tried to understand what a column name means?

Our friends at Outland started out as a data team themselves and faced all this collaboration chaos.

They started building Outland as an internal tool for themselves.

Outland is a collaborative workspace for data driven teams like GitHub for engineering or Figma for design teams.

By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets and code, Atlant enables teams to create a single source of truth for all of their data assets

and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker, and more.

Go to dataengineeringpodcast.com/outland

today. That's a t l a n, and sign up for a free trial.

If you're a data engineering podcast listener, you get credits worth $3, 000 on an annual subscription.

When you're ready to build your next pipeline and want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode.

With our managed Kubernetes platform, it's now even easier to deploy and scale your work flows or try out the latest Helm charts from tools like Pulsar, Packaderm, and Dagster.

With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform.

Go to data engineering podcast.com/linode

today. That's

l I n o d e, and get a $100

credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.

Your host is Tobias Macy. And today, I'm interviewing Itamar Benjemmo about Rivery, a SaaS platform designed to provide an end to end solution for ingestion, transformation,

orchestration, and data operations. So, Itamar, can you start by introducing yourself?

Yes. Thank you so much, and thank you for having me to to bring a value for the data engineering community who are listening to us. Do you remember how you first got involved in the area of working with data?

Yeah. Actually, it's a long time ago. I always been passionate about the power of good data and good insight,

actually, in this space for more than 20 years. I'm building data warehouses for living since 2002.

In

2007, I built I founded my first company

company called, like, Vision BI.

It's a company that became to be a market leader as a BI consultancy in Israel.

We we used to manage many of the largest big data projects in the past decade.

We built a lot of tools and utilities

from data quality to Python frameworks, but but we also kept the focus to bring in

great things or great innovative solution in this space. I think that the best example is how we brought Snowflake

in the very early days to Israel, and I became to be 1 of the best partners in EMEA. Few years later, alongside to my role in Israel, I wanted to do something in much bigger market, So I opened my 2nd company and with the acquisition of Keywords, which is a global data consulting company,

this became to be what we call Keywords North America division.

I led this organization

from 0 to over 100 of engineers

in several states, in several sites in North America.

This company, we did things a bit different, mainly worked with enterprises,

but we got the great recognition from market leaders like Alteryx, we were a partner of the years globally, or Tableau, emerging partner of the year in Americas.

I think that

given the fact that I used to work with hundreds of companies around the world, helping them to create their optimal data processes and data management

stack,

It's actually what brought us to create Wivery,

which basically we try to take our deep knowledge in data processing into a native SaaS service.

Can you give a bit of an overview about what it is that you're building there now and what it is about this particular space of simplifying the work of managing ETL processes and delivering it as a service that is keeping you interested and motivates you to spend your time and energy there?

Yeah. Absolutely. So I think that we started Riverie. I'm the CEO and cofounder at Riverie.

So my cofounders in the company are even alone. They worked with me for many years in the previous company, and they came up with the idea

to build Riverview right after completion of several

big data projects in the Israeli tech industry.

So at that time, we

we tried to find kind of modern data stack in this space. We run all our project in Python.

Since

what we found in that, the ETL, the legacy players,

such as Informatica

and Talend that own this market, and even

then you saw, like, tools like, Tilion's that came with the cloud, but still it for us, it was the same functionality. It's very similar. The same the same capabilities

with the same persona, which is the BI developers,

what we think that changing a lot now. Especially, we

faced with lack of ability to scale and integrations

scale with volumes.

So what we did in RiverReds, actually, this brought us to the decision

to build our 1st MVP. And if you will see the 1st MVP of RiverReds was, like,

4 or 5 connectors

with what we call the orchestration on top of it, like logic layer that run

ELT service as a native SaaS. And we started on Google BigQuery, and from that point, we expanded to Snowflake

and other cloud data warehouses.

So the idea,

what we really try to do here is to build a modern

SaaS ELT

to avoid the need of

rebuilding the same scaling processes, to maintain APIs,

Python templates, and others. So we run this practice

2 years, 2018

and 2019, as a bootstrap. And in December 2019, once we saw that we are actually winning the new players in this space, so we decided to spin off the technology and take this company from a bootstrap into a venture company.

So we raised our seed round in December 2019.

And so in January 2020, just before COVID, we were 5 folks in Tel Aviv, myself in New York.

But from that point, we never stopped. We grew a lot of the business. Now it's 100 of worldwide customers,

over 75 employees

up today.

So rebury is a very simple

SaaS ELT.

The way that we see it is the core and the backbone

of the modern data stack. We handle the ingestion, transformation, orchestrations,

even some use cases for reverse CTL,

and and capitalize everything in kind of the development life cycle.

With a risk approach, our approach to data management is to focus on the valuable modeling and the insight that you can get out of the data.

We want to eliminate

a lot of challenges in the pipeline maintenance, and we used to call it maybe too naive,

focus on insight. We will do the rest.

As far as

the overall goals of the Riverview platform and the company that you're building around it, you mentioned that you

want to simplify the work of managing ELT processes so that people don't have to

redevelop the same things over and over. And I'm curious if you can talk to the

target personas that you have in mind as you're developing Riverie

and some of the ways that you think about the different interaction points and workflows that you're trying to support for each of those different personas?

Yeah. It's a very good question. Maybe let's start with the goal, the primary goal. So the primary goal

is to be the go to solution that gives people the flexibility

to manage ELT on the best way that they see it, whether that it's a no code or low code way for analysts or BI people,

but also for that engineers with that required more heavy code processes like Python. We believe that

once you bring the platform that provided the 2

angles, it frees them to invest their time and energy

in the focus of creating the new data models,

better analysis of the data in new ways, by the way,

and, ultimately, providing their organization

with the insight they they need to they need in the shortest time and most efficient way.

You mentioned the personas. This is really interesting because what we see is that as long as you are growing with the use cases around data and more and more companies

are growing and the data became to be there in the heart of every organization,

we see different

persona that we didn't met before.

So businesses, for example, they need Riverview to accelerate their time to value or time to insight and and and, of course, support their scale.

But with the rise of the data inside of the organization, you see, like, personas that software engineers or CTOs

in startups or marketing people, folks in the enterprise or product teams, that they need to improve their data platform

all the way to data analyst or BI developers,

that they need the infrastructure

to deliver their insight via via the cloud solution.

But all of this is actually what we call the data citizens.

What we are

focusing on a daily basis when we want to present the product and the value of it, best way for us to show it is data engineers. As think about the data engineers are the people who need to manage ELT processes in their most efficient way.

You don't want them to spend times and efforts on, like, external APIs

or maintaining databases schema

according to hundreds of sources that they're getting the data from. And they also don't need to be

controlled 247 on the computing resources that they need to that require to run their scale. So we are helping them with these capabilities.

I think that different roles in the organization

might use different tools, and maybe this is why we see many tools in this space.

But the thing is that and, again, I have 20 years experience that

everyone using the same dataset. Maybe you're an analyst, but still you're using the same dataset. You are a data scientist. You're running on the same dataset in Snowflake or in BigQuery or whatever. So this is why we want to be able to cater people who need

both the the no code, but also the heavy code. And this is why we launched, for example, the Python, because you must work on the same dataset in order to have reliability

and and scale your your business.

And as far as determining the scope of what you were trying to achieve with Riverie, I'm curious what you

use as

both positive and negative sources of inspiration for understanding

what to try to

encompass in this project as well as how best to implement it and integrate it with existing tools and workflows?

I think that in

we are positive people. We're thinking only about the positive. So think about Lego, for example. With very

the fun experience, a tool

that you can play with, with a really fun experience, tools that you can play with, but at the same time, is modular

modular platform that can be shaped and used in infinite ways. We wanted to give data engineers the full flexibility

and endless possibilities when working with a product.

I cannot think about negative inspiration, but I have 1. If you think about our space, the legacy players, they did all in 1.

But with the shift to cloud and especially the SaaS, they missed something. They missed the new persona. They missed the infinite scale, the ease of use, and basically everything that you want to see at the SaaS. My inspiration is to build

more capabilities in 1 tool and not necessarily running to buy 4, 5

tools, but in the same time,

make them, for every function in the organization, make it best of breed for them.

As far as the

approach that you've taken, it's definitely

in contrast to where a lot of the industry has been focusing,

particularly in the past couple of years, with all the attention that's gone into the, quote, unquote, modern data stack of having these

very narrowly defined tools and platforms that are intended to be composed together to form your full end to end flow.

And I'm wondering what you see as the shortcomings

of that approach of having these very

narrow slices of functionality

that are

optimized for a particular use case, and then you have to do the picking and choosing across the entire life cycle of the data?

This is really great question because and specifically to ask me on this 1 because as I see it, this is our differentiation.

Revolution not happening in 1 or 2 years. Revolution takes

10 years to complete. So this is why I see this as a differentiation because

data stacks getting in, like, increasingly complex and harder to manage because

the amount of the tooling that we are seeing today. And what is great for companies to specialize

on a key pain point

that can be solved with 1

niche stack, data stack,

they often lack in a broader vision of the ecosystem and the required to get things done. So dealing with many tools, the way that I see it across dozens of or even hundreds of pipelines

means that

there is a higher chance, you know, to dealing with broken pipelines. In addition, thinking about

the business models or the pricing models. So having to deal with multiple business models from the different tools makes it nearly impossible to predict the cost

or technical monitoring.

Your data ecosystem as a whole, it's really hard to get there. What's more, and this is for me something really

something that I think that people are not yet paying a lot of attention to it, but the security risk involving of 5, 6 tools that you are basically transferring

credentials, the most secure

credentials

to get the access to the data in the organization between several SaaS services.

So overall,

from opening a ticket to monitor everything

and get the best cost effective

business, I think it's really hard for the organization.

Not always they see it in the beginning of the revolution. Sometimes you buy a tool that's doing the the EL,

and then you need to have another API that they don't support. So if you don't have the capability now, you need to buy another tool or you need to go to code.

So in our case, we believe that we want to give them

what we are doing in the right way and just walks, but in the same

time, giving them the possibility to grow. And I will give some example like we have in with DBT, for example,

which is a really amazing solution that built a lot of the community around it. It's really amazing.

So we see many cases that are using Riverview

and they decide not to use our transformation engine, decide to choose, DBT, which is amazing.

But, again, it's second tool in the stack. It's not like 5, 6, 7 tool for every feature I need to generate.

I will try to be polite and won't say that it's more a marketing message, not necessarily what the

daily basis. It's a lot of hypergrowth companies that, you know, got a lot of yeah. Which which is great. They did a very good job in terms of self-service.

But end of the day, this marketing message

push the market to think that this is the reality. This is the only way to manage data.

But think a bit different about that 1.

Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code.

Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it's often too late and the damage is done.

DataFold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests.

DataFold shows how a change in SQL code affects your data, both on a statistical level and down to

production.

No more shipping and praying. You can now know exactly what will change in your database.

DataFold integrates with all major data warehouses as well as frameworks such as airflow and DBT and seamlessly plugs into CI workflows.

Visitdataengineeringpodcast.com/datafold

today to book a demo with DataFold.

In terms of the functionality

that Riveri provides,

what do you see as the

either specific instances or categories of tools that it will replace, and what are the

components of a data platform that it aims to supplement?

Our core capabilities is ingestion, transformation,

and orchestration.

So in the ingestion,

we are competing

and sometimes replace and sometimes competing head to head with tools like Fivetran.

So in the ingestion, definitely, we are having a strong competition there. They were before us, so this is why, you know, they are more mature than us in terms of, the selling or the revenue. But in terms of the product, definitely, we are given a very good fight there. In terms of the transformation,

we believe that people need to choose the way that they're running the transformation. So for me, you choose to work with our Python data frame, great. You need to you want to choose with slowpa UDF function,

amazing. Do it. We will orchestrate.

We embedded this 1 in our solution. Same DBT,

same Python, so and same SQL. So with Viree, we want to give all the capabilities to work with any tool. So I think in the transformation, the way that I see it, it's not replacing

no 1. It's actually working together with them. In terms of orchestration,

we see

competition with tools like Airflow

and,

and, yeah, so I think that our engine is strong enough to complete this space.

Those are the core capabilities. And from that point, what we are trying to do is to leverage each 1 of the component to be the best of breed. When it comes to ingestion. For example,

we don't support only the predefined report. We have 1 80 plus native connectors exactly like 5th 1. We are I think that the overlap is around 85%,

which is great. But in our case, when you need to have additional API or additional source that you want to get the data, you don't need now to go to code. You can go to code and write Python, but you can also use what we call our action or custom API function. It's more like Pacman capabilities

that you can get data,

but for mainly for the data data ingestion purposes.

We invested a lot in the CDC to compete on the hyperscale

databases replication,

whether if it's from cloud or from on prem that for companies that move to cloud. So this is this is the ecosystem

that we see.

As far as the actual Rivery platform itself, can you talk through some of the design and architecture of how you went about implementing it?

The platform builds as a multitask runners that get metadata

and know to handle these tasks by the definition of the metadata that we get in from the metadata, whether if it's for the transformation or for the ingestion.

Everything is dynamic and built as dynamic task, like all built to cloud, of course. The platform runs on Kubernetes, so we have the ability to scale up, scale down

automatically, and then really in the most efficient way. Maybe to

good to focus on dream

core engines that we built, that we are using as the foundation in our product.

So first 1 is the CDC. I touched on this 1. We built our real time engine. From day 1, we built it in stages. So in the beginning, it was, like, started with multi tables replications,

like the legacy players, like Talent, or, like, other tools to use to do, like select statement on top of the databases. But we did nice wizard that you can get all the tables right away.

But then we saw that people need to manage the incrementals and everything, so we moved into the second version, which was CDC.

We built an engine that runs including some of the capabilities of the Debitium. We also, by the way, contribute to this great open source project.

But recently, in the past few months,

we actually took a step

up because we saw that in the scale,

there are some challenges in the Debitium that wasn't so good for us.

So then what we did is that we built the core engine, the core CDC engine,

and in order to

reduce or eliminate

all the limitations that we face when we build the solution on top of, Debitzium.

Second thing in the technology is, I think, differentiate

a lot with Re is the multi tenancy. Everything in Reverie start from a single tenant, which is we call it in the platform environment.

So every account in Reverie

built by 1 to n environments

or tenants. Each tenants include

completely different content, different,

like, separate computing engine behind the scene

from users, variable,

workflows, connections, metadata engine, everything runs on a single tenant.

Why it's so good and why it's so strong, aside the fact that you can scale, it's actually support

the data engineers and the architect

when they want to build their solution to enterprise scales, means that they will governance,

deployment.

This serves the engineers'

use case from building their development, QA, production environments,

or just building a sandbox from scratch and deploy the entire solution to a new branch or a new sandbox.

We see that partners

that, you know, the SI partners, they want to be to shift their business into fully managed or managed service, what they call. And for them, we see a lot of use cases of SI partners that took this solution

and built a multi tenant or multi environment. So every account that they are managing the data workflows for them, it's actually a single tenant,

but they have the full account with the templates that they now they can deploy

their practices

and the data models between

their accounts.

But once completing the deployment, then they can make a tailored modification

in the single tenant. This is really strong capabilities.

The 3rd engine,

it's about

how we are managing the API

and the ingestion. So to be honest,

every data engineer will tell you that to build an API

and get the data, it's not such a complex mission, to be honest. The biggest challenge is how to do it at scale, how to get the great performance,

and of course, to have the built in validations

to get the most qualified data sets.

So 1 of the most important

decisions, I think, that we made in the early days was to that we want to build everything

in house and support A to Z, the ingestion,

even though it sometimes seems hard because you are searching on a on a sales call and you support only 60, 70 percent of the APIs that the customer needs,

we still insist to develop everything and not using other third party. People used to tell me, like, hey. Take see data drivers embedded inside of your solution. It will scale your business. I told them that, yeah. I agree. But eventually,

I'm so happy because the fact that we didn't take this shortcut help us in 3 ways. 1, in the way that we are independent to support any required improvement when changing the APIs, and it's happening high

like, pretty frequently.

We know to answer to address any change, and we are not depends on any other vendor.

So we know that we can improve everything, performance,

quality,

by our engineers and not required,

involvement of other vendors.

2nd is the security. The security is really important thing for us, and all the responsibility

goes to our, like, you know, there is we are holding most critical credentials managed in our system. So we need to make sure that it's managed well, encrypted, and of our highest security standards.

The third way is the bootstrap. As I mentioned, we were bootstrap for 2 years, over 2 years and a bit more. We didn't have a lot of money. So this is actually

made

our ability and enforced us to take our ability of development to the edge the way that we we can speed up the development. So the results we launched,

we exposed

what we call the actionriver, which is the custom connectors.

Still, you can build with this feature a custom API with no code because we use it internally and we just expose it to our audience. So if we don't have an API that we are using internally

so over 60% of our customers using it and it's just, like, 1 and a half years on air means that hundreds of customers using it. And also

the fact that we built a strong infrastructure

help us to close the gap, like, from 0 to 1 90 plus connectors, like our main competitors in that space, which is Fivetran. And and the idea is that we will be able to scale more and more because the infrastructure that we build behind the scene of this API management is we believe that it's more mature and and, let's say, help us to to do it really fast.

In terms of the overall design and goals of the platform, you mentioned

that you went through a few different stages of development and iteration. And I'm wondering how the

early ideas and assumptions about what you wanted to do with the platform, what customers wanted out of it, how well they've stood up, and how they have evolved in the process of going to market and working with your design partners?

We designed the platform to deal with the simple problem of maintaining data pipelines. But

when time move on and we started to see more more and more requirements

from our users or including

other core

processes of the ELT, like, including the orchestration,

transformations,

and and in the transformation, we added the, like, the example that we added the Python.

So I think that

and the example that I gave you with the APIs or the CDC,

so not every time that we are launching a feature is immediately

we see results.

But I think that

we are trying to make it together with our customers because our customers we are customer obsessed, and our customers and our users are the most important thing for us. So what we are trying to do is that every simple

MVP,

we want to make it

just works and doing with the design partners. We're always picking up, like, several

customers to be kind of private view and design partners for every new features because we want to get the feedback right away. We don't want them to wait until we will develop the feature, the full capabilities, and and, you know, and then see that there's no value. I think we did it. We had the luck that we were a bootstrap, so that was in our DNA.

You know, if I'm thinking about 2018,

10 of our customers, like, 10 customers we had at that time, like Yellowhead, WalkMe, Dynamic Hills, even AB InBev, a huge company, or Minute Media,

they based on their demand and their requirements,

we modified the platform. And guess what? All of them are still

great partners. Cannot call them customers even. It's really true partners for us. As a startup, we want to make sure that we are

making adoptions

and adapt to market changes and demand. So in the way that our product features comes with via our customers, testing the new product ideas

is a key for us.

That way we can quickly react,

drop sometimes drop a feature and improve it before going live with everyone.

Again, the Python or how we develop the CDC or the environments,

those things came from our audience, from the from the customers.

Particularly given the fact that you were at least initially bootstrapped,

I'm wondering what you used as your criteria for determining what

the initial product scope needed to be to ensure that you could

prove out the ideas that you had and how well they would work in real world situations

while still being achievable in a reasonable amount of time?

First of all, we are not bootstrapped anymore. You know, just to clarify, we funded now closing our b round, actually, a massive b round.

So

what we are trying to, as I mentioned before,

maybe it sounds reputable, but it's

taking a need,

what we believe. We have our own road map. We came from this industry. We are 20 years doing this mission of building data management. So we have our own vision,

but in a a known style. But in the same time, we are working with hundreds of customers. It's a luxury. It's really luxury. We have self-service

practice or pay as you go. Just in the past since we launched it, like, 2 months over 40 accounts using these capabilities, and we see

how the consumption

growing with them and what we are investing a lot is to how to make their life and their onboarding much easier. So everything that we are doing, it's really closed for the usage.

This is our biggest advantage. And by the way, not only us. It's an advantage of any SaaS platform, like Snowflake, like BigQuery. The fact that you are a SaaS and you

have all the logs and everything, you don't need to jump on a ticket and asking the customer to send you a log. You see everything, and you know how to be proactively

to help them to get things done. So we are trying to be a data company to help our customers with the data that we have. Something that 1 customer failed in about a minute. We are going to see it in other customers so we can prevent this 1 upfront.

Another interesting element of your business and your product is that being a managed service, it has a different path to adoption than what a lot of organizations,

particularly in recent years, have been doing with the open source, bottom up, engineer led adoption strategy. And I'm curious how you've been thinking about your approach if you're aiming for bottom up adoption by getting in with engineers to solve a particular pain point or if you're looking to go more top down by

working with

senior leadership and engineering management to help to solve problems at more of the organizational level.

I think that if I need to build a company again, maybe I will choose to build it in an open source and then jumping into the commercial side of things.

But because we are through selling to the data engineer, we are selling actually to the individual that actually work with the platform.

So the way that we solve it, since we are not an open source, but what we are doing is we are we launched 2 months ago or 3 months ago with a pay as you go program, which is, you know, you can start really small and think, I don't know, dozens of dollars a month if your consumption is pretty low. And we are going to take it another step further. Like, in the next 2 weeks, you are going to see our

pricing model and how this pricing model works well with this 1. I won't give all the details about that 1, but this is actually

a big improvement that we are doing as a product led growth company. It's a company that's selling its software to the data engineers and the individual.

And yeah. So several things there are like, this is this is what we want to do. About the open source and the community,

we do believe, and we're already having these discussions inside of the company, that to take some of the capabilities that we have

and make them

as an open source.

We believe on this 1. We, as I mentioned, we contribute to

the Debezium at the time that we use this 1. We want to contribute to the community

with some elements that we are having in the platform. We did it, but not in the way that we want to do it. And we we see see ourselves doing it, and it's going to grow this exposure

in the coming months.

Another interesting aspect of building a business in the data ecosystem at all is the rapid rate of change and evolution that it's been going under. And I'm curious how

your focus on being an end to end option for data engineers and data teams

watching the surrounding ecosystem go through its own shifts of kind of reinvention and self discovery

has either validated

or influenced

the product direction that you're focused on?

I will take it for 3 parts. 1 is

data is product. We strongly believe on this 1 that, you know, moving

the from data teams that, you know, we count them as the service oriented inside of the organization to actually a team that acting like a product team that creating an internal product, internal or external product that rely on data.

The increase of support for the data stack development life cycle, for example, the multiple environment and the deployment, this is

basically, we brought those those foundation and those capabilities

into this approach. So we really believe on the data as product,

and to get there, you need to have a scale deployment

environment,

management,

like APIs,

CLI, everything that you expect to see in the development life cycle.

The second thing is what we see a wider range of people and personas that involved in the data team. So what we see in the company is that there is no 1 size fit to all, and we need to to support

different types of persona. So you can be an analyst or BI person

that you can get everything via the GUI into, you know, via the UI

and doing, like, low code. Low code, we call it as an SQL, but you can get

APIs. You can run SQL and building tables and schema automatically by the product, which is great, or using, by the way, dbt in that case. But for the data engineers and data scientists,

we want to empower them. Then this is why we came up with industry first

standard, like

Python as a standard in the pipeline.

No 1 doing it in the industry. Many folks that really expert in this space and when they saw what we build there, that in 1 workflow, you can get data from ingestion,

running transformation,

whatever preferred tool that you choose for transformation,

then you have all the tables and the structure in the data warehouse, let's say Snowflake.

But now in the same workflow, in the same pipeline, you get the dataset from Snowflake, send it into a data frame, running machine learning, and bring back

from the data frame at the source back to the Snowflake, and from that point to the endpoint applications.

So we believe

that our responsibility

in this market,

especially once the enterprise will grow more and more, they need to have tools that they can scale

commercial,

education,

and all capabilities.

Last thing, a lot but not least, is the end to end. Talk a lot about that, the end to end versus the multiple tools.

So, again,

the market pushed by a lot of messaging about for this, you need to use this tool, and for this mission, you need to do this 1.

I'm sure that the right balance

of breadth and the depth in our functionality,

we believe that we we can fully handle the foundation, which is ingestion,

transformation, and even some other use cases, more complex 1 that you need to have the Python for it. But

we are not

so fanatic to do everything, and we give you the possibility to use everything in WebRee.

But

in the same time,

you can get these

enterprise needs, you know, enterprise capabilities like reliability,

agility, scalability, and security.

And so I think that we are trying to give you everything, but, again, freedom to connect everything. Since we have an API, since we are a CLI company,

definitely, you can connect and integrate your

other tools into this stack. But we believe that this is the stack that going to lead the industry for the next couple of years when it comes to data operation or data as product as as we spoke before.

As far as the actual

workflow

and getting started with VIVORY and incorporating it into your day to day efforts of building and managing data systems, I'm wondering if you can just talk through

the overall steps of saying,

I need to be able to build this new data workflow that's ingesting from

whether it's an API or a set of databases

or multiple different sources, load it all into a set of tables in my data warehouse or my data lake, and then

orchestrate the transformations for being able to provide a resulting

database view for downstream consumption and just all of the sort of capabilities and touch points along the way for data engineers and any sort of collaboration capabilities for being able to manage that handoff from preparing the data to being consumed by downstream analysis?

It's actually very easy getting into the product.

Immediately,

what you will see is, like,

the ability to start building your pipeline within minutes. So, basically, you can choose 1 of the native connectors, or you choose to go what we call kits, which is data models

templates

built by

people like you, like the data engineers or mainly by our partners, and that took

specific use cases and repeatable use cases and put the redefined

data models or the pre engineered

data model. So

you can use

the the kits. You can use your connector

to run transformation. If you're familiar with DBT, bring DBT in. If you're familiar with SQL, you you run the transformation.

And if you are running Python, so that's great. You don't need to know how to bring,

a machine and building Spark machine and everything. You can just

define, you know, a data frame, which is basically 1 definition, the name of the data frame, and that's it. Once you have this in the platform, you just need to define what is the computing engine that you want to have for this purpose. So the product itself is a self-service.

I don't believe that data engineers are getting into the trials, and we see

it actually as a fact

that majority of them

succeeded to build the first river within minutes. This is another KPI that we are measuring ourselves

or

our company. We have what we call sign up workflows.

We are measuring on it every day with Slack alerts,

how long it took them to succeed to build their river,

and what is their overall experience in that mindset. So it's definitely the our main focus as a PLG company.

1 of the interesting

features that you have is a catalog of kits for being able to manage that reusability and avoiding rework like you were mentioning at the beginning of the conversation and being able to use these to get common workflows set up quickly and be able to have sort of a a catalog of functionality

for the organization.

And I'm wondering if you can

talk through how you manage regression and integration testing for those kits as the Rivery platform evolves or as the sort of goals and governance structures for an organization

change once they've started to adopt and become more sophisticated with Rivory?

First of all, we have a process for this 1. We have a dedicated team. It's the kids team inside of Rivory,

and they are basically engineers

that work

in to build the kids, sometimes with customers, sometimes with partners.

And then from that point, we are maintaining them. Like, even today, I saw in the Slack that we launched the intercom as a native connector,

which is great. The second method was, okay. Now we can remove the kits that we built for our customers so far that used by

Keith, that used by Action River, that custom connector. So now we can remove this 1 because we have everything in the native.

So it's helped

our kits include both the business logic

and the technical setup that required

for our workflows.

But beautiful thing that, first of all, everyone could build build the kits. You can build the kits internally, not necessarily should must go to our public library.

So this can include

the SQL script, the data models, the Python script, the other tools like the DBT package, the source and target connections definition, pipelines, workflows, everything.

So

the good things that that because you have all the capabilities, it's doable, and this is how we came to this 1. But on top of this 1, it's really important to show that while the business logic

is custom written for each kit, the underlying functionality uses our standard

out of the box functionalities.

Means that in this way, we don't need to update

every kit or they don't need to update everyone when changing their underlying connections or their component when the component is getting updated.

The most amazing things with kits is that,

first, every user can build it. It can be kit internally that you can

deploy in between your tenant.

We encourage the community

companies and individual

to build their kids in the in the library

mainly because we want to build a community, not mainly not not it's not a question. We are going to build a community around that 1. We are going to monetize that 1 so everyone, companies, and individual

in the space

can actually

bring their know how into the kits. And from that point, they will get part of the revenue that we are seeing out of based on consumption. So if you think about that,

this will be the community version of Freeberry.

Are you struggling with broken pipelines,

stale dashboards,

missing data? If this resonates with you, you're not alone.

Data engineers struggling with unreliable data need look no further than Monte Carlo, the leading end to end data observability

platform.

Trusted by the teams at Fox, JetBlue, and PagerDuty,

Monte Carlo solves the costly problem of broken data pipelines.

Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, dbt models, airflow jobs, and business intelligence tools,

reducing time to detection and resolution from weeks to just minutes.

Monte Carlo also gives you a holistic picture of data health with automatic end to end lineage from ingestion to the BI layer directly out of the box.

Start trusting your data with Monte Carlo today. Go to data engineering podcast dotcom/monte

Carlo to learn more.

In terms of the ways that people have been using Rivory and applying it to their particular organizations and problems, what are the most interesting or innovative or unexpected ways that you've seen it used?

Yeah. I will I will mention 3 3 cases.

So 1 day, I got a call. I think it was 2018. We were a really small team.

It was a Fortune 500 company. It's in our website, and everyone knows this company.

And they were the biggest sponsor of the World Cup at that year. And they came to us, like, called me to my mobile. My mobile was in the in the website at that time. The guy was global director of marketing in this giants, and he called me

and asked me, look, in about 1 month, we have the World Cup. We are the main sponsor. The CEO going to be there, 40 people. It's going to be a war room, 5 agencies. Okay. What do you do? Then he told me, I need to build a digital data lake. We tried to do it internally for many months, and we see that

the high frequency of updates from the APIs

dropping what we are building every time. And, actually, at that day,

Facebook changed dramatically their APIs because of all the compliance and regulation at that weeks exactly.

So I told him, yeah. We will do our best. We help them to build

really an amazing digital

data lake.

Within 1 month, we actually built 8 API set miles only to support their needs, and it was 1 of the most successful projects there. The second use case that also surprised me was, like,

Maar, the case study also in our website.

Maar Properties is real estate companies, but not only real estate, it's the experience of

shopping and malls and the ecommerce in Dubai.

So they came to us from from the Snowflake

Partner Connect, if you're familiar with this 1. So we were 1 of the first 5 software SaaS platform there. So they came into delivery

and they wanted to have some small use case

to how the experience in their stores and malls, what is the engagement with their consumers.

So they started with this 1 and after 3 months, they called us or sent us an email, we need to do a security check. And, you know, I was afraid. The things that you are afraid the most is security.

And I asked them we asked them in the email, yeah. No problem. But something happened. No. No. It's really amazing. But we are working with the product 3 months, and what we see is that we want to expand the use case to overall

replace our legacy engines when it comes to database replication.

They at that time, they move to Snowflake, and they build big migration from on prem databases to Snowflake and, of course, Salesforce integrated and everything.

So those kind of things that you are not expecting

as a Firefox company to work with these giants.

And this is, I think, was the reason that I believe that, look, something

big going on in this space. You must be responsible for those customers. You need the money to grow. You need to develop

this engine in a more efficient way, and this is why we decided to do the spin off.

Lately, we see a lot of use cases. We have an edge fund

here in the East Coast

that they built a to z solution for their portfolio of companies,

and they are using reverie for the ingestion and transformation. They are using DBT packages

for the transformation.

And now

they were 1 of our partners,

and they added the machine learning piece with the Python. So now they have the full solution

that they can basically empower their portfolio of companies.

And then we were able to build,

with maybe the most comprehensive

pipelines that manage and orchestrate by by Riverbree, but, again, including

other tool in the stack. If you have the favorite tool, bring it in and use it, especially if it's an open source.

In your own experience of building and scaling this company and working with your customers, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?

So every day you have a challenge because you want to be to be better. Every day you the time is running, the time is fly, and we are doing a lot of efforts to make

Privery

as the industry standard. And it's not easy because we came a bit late when it says when we check the timelines, but we

encourage the community to check this out and check only if we are much better than the competitor just on this 1. Give us a shot. So I think that this is the majority of the focus because when you are doing right things, look, our

churn

rate is less than 2%.

It's really low. And you can imagine that this 2% were a few customers that were really low volume, and we actually never succeed to grow there. Sometimes we lost a champion, sometimes it's whatever, but we are working day and night to make our customers happy.

We believe that with the efforts that we are doing in the PLG,

the new pricing model that we are about to share in very few weeks,

and and the entire boarding processes, we believe that the exposure

of the company will be high, and we have a lot of credibility. We have a strong partnership

with Snowflake, as I mentioned, PartnerConnect,

embedded in PartnerConnect,

Premier Partnerships, and also the validation process. 1 of the earliest companies that did the same with Google, we are doing now with the same with AWS

and being a part of this marketplace.

So all of these efforts, I remember to mention the strong project, really ambitious project that we made with Databricks.

They launched their Delta Lake as an SQL on top of Databricks. This is definitely they saw a value, and Webberie came out to us and told us, we want to put you in as the 1 of the early adopters of the Partner Connect. We were only 3 players there. We were 1 of the first 1. We are working day and night with first of all, for our customers and prospect,

then

working really, really hard to get the exposure out there. Yes. It's not easy because you're competing with giants,

but if you're good, you have the confidence that you will get there sooner or later,

but better sooner.

And for people who are looking for a way to manage their

full extract, load, transform orchestration,

what are the cases where Rivery is the wrong choice and maybe they're better suited with a different fully vertically integrated solution or actually picking and choosing

and using multiple different tools for each of those different stages?

So as I mentioned, we want to be best of breed in each 1 of the capabilities, but the wrong choice if you want to do this mission but in non cloud,

even more non SaaS.

Sometimes we have

in a prospect that coming in and say, wow. It's amazing. We like it, but we want to buy the software when we want to install the software of our service. No. It's not doable.

I guess that Snowflake had the same challenge, you know, when moving from Teradata to Oracle. We're all owing a lot to Snowflake about the way that they insist, I guess,

to keep their solution at the SaaS.

We are SaaS. We are not on prem. We are not a software company. We are SaaS. What we are doing day and night is

experience of a SaaS. So if you are not

ready to use a SaaS ELT,

I think that's better not to call us, let's say. And because it would be a wrong wrong decision for them because we will insist

to run as a SaaS with all of the secure required security, whether it's

virtual private links or SSH, SSO,

all everything that we have in place, we certified and compliance by

everything, all HIPAA or SOC 2 or anything that you need. You want to see our architecture, how we are encrypting the data, and how we are managing our security.

Excellent. We will help you to get there. It's all up there in our website and in our blogs and our content. And if you need to speak with engineer, we will speak with engineer,

but

we will never take the amazing solution that we built here at the SaaS into a mode

that is going to be a software, because we won't be good at that 1. As you continue to

build and grow and work with your customers, what are the things you have planned for the near to medium term of RoopFree?

So I think I touched that 1 when we spoke about the kits. We want to continue to build the community,

basically like minded people

that passionate and passionate for data as we are. So we want them to build more kits,

and

it's not just

building anything without that 1. It's basically to build a community, a collaboration

between

companies,

between

individuals.

And for that 1, we're also going

to basically

finding the right monetization model in order to make sure that we will we will drive the growth of this 1.

The second piece is improving the integration

or working on integration with

other, let's say, data catalog enterprise data catalog, companies like Colibra or like Alation.

When we are coming to enterprise, the first question is

how we can get the

metadata that exists in Riverview or the linear that exists in Riverview. How can I see it in my tool? So now they can do it with action or the rest API, custom API that we have or via our

API or CLI engine, or to get the logs directly there.

But

the goal is to have 1 click integration with those tools. We are not going to replace enterprise data catalog. Data catalog is not only around the ETL piece. It's all over the place in data.

So we want to help them to get

the data that we have that existed in our repository

and empower their users to see the how beautiful the lineage looks like when you're running in Rivery.

Are there any other aspects of the Rivery platform

or the overall space of building data products and offering them as a service that we didn't discuss yet that you'd like to cover before we close out the show?

I believe that

down the road, we will see more, again, part of the Data Eyes product, we will see more native integrations with machine learning tools. And because with tools like Rivery and and both Fivetran and other players

that giving you the flexibility and the self-service experience,

by definition, it's pushing the people to analyze and know better their data. And and I think that this trend

will push for more

machine learning and AI tools

that I think that down the road, it will be native integrations

between

the back end tools like Riviere or like

other ETL tools into

their brain, which is the machine learning or collaboration tools that exist in out there. Alright. Well, for anybody who wants to get in touch with you and follow along with work that you're doing, I'll have you add your preferred contact information to the show notes. And as a final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.

So I think that,

you know, companies like,

Thor, like, the other emerging

technologies,

we are trying to cover all the gaps that you see in the market. I don't think that there is

now any specific

unique gap. I don't want to say it didn't solve. I think that

what we try to do with the Python is actually the first step, and I guess that other companies will follow us on this 1 and be able to, you know, bring the Python into the data pipelines and make everyone,

like, walking on the same dataset.

To build

a fragmented

solution,

it's not the right way when you're speaking about single source of the truth or think thinking about building the right way to building the data warehouse or data processes in the organization. So I think that a lot of companies will follow us. I think that in the next 5 years,

you won't see the same picture as you see now. You won't see so many tools. What you will see is consolidations.

You will see that even company like 51, I believe that won't looks the same. It will maybe buy other companies and, you know, reduce the number of players that you will see in this industry. And going back to the place that you have

1, 2, or 3 data stacks, I don't believe that every feature

deserves to be, let's say, an industry.

Alright. Well, thank you very much for taking the time today to join me and share the work that you've been doing at Rivery. It's definitely a very interesting platform,

great product for people who are looking for a more unified flow to their data life cycles. So I appreciate all the time and energy that you and your team have put into that, and I hope you enjoy the rest of your day. Thank you so much, Tobias. Thank you. I really enjoyed this session. Thank you so much.

Listening.

Don't forget to check out our other show, podcast.init

atpythonpodcast.com

to learn about the Python language, its community, and the innovative ways it is being used.

And visit the site at data engineeringpod

cast.com to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com

with your story. And to help other people find the show, please leave a review on Itunes and tell your friends and coworkers.

Data Engineering Podcast

Summary

Announcements

Interview

Contact Info

Parting Question

Closing Announcements

Links