Summary
Data engineering is a practice that is multi-faceted and requires integration with a large number of systems. This often means working across multiple tools to get the job done which can introduce significant cost to productivity due to the number of context switches. Rivery is a platform designed to reduce this incidental complexity and provide a single system for working across the different stages of the data lifecycle. In this episode CEO and founder Itamar Ben hemo explains how his experiences in the industry led to his vision for the Rivery platform as a single place to build end-to-end analytical workflows, including how it is architected and how you can start using it today for your own work.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription
- Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold.
- Are you looking for a structured and battle-tested approach for learning data engineering? Would you like to know how you can build proper data infrastructures that are built to last? Would you like to have a seasoned industry expert guide you and answer all your questions? Join Pipeline Academy, the worlds first data engineering bootcamp. Learn in small groups with likeminded professionals for 9 weeks part-time to level up in your career. The course covers the most relevant and essential data and software engineering topics that enable you to start your journey as a professional data engineer or analytics engineer. Plus we have AMAs with world-class guest speakers every week! The next cohort starts in April 2022. Visit dataengineeringpodcast.com/academy and apply now!
- Your host is Tobias Macey and today I’m interviewing Itamar Ben Hemo about Rivery, a SaaS platform designed to provide an end-to-end solution for Ingestion, Transformation, Orchestration, and Data Operations
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what Rivery is and the story behind it?
- What are the primary goals of Rivery as a platform and company?
- What are the target personas for the Rivery platform?
- What are the points of interaction/workflows for each of those personas?
- What are some of the positive and negative sources of inspiration that you looked to while deciding on the scope of the platform?
- The majority of recently formed companies are focused on narrow and composable concerns of data management. What do you see as the shortcomings of that approach?
- What are some of the tradeoffs between integrating independent tools vs buying into an ecosystem?
- How is the Rivery platform designed and implemented?
- How have the design and goals of the platform changed or evolved since you began working on it?
- What were your criteria for the MVP that would allow you to test your hypothesis?
- How has the evolution of the ecosystem influenced your product strategy?
- One of the interesting features that you offer is the catalog of "kits" to quickly set up common workflows. How do you manage regression/integration testing for those kits as the Rivery platform evolves?
- What are the most interesting, innovative, or unexpected ways that you have seen Rivery used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Rivery?
- When is Rivery the wrong choice?
- What do you have planned for the future of Rivery?
Contact Info
- @ItamarBenHemo on Twitter
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Links
- Rivery
- Matillion
- BigQuery
- Snowflake
- dbt
- Fivetran
- Snowpark
- Postman
- Debezium
- Snowflake Partner Connect
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. Have you ever woken up to a crisis because a number on a dashboard is broken and no 1 knows why? Or sent out frustrating Slack messages trying to find the right dataset? Or tried to understand what a column name means? Our friends at Outland started out as a data team themselves and faced all this collaboration chaos. They started building Outland as an internal tool for themselves. Outland is a collaborative workspace for data driven teams like GitHub for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets and code, Atlant enables teams to create a single source of truth for all of their data assets and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker, and more.
Go to dataengineeringpodcast.com/outland today. That's a t l a n, and sign up for a free trial. If you're a data engineering podcast listener, you get credits worth $3, 000 on an annual subscription. When you're ready to build your next pipeline and want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode. With our managed Kubernetes platform, it's now even easier to deploy and scale your work flows or try out the latest Helm charts from tools like Pulsar, Packaderm, and Dagster. With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform.
Go to data engineering podcast.com/linode today. That's l I n o d e, and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host is Tobias Macy. And today, I'm interviewing Itamar Benjemmo about Rivery, a SaaS platform designed to provide an end to end solution for ingestion, transformation, orchestration, and data operations. So, Itamar, can you start by introducing yourself?
[00:02:10] Unknown:
Yes. Thank you so much, and thank you for having me to to bring a value for the data engineering community who are listening to us. Do you remember how you first got involved in the area of working with data? Yeah. Actually, it's a long time ago. I always been passionate about the power of good data and good insight, actually, in this space for more than 20 years. I'm building data warehouses for living since 2002. In 2007, I built I founded my first company company called, like, Vision BI. It's a company that became to be a market leader as a BI consultancy in Israel. We we used to manage many of the largest big data projects in the past decade.
We built a lot of tools and utilities from data quality to Python frameworks, but but we also kept the focus to bring in great things or great innovative solution in this space. I think that the best example is how we brought Snowflake in the very early days to Israel, and I became to be 1 of the best partners in EMEA. Few years later, alongside to my role in Israel, I wanted to do something in much bigger market, So I opened my 2nd company and with the acquisition of Keywords, which is a global data consulting company, this became to be what we call Keywords North America division.
I led this organization from 0 to over 100 of engineers in several states, in several sites in North America. This company, we did things a bit different, mainly worked with enterprises, but we got the great recognition from market leaders like Alteryx, we were a partner of the years globally, or Tableau, emerging partner of the year in Americas. I think that given the fact that I used to work with hundreds of companies around the world, helping them to create their optimal data processes and data management stack, It's actually what brought us to create Wivery, which basically we try to take our deep knowledge in data processing into a native SaaS service.
[00:04:12] Unknown:
Can you give a bit of an overview about what it is that you're building there now and what it is about this particular space of simplifying the work of managing ETL processes and delivering it as a service that is keeping you interested and motivates you to spend your time and energy there?
[00:04:28] Unknown:
Yeah. Absolutely. So I think that we started Riverie. I'm the CEO and cofounder at Riverie. So my cofounders in the company are even alone. They worked with me for many years in the previous company, and they came up with the idea to build Riverview right after completion of several big data projects in the Israeli tech industry. So at that time, we we tried to find kind of modern data stack in this space. We run all our project in Python. Since what we found in that, the ETL, the legacy players, such as Informatica and Talend that own this market, and even then you saw, like, tools like, Tilion's that came with the cloud, but still it for us, it was the same functionality. It's very similar. The same the same capabilities with the same persona, which is the BI developers, what we think that changing a lot now. Especially, we faced with lack of ability to scale and integrations scale with volumes.
So what we did in RiverReds, actually, this brought us to the decision to build our 1st MVP. And if you will see the 1st MVP of RiverReds was, like, 4 or 5 connectors with what we call the orchestration on top of it, like logic layer that run ELT service as a native SaaS. And we started on Google BigQuery, and from that point, we expanded to Snowflake and other cloud data warehouses. So the idea, what we really try to do here is to build a modern SaaS ELT to avoid the need of rebuilding the same scaling processes, to maintain APIs, Python templates, and others. So we run this practice 2 years, 2018 and 2019, as a bootstrap. And in December 2019, once we saw that we are actually winning the new players in this space, so we decided to spin off the technology and take this company from a bootstrap into a venture company.
So we raised our seed round in December 2019. And so in January 2020, just before COVID, we were 5 folks in Tel Aviv, myself in New York. But from that point, we never stopped. We grew a lot of the business. Now it's 100 of worldwide customers, over 75 employees up today. So rebury is a very simple SaaS ELT. The way that we see it is the core and the backbone of the modern data stack. We handle the ingestion, transformation, orchestrations, even some use cases for reverse CTL, and and capitalize everything in kind of the development life cycle. With a risk approach, our approach to data management is to focus on the valuable modeling and the insight that you can get out of the data.
We want to eliminate a lot of challenges in the pipeline maintenance, and we used to call it maybe too naive, focus on insight. We will do the rest.
[00:07:28] Unknown:
As far as the overall goals of the Riverview platform and the company that you're building around it, you mentioned that you want to simplify the work of managing ELT processes so that people don't have to redevelop the same things over and over. And I'm curious if you can talk to the target personas that you have in mind as you're developing Riverie and some of the ways that you think about the different interaction points and workflows that you're trying to support for each of those different personas?
[00:08:00] Unknown:
Yeah. It's a very good question. Maybe let's start with the goal, the primary goal. So the primary goal is to be the go to solution that gives people the flexibility to manage ELT on the best way that they see it, whether that it's a no code or low code way for analysts or BI people, but also for that engineers with that required more heavy code processes like Python. We believe that once you bring the platform that provided the 2 angles, it frees them to invest their time and energy in the focus of creating the new data models, better analysis of the data in new ways, by the way, and, ultimately, providing their organization with the insight they they need to they need in the shortest time and most efficient way.
You mentioned the personas. This is really interesting because what we see is that as long as you are growing with the use cases around data and more and more companies are growing and the data became to be there in the heart of every organization, we see different persona that we didn't met before. So businesses, for example, they need Riverview to accelerate their time to value or time to insight and and and, of course, support their scale. But with the rise of the data inside of the organization, you see, like, personas that software engineers or CTOs in startups or marketing people, folks in the enterprise or product teams, that they need to improve their data platform all the way to data analyst or BI developers, that they need the infrastructure to deliver their insight via via the cloud solution.
But all of this is actually what we call the data citizens. What we are focusing on a daily basis when we want to present the product and the value of it, best way for us to show it is data engineers. As think about the data engineers are the people who need to manage ELT processes in their most efficient way. You don't want them to spend times and efforts on, like, external APIs or maintaining databases schema according to hundreds of sources that they're getting the data from. And they also don't need to be controlled 247 on the computing resources that they need to that require to run their scale. So we are helping them with these capabilities.
I think that different roles in the organization might use different tools, and maybe this is why we see many tools in this space. But the thing is that and, again, I have 20 years experience that everyone using the same dataset. Maybe you're an analyst, but still you're using the same dataset. You are a data scientist. You're running on the same dataset in Snowflake or in BigQuery or whatever. So this is why we want to be able to cater people who need both the the no code, but also the heavy code. And this is why we launched, for example, the Python, because you must work on the same dataset in order to have reliability and and scale your your business.
[00:11:09] Unknown:
And as far as determining the scope of what you were trying to achieve with Riverie, I'm curious what you use as both positive and negative sources of inspiration for understanding what to try to encompass in this project as well as how best to implement it and integrate it with existing tools and workflows?
[00:11:31] Unknown:
I think that in we are positive people. We're thinking only about the positive. So think about Lego, for example. With very the fun experience, a tool that you can play with, with a really fun experience, tools that you can play with, but at the same time, is modular modular platform that can be shaped and used in infinite ways. We wanted to give data engineers the full flexibility and endless possibilities when working with a product. I cannot think about negative inspiration, but I have 1. If you think about our space, the legacy players, they did all in 1. But with the shift to cloud and especially the SaaS, they missed something. They missed the new persona. They missed the infinite scale, the ease of use, and basically everything that you want to see at the SaaS. My inspiration is to build more capabilities in 1 tool and not necessarily running to buy 4, 5 tools, but in the same time, make them, for every function in the organization, make it best of breed for them.
[00:12:41] Unknown:
As far as the approach that you've taken, it's definitely in contrast to where a lot of the industry has been focusing, particularly in the past couple of years, with all the attention that's gone into the, quote, unquote, modern data stack of having these very narrowly defined tools and platforms that are intended to be composed together to form your full end to end flow. And I'm wondering what you see as the shortcomings of that approach of having these very narrow slices of functionality that are optimized for a particular use case, and then you have to do the picking and choosing across the entire life cycle of the data?
[00:13:22] Unknown:
This is really great question because and specifically to ask me on this 1 because as I see it, this is our differentiation. Revolution not happening in 1 or 2 years. Revolution takes 10 years to complete. So this is why I see this as a differentiation because data stacks getting in, like, increasingly complex and harder to manage because the amount of the tooling that we are seeing today. And what is great for companies to specialize on a key pain point that can be solved with 1 niche stack, data stack, they often lack in a broader vision of the ecosystem and the required to get things done. So dealing with many tools, the way that I see it across dozens of or even hundreds of pipelines means that there is a higher chance, you know, to dealing with broken pipelines. In addition, thinking about the business models or the pricing models. So having to deal with multiple business models from the different tools makes it nearly impossible to predict the cost or technical monitoring.
Your data ecosystem as a whole, it's really hard to get there. What's more, and this is for me something really something that I think that people are not yet paying a lot of attention to it, but the security risk involving of 5, 6 tools that you are basically transferring credentials, the most secure credentials to get the access to the data in the organization between several SaaS services. So overall, from opening a ticket to monitor everything and get the best cost effective business, I think it's really hard for the organization. Not always they see it in the beginning of the revolution. Sometimes you buy a tool that's doing the the EL, and then you need to have another API that they don't support. So if you don't have the capability now, you need to buy another tool or you need to go to code.
So in our case, we believe that we want to give them what we are doing in the right way and just walks, but in the same time, giving them the possibility to grow. And I will give some example like we have in with DBT, for example, which is a really amazing solution that built a lot of the community around it. It's really amazing. So we see many cases that are using Riverview and they decide not to use our transformation engine, decide to choose, DBT, which is amazing. But, again, it's second tool in the stack. It's not like 5, 6, 7 tool for every feature I need to generate. I will try to be polite and won't say that it's more a marketing message, not necessarily what the daily basis. It's a lot of hypergrowth companies that, you know, got a lot of yeah. Which which is great. They did a very good job in terms of self-service.
But end of the day, this marketing message push the market to think that this is the reality. This is the only way to manage data. But think a bit different about that 1.
[00:16:32] Unknown:
Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it's often too late and the damage is done. DataFold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. DataFold shows how a change in SQL code affects your data, both on a statistical level and down to production.
No more shipping and praying. You can now know exactly what will change in your database. DataFold integrates with all major data warehouses as well as frameworks such as airflow and DBT and seamlessly plugs into CI workflows. Visitdataengineeringpodcast.com/datafold today to book a demo with DataFold. In terms of the functionality that Riveri provides, what do you see as the either specific instances or categories of tools that it will replace, and what are the components of a data platform that it aims to supplement?
[00:17:45] Unknown:
Our core capabilities is ingestion, transformation, and orchestration. So in the ingestion, we are competing and sometimes replace and sometimes competing head to head with tools like Fivetran. So in the ingestion, definitely, we are having a strong competition there. They were before us, so this is why, you know, they are more mature than us in terms of, the selling or the revenue. But in terms of the product, definitely, we are given a very good fight there. In terms of the transformation, we believe that people need to choose the way that they're running the transformation. So for me, you choose to work with our Python data frame, great. You need to you want to choose with slowpa UDF function, amazing. Do it. We will orchestrate.
We embedded this 1 in our solution. Same DBT, same Python, so and same SQL. So with Viree, we want to give all the capabilities to work with any tool. So I think in the transformation, the way that I see it, it's not replacing no 1. It's actually working together with them. In terms of orchestration, we see competition with tools like Airflow and, and, yeah, so I think that our engine is strong enough to complete this space. Those are the core capabilities. And from that point, what we are trying to do is to leverage each 1 of the component to be the best of breed. When it comes to ingestion. For example, we don't support only the predefined report. We have 1 80 plus native connectors exactly like 5th 1. We are I think that the overlap is around 85%, which is great. But in our case, when you need to have additional API or additional source that you want to get the data, you don't need now to go to code. You can go to code and write Python, but you can also use what we call our action or custom API function. It's more like Pacman capabilities that you can get data, but for mainly for the data data ingestion purposes.
We invested a lot in the CDC to compete on the hyperscale databases replication, whether if it's from cloud or from on prem that for companies that move to cloud. So this is this is the ecosystem that we see.
[00:19:58] Unknown:
As far as the actual Rivery platform itself, can you talk through some of the design and architecture of how you went about implementing it?
[00:20:06] Unknown:
The platform builds as a multitask runners that get metadata and know to handle these tasks by the definition of the metadata that we get in from the metadata, whether if it's for the transformation or for the ingestion. Everything is dynamic and built as dynamic task, like all built to cloud, of course. The platform runs on Kubernetes, so we have the ability to scale up, scale down automatically, and then really in the most efficient way. Maybe to good to focus on dream core engines that we built, that we are using as the foundation in our product. So first 1 is the CDC. I touched on this 1. We built our real time engine. From day 1, we built it in stages. So in the beginning, it was, like, started with multi tables replications, like the legacy players, like Talent, or, like, other tools to use to do, like select statement on top of the databases. But we did nice wizard that you can get all the tables right away.
But then we saw that people need to manage the incrementals and everything, so we moved into the second version, which was CDC. We built an engine that runs including some of the capabilities of the Debitium. We also, by the way, contribute to this great open source project. But recently, in the past few months, we actually took a step up because we saw that in the scale, there are some challenges in the Debitium that wasn't so good for us. So then what we did is that we built the core engine, the core CDC engine, and in order to reduce or eliminate all the limitations that we face when we build the solution on top of, Debitzium.
Second thing in the technology is, I think, differentiate a lot with Re is the multi tenancy. Everything in Reverie start from a single tenant, which is we call it in the platform environment. So every account in Reverie built by 1 to n environments or tenants. Each tenants include completely different content, different, like, separate computing engine behind the scene from users, variable, workflows, connections, metadata engine, everything runs on a single tenant. Why it's so good and why it's so strong, aside the fact that you can scale, it's actually support the data engineers and the architect when they want to build their solution to enterprise scales, means that they will governance, deployment.
This serves the engineers' use case from building their development, QA, production environments, or just building a sandbox from scratch and deploy the entire solution to a new branch or a new sandbox. We see that partners that, you know, the SI partners, they want to be to shift their business into fully managed or managed service, what they call. And for them, we see a lot of use cases of SI partners that took this solution and built a multi tenant or multi environment. So every account that they are managing the data workflows for them, it's actually a single tenant, but they have the full account with the templates that they now they can deploy their practices and the data models between their accounts.
But once completing the deployment, then they can make a tailored modification in the single tenant. This is really strong capabilities. The 3rd engine, it's about how we are managing the API and the ingestion. So to be honest, every data engineer will tell you that to build an API and get the data, it's not such a complex mission, to be honest. The biggest challenge is how to do it at scale, how to get the great performance, and of course, to have the built in validations to get the most qualified data sets. So 1 of the most important decisions, I think, that we made in the early days was to that we want to build everything in house and support A to Z, the ingestion, even though it sometimes seems hard because you are searching on a on a sales call and you support only 60, 70 percent of the APIs that the customer needs, we still insist to develop everything and not using other third party. People used to tell me, like, hey. Take see data drivers embedded inside of your solution. It will scale your business. I told them that, yeah. I agree. But eventually, I'm so happy because the fact that we didn't take this shortcut help us in 3 ways. 1, in the way that we are independent to support any required improvement when changing the APIs, and it's happening high like, pretty frequently.
We know to answer to address any change, and we are not depends on any other vendor. So we know that we can improve everything, performance, quality, by our engineers and not required, involvement of other vendors. 2nd is the security. The security is really important thing for us, and all the responsibility goes to our, like, you know, there is we are holding most critical credentials managed in our system. So we need to make sure that it's managed well, encrypted, and of our highest security standards. The third way is the bootstrap. As I mentioned, we were bootstrap for 2 years, over 2 years and a bit more. We didn't have a lot of money. So this is actually made our ability and enforced us to take our ability of development to the edge the way that we we can speed up the development. So the results we launched, we exposed what we call the actionriver, which is the custom connectors.
Still, you can build with this feature a custom API with no code because we use it internally and we just expose it to our audience. So if we don't have an API that we are using internally so over 60% of our customers using it and it's just, like, 1 and a half years on air means that hundreds of customers using it. And also the fact that we built a strong infrastructure help us to close the gap, like, from 0 to 1 90 plus connectors, like our main competitors in that space, which is Fivetran. And and the idea is that we will be able to scale more and more because the infrastructure that we build behind the scene of this API management is we believe that it's more mature and and, let's say, help us to to do it really fast.
[00:26:27] Unknown:
In terms of the overall design and goals of the platform, you mentioned that you went through a few different stages of development and iteration. And I'm wondering how the early ideas and assumptions about what you wanted to do with the platform, what customers wanted out of it, how well they've stood up, and how they have evolved in the process of going to market and working with your design partners?
[00:26:51] Unknown:
We designed the platform to deal with the simple problem of maintaining data pipelines. But when time move on and we started to see more more and more requirements from our users or including other core processes of the ELT, like, including the orchestration, transformations, and and in the transformation, we added the, like, the example that we added the Python. So I think that and the example that I gave you with the APIs or the CDC, so not every time that we are launching a feature is immediately we see results. But I think that we are trying to make it together with our customers because our customers we are customer obsessed, and our customers and our users are the most important thing for us. So what we are trying to do is that every simple MVP, we want to make it just works and doing with the design partners. We're always picking up, like, several customers to be kind of private view and design partners for every new features because we want to get the feedback right away. We don't want them to wait until we will develop the feature, the full capabilities, and and, you know, and then see that there's no value. I think we did it. We had the luck that we were a bootstrap, so that was in our DNA.
You know, if I'm thinking about 2018, 10 of our customers, like, 10 customers we had at that time, like Yellowhead, WalkMe, Dynamic Hills, even AB InBev, a huge company, or Minute Media, they based on their demand and their requirements, we modified the platform. And guess what? All of them are still great partners. Cannot call them customers even. It's really true partners for us. As a startup, we want to make sure that we are making adoptions and adapt to market changes and demand. So in the way that our product features comes with via our customers, testing the new product ideas is a key for us.
That way we can quickly react, drop sometimes drop a feature and improve it before going live with everyone. Again, the Python or how we develop the CDC or the environments, those things came from our audience, from the from the customers.
[00:29:04] Unknown:
Particularly given the fact that you were at least initially bootstrapped, I'm wondering what you used as your criteria for determining what the initial product scope needed to be to ensure that you could prove out the ideas that you had and how well they would work in real world situations while still being achievable in a reasonable amount of time?
[00:29:26] Unknown:
First of all, we are not bootstrapped anymore. You know, just to clarify, we funded now closing our b round, actually, a massive b round. So what we are trying to, as I mentioned before, maybe it sounds reputable, but it's taking a need, what we believe. We have our own road map. We came from this industry. We are 20 years doing this mission of building data management. So we have our own vision, but in a a known style. But in the same time, we are working with hundreds of customers. It's a luxury. It's really luxury. We have self-service practice or pay as you go. Just in the past since we launched it, like, 2 months over 40 accounts using these capabilities, and we see how the consumption growing with them and what we are investing a lot is to how to make their life and their onboarding much easier. So everything that we are doing, it's really closed for the usage.
This is our biggest advantage. And by the way, not only us. It's an advantage of any SaaS platform, like Snowflake, like BigQuery. The fact that you are a SaaS and you have all the logs and everything, you don't need to jump on a ticket and asking the customer to send you a log. You see everything, and you know how to be proactively to help them to get things done. So we are trying to be a data company to help our customers with the data that we have. Something that 1 customer failed in about a minute. We are going to see it in other customers so we can prevent this 1 upfront.
[00:30:58] Unknown:
Another interesting element of your business and your product is that being a managed service, it has a different path to adoption than what a lot of organizations, particularly in recent years, have been doing with the open source, bottom up, engineer led adoption strategy. And I'm curious how you've been thinking about your approach if you're aiming for bottom up adoption by getting in with engineers to solve a particular pain point or if you're looking to go more top down by working with senior leadership and engineering management to help to solve problems at more of the organizational level.
[00:31:35] Unknown:
I think that if I need to build a company again, maybe I will choose to build it in an open source and then jumping into the commercial side of things. But because we are through selling to the data engineer, we are selling actually to the individual that actually work with the platform. So the way that we solve it, since we are not an open source, but what we are doing is we are we launched 2 months ago or 3 months ago with a pay as you go program, which is, you know, you can start really small and think, I don't know, dozens of dollars a month if your consumption is pretty low. And we are going to take it another step further. Like, in the next 2 weeks, you are going to see our pricing model and how this pricing model works well with this 1. I won't give all the details about that 1, but this is actually a big improvement that we are doing as a product led growth company. It's a company that's selling its software to the data engineers and the individual.
And yeah. So several things there are like, this is this is what we want to do. About the open source and the community, we do believe, and we're already having these discussions inside of the company, that to take some of the capabilities that we have and make them as an open source. We believe on this 1. We, as I mentioned, we contribute to the Debezium at the time that we use this 1. We want to contribute to the community with some elements that we are having in the platform. We did it, but not in the way that we want to do it. And we we see see ourselves doing it, and it's going to grow this exposure in the coming months.
[00:33:09] Unknown:
Another interesting aspect of building a business in the data ecosystem at all is the rapid rate of change and evolution that it's been going under. And I'm curious how your focus on being an end to end option for data engineers and data teams watching the surrounding ecosystem go through its own shifts of kind of reinvention and self discovery has either validated or influenced the product direction that you're focused on?
[00:33:40] Unknown:
I will take it for 3 parts. 1 is data is product. We strongly believe on this 1 that, you know, moving the from data teams that, you know, we count them as the service oriented inside of the organization to actually a team that acting like a product team that creating an internal product, internal or external product that rely on data. The increase of support for the data stack development life cycle, for example, the multiple environment and the deployment, this is basically, we brought those those foundation and those capabilities into this approach. So we really believe on the data as product, and to get there, you need to have a scale deployment environment, management, like APIs, CLI, everything that you expect to see in the development life cycle.
The second thing is what we see a wider range of people and personas that involved in the data team. So what we see in the company is that there is no 1 size fit to all, and we need to to support different types of persona. So you can be an analyst or BI person that you can get everything via the GUI into, you know, via the UI and doing, like, low code. Low code, we call it as an SQL, but you can get APIs. You can run SQL and building tables and schema automatically by the product, which is great, or using, by the way, dbt in that case. But for the data engineers and data scientists, we want to empower them. Then this is why we came up with industry first standard, like Python as a standard in the pipeline.
No 1 doing it in the industry. Many folks that really expert in this space and when they saw what we build there, that in 1 workflow, you can get data from ingestion, running transformation, whatever preferred tool that you choose for transformation, then you have all the tables and the structure in the data warehouse, let's say Snowflake. But now in the same workflow, in the same pipeline, you get the dataset from Snowflake, send it into a data frame, running machine learning, and bring back from the data frame at the source back to the Snowflake, and from that point to the endpoint applications. So we believe that our responsibility in this market, especially once the enterprise will grow more and more, they need to have tools that they can scale commercial, education, and all capabilities.
Last thing, a lot but not least, is the end to end. Talk a lot about that, the end to end versus the multiple tools. So, again, the market pushed by a lot of messaging about for this, you need to use this tool, and for this mission, you need to do this 1. I'm sure that the right balance of breadth and the depth in our functionality, we believe that we we can fully handle the foundation, which is ingestion, transformation, and even some other use cases, more complex 1 that you need to have the Python for it. But we are not so fanatic to do everything, and we give you the possibility to use everything in WebRee.
But in the same time, you can get these enterprise needs, you know, enterprise capabilities like reliability, agility, scalability, and security. And so I think that we are trying to give you everything, but, again, freedom to connect everything. Since we have an API, since we are a CLI company, definitely, you can connect and integrate your other tools into this stack. But we believe that this is the stack that going to lead the industry for the next couple of years when it comes to data operation or data as product as as we spoke before.
[00:37:32] Unknown:
As far as the actual workflow and getting started with VIVORY and incorporating it into your day to day efforts of building and managing data systems, I'm wondering if you can just talk through the overall steps of saying, I need to be able to build this new data workflow that's ingesting from whether it's an API or a set of databases or multiple different sources, load it all into a set of tables in my data warehouse or my data lake, and then orchestrate the transformations for being able to provide a resulting database view for downstream consumption and just all of the sort of capabilities and touch points along the way for data engineers and any sort of collaboration capabilities for being able to manage that handoff from preparing the data to being consumed by downstream analysis?
[00:38:23] Unknown:
It's actually very easy getting into the product. Immediately, what you will see is, like, the ability to start building your pipeline within minutes. So, basically, you can choose 1 of the native connectors, or you choose to go what we call kits, which is data models templates built by people like you, like the data engineers or mainly by our partners, and that took specific use cases and repeatable use cases and put the redefined data models or the pre engineered data model. So you can use the the kits. You can use your connector to run transformation. If you're familiar with DBT, bring DBT in. If you're familiar with SQL, you you run the transformation.
And if you are running Python, so that's great. You don't need to know how to bring, a machine and building Spark machine and everything. You can just define, you know, a data frame, which is basically 1 definition, the name of the data frame, and that's it. Once you have this in the platform, you just need to define what is the computing engine that you want to have for this purpose. So the product itself is a self-service. I don't believe that data engineers are getting into the trials, and we see it actually as a fact that majority of them succeeded to build the first river within minutes. This is another KPI that we are measuring ourselves or our company. We have what we call sign up workflows.
We are measuring on it every day with Slack alerts, how long it took them to succeed to build their river, and what is their overall experience in that mindset. So it's definitely the our main focus as a PLG company.
[00:40:10] Unknown:
1 of the interesting features that you have is a catalog of kits for being able to manage that reusability and avoiding rework like you were mentioning at the beginning of the conversation and being able to use these to get common workflows set up quickly and be able to have sort of a a catalog of functionality for the organization. And I'm wondering if you can talk through how you manage regression and integration testing for those kits as the Rivery platform evolves or as the sort of goals and governance structures for an organization change once they've started to adopt and become more sophisticated with Rivory?
[00:40:50] Unknown:
First of all, we have a process for this 1. We have a dedicated team. It's the kids team inside of Rivory, and they are basically engineers that work in to build the kids, sometimes with customers, sometimes with partners. And then from that point, we are maintaining them. Like, even today, I saw in the Slack that we launched the intercom as a native connector, which is great. The second method was, okay. Now we can remove the kits that we built for our customers so far that used by Keith, that used by Action River, that custom connector. So now we can remove this 1 because we have everything in the native. So it's helped our kits include both the business logic and the technical setup that required for our workflows.
But beautiful thing that, first of all, everyone could build build the kits. You can build the kits internally, not necessarily should must go to our public library. So this can include the SQL script, the data models, the Python script, the other tools like the DBT package, the source and target connections definition, pipelines, workflows, everything. So the good things that that because you have all the capabilities, it's doable, and this is how we came to this 1. But on top of this 1, it's really important to show that while the business logic is custom written for each kit, the underlying functionality uses our standard out of the box functionalities.
Means that in this way, we don't need to update every kit or they don't need to update everyone when changing their underlying connections or their component when the component is getting updated. The most amazing things with kits is that, first, every user can build it. It can be kit internally that you can deploy in between your tenant. We encourage the community companies and individual to build their kids in the in the library mainly because we want to build a community, not mainly not not it's not a question. We are going to build a community around that 1. We are going to monetize that 1 so everyone, companies, and individual in the space can actually bring their know how into the kits. And from that point, they will get part of the revenue that we are seeing out of based on consumption. So if you think about that, this will be the community version of Freeberry.
[00:43:19] Unknown:
Are you struggling with broken pipelines, stale dashboards, missing data? If this resonates with you, you're not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the leading end to end data observability platform. Trusted by the teams at Fox, JetBlue, and PagerDuty, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, dbt models, airflow jobs, and business intelligence tools, reducing time to detection and resolution from weeks to just minutes.
Monte Carlo also gives you a holistic picture of data health with automatic end to end lineage from ingestion to the BI layer directly out of the box. Start trusting your data with Monte Carlo today. Go to data engineering podcast dotcom/monte Carlo to learn more. In terms of the ways that people have been using Rivory and applying it to their particular organizations and problems, what are the most interesting or innovative or unexpected ways that you've seen it used?
[00:44:26] Unknown:
Yeah. I will I will mention 3 3 cases. So 1 day, I got a call. I think it was 2018. We were a really small team. It was a Fortune 500 company. It's in our website, and everyone knows this company. And they were the biggest sponsor of the World Cup at that year. And they came to us, like, called me to my mobile. My mobile was in the in the website at that time. The guy was global director of marketing in this giants, and he called me and asked me, look, in about 1 month, we have the World Cup. We are the main sponsor. The CEO going to be there, 40 people. It's going to be a war room, 5 agencies. Okay. What do you do? Then he told me, I need to build a digital data lake. We tried to do it internally for many months, and we see that the high frequency of updates from the APIs dropping what we are building every time. And, actually, at that day, Facebook changed dramatically their APIs because of all the compliance and regulation at that weeks exactly.
So I told him, yeah. We will do our best. We help them to build really an amazing digital data lake. Within 1 month, we actually built 8 API set miles only to support their needs, and it was 1 of the most successful projects there. The second use case that also surprised me was, like, Maar, the case study also in our website. Maar Properties is real estate companies, but not only real estate, it's the experience of shopping and malls and the ecommerce in Dubai. So they came to us from from the Snowflake Partner Connect, if you're familiar with this 1. So we were 1 of the first 5 software SaaS platform there. So they came into delivery and they wanted to have some small use case to how the experience in their stores and malls, what is the engagement with their consumers.
So they started with this 1 and after 3 months, they called us or sent us an email, we need to do a security check. And, you know, I was afraid. The things that you are afraid the most is security. And I asked them we asked them in the email, yeah. No problem. But something happened. No. No. It's really amazing. But we are working with the product 3 months, and what we see is that we want to expand the use case to overall replace our legacy engines when it comes to database replication. They at that time, they move to Snowflake, and they build big migration from on prem databases to Snowflake and, of course, Salesforce integrated and everything. So those kind of things that you are not expecting as a Firefox company to work with these giants.
And this is, I think, was the reason that I believe that, look, something big going on in this space. You must be responsible for those customers. You need the money to grow. You need to develop this engine in a more efficient way, and this is why we decided to do the spin off. Lately, we see a lot of use cases. We have an edge fund here in the East Coast that they built a to z solution for their portfolio of companies, and they are using reverie for the ingestion and transformation. They are using DBT packages for the transformation.
And now they were 1 of our partners, and they added the machine learning piece with the Python. So now they have the full solution that they can basically empower their portfolio of companies. And then we were able to build, with maybe the most comprehensive pipelines that manage and orchestrate by by Riverbree, but, again, including other tool in the stack. If you have the favorite tool, bring it in and use it, especially if it's an open source.
[00:48:09] Unknown:
In your own experience of building and scaling this company and working with your customers, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:48:20] Unknown:
So every day you have a challenge because you want to be to be better. Every day you the time is running, the time is fly, and we are doing a lot of efforts to make Privery as the industry standard. And it's not easy because we came a bit late when it says when we check the timelines, but we encourage the community to check this out and check only if we are much better than the competitor just on this 1. Give us a shot. So I think that this is the majority of the focus because when you are doing right things, look, our churn rate is less than 2%.
It's really low. And you can imagine that this 2% were a few customers that were really low volume, and we actually never succeed to grow there. Sometimes we lost a champion, sometimes it's whatever, but we are working day and night to make our customers happy. We believe that with the efforts that we are doing in the PLG, the new pricing model that we are about to share in very few weeks, and and the entire boarding processes, we believe that the exposure of the company will be high, and we have a lot of credibility. We have a strong partnership with Snowflake, as I mentioned, PartnerConnect, embedded in PartnerConnect, Premier Partnerships, and also the validation process. 1 of the earliest companies that did the same with Google, we are doing now with the same with AWS and being a part of this marketplace.
So all of these efforts, I remember to mention the strong project, really ambitious project that we made with Databricks. They launched their Delta Lake as an SQL on top of Databricks. This is definitely they saw a value, and Webberie came out to us and told us, we want to put you in as the 1 of the early adopters of the Partner Connect. We were only 3 players there. We were 1 of the first 1. We are working day and night with first of all, for our customers and prospect, then working really, really hard to get the exposure out there. Yes. It's not easy because you're competing with giants, but if you're good, you have the confidence that you will get there sooner or later, but better sooner.
[00:50:32] Unknown:
And for people who are looking for a way to manage their full extract, load, transform orchestration, what are the cases where Rivery is the wrong choice and maybe they're better suited with a different fully vertically integrated solution or actually picking and choosing and using multiple different tools for each of those different stages?
[00:50:55] Unknown:
So as I mentioned, we want to be best of breed in each 1 of the capabilities, but the wrong choice if you want to do this mission but in non cloud, even more non SaaS. Sometimes we have in a prospect that coming in and say, wow. It's amazing. We like it, but we want to buy the software when we want to install the software of our service. No. It's not doable. I guess that Snowflake had the same challenge, you know, when moving from Teradata to Oracle. We're all owing a lot to Snowflake about the way that they insist, I guess, to keep their solution at the SaaS.
We are SaaS. We are not on prem. We are not a software company. We are SaaS. What we are doing day and night is experience of a SaaS. So if you are not ready to use a SaaS ELT, I think that's better not to call us, let's say. And because it would be a wrong wrong decision for them because we will insist to run as a SaaS with all of the secure required security, whether it's virtual private links or SSH, SSO, all everything that we have in place, we certified and compliance by everything, all HIPAA or SOC 2 or anything that you need. You want to see our architecture, how we are encrypting the data, and how we are managing our security. Excellent. We will help you to get there. It's all up there in our website and in our blogs and our content. And if you need to speak with engineer, we will speak with engineer, but we will never take the amazing solution that we built here at the SaaS into a mode that is going to be a software, because we won't be good at that 1. As you continue to
[00:52:36] Unknown:
build and grow and work with your customers, what are the things you have planned for the near to medium term of RoopFree?
[00:52:42] Unknown:
So I think I touched that 1 when we spoke about the kits. We want to continue to build the community, basically like minded people that passionate and passionate for data as we are. So we want them to build more kits, and it's not just building anything without that 1. It's basically to build a community, a collaboration between companies, between individuals. And for that 1, we're also going to basically finding the right monetization model in order to make sure that we will we will drive the growth of this 1. The second piece is improving the integration or working on integration with other, let's say, data catalog enterprise data catalog, companies like Colibra or like Alation.
When we are coming to enterprise, the first question is how we can get the metadata that exists in Riverview or the linear that exists in Riverview. How can I see it in my tool? So now they can do it with action or the rest API, custom API that we have or via our API or CLI engine, or to get the logs directly there. But the goal is to have 1 click integration with those tools. We are not going to replace enterprise data catalog. Data catalog is not only around the ETL piece. It's all over the place in data. So we want to help them to get the data that we have that existed in our repository and empower their users to see the how beautiful the lineage looks like when you're running in Rivery.
[00:54:17] Unknown:
Are there any other aspects of the Rivery platform or the overall space of building data products and offering them as a service that we didn't discuss yet that you'd like to cover before we close out the show?
[00:54:30] Unknown:
I believe that down the road, we will see more, again, part of the Data Eyes product, we will see more native integrations with machine learning tools. And because with tools like Rivery and and both Fivetran and other players that giving you the flexibility and the self-service experience, by definition, it's pushing the people to analyze and know better their data. And and I think that this trend will push for more machine learning and AI tools that I think that down the road, it will be native integrations between the back end tools like Riviere or like other ETL tools into
[00:55:10] Unknown:
their brain, which is the machine learning or collaboration tools that exist in out there. Alright. Well, for anybody who wants to get in touch with you and follow along with work that you're doing, I'll have you add your preferred contact information to the show notes. And as a final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[00:55:31] Unknown:
So I think that, you know, companies like, Thor, like, the other emerging technologies, we are trying to cover all the gaps that you see in the market. I don't think that there is now any specific unique gap. I don't want to say it didn't solve. I think that what we try to do with the Python is actually the first step, and I guess that other companies will follow us on this 1 and be able to, you know, bring the Python into the data pipelines and make everyone, like, walking on the same dataset. To build a fragmented solution, it's not the right way when you're speaking about single source of the truth or think thinking about building the right way to building the data warehouse or data processes in the organization. So I think that a lot of companies will follow us. I think that in the next 5 years, you won't see the same picture as you see now. You won't see so many tools. What you will see is consolidations.
You will see that even company like 51, I believe that won't looks the same. It will maybe buy other companies and, you know, reduce the number of players that you will see in this industry. And going back to the place that you have 1, 2, or 3 data stacks, I don't believe that every feature deserves to be, let's say, an industry.
[00:56:50] Unknown:
Alright. Well, thank you very much for taking the time today to join me and share the work that you've been doing at Rivery. It's definitely a very interesting platform, great product for people who are looking for a more unified flow to their data life cycles. So I appreciate all the time and energy that you and your team have put into that, and I hope you enjoy the rest of your day. Thank you so much, Tobias. Thank you. I really enjoyed this session. Thank you so much. Listening. Don't forget to check out our other show, podcast.init atpythonpodcast.com to learn about the Python language, its community, and the innovative ways it is being used.
And visit the site at data engineeringpod cast.com to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Sponsor Messages
Interview with Itamar Benjemmo: Introduction and Background
Building Rivery: Concept and Early Development
Target Personas and Use Cases for Rivery
Design and Architecture of Rivery
Core Capabilities: Ingestion, Transformation, Orchestration
Customer Feedback and Iterative Development
Adoption Strategy: Bottom-Up vs. Top-Down
Workflow and Collaboration in Rivery
Community and Kits: Building Reusable Data Models
Challenges and Lessons Learned
When Rivery is Not the Right Choice
Future Plans for Rivery
Closing Remarks and Final Thoughts