Summary
At the core of every data pipeline is an workflow manager (or several). Deploying, managing, and scaling that orchestration can consume a large fraction of a data team’s energy so it is important to pick something that provides the power and flexibility that you need. SaaSGlue is a managed service that lets you connect all of your systems, across clouds and physical infrastructure, and spanning all of your programming languages. In this episode Bart and Rich Wood explain how SaaSGlue is architected to allow for a high degree of flexibility in usage and deployment, their experience building a business with family, and how you can get started using it today. This is a fascinating platform with an endless set of use cases and a great team of people behind it.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- RudderStack’s smart customer data pipeline is warehouse-first. It builds your customer data warehouse and your identity graph on your data warehouse, with support for Snowflake, Google BigQuery, Amazon Redshift, and more. Their SDKs and plugins make event streaming easy, and their integrations with cloud applications like Salesforce and ZenDesk help you go beyond event streaming. With RudderStack you can use all of your customer data to answer more difficult questions and then send those insights to your whole customer data stack. Sign up free at dataengineeringpodcast.com/rudder today.
- We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to dataengineeringpodcast.com/census today to get a free 14-day trial.
- Your host is Tobias Macey and today I’m interviewing Rich and Bart Wood about SaasGlue, a SaaS-based integration, orchestration and automation platform that lets you fill the gaps in your existing automation infrastructure
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what SaasGlue is and the story behind it?
- I understand that you are building this company with your 3 brothers. What have been the pros and cons of working with your family on this project?
- What are the main use cases that you are focused on enabling?
- Who are your target users and how has that influenced the features and design of the platform?
- Orchestration, automation, and workflow management are all areas that have a range of active products and projects. How do you characterize SaaSGlue’s position in the overall ecosystem?
- What are some of the ways that you see it integrated into a data platform?
- What are the core elements and concepts of the SaaSGlue platform?
- How is the SaaSGlue platform architected?
- How have the goals and design of the platform changed or evolved since you first began working on it?
- What are some of the assumptions that you had at the beginning of the project which have been challenged or changed as you worked through building it?
- Can you talk through the workflow of someone building a task graph with SaaSGlue?
- How do you handle dependency management for custom code in the payloads for agent tasks?
- How does SaasGlue manage metadata propagation throughout the execution graph?
- How do you handle the myriad failure modes that you are likely to encounter? (e.g. agent failure, network partitions, individual task failures, etc.)
- What are some of the tools/platforms/architectural paradigms that you looked to for inspiration while designing and building SaaSGlue?
- What are the most interesting, innovative, or unexpected ways that you have seen SaasGlue used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on SaasGlue?
- When is SaaSGlue the wrong choice?
- What do you have planned for the future of SaaSGlue?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
- SaaSGlue
- Jenkins
- Cron
- Airflow
- Ansible
- Terraform
- DSL == Domain Specific Language
- Clojure
- Gradle
- Polymorphism
- Dagster
- Martin Kleppman
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. When you're ready to build your next pipeline and want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode. With our managed Kubernetes platform, it's now even easier to deploy and scale your workflows or try out the latest Helm charts from tools like Pulsar, Packaderm, and Dagster. With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform. Go to data engineering podcast.com/linode today. That's l I n o d e, and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.
We've all been asked to help with an ad hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, HubSpot, and many more. Go to data engineering podcast.com/census today to get a free 14 day trial. Your host is Tobias Macy. And today, I'm interviewing Rich and Bart Wood about SAS Glue, a SAS based integration, orchestration, and automation platform that lets you fill the gaps in your existing automation infrastructure. So, Rich, can you start by introducing yourself?
[00:01:48] Unknown:
Sure. Yeah. My name is Rich Wood. I've been developing software for too long, 20 plus years in the financial services industry and big data, mostly building distributed systems and now currently
[00:02:03] Unknown:
helping to found Saskatoon. And Bart, how about yourself? Basically the same thing, except oil and gas, financial, and health care. And going back to you, Rich, do you remember how you first got involved in the area of data management?
[00:02:14] Unknown:
So I started my career in the hedge fund industry building real time distributed trading systems, you know, dealing with really high volume, low latency stock market data. And in about 2, 008, when the hedge fund industry wasn't doing so well, I switched to the big data industry. So for about 10 years, I've been building distributed and fault tolerant scalable data pipelines.
[00:02:42] Unknown:
And, Bart, do you remember how you got involved in data management? Rich turned me onto it.
[00:02:47] Unknown:
So I have a lot more experience with graphics and front end and more distributed real time systems, but not as much data pipelines. So
[00:02:55] Unknown:
So you've both been working on the SAS glue platform, and you've cocreated it along with your 2 other brothers. And so before we get too much into SasGlue, I'm just curious if you can give some flavor of what inspired you to actually found a business with family and how well that's been going for you, particularly as 4 brothers?
[00:03:15] Unknown:
Yeah. So that's kind of an interesting story. So the brothers are Jack, Jay, Rich, and myself, Bart. And about 3 years ago, Jay and a friend had an idea for just a simple script runner that could be done remotely, And Rich got on it. And and once Rich got on it, I got on it, and then Jack came a little bit later. And we started off with something simple, and then Rich turned it into something that was much better design wise. And at first, I was a little bit worried that how well it would work with 4 brothers is like putting, you know, 4 cats in a pillow and shaking it. Right?
So this is gonna go good. But, actually, it's worked out really, really well because Rich has a different skill set than me, and same with Jay and Jack. And so we complement each other really well. And so maybe a miracle will happen, and we will be friends in 3 or 4 years still. So
[00:04:14] Unknown:
And we added another family member, actually. Oh, man. I forgot. Yeah. My daughter-in-law, Aubrey, who's also an MIT graduate, is our, our CFO. She does her accounting and taxes and all that kind of stuff.
[00:04:27] Unknown:
So you just went against all the conventional wisdom of don't mix business and family, and so far, it's actually been working out for you. Alright. So far, it's going really well. And that brings us to the actual SAS glue platform itself. So I'm wondering if you can just give a bit of a description about what it is that you're building there and some of the story behind it. You've dug into it a little bit, but maybe add a bit more color about sort of the motivation for it and sort of how it came to be. We were discussing it with 1 of our customers the other day,
[00:04:57] Unknown:
and he used a term that we had not used before that I think describes it really well. And he called it automation as a service. We probably don't wanna use the acronym for that 1. But, really, at its core, it's a remote script runner. So there's an API component. You can store your scripts, your code, your workflows with the API. And when you kick them off, the code is actually delivered to your compute through an agent, and the agent runs the code wherever it exists on whatever host machine it's running on. And kind of the impetus of it was, you know, I I've been working with distributed data, you know, systems for a long time, and I saw people reinventing the same stuff over and over and over again.
And 1 of the big things is I saw people torturing Jenkins to, you know, build distributed automated systems. And I said, if people are using Jenkins, which was never intended for that purpose to do automation, there's a big hole in that space. So that's really how it got started.
[00:06:11] Unknown:
In terms of the main use cases that you're focused on, you mentioned that some of the inspiration comes from the big data ecosystem. But from looking at it, the actual applications are much more generic than that. But I'm wondering where you're actually directing your main efforts in terms of the use cases that you're enabling and who you're targeting as your end users and just how all of that focus influences the feature set and design of the platform.
[00:06:38] Unknown:
Our main target has been data engineers, roles that Rich were was kind of in. So large data pipelines, complicated. But as we started showing it to people, and as we started using the product ourselves over several months, years, it's been an evolution, and we found that that it's much more open ended than that. And we found that we can easily tie together existing tools for build pipelines quite easily. As we've used the product ourselves, we actually deploy SassGlue with SassGlue itself. And we found that as we've had to do several tasks, we've often started doing it kind of with a different mindset of the old mindset of how we used to do things with our old companies.
And we would spend, like, an hour or 2 starting to go down that route, and then we'd say, hey, stupid. Guess what? SaaS clicked in here is that much easier. So even for us, it's been a mind shift. And so that's kind of a long rambling answer to say that our target audience, you know, who could actually use it is still kind of open ended, which makes it kind of an a hard product to describe and get out there. Our main target audience is data engineers, but the product is pretty open ended. It can do a lot of other things that we're even learning ourselves. Yeah. And just to follow-up on that, you know, a lot of the automation
[00:07:57] Unknown:
or some of the automation products that are out there right now were products that were built for a company, right, to meet the needs of that company in that environment. Whereas with SASKLU, we build it from the ground up as a general purpose automation tool. So it's really more of a platform, which does make it a bit on the 1 hand, it makes it really powerful. On the other hand, it makes it a little bit difficult to say, what is it? Some of the specific use cases that people are using it for, cloud based Cron. Right? So Cron is simple until it's not. Right? It's really easy to set up, but then you you start having to worry about dependencies between different cron jobs, failover, logging, notification, that kind of stuff. With SassGlue, you get all that kind of scaffolding for free, and you have a central place that you can manage all your cron jobs rather than, you know, having cron proliferation with, you know, cron jobs that have gone all over the network and nobody really knows where they are. Another use case that people have used it for is software build pipelines.
Obviously, there's a lot of software build tools out there. Sometimes you run into a wall with those tools, and SassGlue can sort of fill in those gaps. And, obviously, the big 1 is data pipelines.
[00:09:14] Unknown:
As far as the actual orchestration and automation and workflow management space that you are working within. That's something that's obviously been around for decades at this point. There are a huge number of players with different areas of focus and different industries or verticals that they're targeting or specific languages or tool chains that they're looking to integrate with. I'm wondering if you can just give your characterization of where SAS glue sits in that overall space and some of the ways that you're seeing it integrated into data platforms specifically?
[00:09:48] Unknown:
That is a challenge of this market. You know, it's new. It's fragmented. There are just tons of tools out there. At 1 point in the sort of the life cycle of this project, it was pretty far along at that point. We're trying to decide how do we explain this to people? What is it that it does? And Bar came up with this concept that really what it is is glue. It's so flexible and so good at integrating with other software applications and services that it's a great way to sort of glue those things together. So many of the trends right now are low code, no code. Right? You know, there's this concept of leaky abstractions, right, where, you know, you hit a wall with that solution and you need to go to a lower level of abstraction.
And SassGlue can sort of tie those things together, and so Bar came up with the name, hey. It's SaaS based. It's Glue. SassGlue. Another way you can think of it is like a backbone for a service based architecture where you only need to write the code or supply the application and then allocate the compute for each service. And then SAS glue is sort of the backbone that ties them all together. And, you know, why is it glue? What makes it so flexible? And we'll talk about the architecture a little more further on, but there are no firewall exceptions required. Right? With a server based automation architecture, you're almost always going through firewalls if you wanna automate something outside of the local network where the server exists.
It's completely unopinionated when it comes to to programming language. So you can use any programming language with software. So that makes it very, very easy to integrate with other
[00:11:35] Unknown:
tools. Yeah. The glue concept is important. We don't really see anything that does this out there, and there's a lot of fantastic tools out there for build pipelines and data engineering like Airflow or Ansible or Terraform or stuff like that. This isn't meant to replace those. It's meant to augment them and be able to create a central place for monitoring them and kicking them off and and hooking them up together at runtime. We've looked around a lot, and we don't really see a competitor
[00:12:01] Unknown:
because we didn't see anything else that does that. It's it's sort of the Python of the cloud where it's not the best at anything, but it's the second best at everything.
[00:12:11] Unknown:
You are correct.
[00:12:12] Unknown:
Excellent. Yeah. Excellent. We hope to change that going forward, but that's probably pretty accurate.
[00:12:20] Unknown:
And as far as the core elements and concepts, you've mentioned a little bit about the agents and the fact that there's, you know, the backbone layer that helps with the orchestration aspect. But I'm wondering if you can just talk through some of the main concepts that you've built the SAS glue platform on top of and things that customers need to be thinking about as they're deciding how they want to deploy Sasklu and integrate it into their systems?
[00:12:44] Unknown:
Yeah. So, I mean, Rich mentioned when you hit a wall with your tool that has a DSL. Right? So a lot of tools have, like, you know, configuration by DSL, whether it's XML or or something like that. Where if you run it up against that wall, and you're like, I need to to do something else that's more dynamic, SAS glue is very imperative. Right? Our main important point was design philosophy, really. Because we said, let's do the parts that nobody cares about and they're not getting paid to do, which is the distribution, the central monitoring, having a record of what you ran yesterday, seeing what happened with it, you know, a central place for Kron, security Notifications.
Notifications. Yeah. And so we thought when a data engineer or a build engineer or a developer are getting paid to do something, they're not getting paid to do these things. You know, the companies that are paying them don't really care about this. They just kind of expect it to happen. But a lot of people out there, they have to kinda reinvent that part. And so 1 of the most important design philosophies for us was to say, let's do the boring stuff that people don't wanna do, and let's cut it off there and not take it too far where we're starting to get into, you know, the guts of what people are trying to do in the business logic. Right? And so we've tried to keep it simple in the design. As far as architecture, you are correct. The API on the back end is, you know, the core of the product as well as the agent, which took a long time to develop, to make sure that it was really robust, those things working in tandem with each other. And then we use technologies like message queuing and, you know, databases like that that are pretty standard and and standard security things like OAuth. So Yeah. And just 1 thing to follow-up on that as far as, you know, what users
[00:14:33] Unknown:
would expect as far as, you know, prepping to use it and that kind of stuff, you can literally be, you know, running your first script in SassGlen in 5 minutes. There's no server to set up. There's no plugins. You, you know, sign up on the web console, you download the agent, you get your security tokens, put it in a config file, run the agent, it connects up. You can see the agent, you know, heart beating in the web console, and now you can run scripts on it. There's no firewall exceptions because there's no incoming connection to the agent.
The agent connects over HTTPS to the API and a message queue. So that's how code and instructions are delivered to the agent. So there's no firewall configuration.
[00:15:20] Unknown:
And in terms of the evolution of the project, I'm wondering what were some of the initial ideas or assumptions that you had going into it and some of the ways that those have been challenged or invalidated as you hit reality and started working with end users and some of the ways that that has shifted the design and architectural concepts that you've ended up with?
[00:15:42] Unknown:
Yeah. That's a good question. As far as, like, the overall goal, honestly, we're software developers. Right? And we always look for things that make our lives easier. Right? And we're always skeptical when somebody claims to be able to make our lives easier. Right? We hear those claims all the time, and we're right to be skeptical about that. But that was really the goal with this thing is we wanted to make something that would make life easier. You know, people start out with a simple thing, and then they end up having to add, you know, all the scaffolding that Bart talked about earlier. We said, hey. What if they could just focus on what on the value that they're getting paid to add, right, rather than all this other stuff. So that was the goal, and that hasn't changed. And that's built into the DNA of the project because it's a usage based payment model. If people don't like it, people don't use it, you know, it's of no benefit to us either.
So as far as, like, design changes, when we got, like, our first beta user, I think 1 of our first beta users was running it Linux box, but running it from a remote location. Right? And we hadn't considered that. Right? We figured, oh, they're gonna put the agent in a folder and run it from that folder. They put it in a central folder and ran it on a bunch of machines from that folder, and that didn't work. Right? So, you know, there have been things like that where, you know, we're like, wow. We never thought somebody would use it in that way, and then we had to, you know, adjust the product to, you know, handle those scenarios.
But that's not really a design change. That's more of like a almost like a bug fix. As far as design changes, at first, I wrote the agent to talk to the API through a message queue, and then we changed that to talk to the API directly, you know, using API endpoints. There were some security advantages to that. The database structure, we started out with deeply nested database structures, and we quickly found that it was much better to use shallow structures. So that was kind of the big design change. The testing framework has evolved a lot over the course of the project. But, you know, overall, this is the culmination of of years of experience doing this kind of stuff. So there haven't been any, like, you know,
[00:18:00] Unknown:
earth shattering design changes. It's been a fun project to work on because this is the first project I've ever worked on where I am the target user. Somewhat, Rich, is and we show it to our friends. And we say, what do you don't like? What would make your life easier? That's the main goal is to make people's lives easier and developers happy, and we've stuck to that pretty closely.
[00:18:20] Unknown:
As far as the actual workflow of building a task graph with SAS glue. You mentioned that the initial setup is you sign up, you get the agent, you get the security token, and you're off to the races. But as you're actually starting to build out more complex systems or building out a directed acyclic graph where you might have branches and convergence points, How do you think about the design of that workflow and what's involved in actually developing on and iterating on the overall structure of a, you know, end to end workflow.
[00:18:54] Unknown:
So as we've kind of, you know, alluded to before, our goal here was to make it so that the only hard thing that you need to do with SassBlue is your code. Right? We wanted to make the rest of it really easy. You know, designing DAGs with SassBlue is it's super easy. It's clicking and, you know, typing names and stuff. So we have a web console with a workflow designer that's job centric. Right? So you have a job that consists of multiple tasks, and you, you know, create a task by clicking a button, giving it a name. Now it shows up in the visual workflow designer. Each task is a box in the designer.
We wanted to make it mobile friendly so we don't attempt to, like, redraw the thing, you know, which some tools do, doesn't seem to ever end very well. But we do show, like, the hierarchy. So you can click on a button on an individual task, and it'll show you all the up upstream dependencies. You can have 0 to many dependency relationships between all the tasks in a DAG. Obviously, we do check to make sure there are no cyclical dependencies. And 1 of the really cool features is some really cool conditional path routing.
So your script can output a route code, and you can use those route codes within the graph to say, hey. Task 5 should only execute if task 4 finishes with route code.
[00:20:21] Unknown:
Go to, you know, step 4 or 5 or something like that. And and it could be based on regular expressions as well. And 1 other thing I wanna say real quick is that you don't have to actually use the visual designer. As people become kind of power users of this thing, it's meant to be programmed with itself, and everything that you can do in the web console can be done through an API for our users. And, in fact, a lot of the demos that we've created so far, the more useful ones end up using the SAS glue API internally to dynamically generate jobs. So if you wanted to, like, dynamically generate an airflow workflow in SAS Glue and kick it off, you could do that pretty easily.
And so anything that we describe in the UI, you can do in the API. So
[00:21:02] Unknown:
Yeah. In fact, you can design an entire complex job with task dependencies and embed the scripts and everything right in a single JSON document. Upload that to SassGlue through the API, and it'll just run the whole thing. So you can create jobs dynamically from code, upload it to the API, and it'll just run it. As far as the actual
[00:21:25] Unknown:
jobs themselves, so there's the agent that actually executes things. And as a result, because it's just executing a binary on the system, you're able to run it with any different programming language or mix and match languages. And I'm curious how you handle things like dependencies for a Python script that somebody specifies in the web API or in the web UI or any sort of packaging concerns that people should be thinking about as they're deploying these different tasks to the boxes that the agents are running on and just where you see your responsibilities in terms of being able to execute these scripts and where the responsibility of the developer lies of deploying their code, you know, handling versioning of the job specifications and just the, you know, build test release cycle for working with SassGlue.
[00:22:11] Unknown:
So for dependencies, if you're talking about dependencies that are, like, files, like zip files or maybe JAR files or something like that, We do have to place in SassGlue under the concept of artifacts so you can upload and then reuse them. But if you're talking about, like, oh, I'm going to run a Python script to make sure that I ran a PIP install or some or Node. Js you wanted NPM or JAVI, maybe you wanted Maven or Gradle or something like that. That 1 cool thing is is that the scripts themselves can do that. And jobs are composed of 0 to 100, possibly, of tasks and steps and scripts, and the scripts themselves can do that on the agents that they're running on. And so when when you get it down to the task level, which is running on an agent, you could have a job that's orchestrating hundreds of agents in the system.
Those agents themselves are able to use whatever dependency management system that you're using for that job. If you're running a closure job, you know, using Java closure, you could run a Gradle install as 1 of your steps before running your actual code just to make sure that your dependencies exist on that particular area.
[00:23:18] Unknown:
Just to put a fine point on that, a task can consist of multiple steps that all run-in the same runtime environment. So your first step could be installing your dependencies with a shell script. The next step could be a Node. Js script that, you know, now has those required dependencies there. But for the most part, like you said, we're not doing this for the developers. Right? It's up to them to make sure that they set up their environment before trying to run something.
[00:23:48] Unknown:
Are there any useful patterns that you've seen come out from your end users of how they handle things like dependencies and deployment or testing of the tasks and how they work in the interdependencies across the graph?
[00:24:02] Unknown:
A common pattern all over the place is that the first steps in that task are to install dependencies. And so that's almost always how they're written. So if you look at any step that's doing anything kind of interesting, what'll what'll happen is is that you'll have some reusable scripts that you just keep reusing all over the place to make sure that your dependencies exist on those machines.
[00:24:23] Unknown:
There's a lot of code reuse mechanisms in SassGlue. You can inject a script from your script library in another script using SassGlue specific syntax. It'll basically inject the script into the other script at runtime, and we'll talk about metadata a little bit further on. But yeah. Which is an interesting way of implementing polymorphism. Right? You can have, you know, 3 different scripts for different methods of compression, and that all have the same method name, right, for compressing and uncompressing. And so you can decide, you know, design time or even run time which method you wanna use, and the calling script is just calling, you know, the unzip method.
[00:25:12] Unknown:
Yeah. I was gonna ask about the sort of reuse patterns and in particular, the possibility of having sort of a script market That's the
[00:25:30] Unknown:
That's the biggest next step that we wanna do. So we spend a lot of time trying to make the product really robust and work. But our next huge goal is something that we call Sassipes, which are SassGlue recipes. We wanna build a community. Actually, that's the work that I was working on a few weeks ago up until now is the ability to import and export scripts easily between authors. We are trying to do more meetups. And when we do the meetups, we create SaaS APIs that participants can automatically download with a click of a button, you know, import it into their own environment. Because when you sit down and you start using SassGlue, honestly, you're gonna need to go through some kind of tutorial to use it correctly.
Right? It's not as easy as, like, logging on to Gmail for the first time. Right? It does take some work, and the community with, you know, good medium articles, This is what we're focused on right now, is to build that up and to make it very easy for people to share with each other or for us to share with people on Medium blogs, different ways of doing things. You wanna spin up Terraform. Rich has created a bunch of really good scripts for data pipelines and how to set up environments dynamically. He's got a really cool Twitter stream data engineering pipeline demo.
We've got a bunch left, but what we would love to see is SassGlue as a delivery mechanism and a way for people to be able to share scripts with each other on how to do difficult things. Yeah. And eventually,
[00:26:59] Unknown:
you know, we'll add GitHub integration to that so you can, you know, export to GitHub, import from GitHub, sync with GitHub, and then, you know, at some point, you know, ratings. Right? So you can look at SaaS APIs either that we create and other people create and and see descriptions and ratings and say, okay. Yeah. That would be really useful for me. You know? There's some job that, you know, deploys a whole Kubernetes environment with EKS and AWS, You know, rather than spending, you know, days or weeks, you know, reading through documentation to figure out how to do that, You import the SassGlue job and you run it. And this is where SassGlue will be able to compete with these no code, low code solutions, except the solutions are not gonna be developed and dictated by, you know, single companies. They're gonna be developed by the community and rated by the community.
[00:27:54] Unknown:
As far as the actual existing integrations, I know that you have built in capacity for being able to do things like deploy to AWS Lambda. And you mentioned wanting to be to have prebuilt recipes for work integrating with setting up an EKS cluster. I'm wondering what were some of the initial systems that you wanted to have deep integrations with and how you thought about prioritizing those as you were building out the SAS glue platform and working with the initial set of customers.
[00:28:22] Unknown:
I'm not sure if we really talked about this too much yet detail, but we wanted to make it easier for people to run scripts that don't need agents. So in sasquil, you can run a job with tasks that run-in AWS Lambda, and you don't have to have a Lambda account. We'll run it in Lambda for you and pipe the output from CloudFront back into SAS Glue. So you you can actually run tasks in Lambda immediately and see the outputs in real time and have a log of it in in SassGlue. We're thinking about doing that for GCP and Azure compute. It's it's not that hard for us to do. As far as deep integrations with other products, what we really wanna do is create SaaS APIs for them. But the cool thing is when we create SaaS APIs of how to run Terraform or Ansible, the Ansible's a really good complementary product with Sasqo as well. We don't actually change the product at all. And so all we're doing is is just showing how to hook it up with it. And so the only integrations that will have that SASKU will know about that we had to program for, we already have AWS Lambda hooked up. We will hook up GCP and Azure compute as people want it. It's not that hard for us to do. But as far as deep integrations with other products, we're not gonna do that. We'll create recipes and articles and and reasonable scripts to show how to do it, but it's not part of Saskatoon itself.
[00:29:42] Unknown:
Yeah. In fact, Bart has really added a lot of discipline to this project. And, you know, there were times when I I thought of a solution that would be customer specific, right, or specific to some product or something like that. And Bart would say, no. We're not doing that. We're not doing that. This is a platform.
[00:30:02] Unknown:
Right? It's because I'm lazy.
[00:30:05] Unknown:
Well, those are the part of the virtues of programmers, arrogance, laziness, and hubris. Right?
[00:30:12] Unknown:
We got
[00:30:13] Unknown:
oh, we got them all. We've touched on metadata a little bit, and we've touched on some of the aspects of being able to propagate information between tasks. And I'm wondering if you can talk through how you're actually implementing the information sharing across different stages of a task graph and particularly for data workflows, how you handle you know, handing off the different data artifacts between those different stages.
[00:30:40] Unknown:
This is a key feature of SassGlue, and it enables some really, really powerful code reuse patterns. Imagine, you know, if you've got an organization that has grown by acquisition, which so many have, and you've got, you know, little pods of code in different languages, you know, all over the place and you're integrating those things and across networks. You probably have all sorts of applications and, you know, code written in different languages for, you know, authenticating with an API and getting a JWT token. Right? You know, we've built tons of Sass Glue stuff for running Sass Glue. Right? So we've been our biggest user, you know, for over a year now. We have 1 piece of code to connect to an API and get a JWT token. It's a Python script. So in any of our SassGlue jobs that are interacting with an API requiring a JWT token, the first task in the job runs a Python script, which goes out to the API, authenticates, gets a JWT token, and then outputs that as a runtime variable. And this is kind of what we call the metadata in SassBlue runtime variables.
So what happens is that variable goes back to the API, and it's stored with the job as metadata.
[00:31:58] Unknown:
As part of that executing jobs context.
[00:32:00] Unknown:
Yes. Exactly. Thanks, Mark. So future tasks can then leverage that JWT token that was generated in the first task. So now you can leverage that JWT token in JavaScript, you know, in a c sharp application, right, in any other code language, in any other task running in any environment. So that's what I'm talking about as far as a powerful code reuse pattern. So how does that work? So metadata can be defined either at design time or at run time. So you can create team level variables or job level variables with the Sassuite console, but you can also write your scripts to send a specific formatted string to standard out. In the background, the agent is tailing that standard out and looking for those patterns, and that string is gonna include a variable name and value.
And it's gonna extract those key value pairs, that metadata, and upload it to the API. So now that state is gonna be available to other tasks in the job, including the current task that's running. So if the task fails and you restart it, it's gonna have the metadata that it had previously saved. So, you know, as far as, like, saving state to be able to restart a process that failed where it left off, You can use metadata to do that as well. The way metadata is consumed in a script is you use a SassClue specific syntax as a placeholder for a runtime variable.
Immediately before the script is executed, the agent will inject the current value of that runtime variable into the script before execution, which means that you can utilize this metadata, these runtime variables in scripts written in any language.
[00:33:52] Unknown:
Yeah. And what's cool is that so you're obviously not gonna serialize a gigabyte of data. Don't do that. It won't work. Right? As a SAS clue variable in your standard out. Right? But you can, you know, do, you know, 500 k, something like that, upper bound. You can do fairly large strings. And so what you can do is you can actually serialize JSON structures. And so let's say that you're running a job that is spitting out, you know, a 150 pieces of metadata of where you generated data in maybe s 3 or something like that or in your on prem. And, normally, you'll probably have links to that data of where you generated something. Right? And you can serialize out, like, a large payload of, you know, hundreds of fields with links to them that will be available to downstream tasks running in the same job context. So if you have a job that takes, like, a day or 2 or a week or something like that, and you're starting to, like, generate all of this, you know, data of where you've created it. You can do that with this mechanism. We're not gonna be making it easier to transfer terabytes of data across networks magically, but we do allow the metadata to be passed around quite easily.
[00:35:02] Unknown:
Touching on the thought that some of these jobs might last for weeks or so, I'm wondering how you handle things like the various failure modes that you're definitely going to run into with network partitions or failed agents or, you know, machines completely crashing that the agent is running on or, you know, maybe a intermittent API outages on the backplane where the agent is trying to integrate with that and just some of the ways that you have iterated on and worked through those different failure modes and the ways that you're building resiliency into the overall end to end platform.
[00:35:36] Unknown:
Yeah. You know, I said that metadata was a key feature. This is probably the key feature. The SassGlue is the culmination of a lot of years of experience of dealing with all these different scenarios. So, you know, when a distributed event driven system like this, of course, there are a huge number of things that can go wrong, particularly in the agent environment that we have no control over whatsoever. And I'd classify these things into, you know, things that can go wrong, things you can control, and things you can't. And we can control the code. We can control the hosting environment of the API, and we've got, you know, extensive unit tests and great logging to help us do root cause analysis when something goes wrong. We know when something goes wrong. But talking about the things we can't control, the agent getting disconnected is a great example.
And it's actually been an enormous amount of work just creating an environment so that we can simulate all of the things that can go wrong, and we've done that using Docker. Right? There are some great Docker tools out there for simulating, you know, a machine gets disconnected, a machine crashes. So we have unit tests that actually run through those simulations. And, you know, we run the agent on our laptop and you shut down your laptop, you know, how many times a day and, you know, so we we've tested this out in sort of like the harshest conditions and, you know, at this point, the agent is extremely reliable at just waiting patiently till the network comes up, reconnecting. So in production, the agent loses network connectivity.
It's gonna continue doing whatever it was doing. If it's running scripts that don't require network connectivity, they're gonna continue running just fine. When the network comes back, it's gonna reconnect to the API and the message queue. Tell the API everything that happened when it was offline,
[00:37:33] Unknown:
and then continue accepting new jobs. And we've worked hard on that if you have a job that takes, like, a day to run, it's on, like, an 800 node spark cluster or something like that, and you have partial failure. Rich has done a lot of awesome work where you you don't have to re execute the whole thing from scratch, that you can fix the problem and continue. So that was a huge key feature that if you have failures in the middle of a large pipe pipeline, that you can fix the issues and continue without having to start over again
[00:38:05] Unknown:
if you've designed it well. But with sasco, that is possible if you if things have been set up properly. Yeah. I was just gonna say it's not magic. Right? Your code has to in a spark job, obviously, you can't restart a spark job in the middle. But Right. If, for example, you're, you know, processing a huge file with, you know, millions of records incrementally, Right? And you've got a process 50, 000, 000 records and it crashes on 40, 000, 000. It'd be nice to know that you've got up to record 40, 000, 000 when it crashed, and that you could restart it on record 40, 000, 001 rather than going back to the beginning. Yeah. And you do have to
[00:38:44] Unknown:
program your script like that,
[00:38:46] Unknown:
but it's completely possible to do that. RudderStack's smart customer data pipeline is warehouse first. It builds your customer data warehouse and your identity graph on your data warehouse with support for Snowflake, Google BigQuery, Amazon Redshift, and more. Their SDKs and plugins make event streaming easy, and their integrations with cloud applications like Salesforce and Zendesk help you go beyond event streaming. With RudderStack, you can use all of your customer data to answer more difficult questions and send those insights to your whole customer data stack. Sign up for free at dataengineeringpodcast.com/rudder today.
And 1 of the other things that's interesting to think about in this overall workflow management space and particularly because you have it architected where the agent is responsible for contacting the backplane versus the other way around is how you handle things like latency sensitive workloads where you want to be able to process things end to end within a certain time frame or if that's something that you are just completely punting on because of the way that you've architected the system.
[00:39:49] Unknown:
Yeah. So for a lot of jobs, we hadn't really thought about that too much, like a real time system, which Rich has done some of those. I have to. As far as how quickly things will execute when you kick off a job, usually, it's in the order of maybe upper bound of 200, 300 milliseconds, but that's just given the network speeds across, you know, America, right, or the world. We don't really have any mechanism to make sure it's that fast right now. And so we do run-in the AWS Northern Virginia. So if you have a colo in there and that's where your stuff is, it's gonna run pretty fast. If you're running your stuff in Siberia, that'll probably be slower. So
[00:40:32] Unknown:
But it is event driven, So it's basically gonna run as fast. I mean, you know, and each part of the system on our side is scalable. Right? So it's basically gonna run as fast as you have capacity. 1 thing, though, to point out with, like, time sensitive sensitive sorts of things, there are a lot of really, really powerful workflow management features in SassLoop. So with a job, you can define a misfire grace time. Right? So if this job doesn't start within, you know, x number of seconds from the scheduled time, don't run it. Right? If maybe you had a few, you know, failures, right, and you've got several jobs queued up, there's a coalesce feature where you can say just run 1. Like, you don't wanna run 5 database backups because it failed the previous 4 times. Right?
You can say, I only want 1 instance of this job to run at a time. You can also say a 100. Right? So you can control the parallelism of jobs. And so that's useful if, for example, you have a data pipeline where the data has to be processed in order. Right? You're having files coming into a bucket. They have to be processed in order. You can say, I only want 1 instance of this job to run at a time. And if an instance fails, then pause further job creation and send yourself a notification. So you get a notification, it failed. You go in, you fix it, then you restart it, and then it begins processing the jobs in order.
So there there are a lot of workflow management features that help with those kinds of things.
[00:42:10] Unknown:
I wouldn't build a real time trading system on this.
[00:42:14] Unknown:
No. It's not for that. Me neither.
[00:42:19] Unknown:
And 1 of the other things that's worth digging into is with those different task dependencies, obviously, you can say this job will only execute after this other job. But particularly when you get into the space of reusing job components or different individual tasks and recombining them into different workflows, how do you think about things like the dependencies of the shape of the data or the specifics of what 1 node is going to output and the inputs that are required for the other node and just being able to specify any of those data informed dependencies between those different stages?
[00:42:54] Unknown:
We don't really and I think you're referring to Daxter's capabilities.
[00:42:59] Unknown:
Is that right? Yeah. So something along the lines of being able to say that this step is going to output something of this type and then being able to say this other step requires something of that same type and basically bringing kind of programmatic concepts of types to the workflow level, not necessarily in the exact same sense, but saying, you know, this is going to output a file, and this requires a file, or this is going to output a string, and this requires a string.
[00:43:24] Unknown:
What you can do is when you run a script, it can output and standard out the results of that script with an SGO, a SAS glue outstring. And depending on what the script created, it can be very dynamic in what it spits out there. And the task dependencies are can be based on the regular expression pattern matching. Okay? So, you know, task a could spit out something dynamically that could then be fed into task b, c, or d depending on the data type, depending on that regular expression pattern matching. Cool? So you can do it very dynamically.
You're not gonna get pre static analysis right now. But you brought up Daxter when we talked to you a while ago. We're looking at that, and we would love to maybe incorporate it somehow and leverage that. We don't have any static as far as I know, Rich. Correct me if I'm wrong.
[00:44:17] Unknown:
No. No. And DAXTER is a little bit of a different model. If I understand correctly, it's closer to, like, a spark model where you have a bunch of nodes that are interconnected sort of working together to, you know, perform some data pipeline type job. So if that sort of thing is really critical for a customer, you know, I'd say use Daxter for it and just kick it off with Sassflow.
[00:44:39] Unknown:
Right? Schedule it with SAS code. More of a dynamic thing. So at runtime, you can make runtime decisions on what to run based off of, outputs of tasks in a in a job execution. But you won't get static job analysis.
[00:44:53] Unknown:
You've mentioned that this is sort of the culmination of your career's worth of experience. But in the process of designing and building the system, I'm curious if there are any specific points of inspiration that you've looked to for both positive and negative inspiration for how to approach building this system?
[00:45:11] Unknown:
For me, I started out realizing how many times I fought against frameworks and how they're really awesome until they're not. And the best frameworks that I've ever used know their place, and they don't go too far and too deep into things. I've had a design philosophy of simplicity in trying to make things as powerful as possible, but not to get into people's kitchen.
[00:45:33] Unknown:
I was at a conference, I don't know, 10, 15 years ago where the guy talked about service based architectures. I can't remember his name right now. I'm very sorry to him about that. But that was really impactful to me, you know, where you can have these decoupled services that sort of work together. Martin Kleppman has written some stuff about online event processing, which we've incorporated some of those concepts. 1 sort of pivotal thing for me in my career, which actually led to a lot of these concepts with this, you know, real time distributed stock, you know, trading systems that I developed.
In order to get the kind of performance that I need in, I developed a TCPIP based communication layer, right, for distributing data. Actually, side note, the biggest bottleneck is serializing, deserializing data, and I was able to structure a library to minimize that cost. So when I moved over to big data and I was building distributed data systems, I started out using that same model. But I found that when you have brittle connections like that, there's a huge amount of work that goes into making sure that those connections stay up. And if you ever have to go through a firewall, then it becomes months of getting IT involved.
So Bart actually had done some stuff with queuing, right, with message queues. And, you know, we started talking about it, and I said, you know, I'm gonna try that. And I rearchitected my data pipelines using an event driven architecture with message queues, and it made the system so much more stable. Right now, if 1 component goes down, you don't have to worry about a domino effect where all the other connections are messed up. Right? You don't have to worry so much about those connections. The individual components can sort of work independently.
[00:47:31] Unknown:
And as you have been working with the platform and both using it to build the platform, but also as you've been working with other customers, I'm curious, what are some of the most interesting or innovative or unexpected ways that you've seen the platform used?
[00:47:45] Unknown:
We never even considered this use case, but, our first customers use it as a patch management tool. So they'll have hundreds of computers that they have to patch, and they use Saskatoon for that. We never considered that. Somebody was talking to us about using it for IoT devices. 1 person was talking about using it for control software in factories, and we're like, really? And so it's kinda neat to see that it is a lot more open ended than we had considered.
[00:48:12] Unknown:
Yeah. The 1 interesting use case is you can deploy code as a service, like a Windows service. Right? So this is code that's supposed to run all the time. Could be like connecting to a message queue and, you know, implementing some logic whenever an event is raised. In SassGlue, that's as easy as flagging the task as an auto restart task. So if the host machine running it goes down, it will automatically restart the task on another available qualified agent. So this is really interesting in ephemeral compute environments, right, where you have this service, you want it to be running all the time, but it's running on a femoral compute that could go away. You can even use Sass Glue to scale these up and down. I want 5 of these services running or 10 or so that's an interesting, use case. Oh, yeah. And we forgot to mention that if you baked into sasco2 when you when you spin up EC 2 instances or ephemeral compute
[00:49:07] Unknown:
baked into that setup is that you can tear it down once it it hasn't been active for a while. So you won't get dinged with an Amazon charge a month later when you forgot you spun up, you know, a 100 machines or something as part of the job.
[00:49:20] Unknown:
As you've been building the system and using it, I'm curious, what are some of the most interesting or unexpected or challenging lessons that you've both learned in the process?
[00:49:29] Unknown:
It's taken far longer than we thought.
[00:49:33] Unknown:
It always does.
[00:49:35] Unknown:
We we thought it's after a month, we can get this thing going. No.
[00:49:41] Unknown:
Yeah. We completely underestimated how hard it would be to design, and the design phase took a long time of simplifying and reiterating things to to get it down to its simplest form, and the development was much more difficult than we had considered. So Ditto. But fun. It was fun to work on, but Yeah. Harder than we thought. In all of our free time for, like, 3 years, it was really fun.
[00:50:11] Unknown:
So it sounds like this has been mostly a bootstrapped adventure for you.
[00:50:15] Unknown:
Yeah. Oh, yeah.
[00:50:17] Unknown:
And so for people who are excited and they think this is the best thing since sliced bread, what are the cases where SassGlue is the wrong choice?
[00:50:25] Unknown:
I mean, you're not gonna run a web service for it. We've determined we're not gonna build a real time stock trading application. It's not for stateful things that most of the time that are running this is for batch. Right? For complicated jobs. For cron, that kind of stuff, you know, you're not gonna you know, there's a huge amount of things that you would not use it for. A web service, you know, don't install your database to run full time on it. Right?
[00:50:52] Unknown:
And as you continue to build out and iterate on the product, you've mentioned a couple of things that you have planned for the future. But what are some of the overall goals that you have for the near to medium term?
[00:51:04] Unknown:
We wanna build up a community, and we wanna start developing our own SassGlue recipes or Sassipes. We started doing meetups, which we think are really fun, and we go and we find out what are people having a hard time with. And we want that feedback from people. We really wanna see how far this thing can go. We wanna see how much we can do with it. We learn the tool ourselves that we find that we can do things with it that we hadn't expected before. And so we wanna get it out there. We wanna hear what people are having problems with. We wanna work with them to automate it and see if SASKube can help with that. We love to show it to our friends and people that we know and other people that we're introduced to to see, you know, how can this thing help them.
[00:51:52] Unknown:
Yep. You know, just going back to our goal from the beginning is really as a tool that's gonna help developers make their lives easier. So going forward, you know, we wanna hear feedback from developers. Right? What's working for you? What's not? You know, how can it be better in implementing that feedback in the product?
[00:52:12] Unknown:
Are there any other aspects of the SAS glue platform or the experiences that you've had building it and turning it into a business that we haven't discussed yet that you'd like to cover before we close out the show?
[00:52:23] Unknown:
It's been fun. Challenging fun. You know, we both love to work on it, and
[00:52:32] Unknown:
we love to see people using it. And I guess that's it, really. Just thank you for giving us the opportunity to talk about it. We really appreciate this. Absolutely.
[00:52:42] Unknown:
Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you each add your preferred contact information to the show notes. And as a final question, I'd like to get each of your perspectives on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[00:52:58] Unknown:
You know, honestly, I mean, based on what we've done, we think there's a big hole in the automation market in terms of a pure SaaS based solution that doesn't require setting up a server, installing plugins, setting up all kinds of firewall access, and that kind of stuff. Just I mean, we've got SaaS services for everything. Why not automation?
[00:53:23] Unknown:
When you look at data engineering, the tools when you get the data are really powerful and quite advanced. But we didn't see a ton of help out there, especially as a SaaS solution for collecting your data and moving it around and driving those things in data engineering. And so you know? But you could fill the Library of Congress with what we don't know. So so there's probably there's probably 30 people out there like, why don't you use product x? Like, well, we didn't know about it. But, yeah, we didn't see that there was a ton of automated help out there to collect the data, which is the unfun, unsexy part of data, you know. But we thought that that was critical.
[00:54:09] Unknown:
So want to thank you very much for both taking the time today to join me and share the work that you're doing on SAS glue. It's definitely very interesting product and 1 that I intend to take a look at and start experimenting with myself. So thank you for all of the time and effort you've put into that and the work that you're doing to simplify the job of integrating across the, you know, multitude of services that we all have to deal with. So definitely look forward to seeing where you end up taking the platform, and I hope you enjoy the rest of your day. Thank you so much for having us on. Yes. Thank you. People listening. Don't forget to check out our other show, podcast.init@pythonpodcast.com to learn about the Python language, its community, and the innovative ways it is being used.
And visit the site at dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Sponsor Message
Interview with Rich and Bart Wood
Founding a Business with Family
Overview of SAS Glue Platform
Target Audience and Use Cases
Positioning in the Market
Core Concepts and Architecture
Evolution and Design Changes
Building and Iterating Workflows
Handling Dependencies and Packaging
Patterns for Dependencies and Reuse
SAS Glue Recipes and Community
Initial Integrations and Prioritization
Metadata and Information Sharing
Handling Failures and Resiliency
Latency Sensitive Workloads
Data Informed Dependencies
Inspirations and Design Philosophy
Interesting Use Cases
Lessons Learned
When SAS Glue is the Wrong Choice
Future Goals
Closing Remarks and Contact Information
Biggest Gaps in Data Management Tools
Final Thanks and Outro