Data management is hard at any scale, but working in the context of an enterprise organization adds even greater complexity. Infoworks is a platform built to provide a unified set of tooling for managing the full lifecycle of data in large businesses. By reducing the barrier to entry with a graphical interface for defining data transformations and analysis, it makes it easier to bring the domain experts into the process. In this interview co-founder and CTO of Infoworks Amar Arsikere explains the unique challenges faced by enterprise organizations, how the platform is architected to provide the needed flexibility and scale, and how a unified platform for data improves the outcomes of the organizations using it.
Your data platform needs to be scalable, fault tolerant, and performant, which means that you need the same from your cloud provider. Linode has been powering production systems for over 17 years, and now they’ve launched a fully managed Kubernetes platform. With the combined power of the Kubernetes engine for flexible and scalable deployments, and features like dedicated CPU instances, GPU instances, and object storage you’ve got everything you need to build a bulletproof data pipeline. If you go to dataengineeringpodcast.com/linode today you’ll even get a $100 credit to use on building your own cluster, or object storage, or reliable backups, or… And while you’re there don’t forget to thank them for being a long-time supporter of the Data Engineering Podcast!
Ascend.io, the data engineering company, provides the flex-code data platform for autonomous pipelines that frees data teams to spend more time innovating. Data pipelines are the backbone of modern data systems. However, data engineers are overburdened with building and maintaining brittle pipelines, which creates a backlog that prevents data analysts and data scientists from accessing critical information. The Ascend Unified Data Engineering Platform removes these bottlenecks and enables teams to create self-service data pipelines that dynamically adapt to changes in data, code, and environment.
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
- Free yourself from maintaining brittle data pipelines that require excessive coding and don’t operationally scale. With the Ascend Unified Data Engineering Platform, you and your team can easily build autonomous data pipelines that dynamically adapt to changes in data, code, and environment — enabling 10x faster build velocity and automated maintenance. On Ascend, data engineers can ingest, build, integrate, run, and govern advanced data pipelines with 95% less code. Go to dataengineeringpodcast.com/ascend to start building with a free 30-day trial. You’ll partner with a dedicated data engineer at Ascend to help you get started and accelerate your journey from prototype to production.
- Your host is Tobias Macey and today I’m interviewing Amar Arsikere about the Infoworks platform for enterprise data operations and orchestration
- How did you get involved in the area of data management?
- Can you start by describing what you have built at Infoworks and the story of how it got started?
- What are the fundamental challenges that often plague organizations dealing with "big data"?
- How do those challenges change or compound in the context of an enterprise organization?
- What are some of the unique needs that enterprise organizations have of their data?
- What are the design or technical limitations of existing big data technologies that contribute to the overall difficulty of using or integrating them effectively?
- What are some of the tools or platforms that InfoWorks replaces in the overall data lifecycle?
- How do you identify and prioritize the integrations that you build?
- How is Infoworks itself architected and how has it evolved since you first built it?
- Discoverability and reuse of data is one of the biggest challenges facing organizations of all sizes. How do you address that in your platform?
- What are the roles that use InfoWorks in their day-to-day?
- What does the workflow look like for each of those roles?
- Can you talk through the overall lifecycle of a unit of data in InfoWorks and the different subsystems that it interacts with at each stage?
- What are some of the design challenges that you face in building a UI oriented workflow while providing the necessary level of control for these systems?
- How do you handle versioning of pipelines and validation of new iterations prior to production release?
- What are the cases where the no code, graphical paradigm for data orchestration breaks down?
- What are some of the most challenging, interesting, or unexpected lessons that you have learned since starting Infoworks?
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
- Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email firstname.lastname@example.org) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
- Google BigTable
- Apache Spark
- Apache Hadoop
- Data Partitioning
- Apache NiFi
- Change Data Capture
- Slowly Changing Dimensions
- Snowflake DB
- Data Catalog