StreamNative Brings Streaming Data To The Cloud Native Landscape With Pulsar

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

11 May 2020

StreamNative Brings Streaming Data To The Cloud Native Landscape With Pulsar - E132

0:00/0:00

Share on social media:

Summary

There have been several generations of platforms for managing streaming data, each with their own strengths and weaknesses, and different areas of focus. Pulsar is one of the recent entrants which has quickly gained adoption and an impressive set of capabilities. In this episode Sijie Guo discusses his motivations for spending so much of his time and energy on contributing to the project and growing the community. His most recent endeavor at StreamNative is focused on combining the capabilities of Pulsar with the cloud native movement to make it easier to build and scale real time messaging systems with built in event processing capabilities. This was a great conversation about the strengths of the Pulsar project, how it has evolved in recent years, and some of the innovative ways that it is being used. Pulsar is a well engineered and robust platform for building the core of any system that relies on durable access to easily scalable streams of data.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management
When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
You monitor your website to make sure that you’re the first to know when something goes wrong, but what about your data? Tidy Data is the DataOps monitoring platform that you’ve been missing. With real time alerts for problems in your databases, ETL pipelines, or data warehouse, and integrations with Slack, Pagerduty, and custom webhooks you can fix the errors before they become a problem. Go to dataengineeringpodcast.com/tidydata today and get started for free with no credit card required.
Your host is Tobias Macey and today I’m interviewing Sijie Guo about the current state of the Pulsar framework for stream processing and his experiences building a managed offering for it at StreamNative

Interview

Introduction
How did you get involved in the area of data management?
Can you start by giving an overview of what Pulsar is?
- How did you get involved with the project?
What is Pulsar’s role in the lifecycle of data and where does it fit in the overall ecosystem of data tools?
How has the Pulsar project evolved or changed over the past 2 years?
- How has the overall state of the ecosystem influenced the direction that Pulsar has taken?
One of the critical elements in the success of a piece of technology is the ecosystem that grows around it. How has the community responded to Pulsar, and what are some of the barriers to adoption?
- How are you and other project leaders addressing those barriers?
You were a co-founder at Streamlio, which was built on top of Pulsar, and now you have founded StreamNative to offer Pulsar as a service. What did you learned from your time at Streamlio that has been most helpful in your current endeavor?
- How would you characterize your relationship with the project and community in each role?
What motivates you to dedicate so much of your time and enery to Pulsar in particular, and the streaming data ecosystem in general?
- Why is streaming data such an important capability?
- How have projects such as Kafka and Pulsar impacted the broader software and data landscape?
What are some of the most interesting, innovative, or unexpected ways that you have seen Pulsar used?
When is Pulsar the wrong choice?
What do you have planned for the future of StreamNative?

Contact Info

LinkedIn
@sijieg on Twitter
sijie on GitHub

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.__init__ to learn about the Python language, its community, and the innovative ways it is being used.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat