Summary
The past year has been an active one for the timeseries market. New products have been launched, more businesses have moved to streaming analytics, and the team at Timescale has been keeping busy. In this episode the TimescaleDB CEO Ajay Kulkarni and CTO Michael Freedman stop by to talk about their 1.0 release, how the use cases for timeseries data have proliferated, and how they are continuing to simplify the task of processing your time oriented events.
Introduction
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute.
- Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch.
- To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
- Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
- Your host is Tobias Macey and today I’m welcoming Ajay Kulkarni and Mike Freedman back to talk about how TimescaleDB has grown and changed over the past year
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you refresh our memory about what TimescaleDB is?
- How has the market for timeseries databases changed since we last spoke?
- What has changed in the focus and features of the TimescaleDB project and company?
- Toward the end of 2018 you launched the 1.0 release of Timescale. What were your criteria for establishing that milestone?
- What were the most challenging aspects of reaching that goal?
- In terms of timeseries workloads, what are some of the factors that differ across varying use cases?
- How do those differences impact the ways in which Timescale is used by the end user, and built by your team?
- What are some of the initial assumptions that you made while first launching Timescale that have held true, and which have been disproven?
- How have the improvements and new features in the recent releases of PostgreSQL impacted the Timescale product?
- Have you been able to leverage some of the native improvements to simplify your implementation?
- Are there any use cases for Timescale that would have been previously impractical in vanilla Postgres that would now be reasonable without the help of Timescale?
- What is in store for the future of the Timescale product and organization?
Contact Info
- Ajay
- Mike
- Website
- @michaelfreedman on Twitter
- Timescale
- Website
- Documentation
- Careers
- timescaledb on GitHub
- @timescaledb on Twitter
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
- TimescaleDB
- Original Appearance on the Data Engineering Podcast
- 1.0 Release Blog Post
- PostgreSQL
- RDS
- DB-Engines
- MongoDB
- IOT (Internet Of Things)
- AWS Timestream
- Kafka
- Pulsar
- Spark
- Flink
- Hadoop
- DevOps
- PipelineDB
- Grafana
- Tableau
- Prometheus
- OLTP (Online Transaction Processing)
- Oracle DB
- Data Lake
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering podcast, the show about modern data management. When you're ready to build your next pipeline, you'll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40 gigabit network, all controlled by a brand new API, you've got everything you need to run a bulletproof data platform. Go to data engineering podcast.com/ Linode today to get a $20 credit and launch a new server in under a minute. And go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch, and join the discussion at data engineering podcast.com/chat.
Your host is Tobias Macy, and today, I'm welcoming Ajay Kulkarni and Mike Friedman back to talk about how Timescale DB has grown and changed over the past year. So, Mike, could you start by introducing yourself? Hi. I'm Mike Friedman. I'm the, cofounder and CTO of Timescale, and I'm also a professor of computer science at Princeton University. And Ajay, how about yourself? I'm Ajay Kulkarni. I'm also a cofounder, the CEO of Timescale.
[00:01:14] Unknown:
And fun fact is that Mike and I go back over 20 years where we were roommates in our freshman year of college. It was actually we actually met the 1st week of college and actually even remembered the exact moment we met, which I think is pretty crazy. And for anybody who hasn't listened to the first interview where you guys showed up on the data engineering podcast, which there'll be a link to the show notes for people who want to catch up. Can you just give a quick refresher of what Timescale DB is? Of course. Yeah. So so, yeah, Timescale, we're an open source time series database. In particular, we are the leading open source time series database that supports full SQL. We're engineered on top of Postgres. So what that means is that we look and feel just like Postgres. And we also inherit all of these storied Postgres rock solid reliability. And, of course, it's massive ecosystem. That also kind of gives us the, the the largest ecosystem of any time series database, despite being only, you know, 2 years old. We launched less than 2 years ago. We've raised financing from top VCs since then, including Benchmark, NEA, 2Sigma Ventures, and a variety of angels. You know, I think since we last spoke, some new things are, we've surpassed a 1000000 downloads.
Our production users include, you know, companies like Comcast, Cree, which is a lighting company, Leica, which is an animation studio like Pixar. We're even deployed by the European Space Agency for a new solar orbiter mission that they're working on. And also and I think that kind of some of the newest news is that, we found out recently that we are the most request extension on AWS RDS, and we're the most requested feature on Azure Postgres. So I think this all speaks to, you know, not just the product we've built, but also I think, you know, the kind of the broad
[00:02:56] Unknown:
adoption and popularity that it's seen. Yeah. I know that last time we spoke, I had been asking about the availability of Timescale as an extension in RDS. So with the level of support that you're seeing on that front, hopefully, it will come to fruition in the near future so that I can, add you to some of my databases rather than having to spin up e c 2 boxes and run it myself. Yeah. We we, you know, a lot of that is out of our control because we don't run RDS.
[00:03:22] Unknown:
But but what I what I will say is fairly soon we can talk more about this later in this interview as well, but fairly soon, we'll we'll have a a managed offering, a fully managed timescale offering that that you'll be able to just, you know, kinda use like an RDS on AWS.
[00:03:38] Unknown:
And when you first launched, time series workloads were something that were widespread and used in a number of different contexts, and there were some existing databases on the market for that type of workload. But over the past year and a half, the need for time series data and the prevalence of it has exploded beyond what it was 2 years ago. So I'm curious what you have seen in terms of shifts or new entrants to the market and just the overall ecosystem of time series data and and how that has grown over the time that you've been working on timescale?
[00:04:17] Unknown:
Yeah. I mean, I I think when we started the company, we had, you know, our our main hypothesis or at least the problem, the itch that we needed to scratch was that we you know, there's a lot of lot of time series data. And, but but none of the existing time series databases supported SQL. So it's really this intersection of of time series in SQL and in particular Postgres. And I think, just in the past year, like, the trends have grown, like, you know, to be honest, like, even faster than we expected. You know, time series, you know, continues to grow as the fastest growing category of databases.
Postgres continues to grow. And it's, I think, for the 2nd year in a row is the fastest growing database according to DB Engines. And and, you know, number 2 is Mongo. You know, if you look at Mongo stock, you can see how well Mongo is doing. And and Postgres is no, you know, quote unquote company behind it, but Postgres is is, you know, apparently growing even faster. And so we continue to see that. I I think what's been interesting with time series is that, you know, initially, you know, the premise was that, you know, there's a lot of times there's data in say kind of IT operations and DevOps. A lot of times there's an IoT. And I think, I think we're we're just we're seeing that, but, like, even more than we expected. You know, I think the the kind of the the dev ops world is growing faster than than than folks expected. I think in IoT, you know, whether it's industrial machines or sensor data or oil and gas rigs or historian data or some of these new IoT devices.
I think it's it's all grown. I think, you know, the the data volumes are growing faster than we have seen, than we than we expected. So I think it's been it's been really cool, I think. And then that explains a lot of the adoption we're seeing. Time series also, you know, has managed to crop up in places that we did not expect. Like, we see companies deploying us as part of their ML pipelines, as part of their web and mobile eventing data, you know, and part of their security analysis. I think network monitoring is something we're seeing a lot of users from that we did not expect, you know, when we started this company. So it's all been good things. I think in terms of the market, I think probably the most, you know, interesting news has been, you know, as of a few months ago, when AWS at re:Invent announced, you know, their own fully managed time series offering called Timestream.
And, and I think that's been interesting for like a couple of reasons. You know, number 1, I think it it did totally validate the market. And I think, and I think a lot of people when the news came out, a lot of people asked us, oh, is that is this gonna hurt you guys? And it's and it's actually only helped us. We've only gotten, you know, more, you know, more attention from potential users. We've actually had, you know, analysts, you know, like like like leading database analysts, like, cold call us to talk to them. Like, oh, hey. We're having a lot of conversations lately that where, you know, time stream comes up and time scale comes up. And can you talk? And we're like, yeah, of course. And in fact, we've also had a, you know, you know, a lot of inbound investor interest as well. So I think that's been very validating. But I think the the kind of really interesting part about Timestream is, you know, when it first came out, it was really, it was kind of funny because, you know, we saw the name and we were like, wait, TimeStream?
They really call it that? And because, you know, we're timescale. And when we saw some of their marketing as well, when we looked at it, we're like, wow. It's like and very, you know, you know, it kind of mimicked, you know, to put it mildly, our own kind of marketing and our own messaging. And we talk about some of the things we talk about in our blog posts and our website and our and our kind of talks we give that are also on, you know, available on YouTube. You know, even though their product is very different, I think a lot of the messaging they adopted, you know, they they kind of adopted from us. So, you know, when that news came out, you know, I think we all found it very flattering that, you know, like a a company with essentially unlimited resources like Amazon, when launching this new time series product, felt the need to to copy us. And we're, you know, we're still like a, you know, a 30, 40 person startup. And I think we found that really flattering and
[00:08:08] Unknown:
pretty funny. Yeah. It's definitely interesting the ways that AWS can influence the perspectives on what people think they might need for their given workflows. And as you said, having them launch a managed time series data storage system definitely validates the, the niche that you are working to fill. And another reason that I think that time series has become more prevalent is, as you mentioned, customer analytics and some of that those types of data, but also a much bigger shift from some of the batch oriented Hadoop style workloads to some of the more continuous streaming analytics powered by systems such as Spark or Flink and things like Pulsar and Kafka, and just that view of needing to have everything be real time and up to date and having the, as you mentioned, SQL interface on top of that time series data makes it much easier for people who don't necessarily have as much of the technical acumen as the data engineering or data science team still being able to have self-service access to that data with a minimal amount of processing necessary.
[00:09:19] Unknown:
Yeah. And and I think that's 1 of the advantages we've we've always seen from what we hear from customers as well of our approach compared to, you know, some of the others where they were building their own custom, query language was that, you know, as you point out that not only is SQL, you know, broadly adopted, but you have this existing large install base, that, speaks SQL natively. And so it's not just that both people are trained to use SQL as, you know, 1 of the largest data access languages, but but also, visualization layers and existing legacy applications. So you could basically use this, time series data and and join it with other business data metadata and then use it across your entire organization. And I think I think related to your questions before too, which is is 1 of the things that we've we've seen is, you know, a lot of the existing time series databases had come out of this IT or DevOps world where they were kind of narrowly focused on use case. What we start seeing is as as more and more companies are starting to use their time series data as part of their digital transformation, that they need a much richer use of this this time series data than these than these previous solutions that were that had been, you know,
[00:10:27] Unknown:
built only for for for IT ops really offered. And and they and they get that through through, you know, the flexibility that we that we have at Timescale. And and also, I mean, just just to kinda close out the AWS thread. I think what's really interesting is is you're right. I think AWS, you know, I think they've done a great job, you know, kind of building, you know, to some degree, the whole cloud or hosted industry. And and they do kind of direct people. And I think I think what's what's happened is that people saw Timestream like, oh, hey. Like, oh, yeah. Like, we we need something like for our time series workloads. But then they've kind of looked at the kind of the pricing and how Timestream worked. And and if you guys should look at it, it's actually quite expensive. It's not really designed for operational workloads. It's it's kind of this weird serverless thing that, you know, maybe it's built on Kinesis or something. We're not exactly sure. But but but, you know, I think people have just sound like, oh, yeah. We have operational workloads. Oh, wait. Like, we can't, you know, we can't you know, that that that kind of consumption model doesn't work for us. And then they've come to us, which we're more of a you know, we're like a like a real database company. And I think that's really, it's really worked out. Plus the advantage of working with, you know, a company like Timescale is that it's core cloud agnostic. You know? Like and so you're not tied to AWS. You can essentially have workflows on AWS or Azure or GCP or whatever you want. And I think that's kind of been a driver. And also having the AWS managed offering
[00:11:46] Unknown:
limits some of the capacity for doing local development. So being able to install time scale on a developer's machine to build out new features or test out new capacities, and then being able to launch that to whatever their production environment is simplifies the overall life cycle of building different features and products that actually leverage that time series data. And over the past year since we last spoke, I'm wondering what has changed in terms of the focus and features of the Timescale DB project
[00:12:18] Unknown:
and the company that you've built around it? Yeah. I I think, I think last year, the the big push was, getting towards 1 dot o, which we which we, you know, launched in September, October. And 1.1.0 is a really kind of a funny thing for us because typically databases become 1 dot o when they're production ready. But, you know, being engineer on top of post Chris, we've been 1 dot we've been production ready since essentially month 3, if you will. So like pretty early from from version 0.1, I think. And so 1.0 is really just a kind of, you know, our stake in the ground and public recognition saying, hey, we're not just production ready, but also enterprise ready. I think the focus for this year, which I'll let Mike talk about in a minute, has been some kind of the next phase of database, both in terms of kind of capabilities, but also consumption models. Yeah. I think there's, you know, 2 main
[00:13:09] Unknown:
capabilities that we've been working on and and have delivered part of it. Talking about the database itself, there's really been 2 large folks as 1 has been kind of a whole automation framework, that we're we've built into the database. So that, you know, not only, you know, typically you consume a database by sending queries and inserts, but now we've basically built this rich automation framework inside the database, whereby the database will, you know, in the background continuously implement data retention policies, will reorder your data in certain ways to make your query much more efficient, and will allow you to do continuous segregations where you'll continuously roll up, you know, from raw data to aggregated data to, you know, on a raw data to 1 minute intervals to 1 hour intervals to both, inquiries much faster.
And build it deliver that all the way that you can set up initial a very simple policy, and the database will continuously do this in the background. Some of this automation framework, and capabilities have already been released, and we're basically releasing new capabilities, for this, particularly around the continuous aggregations over the next couple months. The other big focus that we've been doing from a from a product engineering perspective has been building out basically our our scale out clustering capabilities. Right now, we, offer clustering in the form of a single primary multiple replicas, and so people regularly deploy that both for, high availability and fault tolerance and also as a way to kind of scale out read capabilities. So you might, for example, give a read replica to a data science team or an analytics team where they could, you know, hit it with all your random expensive queries that could be, you know, scanning the entire database while you're continuing to drive high write rates at the primaries.
But what we'll be basically releasing throughout this year is the ability to, scale out the number of of primaries you have. So you could, you know, build out this to the hundreds of terabytes or petabyte scale, as opposed to day, which we we generally see people in the tens of terabyte scales. 1 more thing that I think is worth mentioning is because it came up kind of earlier in this conversation is
[00:15:19] Unknown:
we fully recognize that, you know, not all but I'd say I'd say a lot, maybe most database workloads are moving to the cloud. And I think like you like you you kind of alluded to, it's like, you know, you know, not everyone wants to manage a database on EC 2 or on their own VMs. And And so, we fully recognize that. And, and and so our our managed offering right now, you know, which essentially has been in beta, where we're, you know, over the next few months, you'll see more announcements around this. But, but 1 of the main focuses has been is really bringing a generally available, you know, fully managed timescale cloud that essentially, you know, allows you to do the things you wanna do, which is, you know, insert data, query data, analyze data, use the power of SQL, and and maybe, you know, kind of let you offload things that maybe folks don't really wanna bother with it, which is, you know, backups, replication, just kind of operating the database. And and the way we're thinking about this managed offering is really to have something that's not just tied to 1 cloud provider, but is really available on multiple clouds.
But but, again, you know, we'll have more, I think, to to announce in the next few months. So last fall, you launched your 1 0 release, and you talked a bit about the,
[00:16:29] Unknown:
criteria for establishing that milestone. And in around that same time frame, the pipeline DB project also hit 1 0, and their focus is on doing these continuous aggregations in memory on top of postgres for high volume workloads where you don't necessarily need access to the raw data, but you want to be able to continuously query these aggregated data points. And what you were saying, Mike, as far as introducing some of the automation around time scale of continuously computing these aggregates and these roll ups. I'm wondering how that plays into the calculus that somebody might be doing as to whether they would want to use pipeline DB versus timescale DB for some of these aggregated statistics.
[00:17:18] Unknown:
Yeah. We, you know, we we definitely are familiar with the pipeline, DB product and and think it, you know, is very good for the for the workloads that the the use case that it's come out with. You know, I think the the great thing about 1.0 was they move from a Postgres fork to Postgres extension, and we've certainly, you know, we've we've we've heard some requests of people looking at using both type of products. I think 1 of the main ways that these are looking at really complementary problems rather than competing problems is that as you point out, that pipeline is really designed for these workloads where you have a really high volume of initial requests, but you don't want to basically store all the raw data. And so typically, the way you're deploying this was that is that you kind of buffer a lot of these initial data points in memory and then into little micro batch. And then on that micro batch, then you update just an aggregate that's stored as like a view in the database. And so, you know, I I think 1 of the main ways that were different is as as much more of a traditional transactional database where you could use us to store your raw data as well as do the all the type of transactions that you associate with, with a relational database.
And yet also in the background, this new automation capability be able to continuously create aggregates. And 1 of the reasons that you use aggregates is then you could then power dashboards where, for example, if you're trying to compute a statistics over a week long data with millions of data points, you don't always at query time want to, you know, recompute, you know, touch millions of data points to compute that 1 metric. You'd wanna just power it on the pre aggregated data. You know, in in that notion of of allowing queries on pre aggregated data, it's it's similar as pipeline. I think 1 of the ways that pipeline is different is they're really looking at things where you might be pushing, you know, I think sometimes for the simpler case, tens of millions of metrics per second on a single machine, where I think right now we're kind of at like a 1000000 metrics per second or so on a on a single machine, because you're not actually storing the raw data. And a lot of cases, there's this long history of, various type of streaming algorithms or sketching algorithms where you're doing these type of, approximations, or or or various type of, functions where you can't even really store you're only computing an approximation of the actual results because you're, again, not really ever looking to store the the raw data. I I would also add, I mean, just on a nontechnical side, we we are
[00:19:43] Unknown:
we do we're we're also big fans of the pipeline team and the work they're doing. And we think it's great for not just the Postgres ecosystem, but also for, you know, I think data management in general.
[00:19:53] Unknown:
And going back to your 1 0 release, I'm curious what were the most challenging aspects of establishing and reaching that milestone and some of the outcomes that you've seen post 1 0 in terms of general recognition or curiosity
[00:20:13] Unknown:
of the product and the work that you're doing at time scale? Yeah. I mean, to be perfectly honest, I think the hardest part for us is where to draw the line and say this is 1.0. Because, you know, typically, companies like, oh, like, until we're production ready, we're not 1 but we've been production ready for, at that point, like, you know, over a year. You know? So so we're kind of like it was almost like scope creep. It was like, hey. Like, where do we internally say, okay. Now we're drawing the line and this is 1.0 and everything after that is not 1.0. And some of that was, I think, pretty clear around some usability things we wanted in 1.0. Other things were less clear because their capabilities and some of it were like, hey. You know what? This can come in 1 on 1 and that's fine. And and but I think the key thing is we did wanna get that out the door and and essentially establish that, yes, this is production and enterprise ready. In terms of Beyond 1.0,
[00:21:00] Unknown:
I'll let Mike maybe Yeah. The the other interesting thing about about doing a 1.0 release is that, you know, in 1 of the things that we always thought was very important from the start was to deliver, given that time scale is implemented as an extension on Postgres, basically enabling, you know, the as close to full capabilities of Postgres as makes sense for a time series database. And you know what? If you if you think about what other time series databases offer, they're like, well, we'll be SQL ish which will be a very simple select statement that maybe could only order by time and can't do anything else. That's what for some people a SQL ish language looks like. You know, for us, we have, you know, part of the 1 process is really to kind of almost cover the broad scope of of every various combination of what Postgres allows you to do. And, you know, given that it's a 20 year old database, it often lets you do similar things in multiple different ways. So, you know, we'll occasionally I remember at 1 point we had a a bug which was caused by somebody having a trigger inside a subquery of a common table expression inside a NumPy query.
And you compare this against the, you know, a Seagullish language which only allows you to order by by time. And so the level of, complexity maturity in in what we want to deliver on was, you know, was much larger. But also, like, in order to declare this up all work, it's just there's a long tail of things that we needed
[00:22:29] Unknown:
to meticulously go through and check. I mean, I I think no other time source database has that problem because none of no no other database can actually do anything close to that. You know? And for and for us, we're like, oh, like, we can do 90% of it right now, but we gotta really work out this 10% corner case. And then there there are quite a few of those that we've managed to to kinda iron out before 1.0.
[00:22:51] Unknown:
And in terms of the contract that you're putting forth with that 1.0 release, is your intent to say that, from this point forward, there is this baseline set of features that we will always support or at least until we deprecate them in 2.0? And what are some of the things that were most requested as far as having that level of commitment in terms of future support?
[00:23:18] Unknown:
You know, I I think related to what I was talking about about this long tail is, you know, our focus with 1.0 was really to give people the perspective that they already know how to use the database, and that means from both the user's perspective and the operations perspective. And, you know, as it it really looks and feels like Postgres even though it has, you know, a lot, more capabilities both from a scaling and performance perspective related to time series data and also, you know, new time series manipulations that we didn't have before.
We actually view that as a more critical part of 1.0 than kind of the next stage of special features related to time series, like, you know, clustering and automation, and you you'll you'll start you'll see a in 1 of the next releases, we'll have kind of new capabilities around time series analytics for gap filling and OLCF and interpolation, which should be out in a couple weeks. So, you know, we we really viewed that 1.0 was was giving people, this familiarity that we, you know, will continue to wanna, will maintain going forward, as opposed to necessarily new features.
[00:24:28] Unknown:
And as you have continued to build and grow timescale and work with your customers, what have you found to be some of the most challenging types of time series workloads and some of the differences in requirements
[00:24:45] Unknown:
of time scale and the underlying Postgres as far as what those workloads dictate. So 1 of the interesting things we found is that particularly in being a, you know, again, general purpose database, we see a lot more varied use cases than you might see if you were specifically focused on something like, you know, DevOps. And even in in you know, in IoT, for example, is 1 1 use case we see a lot. But even in IoT, you actually see very different data patterns, which actually often mean a different way you use the database. And and I'll give you, kind of 3 3 examples.
So 1 common IoT paradigm is that you have many different, a lot of homogeneous devices. You know, you have the same device deployed in, you know, 1, 000 or tens of thousands of locations collecting data all at the same rate. And so then that dictates 1 type of data model to store. You know, you you put all of this data in a single hyper table. You could predefine your schemas up front because all of your, devices are very similar, and and it's kind of the very natural thing. Another use case is that you will actually have a broader system and different types of sensors that are collecting different data at different offsets. And so that again basically leads to different type of data models that you want to handle inside your system, and there's different ways that you could use timescale to store for, you know, either defining different data, you know, sensor types or using our support for almost like a NoSQL database using JSON v support. And so what we find is that even in a in an area that you think sounds like kind of quote IoT, there are many different types of deployments that lead to different ways to use the database. Thankfully, because timescale is incredibly flexible in the way you could use them, it really doesn't dictate a specific storage paradigm or data model. It allows people to really adapt it to the use case.
[00:26:41] Unknown:
And as far as any initial assumptions or approaches that you built into timescale when you first began working on it, what are some of the ones that have held true throughout, and which are the ones that you had to reconsider as you were presented with some of these different use cases and challenges?
[00:27:02] Unknown:
Yeah. I mean, I think I think the number 1 thing that's helped you throughout is is this is notion that that people like SQL. That people know SQL, that SQL is easy to use, and and it's it's not just for your your transactional workloads, even for time series is that the the the versatility, the flexibility, the ubiquity of SQL is SQL is something that people really wanted. And and really no no time series database, you know, before time scale really did that. And that's kind of what what kind of when we launched, because some people thought it was heretical, but I think that's I think that that's that's, you know, I think we thought it was true then and I think it's even more true now or truer now. I think probably related to that, 1 thing that has been disproven and I think we're pleasantly surprised by this is that, you know, we thought, you know, SQL was a query language for say, you know, software developers building applications or folks in IoT. And we just assume that folks in the DevOps and IT monitoring spaces didn't like SQL. Like, they wanted something that was more bespoke and custom like, you know. And and, and 1 thing we've been pleasantly surprised is that there are actually quite a few use cases within the IT operations world that people need SQL. You know, we've you know, especially larger companies, there's this idea that, you know, you have assets that are generating time series data, then you have metadata that describes those assets. And and you need to join that data and you don't want it to denormalize it. And I think another use case has been, you have all this IT operations data that maybe you're displaying in in a, you know, a dashboard like Grafana, but then you also need to make that same data available to to business side of the house, either for for budgeting reasons or capacity planning reasons or for strategic reasons. And the business side of the house uses Tableau and and so you need a SQL interface to that same data. And then and I think that's you know, we've seen that quite often. In particular, you see that with Prometheus. Like, you know, maybe Prometheus is great for or for monitoring for the dev ops team, but as soon as you have multiple Prometheus instances, you know, what's your long term store? And if you wanna make that data available to the rest of the organization for either business needs or data science needs, you know, you know, how do you make it available? And and the answer there, it really you know, I think we've been surprised it has been SQL.
[00:29:10] Unknown:
And and that's kind of, you know, where we've seen some usage that we did not expect. And since we last spoke, there have been a couple of major releases of Postgres in the form of 10 dotxand11.x, and those have brought in some new capabilities as far as partitioning and some of the, time series type, data manipulation within the vanilla release of Postgres. So I'm wondering how those changes and evolutions of the underlying project have impacted your development of timescale from the architectural approach and any code that you have maybe been able to discard because it's implemented in vanilla Postgres and maybe some changes in perception as far as the necessity of time scale for certain workloads.
[00:30:04] Unknown:
Yeah. I think, you know, first of all, 1 of the nice things about being built as an extension is that, you know, when 10 when times when postgres 10 came out, last fall, we released support for it, I think, just a few weeks later. And I think about a month or 2 ago, we we launched support for Postgres 11. And, you know, 1 of the the nice things is that we can take advantage of of many of the capabilities that they offer. And so for example, Postgres 11, you could we now have support to do the you know, greater forms of parallelization across our multiple internal chunks. We have the support to use. It's just in time compilation for, you know, larger queries, and that's all been great. And so to take advantage of the parallelization and jitting and everything has has really been a big win, and we definitely see I think, Postgres 11 was 1 of the definitely recently, off requested feature for for timescale.
You know, as of yet, we haven't changed our the way we do internal partitioning to reuse, any of the support that has been built natively into Postgres 10 or 11. You know, in the future, we might look to see about rearchitecting anything, you know, maybe when Postgres 12 comes out. Although I think there's also very other interesting things in the Postgres, road map that we're actively looking forward. For example, some of their pluggable storage, back, APIs or new pluggable storage back on which look very interesting to us. You know, 1 of the particular reasons is I think that ultimately, you know, the the scale that we're looking at doing partitioning in timescale is just much greater than I think most people use in, you know, traditional workloads to see on Postgres where you might have, you know, tens of of partitions or more when we're seeing people with or 100 when we're seeing tens of 1, 000. And so a bunch of the performance optimizations we've done, and we did we did spend some time looking at the implementation, but we we kind of came to the the conclusion so far is the type of performance optimizations we really have done we have built are better suited for the type of,
[00:32:01] Unknown:
the scale and needs that we have for many of the time series workloads we see. I mean, we we we love all the improvements that that are happening within core Postgres. And and like mentioned, you know, we we benefit from some of them. But I mean, I think the truth is that, you know, the the problem that the Postgres community is is solving is a different problem than what timescale is. You know, they're you know, Postgres is try is still, you know, looking to be the number 1, you know, OLTP database. And, you know, I think there's still a long way I mean, it's growing very quickly, but it's still a long ways to go to replace all of the Oracle legacy workloads. And I think the road map to do that is a very different road map than what we're building, which is, you know, the best time series database that supports SQL. Obviously, there's some overlap, but but I think, you know, especially as as we grow, you'll find that bring a lot of functionality that just doesn't make sense for it to be in Postgres core. And there's in fact, there are certain things that we will prioritize that it would not make sense for the Postgres to prioritize because, you know, you just don't need these types of capabilities in an all DB database.
[00:33:03] Unknown:
But for time series, I mean, they're critical. I think these are complementary. The fact that Postgres builds an extension framework means that we have the companies or projects like ours have the capabilities to take Postgres into new, areas that it otherwise wouldn't have seen use in. So I think it's great for we certainly benefit, from the Postgres database, and I think the broader Postgres community also benefits that, peep you know, projects and companies like ours are, and pipeline are are taking Postgres to be used in cases where it otherwise wants to use.
[00:33:36] Unknown:
And looking forward, you've talked a bit about some of the features and products and capabilities that you have in the works, but what are some of the other things that people can look forward to in the future of the timescale product and organization?
[00:33:51] Unknown:
Yeah. I mean, I think well, first, just to reiterate, I think the ones the ones we live listed so far, you know, I think, you know you know, number 1, we've been really yeah. We've been focusing on, you know, a few major initiatives. 1 is the the the scale out version of timescale, which, you know, right now we scale reads, but now it's also scaling out writes as well across multiple nodes. I think I think number number 2 has been the kind of the the managed offering, which we'll announce, you know, soon. Number 3 has been kind of automation around kind of data life cycle management, whether it's aggregations, retentions, other types of, know, kind of performance based optimizations, you know, more things around analytics, around, like, kind of outlook offerings, like, you know, you know, gap filling, LOCF, interpolation.
But but those are all features. I mean, I I think the the key thing that we're finding is that and and I think, you know, what time scale is headed towards is, you know, I think this company when it was launched, it was launched, you know, I think in a simple idea. The idea was just that, hey, time series data is growing and and it needs a SQL database. And I think pretty early on based on all the broad adoption that we saw, we we had, which at that point we thought was a really crazy idea. But now I think it's it's becoming less and less crazy. And the idea is that, like, you know, in fact, you know, time series databases are growing not just because of DevOps and IoT and, you know, maybe ML and, you know, usage data, but because fundamentally all data is time series data. And this is this idea that, like, look, like, you know, every data point has a timestamp.
And, you know, and we would argue that by not storing your data with its raw in its raw timestamp form, You're actually throwing information away because it's it's only by storing data in a time series format you can actually see how your your data is changing over time, how maybe how your users are interacting over time, how your vehicles and assets are moving across time and space. And, and and I'm I'm not saying that, you know, you need a time series database for everything because because no. Of course. You know, there'll be some workloads where you need a key value store, some workloads where you actually need a distributed, you know, LTP database. But, you know, your your workloads, you know, are fundamentally time series in nature. And because, you know, as stores gets cheaper and cheaper and as computation becomes more and more, you know, powerful, there's tremendous value in in storing, you know, you know, your existing data in its raw time series format for analytical purposes.
So you can see, you know, not just how you know, what's you know, kinda analyze the past, and monitor the present, but even, you know, predict the future. And I think 1 thing you'll find, you know, not just this year, but the next few years in terms of how this company is gonna grow, is it really helping not just with the existing time series workloads, but with other workloads that, you know, where they converted to their raw time series, you know, kind of kind of format, could really lead to some really, you know, you know, opportunities to to drive efficiencies, to to, you know, launch new products and and can run a really, you know, in kind of really meaningful and impactful
[00:36:57] Unknown:
ways. And are there any other aspects of the time scale product and company and time series data that we didn't cover yet that you think we should discuss further before we close out the show?
[00:37:10] Unknown:
I would say the company is growing very quickly. I you know you know, roughly in the past year, we've we've almost quadrupled their headcount, and I think that growth is gonna continue. And so and and and at this point, I think maybe a year ago, we were predominantly in New York, but now our team is distributed across everywhere from the West Coast. We have folks in, you know, you know, in Portland, San Francisco, LA. But even if you go east, you know, you know, not just in the east coast, but also in in Europe. We have teams in, you know, you know, Stockholm, in Germany, you know, even in in Macedonia and Moscow. So I think 1 thing about Timescale is that we're growing very quickly. We are hiring across the board with really strong engineers, but also really kind of, you know, really strong, you know, business folks who are really passionate about databases and open source. And and, and and, you know, I you know, I think,
[00:38:05] Unknown:
you know, if you're anywhere in those time zones, you know, we we'd love to talk to you. Alright. Well, for anybody who does wanna get in touch and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I would like to get the perspective
[00:38:28] Unknown:
Yeah. Yeah. I mean, I think I think we're we're really at a, I would say, a inflection point or really a a crossroads, if you will, in in the database, data management world. You know, more and more workloads are moving to the cloud. And plus you have, you know, the rise of open source. In a kind of contradictory way, the rise of the public clouds has actually meant the has kind of been a threat to the rise of open source. And you can see this as as, you know, folks like Amazon, you know, launch proprietary products on the backs of open source communities without really contributing back to the communities. And in fact, the products they're launching, if they're not, you know, they're not open source. I mean, they're they're proprietary. And I think I think the thing that is maybe not a product that's missing, but I think that's really, you know, I think I think that is a missing thing in the community, which is, you know, how do we marry how do we marry the rise of of the cloud with the rise of open source? And and if the answer is multi cloud, I don't think anyone, you know, has been able to provide a really strong multi cloud story yet. I know some vendors do it, but but I think that's that's a kind of really big question mark in my mind. Mike, do you have anything to add? Yeah. I I think 1 of the you know, if we think about different use cases, you know, we we definitely see a lot, in in IoT. And 1 of the interesting things about IoT is it goes everywhere from the edge to the cloud,
[00:39:47] Unknown:
where people are looking to build more pipelines where, you know, not only are they trying to transmit data from, the edge to, to let's say, historically, you might use a data lake or something where you're just running non real time reporting. But now that they want to operationalize the data, they want to, you know, a, deploy a time series database in the cloud to be able to operationalize that and make that more real time. But we're also seeing where actually they deploy it from the edge too because it's not cost effective or for a variety of reasons to constantly stream all the data back to the cloud. And so part of this now enters this management. We're now we're now beginning to enter a a plane of of how do we do data management all the way from the edge to the cloud where you have lots of different instances of your database that only include a subset of your data. And, you know, in general, there is not really great great tooling out there or great frameworks out there to think about data life cycle management across and really just just the control and security of across data across this, you know, big hybrid platform.
[00:40:53] Unknown:
Alright. Well, thank you both for taking the time again today to talk to me about how Timescale has been progressing over the past year, and thank you for all of the effort that you're putting into the product. It's definitely very compelling and 1 that I've been keeping a close eye on for a while now. So thank you for that, and I hope you both enjoy the rest of your day. Yeah. Appreciate that. Thank you for having us. Thanks a lot for the time.
Introduction and Guest Welcome
Overview of Timescale DB
Managed Offering and Market Trends
AWS Timestream and Market Validation
SQL and Time Series Data
Recent Developments and Features
Automation and Clustering Capabilities
Challenges and Outcomes of 1.0 Release
Challenging Time Series Workloads
Impact of Postgres 10 and 11 on Timescale
Future of Timescale
Company Growth and Hiring