Life Is But A Stream

Ep 1 - Stream On: Unleashing Innovation with Data Streaming

Episode Summary

Real-time data streaming is shaking up how we handle and process data. In our first episode, Tim Berglund, VP of Developer Relations at Confluent breaks down the basics, from what data streaming is to its game-changing role in modern tech—perfect for anyone ready to dive in.

Episode Notes

Real-time data streaming is shaking up everything we know about modern data systems. If you’re ready to dive in but unsure where to begin, no worries. That’s why we’re here.

Our first episode breaks down the basics of data streaming—from what it is, to its pivotal role in processing and transferring data in a fast-paced digital environment. Your guide is Tim Berglund, VP of Developer Relations at Confluent, where he and his team work to make data streaming data and its emerging toolset accessible to all developers.

You’ll learn:

The fundamentals of data streaming
Data streaming advantages vs. other technologies
What is an Event-Driven Architecture (EDA)
And much more…

About the Guest:

Tim Berglund serves as the VP of Developer Relations at Confluent, where he and his team work to make streaming data and its emerging toolset accessible to all developers. He is a regular speaker at conferences and a presence on YouTube explaining complex technology topics in an accessible way.

Guest Highlights:

“The basic intellectual habit that you have in building a data streaming system isn't first, ‘What are the things?’ But it's, ‘What is happening?’”

“With batch processing, I’ve got my data in a pile—I know where it starts, where it ends, and I can work through it. With streaming, it’s not a pile—it’s a pipe.”

“The future of data streaming is real-time everything—flows, insights, and actions. There’s no more ‘take the data here and think about it later.’ The insight is now, ready to be consumed by anyone who needs it. Businesses built on this model can respond to the world as it changes, right away."

Episode Timestamps:

*(01:44) - Tim’s Journey in Data Streaming

*(14:35) - Data Streaming 101: Unlocking the Power of Data

*(38:56) - The Playbook: Tools & Tactics for Data Streaming

*(49:00) - Voices from the World of Data Streaming

*(53:35) - Quick Bytes

*(57:10) - Top 3 Takeaways

Links & Resources:

Connect with Joseph: @thedatagiant
Joseph’s LinkedIn: linkedin.com/in/thedatagiant
Tim’s LinkedIn: linkedin.com/in/tlberglund
Explore the 2024 Data Streaming Report
Learn more at Confluent.io

Our Sponsor:

Your data shouldn’t be a problem to manage. It should be your superpower. The Confluent Data Streaming Platform transforms organizations with trustworthy, real-time data that seamlessly spans your entire environment and powers innovation across every use case. Create smarter, deploy faster, and maximize efficiency with a true Data Streaming Platform from the pioneers in data streaming. Learn more at confluent.io.

Episode Transcription

Joseph: [00:00:00] Welcome to Life is But a Stream, the podcast for tech leaders who need real time insights. I'm Joseph Morais, Technical Champion and Data Streaming Evangelist here at Confluent. My goal, helping leaders like you harness data streaming to drive instant analytics, enhance customer experiences, and lead innovation.

Joseph: Today, I'm talking to Tim Berglund, vice president of developer relations at Confluent. Tim works to make data streaming and its evolving tool set more accessible to developers everywhere. In fact, you might know Tim from… [Clips plays].

Joseph: Today, we're diving into the fundamentals of data streaming. What it is, how it works, and why it's becoming such a powerful tool for modern data systems. And just a reminder, this episode 1, episode 2, and episode 3 are part of our foundational episodes, where we analyze and discuss what data streaming, stream processing, data integration, And data governance are in future episodes, past one through three, we're going to be talking to our customers and all the great things they built with the data streaming platform.

Joseph: But in this episode, we'll explore key concepts like real time event processing, the difference between streaming and batch processing and take a closer look at some of the use cases that are shaping the way businesses handle data today. But first, a quick word from our sponsor. [00:02:00]

Ad: Your data shouldn't be a problem to manage. It should be your superpower. The Confluent Data Streaming Platform transforms organizations with trustworthy, real time data that seamlessly spans your entire environment and powers innovation across every use case. Create smarter, deploy faster, and maximize efficiency with the true data streaming platform from the pioneers in data streaming.

Joseph: Welcome back. Joining me now is Tim Berglund, Vice President of Developer Relations at Confluent. How are you today, Tim?

Tim: Hey, Joseph, doing well. Thanks for having me on the show.

Joseph: Awesome. So even though this is the first show, , I'm going to immediately go off of our, our topic questions, right? And I just want to tell you a little bit.

Tim: Nobody even knows what that flow is yet, but you're ready.

Joseph: You're right. I like to get in immediately and break things, right? That's just how I do. I'm an engineer at heart. So, I started here four and a half [00:03:00] years ago. And we had a training system. We've gone through so many, I don't even know what the name of it is anymore, but there was this intro to Apache Kafka and it was you.

Joseph: And I was like a brand new partner SE. And I saw you talk about data streaming in such a positive and interesting way. I didn't really know that technologists could talk about tech like that. So I was like super excited to work at a company that had this guy. And then sadly, Tim left us and no, I'm not talking to a poltergeist right now.

Joseph: Tim just went to another company and, I was a little, you know, I was a little disappointed, but of course, you know, where you went, they're a great partner in adjacent technology. So I turned that disappointment into the desire to kind of bring some of the content you brought to the company to Confluent and your stead.

Joseph: And that, that style has really inspired the way I've conducted myself as a PSE and now as technical champion getting to this role. And the [00:04:00] show, so I'm completely honored to go full circle and have you as my first guest. So I just wanted to do that and make you blush a little bit. So I've, I've succeeded in doing that. So yeah.

Tim: Yeah. Anybody watching the video version, maybe you saw me blush. Maybe you saw my Christmas sweater. I assume this is going to launch after the holidays. During as we are recording this, it is sort of mid December. My Star Wars ugly Christmas sweater on representing anyway,

Joseph: Thank you like immensely for being the first guest. But we have a more important thing to talk about other than like how much we like each other. We got to talk about data streaming, right? So let's jump right into it. What do you and your team do here a Confluent besides making mesmerizing content?

Tim: Well Um, it's our job to help developers and by that we mean just all technical people, right? We're not super picky about operators or architects or, you know, meaningful distinctions, but we just mean technical people.

Tim: We're, it's our mission to help developers, [00:05:00] adopt and succeed with the technologies of what we at Confluent call the data streaming platform. That includes things like Kafka, including Flink, to an increasing degree as we move forward. Apache Iceberg, technologies like this are things that only technical people use directly and it's our job to help the world of the developers know that these solutions exist and how to get started with them and kind of how to, you know, move through what I call the developer journey successfully with those things.

Joseph: Right? It makes a lot of sense. I mean, what we provide here at Confluent, it's not simple stuff, right? It might be a very simple use case, but once you start to kind of get into enterprise scale and things like that, it's very difficult to know how to tie it all together, especially if you're trying to do it open source. So I really appreciate everything you and your team does. Um, so people that are viewing us that don't know, [00:06:00] what does Confluent do?

Tim: Okay, Confluent makes, well, we'll start with the cloud, you know, a fully managed version of that platform. Now, that platform is also available in, in self managed form if, you know, you just really like tarballs or you can't run the cloud or like whatever.

Tim: There are a bunch of valid reasons, technical and business reasons for that thing to exist. So there's the platform and there's the cloud product that are that, you know, Kafka, Flink, soon to be Iceberg interface as we record this, all those things in a fully managed form with, I'll just say, you know, this purpose of our discussion today isn't to walk through the product, but a bunch of other goodies that you're going to need.

Tim: And either you're going to do maybe the wrong thing and build those things, or you're, you're going to get them from somewhere, and those all exist as an integrated part of, we'll just say Confluent.

Joseph: Yeah, we have all the data [00:07:00] streaming flavors that you could want. Um, so who are our customers and maybe more importantly, who aren't our customers?

Tim: Well, you know, our customers traditionally, I think, hail from all kinds of industries, right? Financial services always tend to be early adopters in data technologies. That was certainly the case here, but there's people in manufacturing, there's people in retail, there's people in sort of digital native, digital native technology space. Anywhere you go, if you have, it sounds like, you know, everybody's our customer. That's not literally true, but, if you have data and you're operating at this point above a mom and pop scale, right? Right? Every business has data, and if you are a, you know, sole proprietor, flower shop, corner store kind of thing, well, you know, you'll have Stripe and, and you'll maybe [00:08:00] use QuickBooks or something if you're, you know, you've made life choices that have led you there. You've got that kind of SaSS stuff. You're not going to use Confluent. You're not going to use Kafka. You don't build, you don't build software. But if you're a company that does, you're big enough to write your own software. Then there's a story there.

Tim: You need to understand what's going on in your world with the data that characterizes your business and that sort of analytical angle on data streaming is going to become important to you either is now or it will be. And you have applications, a set of operational concerns, right?

Tim: Programs that help people run your business or increasingly programs that sort of are your business in this, you know, emerging software defined business kind of space and increasingly the application architectures that emerge there to say microservices broadly, you know, that's the thing. [00:09:00] A data stream platform has emerged as really the successful substrate on which to build that kind of operational estate.

Joseph: That's, that's a great way of putting it. You know, I would say this pretty often, especially in my PSE role, that not all use cases in tech are data streaming use cases. But all companies have data streaming use cases and even to go to your example of like the florist, right, that individual has a data streaming use case, right?

Joseph: They probably want to know, like, I have a new order. I want that instantaneously. They're not probably going to be the ones like you said, working with confluent directly, but they're going to be some consumer of some other technology that is likely built there.

Tim: For example, just to use one that that is quite public about their use of data streaming, suppose their websites on Wix and people can place orders and they want like up to date analytics of what their customers are doing on their, their little mom and pop Wix website, you know, corner florist kind of thing with an online operation, not a giant business. Well, guess what folks? [00:10:00] There's Confluent in there. There's all kinds of data streaming going on in there.

Joseph: Yeah, that's a good one. We're going to talk about data streaming engineers later, but does that mean all Wix users are data streaming engineers?

Tim: No, no, it doesn't. Most Wix users are people making a website and putting plugins in it.

Joseph: But you have the power of data streaming.

Tim: Yeah, and that's the thing, you know, what we do here is infrastructure fundamentally. Yeah. And, you know, infrastructure, think of plumbing in your home. And what we do is a lot like very smart plumbing. Most days, if you're not a plumber, you're not touching the plumbing, right?

Tim: Like if you're touching the plumbing, that's a bad day.

Joseph: That's true.

Tim: And so if you are that sort of mom and pop small scale business where you don't write software, you don't want to think about it, you just want to get things done. You want to take care of your customers. There's plumbing back there and there's plumbers who care.

Tim: And that's to a large extent, it's my job to keep the plumbers up to date on how to build things and what the right [00:11:00] fittings are, and you know, there's this new press fit technology coming out, you should use this, all that kind of stuff.

Joseph: That's a really great analogy, Tim. I appreciate that.

Tim: It's funny when, you know, what I do in developer relations always requires some explanation. I can't just say sales or engineering or something, you know. It always requires a little bit of a story. And when I talk to people who work in the trades, it's much easier that, like everybody can understand the plumbing analogy, but it's very visceral.

Tim: Even if you're not a plumber, you're some other kind of trades person that just clicks right away. Like, oh yeah, there's, there's a, you know, these influencers that tell us about plumbing and all that kind of stuff. Right? It makes sense.

Joseph: The data streaming, plumbing influencer. I like that. That's a direction I need to go now. I follow some plumbing influencers on Instagram. You know, they're pretty great. So Tim, when were you first introduced to data streaming in a venture of an architecture? And as a bonus, have you or any of your past teams ever had the pleasure of running open source Kafka yourself?

Tim: Take those two questions separately. I came to Confluent the first time in 2017. And I [00:12:00] would like to say that I did that because I'm so good at picking horses and I'm such an astute technology analyst that I said, “Oh, this is going to be huge. This is heralding a paradigm shift in the way software and data architectures are built. And these are trends that will last for generations.

Tim: I need to get some.” Nope, wasn't that? It was still a good reason. You know, I saw alignment with the executive team at this fairly young startup, you know, between what they wanted out of a developer relations effort and what I wanted to do. And I'm like, okay, we could do things and it's about six months later, I started to realize, “Oh, wait, no, this is huge.

Tim: This is really huge.” So it was then that, you know, the impact of, or the reality of the paradigm shift started to dawn on me. And I still, you know, I'm getting my mind wrapped around what it means. I think I have a pretty good understanding now. But, you know, this is a generational shift in the way data is dealt with the way software is built.[00:13:00]

Tim: Kafka is, I called a substrate or you could say it's at the center of that story and the pieces that are emerging on top of that flank Iceberg schema registry. These things, these are other components of that platform that, you know, that are part of that story. And so, yeah. I didn't see it right away. It kind of dawned on me over a little bit.

Joseph: That's cool. Cause I would've thought for sure, Tim, that you came in the door, having run data streams for years. Like I really did, but I guess based on the timing of the size of the company, Kafka was pretty naysaying at the time. So that makes sense. Yeah. Well, you made a good bet, so good job.

Tim: And do we use open source Kafka? Yes, but it's not like we operate applications or production data pipelines. I mean, we use open source Kafka to build demos for things to show people how they can build applications and data pipelines and so forth.

Tim: So yeah, we have our hands on. That's one of the [00:14:00] noteworthy things about DevRel at Confluent and, you know, in companies like us strategically where DevRel fits into the picture is, you know, we are people who actually spend our time and advocate for the open source foundations. Like that's an interesting thing to the business because there's this community of developers globally who aren't yet asking who's Confluent and how do I succeed with Confluent?

Tim: They're saying, “Oh, I heard about like, okay, I know Kafka. I heard about Flink. It looks like Flink is becoming a big deal. How do I know more about Flink? Another thing to learn. I need to keep on top of this.” That's the question folks are asking. So we're there answering that question.

Joseph: Yeah, it was kind of a leading question. So I had a startup before I came to work for even Amazon and before Confluent. Our stack was Kafka Samza is actually another project that Jay was a contributor to. I was like, I used to work and like inside of Jay Krep's mind, it was kind of weird. But I used to deploy Zookeeper and Kafka using SaltStack, which I don't even think anyone even knows anymore. It's certainly, [00:15:00] it was like a puppet/chef competitor. I don't know if anyone uses it, but—

Tim: Maybe, maybe an also ran. I mean, I haven't heard of it,

Joseph: …but I did it, and it worked, but it was pain. So I'm, I'm glad neither of us has to do that anymore because frankly, there's, there's better ways to run data streaming now, right? So now that we know you a bit better, Tim, and what you do, for our next segment, let's unravel the mysteries of how data streaming forms the backbone of modern event driven architectures. So, what exactly is data streaming? And how does data streaming differ, or streaming data differ from different types of data?

Tim: Yeah, yeah. There's a lot of ways to come at this. If you think of, you know, what, what you know, , as a traditional database. Um, don't, we don't need to get fancy there or, or strip it down to first principles or anything. Just database. It's a place [00:16:00] where you put data and the data is going to sit there until you need it.

Tim: And this would also apply to a data lake, which I think is really not really a database, but it's a place where you put stuff and it sits there and you might come back to it. You might not. We'll see that tends to create kind of an intellectual pattern or a way of looking at the world that you think about the world as things, what are the things that are there?

Tim: And if you're a technical person who has done this, if you've built an application in kind of the old, client server, a monolith style, you know, you typically start with a database schema. That's like saying, what are the things and how are they related? Now that application, you Processes events things happen and that application deals with them, but those are secondary and the data infrastructure underlying all that is a place where you put things [00:17:00] and they stay there and they're they're things. So, the streaming approach, the data streaming approach is, you know, also data is stored.

Tim: Okay? Data gets put somewhere and remembered, but kind of the shape of the thing, the basic semantics of how the infrastructure works and what things are like there's things are happening, and so you don't put data there to go back to it sometime later, but data flows through the thing. And when a new event or piece of data or message or thing happens, the assumption is, well, you deal with that right away.

Tim: And you know, there's elements of both in both. Okay, you can do these things at small scale with conventional databases. You can do event driven things, and you can use data streaming platforms to store data. It's even a good idea. But again, the shape of the thing is events are happening and we will think about them, process them, make sense of them, act on them right away, not put them to come back to [00:18:00] what that means is the basic intellectual habit that you have in building a data streaming system isn't first, “What are the things?”

Tim: But it's, “What is happening?” So it's the event that comes to the forward from the thing. Now again, in the old style data at rest sorts of systems, things still happen and you have to deal with that. And I assure you in a data streaming system, the thingness of those events that is their schema, you know, what is it that they represent in the world incredibly important and for success at scale tools to manage schema, validate schema, manage changes, understand lineage, publish catalogs of descriptions of things that you can wrap your own automation around, like, you will not succeed if you don't either buy or build something like that to build a system like this at scale. So the thing matters, but you start with the happening and then you get to the thing.

Joseph: Love [00:19:00] that answer. So you mentioned databases and you mentioned a venture of an architecture. There's also these things like message queues that I'm sure our audience is very familiar with. I think, especially in the industry long enough, how does what confluent does differ from, say, a message queue?

Tim: Yeah, from the old style. So there's the semantics of the, you know, like the underlying data structure itself. Let's start there. So, Q is this thing that you put things in, they come out in the order that you put them in and when you take a thing out of it, it's gone, right? You've consumed it.

Tim: So you, you act on it, you go and maybe there's transactions and a bunch of fancy stuff. And so if you take something out, it doesn't work out, you fail, it'll stay whatever, you know, but when you successfully take something out of a queue that is poof, it's gone. It's, “No, you can't”. So, the underlying data structure to a data streaming platform is a log, which [00:20:00] is a thing that kind of looks like a queue and that things, you know, basically come out in the order you put them in.

Tim: When you read them, they're still there. So this is this persistent substrate of data that, you know, say there's some stream of events like orders coming in. I might be interested in writing a service that helps process payments or something on that. Somebody else might want to detect fraud.

Tim: Somebody else might want to look at customer loyalty programs or something, and these are all different services, different programs that different people can write that are all consuming from that same stream of events in a queue. You can't do that, because once you read it, it's gone, and, you know, sometimes a queue is what you want.

Tim: There's actually work going on in open source Kafka right now to give it sort of a queue mode. You can make it act like a queue because, you know, sometimes that's a good thing, but it's... nobody ever built a company on that foundation and in fact, the early 2000s, in mid -[00:21:00] 2000s, stuff that was going on with what we then called service oriented architecture used queues like that and kind of centralized smart routing and processing of the data and you know, there were like everything there are people who had success with it, but it didn't stick.

Tim: It didn't leave a good feeling in everybody's mouth for reasons we don't have to get into, but that isn't it. You know, in another 30 years, people will still be talking about event driven architecture. Maybe it'll be an old and busted idea by then. And, you know, there'll be younger people telling grandpa what it's like—

Joseph: How to migrate off of it. Right.

Tim: Right, right, right. That could be in another generation or so, but they won't be building systems on. So, you know, that's just not right. They aren't now. No, they're not.

Joseph: Exactly. It's like, how do we get off of that? If we have it as quickly as possible, kind of summarize that you have, you know, the ordering. You have the high throughput, but then you [00:22:00] also have persistence. Like a database.

Tim: Yeah, yeah. As being a log that you write things to and you remember them for as long as you want.

Joseph: Yeah. And that's why I think, you know, I might be wrong about this, correct me if I'm wrong, but that's why they called it Kafka, right? Because he liked the right—

Tim: I don't know. I know that Jay Creps, the guy who wrote it, is a fan of the author. I don't know if that was the reason, but I mean, Samza is the last name of the guy in metamorphosis.

Joseph: See, it all comes back to friends.

Tim: So it's just, you know, certain people have affection for certain modernist authors. And you know, the world has changed.

Joseph: I’m going to have to see what I get. If I can ultimately get Jay and Michelle, I want to ask them that question. So can you provide some practical examples of data streaming in action? Maybe a use case or two that most of the audience can relate to or even that affects them on a regular basis and bonus should that have been effect or affect?[00:23:00]

Tim: Oh, that's with an A.

Joseph: Affect. Thank you. I'll never get that right.

Tim: So if it's the word with an ‘E’ and it has a direct object then it's like causing it to be or something like that. If it has a direct object, that's probably just acting on me. Yeah, so let's go credit card fraud detection. That's probably the easiest. It's real.

Tim: There are real, you know, fintech companies that actually do this with actual Kafka. And I imagine a number of them are Confluent customers. But yeah, so you've got this stream of purchases coming in this time of year, you know, again, Christmas sweater, certainly lots of purchases happening in my household.

Tim: That's great. Love, love me some Christmas. But, um, you know, these purchases are coming through and you need to know for each one. Is it fraudulent? And you'd like to know right now. You'd like to be able to stop the transaction. Um, [00:24:00] because the, you know, the credit card company is on the hook for that.

Tim: The consumer is not. And so, yeah, that event happens. And now there's a bunch of computation I have to do. There's things I need to know about a card number, the purchase history, there's all this kind of enrichment that has to happen. It's a fairly sophisticated thing that needs access to a lot of context and the real time event.

Tim: So there is this kind of whole platform. You don't just have a pipe of data. There's a lot of pieces of infrastructure that need to exist to make that happen. And now, you know, your phone wiggles five seconds later and says, “Is this you? You know, you just tried to buy an SD card in Paris. Does that sound right?”

Tim: True story. I did that once and it was really me and my credit card company was like, no, no, I actually left all my SD cards at home. It's terrible. Here we are.

Joseph: I really had to do that. Yeah. And that's a dimension that you didn't initially mention, but location, right? Cause if you had a successful card swipe on the East coast of the United [00:25:00] States, and then a card swipe five minutes later in the middle of Europe, unless you are an X Man, you can't physically do that. So something's up and let's at least flag it to make sure the customer knows that's a really good use case.

Joseph: This is a good one. I'm sure you talk about this all the time. What is the difference between batch processing and real time streaming and bonus, why do you think so many organizations struggle to migrate from batch to real time processing?

Tim: So it's like the difference between a database and a data streaming platform. It'll sound kind of the same, but with batch processing, I kind of have my data in a pile. It's all there. I know where it starts. I know where it ends. Like I might know how many things are in it, how many records, or if it's a thing that breaks down into records, I could know how many there are because there's a first one and there's a last one.

Tim: And I'm going to go punch through them and do some kind of analysis. Maybe I'll just be [00:26:00] computing an average or something. Maybe it'll be some big machine learning model or I'm dumping him into an AI language model, whatever, but I've got the stuff there and I can start working on it and go through each unit of work and then I'll finish and there won't be any more data.

Tim: So it ends with streaming. That means I've got access, not to a pile of data, but basically a pipe where events are impinging on me. And so that that work that I'm doing might be the same work. The computation over these records, wherever these events are, could be the very same thing, but I don't know when the last one is.

Tim: There isn't really a notion of there being a last one until, you know, the heat death of the universe, or I suppose the sun. Or the business shuts down or, you know, something like that. It's something kind of sad for there to be a last event in an event driven pipeline or a data [00:27:00] streaming pipeline.

Tim: And why is it hard? Because it's new. So we've got literally the entire history of commercial computing from the 1950s until just recently has been centered around batch processing. Now, somebody's going to hear me say that. And they're going to say, “Well, actually, Tim, I built an event”. Yes, yes, of course.

Tim: There have always been systems that respond to events. Some of them have specialized in that, but it's always been bespoke and unusual. It hasn't been the way everybody does it. And if you were a person who built that 20 years ago, well, you were a part of an unusual thing and you know it was hard because you had to do it all yourself and now it's a thing that the ordinary enterprise developer and architect is reckoning with because it's the new paradigm and that ordinary enterprise developer and architect probably didn't start working a year ago.

Tim: They probably got a few years on the clock and those years are spent doing it the old batch way, or in terms of [00:28:00] application architecture, if not batch the what I call the client server or monolith way. Those, we'll call them two sides of the same coin, you know, data architecture application architecture and that's all everybody has spent their career doing.

Tim: And it was hard at first when you were a new developer, and you had to get good at that. There were other, more experienced people helping you get good at it. Now everybody's saying, "Wait, no, it should be streaming and event-driven—a completely different way of getting things done."

Tim: And then we don't have that shared body of experience and comfort with the model because it's new. I wasn't a software developer writing code for the enterprise on PCs in the '80s, but I think that's the nearest historical equivalent. You know, I just spent 10 or 15 years of my career being a mainframe developer.

Tim: Now I have to learn how PCs work, LANs, and this junky database server thing. Why is my data over there? UIs are all different. That would be a very different experience for that person with a mature skill set as a mainframe developer in 1986, pivoting to client-server. Well, we're all doing that now with so much more software and a much bigger community of developers.

Joseph: The body of work, that collective body of work, is really an interesting thing to dive into because you're right. When Kafka first hit GitHub, if you were to look at it, there were no practical examples. If you said, "I want to build around this," there were just a handful. Does it scale? Don't know—probably. A million improvements, which we call KIPs, needed to eventually be built. It didn’t do replication then, you know—

Tim: Until like three years later, right?

Joseph: Yeah, you reminded me—when you were talking about someone doing a venture in architecture like 20 years ago, it reminded me of when I was a kid. I really liked comic books, but it wasn’t popular. I was like a nerd because of that.

Tim: Me too. Yeah. But now like all the cool kids read comic books. Yeah, you were just ahead of our times. Weaponized. Don't let anybody know. Right. Right now, all of a sudden. Oh, okay.

Joseph: Yeah, we want to make data streaming engineers the cool kids. That’s what we have to do. Exactly. They already are. That’s kind of your job, but I’m sure you’ll do it. I’m sure it’s a great job. Yeah, they already are. So, we’ve talked around it, and I think we’ve hinted at it, but I’d like to ask you a direct question about it. Can you explain the concept of event-driven architecture and how it ties into streaming data?

Tim: Sure. This, again, is a hot potato because different people are going to describe it differently. This is not the only possible account of it. But what I think of is building an application as a number of small programs that interact. Microservices is another word for this.

Tim: It’s a fine word. [00:31:00] If you don’t like it, don’t use it. But instead of one big, giant program where all the code is in one codebase, with a database at the center of the world, a user interface on the outside of it, maybe some APIs and message queues—whatever—it’s this one big thing. You break that up into pieces that are small enough for each one to fit into your head.

Tim: Because that whole application, after a while, one person really can’t comprehend it all. So you break it into pieces that are small enough for one developer to internalize. [00:31:00] Those pieces then, of course, need to talk to each other because it’s an integrated application, and what they exchange is events.

Tim: So, there is some sort of messaging substrate. We’ll just call it topics in Confluent Cloud—that would be fine. Kafka topics or whatever you want to call them. The application, you know, the services that you’re deploying to people throughout your company or exposing to consumers, or whatever it is, are a bunch of little pieces, each one with a small unit of work to do. [00:32:00]

Tim: It consumes from a topic, does the work, and produces a result—maybe to a new topic, maybe puts it into some kind of data store for reading, maybe keeps some keys in memory for exposure to an API that feeds a user interface, something like that. You know, there are all kinds of things that service might do, but the application is now composed of a bunch of those little programs, exchanging events with one another.

Joseph: You see microservices and Docker containers, which is another example of tech that came along with the rise of event-driven architecture. If you were trying to couple data streaming with a monolith, it doesn’t really make a lot of sense because the model is kind of talking inside of itself.

Joseph: So that’s another one of those things that had to come along to make it tenable for most organizations to build EDA. It wasn’t just Kafka. It wasn’t just stream processing. It was things like containerization that made all of this—building these microservices—a reality and not...

Tim: I think the coincident rise of those two technologies reinforced one another and helped. But if you remember, you know, say 2007, I think it was when EC2 went live... Yes. And, you know, we tell stories like, "Oh, if a company does lots of business around the holidays, they could just scale up a bunch of servers and then spin them down with an API at the end of the holiday season." Like, "Ah, that’s so cool." You know, Joseph, I wasn’t building applications in 2007 that you could do that with. Okay, that’s nonsense. Nobody was.

Joseph: Nothing scaled like that.

Tim: There was no horizontal scaling.

Joseph: Are you crazy? What?

Tim: You know, that's just not how monoliths scale. And now, that sort of promissory note that we were living with for a long time—easy-to-use EC2—was tremendously valuable and useful. It wasn’t like it was a sham or anything. It’s just that that story was really [00:34:00] promissory in nature. Now that you're starting to have success with services-based architectures, microservices-based architectures, and I think the event-driven component, frankly, the rise of Kafka as a component of that, has made that possible. You’d have needed containerization to come in there to really make that “turn a dial and your application scales” story be true.

Joseph: It’s kind of wild. Like, even if you had an app that could horizontally scale in 2006, EC2 only had public networking. There was no VPC—every single instance had a public IP and a key that you could log into. It’s just crazy to think they were ever able to sell that, let alone turn it into the monster service that it is.

Tim: Over and over again, you know, there’s an idea, and we work on building that idea. It’s very hard to build things. Yeah. So we build a crappy version of it, and maybe it enjoys some success, maybe it doesn’t. I mean, if you look at the development of the telegraph, there were, in the 30 or 40 years prior to Samuel Morse, a bunch of, in retrospect, kind of stupid ideas. You look at them and you’re like, "Why are we trying to do this?" Because it’s really hard to come up with new ideas.

Joseph: Push the bubble of what is known a little bit, right? You can't push it too hard.

Tim: There’s a great story—I need to remember the name—but the development of a mechanical clock that could keep time at sea for tracking longitude. Basically, this was before there was GPS, and you needed clocks for that. There was one guy who, there was this prize in Britain, the Royal Society, a £5,000 prize or something, a lot of money. His first clock was this two-foot by two-foot thing, and it worked—except for this one issue. The next one was kind of the same size, and it worked, but you couldn’t rock it back and forth, and that didn’t really work. The next one was a little smaller.

Tim: And then the one that won was the size of a baseball. [00:36:00] So you build a few kind of sucky versions of a thing before you really know how to solve the problem. We see that over and over again with any kind of significant innovation. It’s just that hard. And so, yeah, that was easy too, because it’s hard to make things.

Joseph: Yeah, I mean, absolutely. Once innovation reaches that level of maturity, that’s when things start to get really exciting. And that’s kind of where I feel we are in the data streaming world. I think this leads into our next question really nicely. So, Tim, how do you think organizations will use data streaming in the future?

Tim: I think the end state and maturity there looks like full adoption throughout the stack, or throughout the growing parts of the tech stack, and a business whose operational DNA is also oriented around real-time results, real-time insights, real-time actions. [00:37:00] And so now, what we see is people are building their first or second data streaming solution right now.

Tim: And it’s hard, and it’s new, and folks are powering through it. Confluent—I sound like a shill when I say this—but Confluent has, for 10 years, been at the technological and intellectual forefront of that problem. What we were building 10 years ago didn’t look like what we’re building now.

Tim: Okay, very, very much simpler. You know, back then, we were talking about, “Hey, it’s faster than Hadoop,” and now you’re faster than... what’d you say? You know, it’s... it’s, um, we’ve been providing that intellectual and technological leadership. Most people are taking advantage of that by building their first system, and you don’t rip out all your old stuff and replace it with event-driven stuff unless you’d like a new job because you got fired. Right? That’s just... [36:00:00] that didn’t happen when relational databases were new, when client-server things were new. It was a new, net system being built alongside the old one.

Tim: Okay, and the old ones remain, but I think they eventually get chipped away and replaced. Modernizing mainframe applications—there are still some mainframes that run and are very important—but, you know, you modernize those and you want to move off them. So, I think the future for data streaming looks like a comprehensive adoption of that architectural style for new applications, for most of the applications that matter.

Tim: Real-time data flows, real-time insights. There’s no notion of ETL happening, where you take the data from here, stick it over there, and go think about it later. No, no, no. The insight is now, and it’s available to be consumed by anybody who wants to know what’s going on. So, a business that’s built like that—and if your systems look like that—then operationally, you know, you can debate cart and horse, but at least you have the opportunity operationally to create a culture where you can respond to the changing world right away, and you know what’s going on. So, the future is right away. [00:39:00]

Joseph: So the future is data streaming ubiquity.

Tim: Yeah.

Joseph: Yeah.

Tim: I think so. And again, in 20, 30 years, we’ll be there, probably. And then there might be some new paradigm that you and I can’t envision, that nobody can envision yet, that’ll push that out. Maybe it’ll take longer. I don’t know. I mean, these things don’t last forever, but this is, in my opinion, in my view, a generation-long kind of work.

Joseph: So, our next segment, Tim, is the playbook, where you, yes you, dish out your winning strategies for getting old, tired, unmoving [00:40:00] data in motion. So, I’ve heard about data streaming engineers. Surely they must be in the playbook. Can you help us better understand who data streaming engineers are?

Tim: Yeah, data streaming engineers are these technical people—I’ll call them developers. Now, your title might be architect if you're a very hands-on coding architect. That’s better, right? That’s a good thing. But, you know, I’ll say developers who are working with data streaming technologies.

Tim: And this is a big umbrella that now includes application developers and, you know, what we’ll call data pipeline people—data engineers. Those, yeah, start to look... they start to have more in common because there’s common infrastructure that both are using, and that common infrastructure is kind of starting to dominate the work that gets done. The line [00:41:00] is blurring between the operational and analytical state.

Tim: I think that’s a consequence of what events and data streaming do. They start to bring those two things together. You know, there are still distinct areas of specialization within data streaming engineering. It’s like saying software developer. I mean, come on, there are all kinds of software developers.

Tim: Sure. Even data engineer—there are all kinds of different specialties within data engineering. But a data streaming engineer is this developer who works with data streaming technologies, like Kafka, Link, you know, potentially Iceberg as this unifying thing between the operational and analytical state.

Tim: Whatever other pieces emerge in the future, you know, you’d mentioned indirectly real-time analytics that I’ve been working in. Some part of your stack is going to provide that sort of insight. So, yeah, a data streaming engineer is a person who works with these technologies, whether you’re on the application or pipeline side. It’s you.

Joseph: I like that you left room for future technology because I think, you know, as we get [00:42:00] closer and closer to that data streaming ubiquity, there are going to be new tools or technologies that emerge that will be part of that toolkit that we can’t predict today.

Tim: Landing on a habitable planet, and you don’t have satellites to take pictures of it for some reason—this is my story, work with it—and you’re just exploring the landscape, not knowing what you’re going to find. That’s where we are. Now, we know some things. We know some things very well about that landscape, but there is new territory and new terrain to uncover. There are new infrastructure components in this new stack that I can’t tell you what they are yet, but I bet if you ask me in two years, I’ll be able to tell you.

Joseph: Yeah, absolutely. Yes. Somebody's committing something to GitHub right now. It's going to change all of our lives. Yeah.

Tim: Or, it’s an existing, you know, thing with a user base and a business, and we just don’t know that catch is really going to emerge. You know, we're figuring those things out.

Joseph: Like we did with Flink, so what are the top three tools a data streaming engineer should have in their toolbox? Or conversely, tools that they should avoid, like, you know, batch processing. [00:43:00]

Tim: Ah, okay. I’m thinking actual infrastructure tools. So, we’re gonna go with Kafka. You’ve got to know the basics. We’ll go with Flink. It’s definitely time to know the basics. That’s not exclusive; there are plenty of folks having a lot of success with Kafka Streams. That one might well be in your toolbox, and you might be very excited about that. Also, as I look to the next year, I’ve mentioned Iceberg a few times.

Tim: I think it’s going to play a more important role here as we deal with this bifurcation between operational and analytical data. It’s a 50-year-old problem. We’ve come up with a number of ways to solve it, but none of them has really quite seemed right. They’ve worked, but they’ve had a lot of tradeoffs, friction, and so forth.

Tim: And I think we’re in a place where Iceberg could be a part of unifying that divide in a potentially new way. So, [00:44:00] definitely Kafka, definitely Flink. If you’re looking for new things to learn, I think it should be Iceberg. And if you’re a Java developer, knowing Kafka Streams isn’t going to hurt you.

Joseph: Excellent. Can you share a specific tactic that has specifically improved a customer's data streaming journey?

Tim: This would be kind of more architectural and interfacing with business stakeholders.

Joseph: Yeah, I just wasn't sure if anything ever bubbled up to you that technical.

Tim: Yeah, and this is almost a cliche, but we always need to remind ourselves: you need to find the pain. You need to find something that hurts, something that visibly leads to bad results for the business. So, if you're looking to become a data streaming engineer, maybe you're past the stage where you're working directly on engineering tasks, but you still have influence over parts of the data architecture and you're aware of what's happening.

Tim: Okay, this is new. This is a path to major results and success — deploying a system like this. Well, hopefully, you already know, but if not, you’ve got to look for what hurts. There might be systems still running mainframe code from 1982. [00:45:00]

Tim: Two that never got migrated onto a PC and then onto a web server and then onto a Kafka consumer or whatever, you know, because it's fine. It doesn't hurt. It's working great. You need to find the things that hurt. [00:45:30] And those things probably hurt because they don't change fast enough and they don't tell you about the changing world fast enough. And those make them good candidates for broadly the set of technologies that we call data streaming.

Joseph: Yeah, that's a great tactic. I mean, especially since I've worked for many enterprises of all sizes, and getting anything to change is not easy, right? So if you're an aspiring engineer, you're like, 'Hey, I found this. How can I make the business actually steer the giant ship a few degrees to the right?' Pain is a good starting point in asserting that you can make it go away.

Tim: And in defense of that corporate conservatism It should be hard to change things. [00:46:00] You shouldn't just that's—

Joseph: True.

Tim: People like you and me, you know, we'll get excited about the new thing because it's new, and we're like, 'Let's do this!' and that's great. That's funny, I have hobbies for that. The company has customers to take care of and shareholders to provide value for. So the default is, 'No, don't change it unless it hurts.

Joseph: Yeah, absolutely. I mean, like you said, that’s how you get fired—changing things without actually showing value to the business or blowing things up. So, here's another one, a bit architectural, but I'd still like you to take a crack at it: In your experience, how do customers evaluate and select new tools for their data stack, specifically when moving toward an event-driven architecture and data streaming? How do they balance factors like scalability, cost, and integration with existing systems?

Tim: Yeah, um, everybody's got kind of a slightly different bake-off spreadsheet of what those things are, and [00:47:00] that's contextual enough that you sort of have to do that. Like, scalability is good. Everyone talks about scalability, but it means different things for different people. That upper bound—you'll know what that upper bound is, and you'll know.

Tim: You know, is this going to be a real challenge for a vendor to scale to your level because you're just so big, or what? Right? So scalability is going to matter. Integration with other systems—you’re going to know what those systems are for now, at least for the system that you're building. You know what you need to connect to, and you can go look and say, 'Oh, okay, these guys say they've got hundreds of connectors.

Tim: They've actually got like five. Um, or, oh, wait, no, these guys have like 80 connectors that I could run in the cloud here. I haven't even heard of some of these things, and all my stuff is there, you know? So that's, I think, the easy stuff. Okay, you know what those things are. You're going to look for them, do the work, and qualify the vendor.

Tim: You know, I like confidence when it comes to that. Um, the advice I give broadly when it comes to technology selection is, you know, one of the first things—maybe this makes sense, given that it's me—but you’ve got to look at the community around these things. Um, you know, at a certain point, a vendor gets to a size where you can have a reasonable certainty that the vendor will be around, you know, in some form, as long as you are.

Tim: You also want to know, um, is there a collection of people globally using similar technology or the same technology? Does that thing have a true community around it? Um, and with Confluent, you know, yeah, Confluent was started by the people who originally created Kafka. That’s true, but there's this global community of people developing Kafka and creating different compatible versions of Kafka.

Tim: You know, it's about doing stream processing on Kafka with things like Flink and Kafka Streams. There are all these people in this global community. So, I tend to try to point people to where the action is, and that action is always going to involve more than one vendor. Now, that vendor has to give you what you want, and they have to be solid and big enough that you feel confident betting, if not the business, at least this application on them.

Tim: And so that vendor really matters. But there are questions beyond just that vendor. When it comes to something foundational like data infrastructure, you really need to make sure there’s a global community around it. And again, I've not seen many more thriving, vibrant communities than the one within data streaming.

Joseph: Couldn't agree more. So, for our next segment, we've heard Tim’s winning strategies and general musings on how great data streaming is. But as nice as Tim appears to be, don’t just take his or even my word for it. We’re going to watch a quick clip, and Tim, I’d love to get your reaction. Just a reminder for our audience: this is episode one. Episodes two and three are part of our foundational series for Life Is But a Stream, where we talk about and establish what data streaming, stream processing, data integration, and governance are.

Joseph: So, in future episodes, we’re going to have our customers who have actually built great things with it. But I want to make sure you understand all the pieces of what we call the data streaming platform before we get into that. So, for today and in the next two episodes, we’re going to feature at least one video from one of our customers. For our audience that’s just listening to us, I’ll set the clip up. In this clip, you’ll be hearing from Virtua, a European company that provides smart EV charging services for over a thousand companies across many industries. And here’s how data streaming helps. [00:51:00]

Confluent Customer: We are a global company. We work on the electric vehicle charging. Virata's goal is to create a more sustainable future by making electric vehicle charging profitable for companies and also accessible for people.

Confluent Customer 2: We grew quickly and we were handling a lot of data from thousands of electric chargers. The amount of data was unmanageable, with millions of messages per hour. We required a core that can handle this amount of data. That's what led us to data streaming using Confluent.

Confluent Customer: Each service in our platform provides different types of data. Data streaming helps different services consume the data and bring it all together, so it's not siloed.

Confluent Customer: We are now thinking more in terms of data products, which means creating streams of governed real time data. That's then in turn [00:52:00] usable for different purposes.

Confluent Customer 2: Working with Confluent has made our life a lot easier. We can reliably process about 45 million messages per hour. And billing is based on our usage and throughput, which is more cost effective than self hosting Kafka. Anyone working with Kafka knows that managing and maintaining a Kafka cluster is a very difficult task. Confluent managed Kafka grows with your requirements. So you don't need to, , think about like, if my cluster now can handle this amount of data, it automatically expands with your usage.

Tim: The key thing I see from Virta there is that managing data infrastructure is probably not your job. Their job is to serve their actual customers, right? To help people charge EVs, and, you know, all the problems that I'm sure [00:53:00] are manifold around all of that. If you’re providing a feature in your service or application to help someone charge their car better or cheaper, you're providing differentiated value, making your business more valuable to that customer.

Tim: And that's a good thing. If you're making the plumbing work better, you could have a competitor who's also in the business of making EV charging better. However, you don't differentiate yourself from that competitor just by making your data infrastructure better or simply operating it or keeping it up and running.

Tim: Um, that's not the business you're in. It's very tempting, right? It might be interesting, might be fun. Often, you’re not actually managing data infrastructure, but it’s only your job if you are a data infrastructure provider, like we are. For everyone else, caring for your customers is the priority. Don’t build infrastructure components and don’t operate infrastructure unless, [00:54:00] you know, there's someone like a government telling you that you have to, maybe for reasons like data sovereignty or something related to your business. You know, it’s just, it's not what your job is.

Joseph: Right. If somebody else can run some other critical piece of infrastructure—and going back to earlier in your conversation—and you trust that vendor, let them do that. So you can do the best at EV charging or whatever it is, the thing that you do. I like that. Now, onto maybe our most important segment. So we’ve discussed what data streaming is, talked about strategies, who a data streaming engineer is, and we've heard from one of our customers on how data streaming is transforming their organization. Now it's time for the real hard-hitting stuff: our data streaming meme of the week. Yeah, I thought this was perfect considering you represent Dev Row, so– [00:55:00]

Tim: There you go. Exactly. Yeah. And we love memes. I mean, people love memes. If you don't love memes, I... I don’t wanna say you're not a person, but you're in the wrong era. I don’t understand. Again, from an operational perspective, okay. Because your developers are gonna have to understand how to work with Kafka, even if you use Confluent, because those are the APIs that are exposed for that layer. You'll have to know how Flink SQL works, you know, because that's the API for stream processing.

Tim: These are, these are fine, um, but do you want to know how to turn all the dials to make the thing work in production again? Um, you know, the folks who started early, before there were no options. That's where you get these really seasoned operational teams who are like so good at it and they don't have a reason to pivot because they're just the best in the world and everything. And that's cool. I like to hang out with those people.

Joseph: They're rare.

Tim: That's not most companies. And so, you don't want to have to learn all that stuff to be able to operate in production. You'd have to, that's just a misallocation of resources. [00:56:00]

Joseph: I like that. Like, Confluent helps you just focus on the pieces you need to build that differentiated business logic. Like, whatever you need to do. To be, you know, to charge EVs, but faster and better. Right? Before we let you go, Tim, we're going to do a lightning round. Bite sized questions — that's B Y T E — with bite sized answers. Like cocktails, but schema backed and serialized. Are you ready? Okay. What's something you hate about information technology?

Tim: That's a loaded question. There's a lot.

Joseph: I'll just say computers. Just computers? Just computers. I love it. I love it. Okay. We're going to move to tablets only in 2025. What's the last piece of media you streamed?

Tim: Um, it was a YouTube tutorial on FreeCAD 1. 0. I do some 3D printing and so—

Joseph: Very appropriate.

Tim: Pivoting from Fusion 360. And now, everybody knows.

Joseph: [00:57:00] Perfect. What's a hobby that you enjoy that helps you think differently about working with data across a large enterprise?

Tim: It's an easy one: hardware hacking, firmware development, and—it's, it's all, um, not all about moving data; sometimes about computation on data, decision making on data—but there's always data moving around. And, you know, whether that's Kafka topics, data in a topic moving over a VPC (the consumer on the other end), or a couple of wires with the serial peripheral interface operating at five megahertz—and how long can the wires get before you have to put trans, you know, you're just thinking about data moving. Um, all the time, and I love it.

Joseph: When data moves around you, all the data is moving. I like that. Can you name a book or resource that has influenced your approach to building event driven architecture or implementing data streaming?

Tim: Not so much event-driven architecture, but in general, um, I want to say the book that is emerging as the most influential technical book I've read is Implementation Patterns by Kent Beck. [00:58:00] Not even his best-known book—I think it's highly underrated or under-read, as far as I know. But it's really a book where the purpose—the underlying message—is that you have to create systems that are understandable for other people.

Tim: Um, he's talking about code—you need to write code so that it can be read by people. I mean, making code that the compiler can read and execute correctly, you know, that's comparatively easy. Making it so that other people can understand it—that's hard. So, building systems in a way that, as it were, loves your neighbor—that's had a huge impact on me.

Joseph: I love it. What's your advice for a first time chief data officer or someone with the equivalent impressive title?

Tim: Yeah, same thing as before—find the pain. What is your boss mad about, scared about, or anxious about that falls within your domain? Find some way that you can, within three months, make that not hurt as much.

Joseph: I love that. Okay. So any final [00:59:00] thoughts or anything to plug Tim?

Tim: Yeah, the technical folks on your team—or you, if you're a technical person—should go to developer.confluent.io. That's Confluent Developer, the home of Confluent Developer Relations on the web. Lots of great video courses, executable tutorials—all kinds of helpful resources there. Links to meetups, everything you might want to know to move forward as a data streaming engineer. Coming soon next year, there will be a certification program. My certified data streaming engineer—you want that?

Joseph: Well, thank you so much, Tim. I really can tell you, you know, kind of the way I set up like from my personal history here, it really is an honor to have you as a guest, but especially for this inaugural episode, um, and for the audience that's right. You particularly you. Please stick around because after this, I'm going to give you my top three takeaways in two minutes.

Joseph: That was just a fantastic conversation with Tim. So many of his [01:00:00] answers and insights really surprised me—he truly was the perfect guest for the first episode. One of the key takeaways that I think is really important for anyone diving into the data streaming journey—whether for the first time or at a company where things have been running the same way for a long time—is the idea of finding the pain. I think that's the perfect way to create a meaningful callback. Your ability to make the business case, to actually take on the effort of building something new or migrating off of a long-running system, starts with identifying that pain point.

Joseph: So, find what everyone hates and then figure out a way to utilize data streaming to fix it. Another big surprise for me—wow. I could not believe that Tim wasn’t huge into data streaming before coming to Confluent. Then again, considering the timing, it does make sense. But he has always spoken so passionately about the [01:01:00] subject.

Joseph: I just assumed that he was, like, an industry expert before he got here. Turns out, he’s just really good at talking about technology. He came to work for an awesome company, and now he gets to talk about data streaming—and that’s to all of our benefit. What I really liked about that use case Tim talked about with financial services and fraud detection is…

Joseph: I think it really is just—it’s, it’s, um—it’s a universal use case, right? Anyone who has ever used a bank, especially if you’ve been around long enough, has probably noticed that things have gotten a lot faster in banking. It used to take weeks to get a loan application approved or denied—now it’s happening the same day or the next day. That’s also powered by data streaming. The reason it used to take so long was because these processes were done in multiple batches. If you look at those improvements, along with the specific use case Tim talked about with fraud analysis, it’s just a really great way to explain data streaming, right?

Joseph: You have all these transactions—they’re [01:02:00] constantly moving, thousands or even millions per second—and we need a way to identify outliers and anomalies. Data streaming and stream processing, which we’ll be talking about heavily in episode two (so please do come back for that!), are really the foundation of making that possible. It’s an exciting use case that gets your mind churning about what’s possible. That’s it for this episode of Life Is But A Stream. Thanks again to Tim for joining us, and thanks to you for tuning in. As always, we’re brought to you by Confluent. If you’d like to connect, find me on LinkedIn, tell a friend or coworker about us, and subscribe to the show so you never miss an episode. We’ll see you next time!