Life Is But A Stream

LIVE from Current NOLA: Scaling Streaming in the AI Era

Episode Summary

Recorded from the expo hall at Current New Orleans, Joseph Morais and Adi Polak break down keynote insights, trends in Apache Kafka®, Apache Flink®, and agentic AI, plus voices from the community shaping the future of real-time data streaming.

Episode Notes

From the heart of the expo hall at Current NOLA, this special episode drops you into the conversations, energy, and the breakthroughs as they happened.

Host, Joseph Morais, and co-host, Adi Polak talk with data streaming leaders and community voices to unpack how teams are using Apache Kafka® and Apache Flink® to power low-latency, AI-ready applications—covering patterns from usage-based billing and hybrid operations to cost efficiency and streaming governance. You’ll also hear how shift-left processing and governance make AI-ready data possible at scale. 

Plus, we break down key launches, including Queues for Kafka, Confluent Private Cloud, the Real-Time Context Engine for AI, and more.

You’ll learn:

The Hosts:

The Guests:

Guest Highlights: 
“If your company is shifting to be more nimble with monetization and especially if you're switching to more of a consumption based model, then the underlying systems have to be real time to support that.” — Cosmo Wolfe, Metronome

“So many customers out there…they are running in Amazon and in their data center and somewhere else. They're all looking for ways to monitor and manage all these different clusters in one way.” — Kevin Balaji, Confluent 

“There's a reason we call it time travel—because we're going back to a prior point in history to replay data and do something else with it.” — Scott Haines, O’Reilly Author and OSS Educator

Episode Timestamps: 
01:00 — Cosmo Wolfe, CTO, Metronome 
17:10 —  Kevin Balaji, Sr. Director, Product Marketing, Confluent
28:40 — Adam Bellemare, Principal Technologist, Confluent
43:15 — Sam Barker, Principal Software Engineer, IBM
50:30 — Olena Kutsenko, Staff Developer Advocate, Confluent
01:05:00 — Scott Haines, O’Reilly Author and OSS Educator

Dive Deeper into Data Streaming:

Links & Resources:

Our Sponsor:  
Your data shouldn’t be a problem to manage. It should be your superpower. The Confluent data streaming platform transforms organizations with trustworthy, real-time data that seamlessly spans your entire environment and powers innovation across every use case. Create smarter, deploy faster, and maximize efficiency with a true data streaming platform from the pioneers in data streaming. Learn more at confluent.io.

Episode Transcription

0:01:56.5 Joseph Morais: Welcome to Life Is But a Stream, the web show for tech leaders who need real-time insights. I am Joseph Morais, your host and Technical Champion at Confluent and today, we've got something special. Recently Confluent took over New Orleans for the biggest data streaming event of the year. The energy was unreal. Man, I think about all the amazing product announcements, keynotes, new features that we have in Confluent cloud, all the sessions, the party, the excitement. Literally, hundreds of builders and innovators came together to dig into real time Generative AI, Apache Kafka, Apache Flink and much more. All the good stuff, but here is the good news. Even if you couldn't make it you are not missing out. We brought Current to you in today's episode, straight from Louisiana. In this episode you are going to hear from amazing Tech leaders sharing breakthroughs of data streaming, real-time architectures and more. Let's jump in. 

0:00:58.0 Joseph Morais: Let's get into it, I am joined by Cosmo Wolfe, Chief Technology Officer of Metronome. How are you doing Cosmo? 

0:01:05.4 Cosmo Wolfe: I'm doing well, it's fun to be here and all that, and Current so far, so good. 

0:01:10.7 Joseph Morais: And is this your first time at the event?

0:01:12.9 Cosmo Wolfe: For me, personally it is but you know Metronomers send people the last couple of years but this is my personal first time. 

0:01:21.2 Joseph Morais: So, I know it's crazy that it's his first time but he was up on the keynote stage. How was that? 

0:01:27.7 Cosmo Wolfe: Yeah. It's a crazy way to start my Current experience, I feel like but the keynote was a lot of fun. I think the stage was really cool, its slides were really really good and the content across the board from Confluent and also the guests, I think was really cool to see. 

0:01:46.9 Joseph Morais: Excellent. Were you nervous at all? 

0:01:48.2 Cosmo Wolfe: I think, not really. I was more nervous in preparing for it. 

0:01:52.1 Joseph Morais: Yeah. That's usually how it is. 

0:01:53.3 Cosmo Wolfe: Yeah. Once you get up there, it's just like, well, here I am. 

0:01:55.8 Joseph Morais: Yeah. It's time to do it. All right, well, let's talk more about Metronome. So, you are solving real-time consumption- based billing at Metronome. Why does this problem inherently demand a data streaming approach from day one and what's the biggest mistake companies make when they try to solve this with traditional batch systems? 

0:02:12.2 Cosmo Wolfe: Yeah. This is actually something that I talked about on the keynote stage a lot, I think which is, as the industry is moving away from kind of simple business modules like licences and subscriptions to more complex variable costs, we're more closely line of value, like consumption or, you know, outcome-based pricing or really anything where the month to month, day to day, hour by hour spend is changing. That forces a lot of challenges for companies to build into the product experience and also they're kind of market experience that are all real-time data challenges. For example, if you have variable cost like Confluent, users are going to want to know how much I am spending on Confluent and that has to be up to date. Maybe not to the second, but it can't just be at the end of the month, I get a surprise bill and learn if I spend $1,000 or $100,000. I need to know, closer to real time, how much I'm spending. 

0:03:04.7 Cosmo Wolfe: And then, similarly, if you use any of the big AI products which are also Metronome customers for example, you may use their ability to limit spending. So, you can say I want to make an API key for our marketing team and I only want them to be able to spend this amount per day on that key. 

0:03:18.8 Joseph Morais: Sure. 

0:03:18.8 Cosmo Wolfe: So, as obvious as anything so I can kind of like have those cost controls. That requires even more real-time data because if I say I want to only spend $100 or $1,000 on my API key and then I let's say I have a delay of a day or an hour in the data maybe I burn past that and spend $2,000. 

0:03:38.3 Joseph Morais: Like some spike that I wasn't aware of. 

0:03:39.9 Cosmo Wolfe: Yeah. And so that... And then that's all kind of like user experience like monetisation, is part of your product. But even just downstream of that. Think about Confluence sales wraps knowing one should be engaging new customers to upsell. They want to see like hey this customer is using more or less usage this month than I expected. That needs to be a real time process as well and so, there's basically across the board if your company is shifting to be more nimble to monetisation and especially if you are switching to more of a consumption-based model, then the underlying systems have to be real-time to support that and so you have to use something in-house built on top of Kafka or Confluence or you know better just use Metronome so we have not all that billing for you. But either way, you're becoming a real-time data company. 

0:04:23.1 Joseph Morais: That makes a lot of sense and consumption-based billing is inherently real-time. So, you are going to have some whatever system you're using and hopefully it is Metronome. Has to be able to match that requirement. So, you said... Actually, we did an episode before, so I'm pulling from that. You said Trust is the product. In a streaming context, that hinges on every single stream being processed correctly. How do you build and maintain the trust in the data stream itself, when getting it wrong even for a moment can break a customer relationship?

0:04:50.6 Cosmo Wolfe: Yeah, it's a good question. There's a couple different aspects of that, I think. One is for our customers, the people who are trusting Metronome to build their monetization infrastructure, they need to have deep trust in us that our systems are working. Or if our systems aren't working, that we will tell them, right? And so a lot of people, I think, when they think about the kind of infrastructure and the trust in infrastructure, for example, like we trust Confluent to run our data streaming. People think that just means how many nines of uptime you have. But what really matters, I think, is first of all, you have a lot of nines of uptime, but also then when problems inevitably happen, it's easy for me to figure out, is this a problem on the Metronome side and the Confluent side? And where is this problem, which requires that kind of transparency and trust in your vendors? 

0:05:36.4 Cosmo Wolfe: And then an additional layer of that is the trust that the end user has in that their invoice or their bill is going to be correct. It's pretty easy to kind of reconcile as a user, you know, am I paying the right amount for a seat-based business? I look at roughly how many employees I have and I multiply it by the seat cost and I make sure that that's roughly right. But for usage-based, it's a very opaque black box. And so as a service provider, if you're charging consumption or usage-based pricing, you are pushed to offer a lot of transparency into where my usage is occurring, where are my credits going, how am I spending money on your platform. And so that pushes you to have a lot more kind of like cost explorer-esque experiences.Like I think probably most people who are listening to this have used the AWS cost explorer before, or actually the Confluent cost explorer, honestly. 

0:06:25.7 Cosmo Wolfe: And those are some form of way to build that customer trust. So I can see like, hey, I spent $10,000 on Confluent last month, $8,000 of that was on this cluster and it was broken down by these topics or whatnot. Having that really high carnality visibility into spend is critical for building that end user trust.

0:06:44.2 Joseph Morais: Absolutely. So you mentioned AI in my previous question. Of course, there's going to be more questions about that, I apologize. So the new wave, of course, is AI with unpredictable token-based usage. How does this kind of bursty high volume event data break traditional billing? And what new demands does it place on your data streaming platform to process it accurately?

0:07:04.3 Cosmo Wolfe: I think in some ways, people would love to just treat billing as a batch problem, right? Like your spiky event token usage shouldn't really matter if you just like can just look at a, I don't know, like a BI query at the end of the month, once a month. The problem is that it just doesn't work. Like you need to have, again, we kind of talked about this, this like mid-month visibility into usage. And so you end up not being able to treat it as a batch problem. And then you have a real-time system that it's much harder in some sense to handle those spikes in because you have to be like, what does it look like to my customer's usage, which is a great increase, which is a great problem to have. 

0:07:43.7 Cosmo Wolfe: You want your systems to work when usage increases on your platform. But in a streaming manner, now you have to make sure you can handle that in real time and you can scale these like pretty stateful systems up and down because it's not just like you're looking at each event individually and then throwing it away. You're really accumulating each event into at the end of the day, an invoice or a usage statement or some visibility into that spend.

0:08:05.7 Cosmo Wolfe: And so you have to build these complex stateful streaming systems that you kind of hit on this, but at least for Metronome, we have to make sure we exactly once process everything end to end. Some companies, if your cost of event is very low, like maybe an AI company where an inference is like a hundredth of a cent, that's fine. Like you can probably do at most or at least once and it's going to round out to be roughly the same. But we have customers where individual events are thousands of dollars, right? And like if one event is missing from an invoice, people will notice. And so we have to make sure that the systems can handle very high scale and very real-time data, but also make sure they're end to end exactly correct.

0:08:46.4 Joseph Morais: Right. So the high quality, that control, that stream processing, but at scale. And that's really kind of the important crux, especially when you're trying to, especially how fast Metronome has grown. So let's dig into your architecture. For those who might still try to solve this with a traditional database, what is the fundamental advantage of a streaming first architecture for a problem like real time billing?

0:09:07.2 Cosmo Wolfe: I would say that if you can solve it with a database, feel free to tell me how. 

0:09:12.8 Joseph Morais: What database are you using? 

0:09:14.2 Cosmo Wolfe: At the end of the day, you are storing either these individual raw events of which there's, you know, for Metronome, hundreds of billions of them, or you're rolling them up somewhere before you load them into the storage. And so if you are storing the individual raw events and you can still service the real time use cases on top of that, like, for example, cutting users off when they hit a threshold, that's great. But what it implies is you have a high data volume, high carnal data in a database, but you're serving with high query volume. So a sort of, you know, fairly reasonable architecture you could imagine is loading all these events into an iceberg or something using table flow and then querying that iceberg. But you then lose that ability for those real time insights because you can't service these like low latency, high 2PS queries against the iceberg. And so you end up moving more... Honestly, I'm not trying to ramp the Confluent talking points too hard here, but you end up moving further left.

0:10:15.1 Cosmo Wolfe: You're computing the invoice further upstream or at least constituent parts of that invoice so you can do some more real-time cutoffs and calculations. And there's a lot... Again, some businesses don't need that real-timeness necessarily. They're like, oh, it's fine if it's an hour or a day late. 

0:10:31.5 Joseph Morais: Sure.

0:10:31.7 Cosmo Wolfe: Which is like, even then, I think, a lot of the challenges that people get into with pricing and packaging and building monetization infrastructure in general is designing for what they have today, not what's in the future. And so there's actually like, I think the streaming versus not streaming is somewhere where we see this less, that people usually recognize that it makes sense in a streaming manner. But even just like, what features of a pricing model do you support? Like, I was talking to Sean, the company's CBO backstage, and talking about how doing product-specific credits, so credits that work just for a specific suite of products, is something that Confluent had not originally built into their billing system and then later had to because of that complexity, and that's hard to do, and they ended up using Metronome. 

0:11:18.7 Cosmo Wolfe: And I think building a system for the Current state versus the future state, you'll end up in lots of local maximums, like maybe just using offline data so you lose that real-timeness, or not supporting a business model that ends up costing your business revenue in the future, that kind of thing.

0:11:31.0 Joseph Morais: Yeah, I think what I really take away from that answer is the building for the future, not for today, and I think a lot of people get stuck in that. They're like, hey, this POC works, I'm good, but will this work when you grow 5x, 10x, 100x if you're that successful?

0:11:44.1 Cosmo Wolfe: And just to add to that, because I agree 100%, put yourself in the shoes of someone who that's happening to, right? Your POC worked, you went to market with your company or whatever your new product line is, and now it's growing like crazy. What do you want to be working on? Presumably not retooling any sort of internal process, whether that be billing or deal desk or anything. You want to be selling to customers and building the product. And so it's a good problem to have if your billing system is falling over because you're growing too fast, but it's in some ways a very bad problem to have too, because now you have pulling engineers off of capturing the market moment and instead they're making PDFs. And it's like, why are we doing this? And so really making sure that you don't get caught in that trap, I think, is very critical.

0:12:31.4 Joseph Morais: So speaking of scale, your platform famously handles 10 terabytes of data per hour. When your data stream hits that kind of scale, where do things typically break? And can you walk us through the main bottleneck you hit and how you use Kafka Streams to engineer your way out of it.

0:12:46.7 Cosmo Wolfe: We've sort of been stream native from the beginning for those reasons that I mentioned. I think it was not realistic for us to start with just loading the events into Postgres or something. So a lot of the challenges that we hit as we scale to that scale are not challenges that necessarily streaming solved. Rather, it's just like we scaled streaming to a high scale so we had to figure out how to do that. 

0:13:10.0 Joseph Morais: It was about starting with streaming. 

0:13:11.2 Cosmo Wolfe: Exactly. So, one example is we use Kafka Streams relatively heavily. And for those of you who don't know, Kafka Streams does a lot of instance-level state storage in RocksDB on the hosts. And we had to invest a lot in making it fast to restore if a host shut down uncleanly or somehow the consumer group got into a bad state. That's something where if you're not dealing with the level of data that we are, it might be okay because they'll do a restore where they're consuming from the change log topics and kind of rebuilding good state, which is fine for small volumes but could take literally days for us. And so we had to really engineer to make sure that couldn't happen. So that's the kind of challenge that I think we had to address as we scaled out to be able to handle the kind of world scale that we operate at.

0:13:57.7 Joseph Morais: Yeah, that scale is very impressive. And it comes back to the fact that you scaled for the future of Metronome, not for day one. Now, last time we talked, you mentioned that you developed a unified template application. How does this template make it easier for your developers to build new features on top of Kafka without each of them having to have a deep stream processing expertise?

0:14:18.2 Cosmo Wolfe: Yeah, I mean, I think it is sort of as you'd expect in that we don't want everyone to reinvent the wheel and importantly come up with wheels of different shapes when they're building their Streams app. So the template application just has good defaults so that people in product teams or delivery teams can spin up a Streams app without having to learn about how to set up all the scaffolding and whatnot. Because we're so streaming central, we want to make it easy for more people to build streaming features without, again, like reinventing the wheel every time.

0:14:58.4 Joseph Morais: Okay. So, your platform delivers second level latency for billing insights. What does that kind of speed driven by continuous data streaming unlock for your customers? And what can a finance team do with that live insight that's impossible with a daily batch report?

0:15:12.9 Cosmo Wolfe: The biggest thing would be entitlement control. So, if you want to offer the ability, imagine these AI labs. That you say like, I only want a user to be able to make 10 Claude Opus calls a day. You need... How frequently are they making calls? You need kind of second level control of that. And so it's less about the kind of individual, I guess you could call them features, but less about the kind of individual things that maybe a finance team or a go to market team or an engineering team wants. It's more about the whole shapes of business models. Like if you couldn't offer that and you had to just say, if you have a Claude account, you can access all of our models, Claude accounts would be a fundamentally different shaped product and probably less broadly used, I would guess, because they'd probably be more expensive because you'd have to cover, the individual account would have to cover the like big spenders basically. 

0:15:58.2 Joseph Morais: Right.

0:15:58.4 Cosmo Wolfe: Whereas being able to put those in-product gates in place allows you to have a lot more flexibility in how you offer your products into them and cutting them off as well. Same with fraud, honestly. Like most infra... I'm sure Confluent deals with this, most Infra SAS companies and AI companies deal with a lot of fraud because you're selling a valuable service. Someone's going to try to figure out how to mine Bitcoin on a broker or use AI without paying as much for it. And so being able to only dish out usage of your product that you expect to get paid for. So for example, every time there's $100 of unbilled usage, collecting on that before you let the user use more of the product or spin up more clusters or make more calls, that kind of fraud control is another important thing that a lot of finance teams need in these low margin infra and AI businesses.

0:16:48.6 Joseph Morais: That makes a lot of sense. Well, thank you for your time today, Cosmo. It's great to actually talk to you again. We appreciate you being here at Current, being part of the keynote, and building a great product that we get to use and benefit from. So thank you so much for the time. I really appreciate it. 

0:17:02.2 Cosmo Wolfe: Yeah.Thanks for having me. Thanks for having me at Current. I think you guys are throwing a great event and are always excited to chat about Metronome and Confluent. 

0:17:08.5 Joseph Morais: All right. 

0:17:09.4 Cosmo Wolfe: Cool. 

0:17:09.7 Joseph Morais: Thank you so much.

0:17:10.2 Cosmo Wolfe: Yeah.

0:17:10.7 Joseph Morais: And joining me now, he's just coming all over, Kevin Balaji, product director of product marketing here at Confluent. Thanks for joining me, Kevin.

0:17:20.9 Kevin Balaji: Good to be here, Joseph.

0:17:22.1 Joseph Morais: So let me just get this in front of me. So first of all, how's the event going for you?

0:17:26.9 Kevin Balaji: It's exciting to be here. The whole Kafka, the whole Flink community, all data streaming in one place. It's really exciting to be here.

0:17:32.7 Joseph Morais: All right. Let's talk about some of the announcements we had today at the keynote. So Sean, our Chief Product Officer, was talking about how energized the community is with the transition of Kafka being complete with the launch of the latest version of Apache Kafka 4.1. And he announced queues for Kafka. Why is it so important to build queuing directly into Kafka? And what big problem does this solve for developers who might otherwise use separate systems?

0:17:59.1 Kevin Balaji: Yeah. If you think about most organizations out there, Kafka has been this great produce and consume once, exactly once type of messaging system. But for a lot of these other asynchronous workloads and things like that, there's the traditional message queues, queuing system. That's where they kind of come from. And so the typical organization out there has had to have this need to have Kafka, which they want for so many of the things that they're doing with real-time data, but they still have these queuing systems that they have to run in parallel. 

0:18:26.1 Kevin Balaji: Queues for Kafka kind of puts a lot of that to bed. Queues for Kafka really makes it so that you can have all your queuing and all your asynchronous workloads, as well as these exactly once type of workloads, all in one place. And so if you're an organization, we know how many people are using Kafka in the world. It just makes it that much better. More of your things just come into one messaging system. You don't have to think about multiple fragmented experiences.

0:18:48.8 Joseph Morais: Right. I got a message queue over here. I got a data stream platform over here. It's all here. 

0:18:52.4 Kevin Balaji: It's all in one spot.

0:18:53.4 Joseph Morais: So Confluence's mission is to make data streaming ubiquitous. Sean talked about this, but cost is a major barrier. So he pointed out that networking alone could be up to 80% of streaming costs. How does Confluence Core Engine with features like auto-scaling and diskless Kafka directly attack that massive overhead?

0:19:12.1 Kevin Balaji: Yeah, I think there's a lot of ways. I think we've really optimized for every layer of the cost stack here. I mean, we did a study of customers who are self-managing, and we found that even at peak capacity, their infrastructure was over half under provision.

0:19:25.8 Joseph Morais: That's crazy.

0:19:26.6 Kevin Balaji: So even in the best case, you're wasting half of your infrastructure all the time. So Core's auto-scaling capabilities, just by default right there, saves you half your infrastructure costs. But then, as you say, networking really can be this big killer on top of it. Private network interfaces, this new private networking type that we have, it's powered by the AWS ENI interface. It's the same private networking that everyone's used to out there, but it's for 50% less because of how we're able to build it, right? So that cuts through that. Diskless Kafka, all those expensive inter-availability zone charges, that goes away. And so it's really optimizing for every layer of the stack. And for the customer, you don't actually have to think about any of this.

0:20:03.5 Kevin Balaji: You're just thinking about, okay, I have this workload. I'm going to find the right cluster type for me, and all this stuff that I talked about is just going to be there behind the scenes. You just get the cost savings directly.

0:20:11.6 Joseph Morais: Yeah, that's amazing. So Sean noted that half the world's infrastructure is still on-prem. It's amazing. I wonder if it's even higher than that, which often leads to cluster sprawl. We announced Confluent Private Cloud. How does this make a self-managed on-prem environment feel and operate like a true cloud-native service?

0:20:30.4 Kevin Balaji: I mean, I'll take a bit of a risk here, and if you've got to edit this out, you can edit this out, right? 

0:20:34.1 Joseph Morais: Yeah, yeah, yeah. 

0:20:36.1 Kevin Balaji: Which is, When I first joined Confluent, we pivoted pretty hard on this. We've got to be a cloud company, and we've got to be cloud-first and all that. My first assignment at Confluent... Hey, Batchy. 

0:20:45.5 Joseph Morais: We got Batchy! 

0:20:46.3 Kevin Balaji: Batchy, what's going on? 

0:20:48.1 Joseph Morais: Hey! 

0:20:48.4 Kevin Balaji: What's going on? I love Batchy. 

0:20:49.7 Joseph Morais: This is the good Batchy, who's figured out how to be not our data problem, our data monster, but instead, our best friend. It's so nice to see you, Batchy. Bring it in. See, anything can happen here. So getting back to your answer. 

0:21:08.4 Kevin Balaji: As I was saying, my first assignment at Confluent was like, put the word cloud in as many things as you can. But half the world's infrastructure runs on-prem and it's still true for Confluent for sure. And we see a lot of these, especially these bigger regulated industries, you have these big platform teams and they run Kafka for a whole bunch of business units. Or you have a managed service provider that runs Kafka for a whole bunch of clients. And they need these capabilities that we already have in the managed cloud. We've been doing this for a long time and we've solved these problems for our managed cloud customers. 

0:21:38.3 Kevin Balaji: And is there any way we can bring some of that IP to private environments? And that's why we're really excited about Confluent Private Cloud. It gives kind of those cloud-native building blocks with things like intelligent replication. So you can have that 10x faster performance you have in Confluent Cloud. Multi-tenancy will come over time and that'll help with, again, the scaling and fleet management and stuff like that. Again, we already have that in the cloud. We're able to give a lot of that now to customers that have to be in private environments.

0:22:01.4 Kevin Balaji: And that's very exciting.

0:22:02.2 Joseph Morais: That makes a lot of sense. And I'm a network engineer nerd by heart. So just knowing that we have this unification because hybrid is a reality. I think it's not going to go away. The cloud providers might not feel that way. But making that hybrid experience easier I think is absolutely crucial.

0:22:17.7 Kevin Balaji: Definitely.

0:22:18.2 Joseph Morais: So for companies that, again, form the hybrid theme, running clusters on-prem and in multiple clouds, we announced the GA of Unified Stream Manager. Is this the single pane of glass that platform teams have been asking for to manage their entire complex footprint?

0:22:33.3 Kevin Balaji: The early access period for Unified Stream Manager was as crowded with early access as we have had at Confluent. 

0:22:40.2 Joseph Morais: Everyone wanted it. 

0:22:40.9 Kevin Balaji: Everyone wanted it because the truth is there are so many customers out there, they are running in Amazon and in their data center and somewhere else they're doing another thing. And they're all looking for ways to monitor and manage all these different clusters in one way. Is there one place I can see all my metrics? Is there one place I can apply a policy where it can then go to every single thing in my streaming estate? And there's so many people who are really excited about that. I'm excited about Unified Stream Manager. The way we built it is really cool. You're just giving metadata in the Unified Stream Manager. So all your customer data stays exactly in your environment. So if you're worried about security and things like that, you're still going to apply those principles. Unified Stream Manager is going to make your life easier in how you manage and monitor things.

0:23:22.2 Joseph Morais: I love, again, for those hybrid folks, they must be loving these announcements. So a central theme was shift processing at the government's left. So in simple terms, why is it so important to build this context at the source of the stream rather than having everything downstream or having every downstream team do that work themselves? 

0:23:41.7 Kevin Balaji: Yeah, I mean, so many companies out there have been trying to solve the quote-unquote data mess that's out there for a long time. Batchy represents in some ways that data mess. It's because no matter what platform or thing that's come out there, a lot of times the enrichment of that data and making that data meaningful for whatever application or whatever system and whatever thing you're doing happens at the actual application layer or happens in the lake house or it happens in the actual thing you're doing. This is more of a paradigm shift. This is, you know, hey, is there a way that from the moment data is created, we can actually process and govern and enrich it into something that's trustworthy and something that's reusable to where we can now stream once and use that in many different places. 

0:24:23.9 Joseph Morais: Right.

0:24:24.9 Kevin Balaji: And I'm excited about the IP we're building in Confluent to make that happen with things like Flink and with table flow and governance and things like that. I mean, that makes this whole concept actually practical, that you could stream the data once, you can process it and govern it in real time. If something changes upstream, it's just going to take care of that and stream it and it's going to get to the places and you're going to know that it's going to get to those places and we're building security and RBAC and all that so that the right things have the right stuff whenever they need to. So I think it's a really great shift and the customers that we've seen adopt that approach, they're seeing a lot of savings. They're seeing fewer data errors downstream. They're saving on compute costs because they're not doing that stuff downstream. So it's been a great thing.

0:25:00.8 Joseph Morais: Yeah, it's pretty incredible to enforce governance and processing at ingestion one time and then to be able to use it many times. I think that's what everyone wants. They want to squeeze all the juice out of the lemon. So let's talk about the payoffs of shifting left, starting with AI. So Sean announced Confluent Intelligence, noting that AI systems need both real-time data and historical context. How does this new platform actually deliver that trustworthy real-time context to an AI model?

0:25:27.1 Kevin Balaji: Yeah, so again, if you stream everywhere and you're able to shift processing and governance left, that means the data stream platform has the real-time contextualized trustworthy data that you want. And the key is how can we get it then to our different AI systems that are out there? I think this is saying that AI systems need data. I think that's not a hard sell for people. Everybody knows that. Everyone knows that. AI needs to be real-time. Even that's not a really hard sell. How do we get the real-time data to our AI systems? And I think that's where the real-time context engine is going to be a really powerful thing. One of the things in Confluent Intelligence is we're going to take the same streaming process data that you have. 

0:26:00.4 Kevin Balaji: It's going to enrich it into these materialized low-latency caches that's suitable for production serving for an AI system. And it's through a fully managed MCP server. So the person building the AI system, they don't get to know anything about the Kafka and Flink stuff behind the scenes. 

0:26:13.0 Joseph Morais: That's the best.

0:26:13.3 Kevin Balaji: They're just going to write some MCP queries and some stuff there, and their AI stuff is going to get the real-time data that they need. And that's one of the ways that Confluent Intelligence is going to help you, again, shift left and bring that real-time trust to AI. 

0:26:25.5 Joseph Morais: Make the data good. Make it easy to access. That's everything. That's the core challenge. So the other big payoff of shifting left is analytics. So Sean announced TableFlow's new GA support for Delta Lake. How does this let an analyst use real-time Kafka streams as if it was just like a table in their lake house? And why is this new upsert feature so critical to that?

0:26:44.2 Kevin Balaji: Yeah, I'm really excited about TableFlow. Again, we're talking now about shifting left for analytics. And the same, again, real-time data, how do we get it to our different analytical engines, data warehouses, so on and so forth. TableFlow makes that push button simple. So we're taking the same real-time data, we're materializing it as Iceberg or Delta Lake tables, and it's going to be an object storage, so it's not expensive to store all your historical data. And so that, again, makes that practical. And that works very natively with our analytical engines. And things that we've announced today, like Upserts and dead-letter queues, that makes this whole thing production-usable. 

0:27:16.3 Kevin Balaji: So imagine now you have your entire table of customer records and all the streams that are coming in. If it doesn't fit your schema, it goes into a dead-letter queue. And if something changes upstream, the Upsert functionality is automatically going to rebuild the table. And so you have this kind of ready-to-go silver table for your lake house that's always going to be on. And that makes it, again, very usable for production use cases in the Lake house. 

0:27:37.1 Joseph Morais: Absolutely. So let's tie it all together. Confluent is making streaming cost-effective, managing hybrid sprawl, and shifting governance left. If a company truly embraces this stream-on-use-everywhere model, what is the biggest single transformation they'll see in their business? 

0:27:52.8 Kevin Balaji: Data is going to become a first-class citizen throughout their business. So if you embrace this approach of making streaming ubiquitous, and we're trying to make that happen. We want streaming to be cost-effective for whatever it is you do, whether you're going from A to B or you're building some crazy AI agent, that doesn't matter to us, right? We want it to be cost-effective. If you do that, you're able to process that data once, use it in so many places, AI, analytics, real-time experiences, the whole thing. It's really going to unlock a new way that developers can build with data, and it's going to help them because they're not worrying about getting access to some source system, they're not worrying about, oh, I've got to manage some Kafka cluster somewhere else. They're getting data, and they're going to build with it, and it's just going to make everyone's life so much easier.

0:28:32.4 Joseph Morais: Data is a first-class citizen. That is an amazing way to end this conversation. Kevin, thank you so much for joining me. I appreciate everything you do.

0:28:39.5 Kevin Balaji: Awesome. Have a good one.

0:28:41.1 Joseph Morais: All right, so we're back, and I am joined by Adam Bellemare, principal technologist, who just got out of a book signing. How was that?

0:28:50.6 Adam Bellemare: It makes you realize how little you write with your hands anymore these days, especially if you're in computer technologies. 

0:28:58.7 Joseph Morais: That's true.

0:28:59.1 Adam Bellemare: Wrist is a little stiff. It was about 151 or two copies. We signed one or two extras for some of those latecomers. 

0:29:07.1 Joseph Morais: That's amazing.

0:29:07.8 Adam Bellemare: But, very good. It went very well. 

0:29:09.0 Joseph Morais: I had to write some notes like a week ago, and I realized I hate doing this. Somebody give me a computer or a phone. Can I actually grab your book right now? I want to show everybody. 

0:29:17.1 Adam Bellemare: Yes, for sure. 

0:29:18.0 Joseph Morais: So, we're here talking about building event-driven microservices, and this is the second edition, so let's get right into it. Again, congratulations on the second edition. It's been about five years since the first one. What key shifts in the industry or common challenges have you observed that prompted this update now, and what's the core problem event-driven microservices solve that feels even more relevant today? 

0:29:42.7 Adam Bellemare: Right, so in terms of a second edition, with technology, it's basically the endless march of time.

0:29:51.5 Joseph Morais: I like that, endless march of time.

0:29:53.1 Adam Bellemare: So, I started this one in 2017. 

0:29:56.1 Joseph Morais: Okay.

0:29:57.7 Adam Bellemare: And I completed it in 2019. That's already...

0:30:01.0 Joseph Morais: So, that's the first edition? 

0:30:01.8 Adam Bellemare: Yeah, the first edition, so that's a good six years ago for then. Since then, of course, lots of changes have occurred. If I had to summarize them neatly for this, I would say cloud computing has got even cloudier.

0:30:16.0 Joseph Morais: That sure has.

0:30:17.7 Adam Bellemare: Many more systems are moving to cloud, and then there's impacts on that, such as the rise of more fully managed services, hosting of services, and so forth. People are less afraid to put their services into the cloud and use things like, for example, S3-compatible databases to store their data. 

0:30:38.5 Joseph Morais: Right.

0:30:38.7 Adam Bellemare: So, all of this has had a big effect on streaming, mostly because we've unlocked the ability to store data indefinitely at any size. Streaming systems from maybe 20 years ago, and even when I started in 2017, that was a very uncommon pattern. That's one thing. SQL streaming has come a long ways in general. 

0:31:00.1 Joseph Morais: Sure. 

0:31:02.7 Adam Bellemare: There's still no sort of unified streaming SQL standard, but many different flavors of that. That's a big component, mostly because a lot of stream processing can be written very simply and quickly and easily with SQL streaming, or streaming SQL, I should say. Then a third reason why is less exciting than the rest, and this is just basic updating of content. Personally speaking, I've gotten much better as a writer. 

0:31:35.9 Joseph Morais: Okay.

0:31:38.5 Adam Bellemare: So, the first edition of this book was my first book. 

0:31:41.5 Joseph Morais: Okay. 

0:31:41.4 Adam Bellemare: And the publisher warned me, and they said, try not to go over 250 pages or you'll burn yourself out. There was a lot of stuff left on the cutting room floor, so to say. 

0:31:51.9 Joseph Morais: Okay. 

0:31:52.5 Adam Bellemare: A lot of pages left on the cutting room floor. 

0:31:53.8 Joseph Morais: You still had things to say.

0:31:55.0 Adam Bellemare: I still had things to say. And so, I wanted to include that, which made it a natural extension for the second edition. I was like, ah, I can finally put all that other stuff in there that I wanted. This one here is topping out at about 440 pages, 430 pages. 

0:32:10.0 Joseph Morais: I hope you didn't burn yourself out.

0:32:11.3 Adam Bellemare: I did not burn myself out. No. It was a very lovely experience.

0:32:13.8 Joseph Morais: So, the rise of the cloud, obviously, between the two editions, we had the pandemic, which drove cloud adoption even faster. Then, of course, you had more to say. I'm fantastic you got your full message out there this time. A central theme in the book is how this architecture decouples what a service is from how it accesses data, creating a better data communication structure across an organization. Can you unpack that idea? Why is treating event systems as a shared communication layer so fundamental? 

0:32:41.3 Adam Bellemare: Yeah. Okay so, one of the reasons, I guess there's a couple of ways I could look at this. One of the things that's challenging about computers, like computing systems in general, is basically eventually you will reach a scale where your business grows so much that it can't possibly fit just inside one computer anymore. That's a very good problem to have. This is not a bad thing. This is a fantastic thing. Good job, you're doing great.

0:33:12.6 Adam Bellemare: Now what this means though is, as you want to use different tools or different technologies, and especially with the rise of and continued expansion and adoption of cloud services, there's a lot of services where they simply will need your data, but they don't want your database and you can't install it on your database, so to say. So the data communication layer, in my mind, is this means of communicating, of transferring data from one service to another that's a first class citizen, that's something that's purpose built.

0:33:47.2 Joseph Morais: Yeah, that's an important point. 

0:33:48.5 Adam Bellemare: And not just sort of added on after. 

0:33:50.7 Joseph Morais: Right.

0:33:51.5 Adam Bellemare: If you're using a request response system backed by, let's say, a relational database, you can of course ask it for data and it can give it to you, but it's not a very scalable means because what you're doing is you're telling that database and that system that owns that data, okay, you may have created this data and you might need it for your own operational use cases, but we also need you to serve the performance requirements and request patterns of all these other systems that are going to plug into it. It's a requirement, a set of requirements that isn't actually directly related to serving your core business functionality, but rather this add-on. 

0:34:32.4 Adam Bellemare: And some people just say, well, those are scaling problems, for example, but the crux of the matter is that if the data that that service creates is very important, it's very likely to be needed elsewhere in your business by definition. And the idea of using event streams to do this is that you can publish it once, read it as many times as you need, as often as you need, wherever you need it, and if you want to know if you're up to date on that data, you just listen to the event stream.

0:34:59.9 Adam Bellemare: So it's a very neat and tidy isolation of the operational systems that make that data and every other system that might need it, but doesn't necessarily want to or doesn't have the ability to couple on that original source system.

0:35:16.3 Joseph Morais: It's the democratization of data, and I think that the middle plane's so important. So I have an operator's background. So the idea that things are point-to-point and knowing the brittleness that entails just absolutely kills me. So knowing that we have a system like that is crucial. So part three of your book explores a range of implementation options, basic producer-consumer, lightweight frameworks like Kafka Streams, heavy ones like Flink and Spark, streaming SQL. Without getting lost in the weeds, what high-level advice do you have for teams trying to select the right kind of tool for their specific event-driven task?

0:35:53.8 Adam Bellemare: My advice is always keep it as simple as possible. Use the minimal amount of technology you need to get the job done, with the caveat being that you're still using something that you could extend in the future or also that is, let's say, commonly available. I'm a big fan of off-the-shelf components, open-source projects. I basically don't want to write software if I don't have to. 

0:36:18.9 Joseph Morais: Of course or run a system you don't have to run. 

0:36:20.9 Adam Bellemare: Or run a system you don't have to. Exactly. So perhaps a deeper answer would be a lot of it depends on what frameworks and code you're already familiar with. Very, very rarely will a company, in fact I don't know of any company, that starts up and at their first meeting they say, all right, we're going to do everything in fine-grained microservices. Usually they don't do that. Usually they start with something like a monolithic database. And one of the great things about a monolithic database is you have ready access to all the data within there. So it's very easy to take data that you already have, mix it together, and build new functions. The database acts as a data communication layer for that singular monolith system, which is why it's such a great pattern. 

0:37:07.3 Joseph Morais: Right.

0:37:08.7 Adam Bellemare: That being said, microservice adoption usually comes later. Once you've already accrued some tech debt, you've grown to a scale where you need to start building finer-grained purpose-built systems, you need to integrate with other services, SAS services, for example. And at that point, the technology you're already using should be your number one consideration. If you're a big Python shop, look into the basic Python consumers. 

0:37:33.7 Joseph Morais: Sure. 

0:37:34.2 Adam Bellemare: If you're a big Java shop, like some of the companies I've worked for in the past, we looked more into Kafka Streams, which comes with Kafka, but we also looked at Spark. This was in the day, though, where you would still write your Spark against RDDs, like Spark 1.123. 

0:37:54.8 Joseph Morais: Oh... 

0:37:55.4 Adam Bellemare: There goes the book. But basically what we're doing there is we're using tech we're familiar with, and we're trying to figure out how we can repurpose it to extend where we are without having to reinvent a whole package of things all at once. 

0:38:09.2 Joseph Morais: Right.

0:38:09.8 Adam Bellemare: So, there's all the microservice principles, like container isolation, monitoring, logging, I call it the microservice tax, I'm not alone in that. But the exact technology you're going to use is going to be related to what you already have and how much time you have to try to adopt something new. So short, small incremental steps work really well. And this is also where SQL streaming is very useful, because it's easy to write SQL, and it's easy to get some of the advantages of managing and processing these streams that you may not get if you have to adopt more complicated tooling.

0:38:53.0 Joseph Morais: Right, and then you can get folks like database administrators finally kind of speaking the language that they already know, but interacting with data streaming, so the accessibility is a great piece of it. So if an architect or developer can only take away one core principle or piece of advice from the entire book to guide their event-driven journey and as succinctly as you can make it, what would that essential take away be?

0:39:19.3 Adam Bellemare: Over the years I've come up, I would have answered this question differently. 

0:39:22.2 Joseph Morais: Okay.

0:39:22.7 Adam Bellemare: Depending on, usually you're just due to your own biases. What was the thing that you were hitting that made you the most frustrated at that time? And you're going to say, that's the principle I would always stick by. Honestly, so what is this now, eight years later, going on nine years later since I first started writing this? Use schemas.

0:39:42.2 Joseph Morais: Use schemas, governance is key.

0:39:43.8 Adam Bellemare: Yeah, just use schemas, make your data well-defined. Unstructured data is a horrible, horrible lie.

0:39:53.0 Joseph Morais: It's easy, but bad.

0:39:54.5 Adam Bellemare: It's easy to write and it's easy to sell, but it's not easy to deal with. And I speak from direct experience about 10 years as a data engineer having to deal with that. Where you can get away with in the batch world, you can't get away with it in the streaming world. You basically spread the problem out to every single consumer and they all interpret it differently. They're using their best efforts, but they'll interpret things slightly differently or they may parse a string in a slightly different way because they're trying to extract maybe the underlying integers or floats from there, for example. And then you end up with your data diverging. It can be quiet, you may not notice, and then one day your customer is angry at you because one report says one thing, but perhaps the amount of money that you've billed them says another thing. 

0:40:42.9 Joseph Morais: Yes.

0:40:43.3 Adam Bellemare: And now you have to deal with this lost trust. 

0:40:46.0 Joseph Morais: Reconciliation. 

0:40:46.9 Adam Bellemare: Right. So, if you use a schema, something like protobuf or Avro or JSON schema, which I've not personally used, but I've used the other two, you will save yourself a lot of headaches, a lot of problems. And even if you make mistakes on other things, which will happen, your data will still be a lot cleaner and clearer and understand, and you will have far fewer headaches than you would ever possibly have. So one thing, use schemas.

0:41:15.6 Joseph Morais: Use schemas, folks. Adam, thank you so much for the time. Thanks for updating your book. I think this is going to be absolutely crucial for data streaming engineers today and well into the future. This is a hot commodity, folks.

0:41:28.8 Adam Bellemare: Thanks again.

0:41:28.9 Joseph Morais: Thanks again.Appreciate it. 

0:41:29.7 Adam Bellemare: Lovely to be here.

0:41:31.9 Adi Polak : Hi there! So good to see you! 

0:41:34.3 Joseph Morais: So good to see you. So, joining me now, and about to take over the show for me, is Adi Polak , our Director of Developer Relations. So we, uh, I'm so happy I get to work with you so often now.

0:41:46.3 Adi Polak : Likewise, I feel so fortunate to work with you. It's an amazing experience.

0:41:51.0 Joseph Morais: So we kind of like to represent different sides of the event here. I'm kind of the confluence side, getting all excited about the announcements, and Adi helps represent the community. So just before I let you go, tell me, what's your experience so far at Current this year?

0:42:06.1 Adi Polak : Well, first of all, we sparkle.

0:42:07.7 Joseph Morais: I love this.

0:42:08.3 Adi Polak : It's New Orleans. I mean, it's a combination of voodoo, jazz, fantastic food, and lots and lots of sparkles. So that's the exciting part. It's definitely different vibes. You feel the energy. Everyone here is always super smiling, like extreme, very large, big smiles.

0:42:25.9 Joseph Morais: This is definitely the sparkliest Current we've had.

0:42:27.9 Adi Polak : Yes, definitely the sparkliest Current we've had. And, you know, just the people, the energy, the reigns, the conversations, everything was amazing so far. And people are so excited about data streaming. And it's kind of, you know, I'm excited about data streaming.

0:42:43.6 Joseph Morais: Me too.

0:42:44.2 Adi Polak : But then when I hear more people excited about data streaming, I was like, oh, I'm not alone. It's not just me.

0:42:50.2 Joseph Morais: This cheering section is getting big now. 

0:42:52.4 Adi Polak : Exactly. It's like a huge community out there that all they want to do, all they want to talk about is optimizing stream processing and optimizing Kafka and thinking about schema registry and, you know, from protobuf to Avro to all these good stuff. It's fun. 

0:43:06.6 Joseph Morais: We're all learning out about data streaming. So, Adi, I'm going to leave the show in your very capable hands. Thank you.

0:43:11.5 Adi Polak : Thank you so much, Joseph. 

0:43:14.7 Adi Polak : Hello. So good to see you. How are you doing?

0:43:18.4 Sam Barker: Very glad to be here.

0:43:19.6 Adi Polak : Welcome, welcome. Do you want to introduce yourself? Say a couple of words?

0:43:24.1 Sam Barker: I guess so.

0:43:25.9 Adi Polak : Let's do it.

0:43:27.1 Sam Barker: I'm Sam Barker. I'm a principal software engineer at IBM, but possibly more importantly for this audience, I'm a container of the Croxalicious project. So we're a fully open source Apache Kafka proxy.

0:43:38.9 Adi Polak : So one thing you didn't mention is that you speak at Current, which is very exciting. I don't know if you know that, but we had like multiple hundreds of submissions. It's highly, highly competitive space. There is a program committee of volunteers that, you know, what we do is we read through all the talks. We try to understand what the substance, what the community will want, and what will help people grow in their careers. So super excited to have you speak. Maybe you want to say a couple of words about your talk today? 

0:44:09.4 Sam Barker: Yeah. I'm here to represent the Croxalicious project, which is the fully open source proxy for Apache Kafka. And it was a really interesting chance to give everyone an idea of what's possible to do in front of a Kafka cluster and what difference that can make to your deployments and your workloads and an opportunity for you to add that little bit of extra magic that makes it all work and all hang together.

0:44:31.8 Adi Polak : Right. So, how would people start using it today?

0:44:35.6 Sam Barker: Well, I'd like to think it's nice and easy.

0:44:38.1 Adi Polak : Okay.

0:44:39.5 Sam Barker: It just drops in, it's fully transparent between client and broker. So we just drop it in. You run it in bare metal, you run it in a container, however you want to do your networking. We've got an operator for Kubernetes so that it can just drop in and be deployed. Configuring all the networking and addressing can be a bit interesting, but we've done a lot of the hard work on that side as well with the operator. So you just need to tell it, here's your upstream cluster and what do clients connect to you as? And it'll make everything else transparent.

0:45:07.3 Adi Polak : Super cool. So if I have a Kafka deployment, okay, let's say I have 5,000 topics, you know, we're going scale. It's a number that we see, of course, out there in the wild, you all building data streaming solutions. And let's say I want to start using the project. Of course, it's open source, right?

0:45:27.1 Sam Barker: Yep.

0:45:27.5 Adi Polak : Is there a specific license to it today? 

0:45:29.8 Sam Barker: We're Apache 2 licensed. You can come along, use it, derive from it, do what you like with it. We'd love you to come back and help us out and talk to us about what issues you have, but no, there's no obligation for you to do anything else with it. 

0:45:39.8 Adi Polak : Got it. 

0:45:40.2 Sam Barker: So... 

0:45:40.8 Adi Polak : Very cool. So if I want to implement it in my organization, how do I get started?

0:45:46.5 Sam Barker: Well, watch my session tomorrow because that's the full context of what I'm talking about. So we've got two ways. You can either, we've got a couple of pre-built filters, which one is a solution for encryption at rest. And so you just drop that in. Potentially, there is no config change for your clients. They just connect. We've proxied the whole of the Kafka protocol. So your clients just connect as they would have done anyway. Your brokers don't know anything about this happening either. It's just effectively you re-point DNS at us rather than your Kafka broker. And if you're using our encryption rest solution, you've just deployed that couple lines in a config file and you're away. 

0:46:20.5 Sam Barker: If you want to write your own filters, inject your own custom magic, then you need to check out our Maven archetype that will generate you a sample filter. And you implement a couple of Java interfaces to say which messages you're interested in and the runtime will call you back whenever it sees one of those and you implement your magic in there.

0:46:37.3 Adi Polak : Okay. And how do I take it to scale? So let's say I have one topic, I can manage that, but how do I actually, you know, I tested it, right? 

0:46:45.6 Sam Barker: Yeah.

0:46:46.5 Adi Polak : I validate, made sure it works, did some POC, dump it in staging on a couple of topics, but how do I actually make sure it's going to operate at the large scale that I need?

0:46:57.8 Sam Barker: So we're a stateless proxy. 

0:47:01.2 Adi Polak : It's stateless. 

0:47:02.0 Sam Barker: We're just trying to add that auto-scaling in front of Kafka brokers, inject a lot of what we see as being valuable as where you'd want a server-side interceptor, which we can't do in Kafka for a variety of good reasons. It's a real performance bottleneck, and scaling the brokers is obviously very difficult. 

0:47:17.4 Adi Polak : Right.

0:47:17.3 Sam Barker: You can put it out in the proxy tier, you can scale that with as much CPU as you need, but there's no coordination between nodes. So hopefully it will just scale to as many instances as you need to make it work for your 5,000 topics or one, doing a lot of heavy encryption. 

0:47:31.6 Adi Polak : Yeah.

0:47:32.2 Sam Barker: But you know, we're an open source project. That's not something we haven't spent a ton of time worrying about scaling. If that's a problem space that's interesting to you, come along and talk to us and help us point out to us where we're not right yet. And we'll happily work with you to fix it.

0:47:44.5 Adi Polak : I appreciate you. Yeah. I think it's always the interesting part about open source. We can all collaborate, no matter what the problem that we're solving right now, and just build good engineering practices around that. And you mentioned that it's stateless and it's a proxy. So stateless makes my life super easy because I don't need to manage cache or anything related to that.

0:48:05.2 Sam Barker: Exactly.

0:48:06.7 Adi Polak : On the proxy side, so should I deploy it? It's kind of like part of my Kafka broker's machines or it's a different deployment and I need other machines to do that.

0:48:16.3 Sam Barker: That's the $64,000 question and one I can't answer for you. It all depends on what you want to do with the proxy. We support both forward and reverse proxy topologies. If you're encrypting your data at rest, putting it in the same cluster as your brokers, it's probably too late. It's the wrong side of the trust boundary. So you want it close to your applications. There's a bunch of other use cases where it makes sense to be deployed in the same cluster as your brokers operating on their network in their time zone or whatever you want, however you want to view it. We'll support you wherever you are. We've had a bunch of interest in deploying as a sidecar as well. Not something we've as a project have tackled yet. We think it works. 

0:48:50.7 Adi Polak : Okay.

0:48:50.4 Sam Barker: If that's something that's interesting to you, we've had a couple of people turn up in the community and say, does it? Well, you've got to tell us. You've got to help us here. We're really keen to have more people involved on the community side and helping us figure out where it would be. But we think it would work there too. So we'll suit the deployment topology you want.

0:49:06.8 Adi Polak : That's fantastic. And how is scoring so far for you?

0:49:10.9 Sam Barker: It's awesome. It's really great to be here, see all the energy around Kafka again, hearing Apache Kafka mentioned at the keynote is great to hear. So, you know, it's really powerful. It's good fun.

0:49:20.7 Adi Polak : Anything that's kind of like stuck with you from the keynote today? 

0:49:24.2 Sam Barker: Talking about diskless clusters and that kind of real engagement around shipping data around variable latency and things, that was really engaging to me. I've been following a lot of the work that's been going on upstream about diskless clusters there and how that's coming to the open source version and hearing that validated and the lot of stuff from Confluent about how that's going to be really powerful as well.

0:49:44.8 Adi Polak : Yeah, I agree. The innovation is super exciting and the fact that we can do it now, I believe it's going to be a game changer. It's, you know, you can do SSD or you can do other type of storage if you wish to. It really depends on the architecture that you need, the requirements, the latency and so on.

0:50:01.6 Sam Barker: Mixing those within the same cluster, though, that's where it gets really interesting and really powerful. You're not having to think about in advance, where do I need this data? What's the requirement? I can tune it in production.

0:50:11.7 Adi Polak : Fantastic.

0:50:12.2 Sam Barker: And that's where it's really powerful.

0:50:13.3 Adi Polak : All right. Well, thank you so much for stepping by. It was a pleasure to have you. I'm looking forward to your talk tomorrow. I'm sure it's going to be awesome and I'll get to learn about all the cool stuff that you're building. So thank you.

0:50:22.8 Sam Barker: Thank you very much.

0:50:23.7 Adi Polak : All right. Wow, that was a lot of very cool, rainy stuff that I just learned about this new technology. I didn't have a chance to play with it before, but one thing I know, it's definitely I want to try it out. So next coming up, Olena Kutsenko. Hello there, Olena. 

0:50:39.6 Olena Kutsenko : Hi, Adi.

0:50:40.4 Adi Polak : So good to see you.

0:50:41.3 Olena Kutsenko : So nice to see you as well.

0:50:43.1 Adi Polak : Wow, you're having a good day. I can tell by the smile. 

0:50:48.0 Olena Kutsenko : Yes. I really like the energy of the place. So I am so grateful to be here.

0:50:52.7 Adi Polak : So tell us more. I mean, there's some people out there at home watching us. So it's like, okay, the energy of the place. So, you know, what's... Give us the truth.

0:51:01.6 Olena Kutsenko : It means that watching from home is good, but there is a bit of missing out on interacting with people here. Also seeing the keynote live. Also getting a signed book by Adam, building event driven microservices, something to read on the way back home.

0:51:19.7 Adi Polak : Can I have a... Wow, okay.

0:51:22.6 Olena Kutsenko : You need to get your own copy. There is a line here nearby.

0:51:26.7 Adi Polak : I saw it's a very long line.

0:51:28.1 Olena Kutsenko : It's so because it's a good book.

0:51:29.6 Adi Polak : This is a very thick book, too. 

0:51:30.9 Olena Kutsenko : Yes. 

0:51:31.1 Adi Polak : It's like we're talking 460 pages.

0:51:34.1 Olena Kutsenko : It'll be enough for the way back to Berlin for me. Definitely.

0:51:37.9 Adi Polak : Awesome. Awesome. Did you read the first one? Did you get a chance?

0:51:41.6 Olena Kutsenko : I do believe so. So actually, I'm a minimalist person. I really like audio books and podcasts. But the number of technical books I have at home is piling up. And I really like those because I like to make some notes with a pen and kind of like, you know, just the old fashioned way. So I do have Adam's book, the previous one.

0:52:04.0 Adi Polak : Okay. 

0:52:04.9 Olena Kutsenko : But it will definitely be a refresher to read the new one. And also I think a lot of things have changed. 

0:52:10.1 Adi Polak : Right. 

0:52:10.3 Olena Kutsenko : The whole topic is actually what the conversation is like today compared to some years ago, totally different. I'm really looking forward to attending as many sessions as I can fit today, which will be actually quite a challenge. Like, you know, I feel like we need a time travel machine here, like some DeLorean.

0:52:29.3 Adi Polak : I know. There's so many great sessions. I was like, how do I pick? And then you have your own session. It's like, no, I want to go to the other session.

0:52:37.1 Olena Kutsenko : I was talking yesterday to some speakers and they're like, okay, we have the session at 4:00 P.M but I really want to be at another session, which happens at the same time as my session. And I was like, okay, can I leave the audience like, could you wait for me here because I'm going to watch that other thing happen at the same time? So yes, it's a dilemma. But do you have your list of things you must watch?

0:53:00.7 Adi Polak : Wow, definitely yes. I have a long list of things I must watch. The challenge is like the first day is always extremely packed. 

0:53:07.2 Olena Kutsenko : Yes. 

0:53:07.3 Adi Polak : But then tomorrow, after the keynote, I'll have more time. 

0:53:11.5 Olena Kutsenko : It's also nice that the sessions are recorded, because that's where I'm actually catching up with everything later. 

0:53:17.4 Adi Polak : Yes. 

0:53:18.6 Olena Kutsenko : Because yeah it's impossible to see... Like not physically possible to stretch the time limits. Yes.

0:53:23.8 Adi Polak : We do what we can. You know, we only have 48 hours here, so that's it.

0:53:30.0 Olena Kutsenko : Exactly. What topic excites you the most?

0:53:33.4 Adi Polak : Okay. Streaming Agent Demo was really cool, the keynote. 

0:53:38.2 Olena Kutsenko : Yes.

0:53:38.3 Adi Polak : I was blown away. We were part of the thinking process and designing process and the strategy in the beginning. But to see something like that really becomes live, and you can actually now use it on Confluent Cloud, that was fascinating, because it's something that we need, it's something that people talked about. And now it's there. And I don't know if people know that, but developing software takes time.

0:54:02.4 Olena Kutsenko : It does.

0:54:03.6 Adi Polak : And they did it super fast. Like within a couple of months, boom, it's there. It's out. It's amazing. 

0:54:12.0 Olena Kutsenko : That's very true. And yeah, Streaming Agents, the AI topics, there are actually like two groups for me of the things which I'm interested in, because I want to catch up on all the latest AI stuff. And there are a number of interesting things, including A2A and how we are using it with Flink, how we are using other capabilities of the agents together with streaming data. But also I want to catch up with what is new in the latest Apache Kafka, the consumer group rebalancing protocol. And all this kind of old fashioned stuff and new modern things. And then I think that when we put those together, we can build really impressive stuff with those technologies.

0:54:56.1 Adi Polak : The rebalancing, queues for Kafka is a big change. 

0:54:59.1 Olena Kutsenko : Yes. It's also on my list. 

0:55:01.0 Adi Polak : It's a big change. I'm very curious how the architecture is going to develop because, so a couple of years ago, I don't know if everyone knows that, but there was some thought around not having queues as part of the architecture to support other criterias. And now as the world changes, we have better machines, we have faster networking. It's like, okay, now we can support queues and still have this extreme low latency capability and great scale and throughput and do all these things that we couldn't have done like five or 10 years ago. And now it's possible because of the power of compute. And that's exciting.

0:55:39.0 Olena Kutsenko : Yeah. It's like one thing enables another thing. And here the power of compute is like, okay, so actually you can use queues with Kafka. Maybe it's not such a bad idea as we were thinking before.

0:55:50.9 Adi Polak : Exactly. 

0:55:51.8 Olena Kutsenko : Yes. 

0:55:54.5 Adi Polak : And you're giving a talk.

0:55:55.5 Olena Kutsenko : I'm giving a talk today at 5:00 P.M that will be actually live streamed for anyone who is missing us in person. And I'm going to talk about agents. But my talk on agents will be more like a gamification of things. 

0:56:12.4 Adi Polak : Oh, tell me more. Tell me more.

0:56:13.3 Olena Kutsenko : Have you played Stardew Valley?

0:56:15.0 Adi Polak : Yes. Many years back. I won't give up my age.

0:56:17.7 Olena Kutsenko : Oh my God, Stardew Valley is amazing. And I took that idea and I was thinking, okay, so Stardew Valley is like it's an old fashioned game where everything like those conversations between characters, they're scripted. Like you kind of, if you replace this end time, like it will be the same conversation all over again. And it's boring. How about we introduce a bit of AI and a bit of flink and a bit of Kafka to make those conversations dynamic so that they can evolve, they remember what they're talking about because we are using context. We do things with context. And this is a big conversation again. Like how do we make sure we are using the right data for the right time? And those conversations evolve. The life of the agents or like the villagers in Stardew Valley evolves. Actually, I set a goal for them to organize a conference. And when they talk to each other, they are convinced that it's very important. And then there is a, I mean, it's a kind of Stardew Valley type of conference. 

0:57:15.3 Adi Polak : Yes.

0:57:15.9 Olena Kutsenko : But all of that together with streaming data, Kafka, Flink, some LLMs, some vector databases. That will be in my talk. So for anyone who is curious to... 

0:57:30.7 Adi Polak : Yes, to watch it. It's going to be live streamed. If you have a streaming pass, you know, go watch it. If you don't, just go get one. It's free.

0:57:36.9 Olena Kutsenko : Yeah, exactly.

0:57:37.7 Adi Polak : It's free. I know. People don't know that. But yeah, it's completely free. Go do that. This is super creative. I want to say that this is super creative because, you know, with LLM and generally like generative AI, one of the main questions that people ask is like, what are the actual use cases that I can trust? And I think the gaming industry is a huge opportunity, like taking conversations between the folks in the village. And it's like now let's make it a little bit more interesting than the scripted one we had before. Because as a gamer, you know, you kind of remember all these conversations and sort of become a little bit boring. It's like, okay, I want something that is more enticing or maybe you can even tailor made it for each player and user. It's like, what is your preference? What is the type of things that you like? Maybe you like watching this TV show. 

0:58:27.7 Adi Polak : And so all the village is now going to talk about this specific TV show that the players love. And it gives us just so many opportunities business wise to think like, how can we make our games better, more personalized for the end user?

0:58:42.6 Olena Kutsenko : Exactly. And it's the simulation of, for example, here for the game is kind of like, it sounds fun, but you're right. It's actually quite applicable for business use cases. And one of the examples, which is a couple of months old from Microsoft, I believe, where they built their communication between agents actually for quite a serious use case, how to help doctors to find better diagnosis and kind of to make it efficient, to make it cheap, kind of on a budget scale. And they actually proved that if you have, you have a group of agents and each agent actually takes a particular role. 

0:59:20.5 Olena Kutsenko : So some agents make sure that the diagnosis is correct or at least it aligns with all the symptoms. And other agents make sure that everything is structured, the decision making process. Some agents focus on the budget aspect so that it can scale and some other roles as well. And then those agents have to communicate with each other and to make the decision and to make several iterations where they can agree on the outcome. And this is where it's again simulating that kind of conversation, but where we are, even to be honest, it still kind of sounds to me like really, really fun, but it's like, it's quite a business case, which actually can help. I think it kind of can help humanity in general, like if you can have like better access to doctors or like to kind of this diagnosis process, it will be really, really good.

1:00:14.7 Adi Polak : I love it. It's brilliant. I do wonder how they coordinate them. My geeky side, my geeky side is like, but how do we code this? 

1:00:24.8 Olena Kutsenko : Exactly. This is actually like the whole paper on that. But the thing is, so that was the whole idea and research paper, they did just to prove the concept. But how do we make it that actually it's kind of if you would make it production ready, if you want to make it like that, you are efficiently coordinating them, you're not losing any data, the system is resilient, it opens a lot of challenges. 

1:00:53.3 Adi Polak : Right.

1:00:54.4 Olena Kutsenko : And I think this is actually these two days, there were a lot of conversations about those challenges. How do we solve those streaming use cases and use the context, remembering important things, forgetting not important things, within the conversation with agents.

1:01:12.5 Adi Polak : There's so many, I mean, it opened up like generative AI opened up so many opportunities for us business wise. And then, there's kind of like, I always like to call it like the underline of tech. It's like the things people don't like to talk about because it's not exciting or it's, you know, it's not hype right now. But this is something that we all know from creating software for production. It's like, you need logs, you need to be able to have deterministic fallbacks for whatever is happening in your system. You need to make sure you have replays when you need those replays, guardrails, observability. 

1:01:51.0 Adi Polak : Because agentic systems, essentially, it's a completely probabilistic black box. It's even, you know, harder than what it used to be, you know, like the state of the art machine learning, classics like prediction, anomaly detection, and so on. And now it's completely random because, you know, we're changing the temperature a little bit, and we're getting this randomness effect in. So it feels more, it gets more creative. It gets, you know, more autonomous. And yet when we want to take that and put it into a production system, we need to understand what would a mistake cost us, and are we willing to endure that cost?

1:02:30.1 Adi Polak : Like in games, maybe there's like, you know, some text is wrong there, then the village, like they talk, they start talking Spanish, and we don't speak Spanish, let's say. And it's some glitch in DLLM, that's okay. We can restart the game, you know, we can give them some feedback and go back. But what about the healthcare system? This is where I'm like, and budgeting and things like that, it's, we really need those deterministic fallbacks. We really need software that we can, we can trust, we trust worthy AI, essentially.

1:02:59.7 Olena Kutsenko : I totally agree, because in some industries, they don't forget errors. Like you cannot really make an error. And actually, it's even like, okay, we're talking about healthcare, but there are some industries which we run into every day, like finances or like manufacturing, like industry for cars, you cannot really make errors there. Because fixing those will be quite costly. And sometimes it's actually like, if you make an error, it's a life or death question. 

1:03:28.3 Adi Polak : Right. I don't know if you know this, but in San Francisco, and not only San Francisco, the Bay Area, and now also SFO, like the airport in San Francisco, there's this car called Waymo. It's completely autonomous. You can book it through the app. It will come pick you up with your name and the favorite color that you choose. It uses NFC technology, so once you put your phone next to the door, it immediately opens up and you can go in. And then you sit down, it asks you if you're ready. You click like, yes, I'm ready, and it drives. No driver, no one sits in the front. You just go. 

1:04:06.5 Olena Kutsenko : How can I try it out? I really want to do this. 

1:04:09.9 Adi Polak : So first of all, come to SFO. I'll share the link to the app. And yeah, it's, I mean, okay, so you say, how do I try it out? I asked my parents-in-law. I was like, I'll take you on a Waymo. I was like, no, no, no, no, no, no, no. Please don't. We're okay with normal cars. 

1:04:32.8 Olena Kutsenko : But to be honest, I must admit that those cars. I am super curious. I want to try it. I usually don't travel as far as San Francisco, but maybe I should do it just for that case. 

1:04:41.3 Adi Polak : All right, you will. I'll take you. I promise.

1:04:43.6 Olena Kutsenko : But then I heard recently that they are thinking that we can fly without their pilots in the airplane. And that made me think that I'm on the side of your relatives.

1:04:58.4 Adi Polak : That's next. Elena, thank you so much for stopping by. I'm super excited for your talk. And yeah, I'll see you around.

1:05:04.4 Olena Kutsenko : See you around.

1:05:05.1 Adi Polak : Cheers. Wow. So much goodness in the book is amazing. If you didn't get a copy yet, you should definitely consider getting a copy. It's the second edition. There's a lot of good things that was already built into it. And now I'm super excited to bring in my second, my second? We're already at third. My third guest. Hello there, buddy. So good to see you.

1:05:24.7 Scott Haines: Good to see you too.

1:05:25.5 Adi Polak : I love the green bracelet.

1:05:26.9 Scott Haines: Thank you. I got it on Bourbon Street yesterday for $40. 

1:05:30.1 Adi Polak : Ohh!

1:05:31.8 Scott Haines: It's a very good deal.

1:05:32.5 Adi Polak : Tell me more, it looks really nice.

1:05:35.2 Scott Haines: I'm very nice when I talk to people. And somebody came up and they're like, so I got hosed kind of. It's like, oh, cool. This is nice. Thank you so much for this free gift. And it's like, no money. Give me money. And so that was the start of my day yesterday. 

1:05:50.4 Adi Polak : Okay.

1:05:50.6 Scott Haines: But now I'm wearing it because I paid money for it. And I'm like, I kind of like it. It's nice.

1:05:53.2 Adi Polak : It's really nice. It suits you.

1:05:55.5 Scott Haines: It's plastic, but it could lie and say it's actually jade, which is nice.

1:05:59.5 Adi Polak : It's a nice green.

1:06:00.5 Scott Haines: Brings me peace.

1:06:01.2 Adi Polak : It's a nice green.

1:06:02.7 Scott Haines: Yeah. 

1:06:03.1 Adi Polak : And you had a packed house this morning.

1:06:04.6 Scott Haines: Thank you.

1:06:05.1 Adi Polak : So I don't know if you know, Scott. Scott, maybe you can say a couple of words about yourself.

1:06:07.9 Scott Haines: Yeah. So my name is Scott Haines. I've been in the streaming space for 18 years. I've gone through like every kind of major streaming platform over the years, like starting with Storm really, really way back in the day. Kafka is kind of almost like pre-release, pre-1.0 when I was at Yahoo. So there's a lot of kind of sharing of data and everything else like when I was like in the Yahoo days of like 2012 to 2015 when I was there. 

1:06:35.2 Adi Polak : The exciting days of Yahoo.

1:06:36.3 Scott Haines: Yeah. It was like that. I'm like, it was also like the days of like, no SQL revolution and like all these things that we've kind of gone to, gone away from, come back to. And, you know, now we're back at tables and tables, which is great. So, but yeah, I write for fun and I make hot sauce.

1:06:52.7 Adi Polak : Yeah. Tables. I like tables for tables. I also want to say that Scott is a published author. And he published a bunch of books for O'Reilly, like the definitive guide for Delta Lake and a bunch of other stuff. If you didn't get a chance to read it, you should. It's... 

1:07:08.7 Scott Haines: Thank you. 

1:07:09.1 Adi Polak : Yeah, it's a really great technology. And Scott is a great author and there's so much to learn from you. So thank you for putting your knowledge into paper.

1:07:15.7 Scott Haines: Oh, thank you. You as well. I think it's an interesting thing. Like we, I've been talking to a lot of people about just writing in general and there's a lot you put into it. There's a lot of work that goes into it. And then like, you'll find a lot of people that it's like, oh, hey, I read that book and it changed my perspective. And you're like, yes. Like, literally, if that's like the legacy you leave, it's wonderful. So thank you also for your books as well.

1:07:36.4 Adi Polak : Oh, my pleasure. It's sweat and tears and a full labor of love. People, you know, it's put so much effort into it.

1:07:43.6 Scott Haines: So lots of long nights, you don't get weekends, but you also learn a lot along the way, which is great.

1:07:48.7 Adi Polak : Exactly. It's a good opportunity to dive deeper into spaces that you don't always get to have on a day-to-day basis.

1:07:54.6 Scott Haines: Yeah. It gives you, I don't know. It's interesting too. It's almost like, I think about it kind of as meditative a little bit. It's like you spend so much time focusing on something that you can really just kind of, you think about like the onion type analogies and everything else. It's like, you really kind of go down to the core of some idea. Right. And it makes it really easy to kind of, you know, share that with other people, share your mental models, and then teach them things that are like potentially a cool trick that came out of nowhere. It's like, hey, this might work. Right. I don't know. I hope it does. I hope it works for you. It worked for me. Right. 

1:08:22.9 Adi Polak : Yeah. 

1:08:22.9 Scott Haines: And that's a lot of fun as well.

1:08:24.6 Adi Polak : It's interesting to me that you call it cool tricks, but in reality, it's experience, and it's highly respected, and it's a highly paid-for skill, data streaming, just so you know. So it's a great space to be at. How is Corrin so far for you?

1:08:41.9 Scott Haines: Corrin's been awesome. I'm an introvert, so it's like this morning we had the talk, got to literally open things up before the keynote, got to go to the keynote, set up with you as well, which is awesome, and then I've just been kind of mingling and talking with people and understanding. It's interesting. It's like we're so kind of far into the streaming game, or it feels like, but it's like a lot of companies have not got there yet. 

1:09:04.9 Adi Polak : Right.

1:09:05.2 Scott Haines: And so it's an interesting kind of—it's interesting to have those conversations and talk with people about what it takes, what the steps are to go from a kind of batch-oriented mindset to something where it's like there's a continuous flow of data, things will go wrong, like engineer for that, engineer for things to go wrong. It's not going to be a surprise. It's just part of the natural behavior of all these systems, and so it's interesting when some people get it, some people kind of glaze over, and that's okay. It's all just different parts of that learning path.

1:09:35.6 Adi Polak : Right. It's very interesting. I can also say, you know, I came from, my background is in AI, and then I moved to the analytics side of the house, and then I progressed into streaming, and I must say that it's, for me, the best way to go about it is actually to forget a little bit what I knew before around analytics role and just embrace what comes and learn all the patterns and learn the best practices, the architecture, how things work, because I think this is when you actually understand why decisions were made the way that they're made, because in the analytics space, decisions are made to reach a specific criteria, which is I don't want to scan my whole data lake, database, lake house, whatever storage that I have. I just want to pinpoint, I want to have an efficient query. I just want to pinpoint specific data from there. I want to have great indexing and so on, and in streaming, it's actually not like that. You've got to go through all the data that's streamed into your system. So that blew my mind.

1:10:35.5 Scott Haines: Yeah, I think it's also interesting, too, because you think about things like in batch, right, and you have the ability to go back. Time doesn't matter, and so it's like, cool. I think it's one thing I talk to a lot of people about where it's like there's a reason we call it time travel because we're going back to a prior point in history to essentially replay data, do something else with that, but it's also interesting because you have to really be concerned about time when your data is actually flowing because we talk about outages and things like that, but it's like in a lot of cases, your outage means there's a vacancy of data. It's an empty set for a certain amount of time if you haven't thought things through, and in a lot of times in batch, you can always go back to a source that exists in some place, but there's that kind of like ephemeralness of streaming data, which is I think a harder part for people to kind of understand.

1:11:22.3 Scott Haines: It's like, you know, I think unbounded tables was a really good way to explain it to a lot of people, and then even from the keynote this morning, right? It's like data comes over time, and it's sort of stacked on this table, and I think for that, people kind of understand this conveyor belt of information, but it's a lot harder in practice when it's like there's a lot of anxiety, right? Right. And one of the things when I was at Nike, teaching teams how to use, how do I do streaming correctly? How do I work with Kafka correctly? How do I not break my brokers? How do I not just blow the bill, right? There's all these things that are hard, but it's like if you find a safe space for people, I always try and do that, and it's like you can screw up here, and that's completely fine. It doesn't matter. No one cares. Once it gets to production, it's like we can run game days.

1:12:09.6 Scott Haines: We can try it. We can fail, but we fail in that safe spot, and for a lot of teams, it's like the aha moment where it's like you're not going to get like, no one's going to get mad at you for failing in a test. It's like no one's going to get mad at you for trying beforehand, and that leads to a lot of really cool best practices and patterns. I want to call it the dumb pattern I showed this morning.

1:12:28.7 Adi Polak : Yeah. Tell me more about that.

1:12:29.7 Scott Haines: Yeah, so like eventual extraction is sort of like eventual lazy execution. It's like I have data. It's in a proper format. I will talk about proper formats. It's structured data. It can be Avra. It can be Pridavav, but we know it's good. 

1:12:44.5 Adi Polak : Okay.

1:12:44.5 Scott Haines: A lot of times, that's up to, essentially, your ingestion APIs or any kind of client that you're building for Kafka for your producers. If good data is flowing into your system, you don't have to try as hard, and if I could do one thing well and give everybody the advice, it's like doing the upfront work. Don't try and fix it after the data's landed because a lot of times, it's too hard to go back and do anything else with it at that point in time. If you know it's good, you don't need it yet, or if you want to go back and like, so Jay was talking about any kind of redriving, reprocessing takes a long time. Going back to historic data takes a long time, but certain data makes it really easy for you to kind of reach in and kind of get to the specific columns of data that you want in a much more beneficial way. So if you think about Arrow for data in flight that's columnar, we didn't really have that. 

1:13:37.4 Scott Haines: Like RK, it's like, well, we have that once stuff lands. And so it's like we're going from row-oriented data, so whether it's Avro, whether it's JSON, whether it's PredoF, anything else like that, and then we're going to like now it's like, oh, now it's column once it's been landed. And so this kind of takes us to like table flow as well, where it's like, well, what if you had the ability to kind of go back in time, but you could also very optimally pull data back out as like a column of very specific information that you want? And I feel like, I don't know, nowadays, that ends up being something that's really good to have that saves you a lot of time and effort, but it also requires structure. And so that's kind of like where it's like, well, things can be easy. Things can be, you know, there's a lot of different things.

1:14:19.8 Scott Haines: But it's like if you want to have something easy, it's like you're going to optimize for something, right? Do I care about reading? Do I care about writing? Or do I care about retrieval? And I think for a lot of systems nowadays, it's like, I kind of want it all. 

1:14:32.2 Adi Polak : Right.

1:14:32.7 Scott Haines: It's like best of all worlds. 

1:14:33.7 Adi Polak : We're spoiled. 

1:14:34.3 Scott Haines: Yeah, we are spoiled. But even for an agent and things like that, I want to be able to, if I'm going to go get data, and I want to go get that data from the screen. 

1:14:43.5 Adi Polak : Right.

1:14:44.9 Scott Haines: It's very nice to know where it is.

1:14:46.5 Adi Polak : Yeah. 

1:14:46.6 Scott Haines: So, yeah, I don't know.

1:14:48.2 Adi Polak : And there's one thing you mentioned, there's a wishful thinking that I'll have good data, and schema would not break. There won't be any major changes. And all the upstreams, applications that generate that data will have the same schema all the time, and will be super coordinated. And that's wishful thinking. 

1:15:12.5 Scott Haines: Yeah, that's not reality. 

1:15:13.6 Adi Polak : Put it that way.

1:15:13.4 Scott Haines: Well, I think it's interesting too, because there's been a lot more kind of conversation recently, or it's just the algorithm feeding me information. A lot of people, it's like, well, immutability doesn't matter, but also very specific mutability matters. It's like, if I think about it, say you think about type widening.

1:15:34.2 Scott Haines: The direction in which I do type widening matters a lot more. But if I'm changing a field from a numeric field to a string type, I'm going to break everything. And so there's certain things where it's like there's rules and behavior, and there's invariance in that whole entire system.

1:15:49.1 Adi Polak : Yeah.

1:15:49.5 Scott Haines: And that I think is a harder thing to teach as well. It's a harder thing to kind of think about and frame well for streaming, which is why we have protocols.

1:15:57.7 Adi Polak : Right. Just to help some folks, maybe some folks don't know what that type widening is. But essentially, when we have different formats in the data world, like Avro and Parquet and so on, there are some rules around which types we can just have in there without changing the schema, necessarily. So for example, int can become a float in Avro. I'm looking for other examples. It's like an array can become byte. I don't remember everything by heart, but I do remember there's a bunch of them.

1:16:25.9 Scott Haines: There's a couple of ways of skipping around as well. Floating point, RF floating point, you can change the precision. But I think some of the other ones are a little bit more complicated. And it's also like, I think that that standard is still kind of evolving. So what works in Iceberg might not work in Delta. And these are things that I feel like are going to get solved over the next year anyways. Because I think all of the stuff is going to go into Parquet anyways, and then it solves the problem for everybody else, which is always great.

1:16:56.4 Adi Polak : Scott, thank you so much for stepping by. Thank you for sharing your brains. Thank you for a great talk this morning. And thank you for being a grand friend and a partner for Corinth. It was a pleasure hanging out and watching. And watching the keynote together with you.

1:17:13.3 Scott Haines: Awesome. Thanks for having me on.

1:17:14.6 Adi Polak : Of course. See you later. AlL right, Well, that was Scott Haines. It was super cool conversations, I must say. I hope you at least got a good glimpse or the whole conversation. He's super experienced, 18 years in the industry, and has been doing data streaming since forever. And now... 

1:17:34.5 Joseph Morais: I'm back. 

1:17:34.9 Adi Polak : He's back. Look at this co-host. 

1:17:39.4 Joseph Morais: Adi, thank you so much. I know we don't play favorites here, but who was the favorite person you talked to today?

1:17:43.9 Adi Polak : We don't play favorites here. I'll say Plushie, Batchy. Sorry, I meant Batchy. There is a scroll there, I don't know if you can see it, Stranger Things. I'll take these as my favorites. 

1:17:55.5 Joseph Morais: Okay, there you go. So, your favorite is the swag. I get it. It's great swag. Thank you so much for taking over for me. Were there any things that came out of those conversations in particular that were just...

1:18:08.4 Adi Polak : Yes. I mean, first of all, sometimes it feels like data streaming is new, but I just had a great conversation with Scott and someone's like, yeah, I've been doing it for 18 years. It was like, if we call that thing new, well, 18 years, it's kind of a long time. But definitely what happens now is the industry, we see more and more people get into data streaming. And we see it here in the room, the excitement on stages and talks and hallway conversations and the expo hall among other folks as well. So, yeah, it's great to see it's growing. 

1:18:37.4 Joseph Morais: Even though he's been doing it for 18 years, it still feels like the industry, it's still brand new. So we still got a lot of work to do.

1:18:43.7 Adi Polak : Yes, we do.

1:18:45.2 Joseph Morais: Thank you so much.

1:18:45.7 Adi Polak : Thank you so much for having me. It was a pleasure.