Life Is But A Stream

Ep 2 - Processing Without Pause: Continuous Stream Processing and Apache Flink®

Episode Summary

We’re diving even deeper into the fundamentals of data streaming to explore stream processing—what it is, the best tools and frameworks, and its real-world applications.

Episode Notes

We’re diving even deeper into the fundamentals of data streaming to explore stream processing—what it is, the best tools and frameworks, and its real-world applications.

Our guests, Anna McDonald, Distinguished Technical Voice of the Customer at Confluent, and Abhishek Walia, Staff Customer Success Technical Architect at Confluent, break down what stream processing is, how it differs from batch processing, and why tools like Flink are game changers.

You’ll learn:

About the Guests:

Anna McDonald is the Distinguished Technical Voice of the Customer at Confluent. She loves designing creative solutions to challenging problems. Her focus is on event-driven architectures, reactive systems, and Apache Kafka®.

Abhishek Walia is a Staff Customer Success Technical Architect at Confluent. He has years of experience implementing innovative, performance-driven, and highly scalable enterprise-level solutions for large organizations. Abhishek specializes in architecting, designing, developing, and delivering integration solutions across multiple platforms. 

Guest Highlights:

“Flink is more approachable because it blends approaches together and says, ‘If you need this, you still can use this.’ It's the most powerful at this point.” - Abhishek Walia

“If you're somebody who's ever gone from normal to eventing, at some point you probably would have gone, ‘When does [the data] stop?’ It doesn't stop.” - Anna McDonald

“ Start with a fully managed service. That's probably going to save a lot of cycles for you.” - Abhishek Walia

Episode Timestamps

*(01:35) - Anna & Abhishek’s Journey in Data Streaming

*(12:30) -   Data Streaming 101: Stream Processing

*(26:30) -   The Playbook: Tools & Tactics for Stream Processing

*(50:20) -   Voices from the World of Data Streaming

*(56:13) -    Quick Bytes

*(58:57) - Top 3 Takeaways

Links & Resources:

Our Sponsor:  

Your data shouldn’t be a problem to manage. It should be your superpower. The Confluent Data Streaming Platform transforms organizations with trustworthy, real-time data that seamlessly spans your entire environment and powers innovation across every use case. Create smarter, deploy faster, and maximize efficiency with a true Data Streaming Platform from the pioneers in data streaming. Learn more at confluent.io.

Episode Transcription

0:00:00.2 Joseph Morais: Welcome to Life Is But A Stream, the web show for tech leaders who need real time insights. I'm Joseph Morais, technical champion and data streaming evangelist at Confluent. My goal, helping leaders like you harness data streaming to drive incident analytics, enhance customer experiences and lead innovation. Today I'm talking to Abhishek Walia and Anna McDonald. Anna is the Distinguished Technical Voice of the Customer here at Confluent and Abhishek is one of our great Staff Customer Success Technical Architects. Our first three episodes of Life Is But A Stream explore the fundamentals of data streaming. If you haven't checked episode one yet, we highly recommend you pause this episode and come back after you've listened. Trust me, you don't want to miss out. In this episode we'll explore the basics of stream processing, everything you need to know, no matter your level of expertise. You'll hear us talk about different tools, strategies and hear real world examples of data streaming in action. But first, quick word from our sponsor.

0:01:01.7 Ad Read: Your data shouldn't be a problem to manage. It should be your superpower. The Confluent data streaming platform transforms organizations with trustworthy real time data that seamlessly spans your entire environment and powers innovation across every use case. Create smarter, deploy faster and maximize efficiency with the true data streaming platform from the pioneers in data streaming.

0:01:32.8 Joseph Morais: Welcome back. Joining me now is Abhishek and Anna from Confluent. Abhishek is a Staff Customer Success Technical Architect and Anna is our Distinguished Technical Voice of the Customer. How are you both doing today? 

0:01:36.0 Anna McDonald: Excellent. Go Bills, let's do it.

0:01:50.5 Joseph Morais: How are you, Abhishek? 

0:01:51.9 Abhishek Walia: I'm doing fine as well, thank you.

0:01:53.8 Joseph Morais: Fantastic. Well, you know, I do this, you know, this is our second episode, but I have this thing where I immediately kind of go away from the questions and just kind of speak my mind. So I'm going to do that. I just want to thank you both for how much time you spend with our customers and how much you nurture them and, you know, really help them get through their challenges. So I was a TAM back at AWS, so I know exactly what you guys do for a living and I really appreciate how much you kind of, you know, make sure that our customers are in fact successful. So, you know, I want to introduce you both to the audience. So starting with you, Anna, what does your role call the Confluent? 

0:02:29.5 Anna McDonald: Yeah. So basically what you find is, it's unfortunate, well, it's kind of a rule of life. The more time you spend away from something the less natural it is. And I don't want to say disconnected, because I think it does a disservice to the people who try their best to stay connected. But if you don't live something every day, it's amazing. You know, they call it a diminishing skill or a diminishing point of view. And so one of the things that we know in the field is we spend every day, all day with customers. It's what we do. We live it, we eat it, we breathe it. And so it's impossible to not understand their point of view. So being the technical voice of the customer and in our office, what we do is we bring that point of view everywhere into Confluent, whether it's engineering, whether it's product. You know, we work with our customer advisory boards, we, you know, advise on use cases, and it really just makes it so what we're doing is solving our customers problems in the right way and we're bringing value to customers every day to make sure that what we're doing is usable, it's awesome for them. And we're working together as a team, so that's kind of what we do.

0:03:36.4 Joseph Morais: You're ensuring that voice is heard at every level of the company.

0:03:39.3 Anna McDonald: Correct. And I think the technical part is important too, because it's not just, "Hey, they need to do this," it's, "Oh, well, they also need to do it with this networking type," or, "These are the configurations you have to have because their use case demands it." Things like, how fresh is my data? How often does this happen? And you really have to have an understanding of the engineering side from Kafka internally, all of our platform and also customer engineering side. So it's kind of a unique role and I kind of like it.

0:04:05.8 Joseph Morais: Yeah, it's a great role for you, Anna. Quick, can you clear something up for me? So you're a Buffalo Bills fan? 

0:04:12.2 Anna McDonald: Huge. Bill's mafia.

0:04:13.7 Joseph Morais: Is the Buffalo Bill, is that like a really cute Buffalo name, Bill, or is it a guy named Bill was really in the Buffaloes? 

0:04:20.7 Anna McDonald: No, it's neither.

0:04:21.9 Joseph Morais: Okay.

0:04:22.5 Anna McDonald: Like Buffalo Bills, or you can call them Bisons. It's spelled with a Z. Look it up, I'm from western New York, that's how we say it, deal with it. So, you know, yeah, that's what it is, it's a pack of us. That's why they say circle the wagons, right? No one circles the wagons like the Buffalo Bills, baby.

0:04:39.7 Joseph Morais: I love it. Thank you so much for that. So, Abhishek, you're a siesta. And some people might mistakenly think that your job involves a lot of naps, but I know that's not the truth. Tell me, tell the audience what you do here at Confluent.

0:04:50.5 Abhishek Walia: Well, I hope that included a lot of naps. That would be just salary and naps, that would be awesome, right? Who would want it? But yeah, I am a customer success technical architect working with Confluent, I think more than five and a half years now. The job basically at this point is working with the customers like what Anna said, right. Mentioned that we make sure that the customers are successful, that's the biggest priority. And they are happy in that order, right? They have to be successful first, they'll automatically be happy there, next. That's basically their job to make sure that they're successful and they're happy and if they have any needs, they're looking for something, there's a requirement from there and which might happen in the next three to six to nine months. We are the ones who take that in and make sure that the product teams and the engineering teams hears about it and are able to create a roadmap according to that. If there is something missing, we should be able to deliver that in a succinct time frame for the customer. And that's what our job needed to be is.

0:05:51.9 Joseph Morais: Anna, in your words, who are our customers and who aren't our customers? 

0:05:56.5 Anna McDonald: So I think that's a good question. I wanted to go to the field from engineering because the world is my customer. There's nothing that I'm not interested in really when it comes to like real world use cases. I often say, like, my value to the world is like pretty low. If the power gets cut, there's like a global darkness, what have I done? I don't make mattresses. You know what I mean? Like, it's rough. And so the closest I can get with the skills the guy gave me is to help those people, the people that are making 3D objects. Like to, you know, kind of power the world. So I think, you know, you run a risk if you, when I look at customers, I look at industries. These industries have facets, they have characteristics. There are of course, individual differences, but by and large they move as one, some of them are lagging, some of them are leading. And in that way, everyone is our customer because we have customers across every industry. And if we have a new customer that's in a single industry and we focus on them at the expense of, okay but what does everybody do? Right? Like, is that normal? That's always what to ask ourselves, right? 

0:07:08.7 Anna McDonald: Is to weigh that, you know, the good of everybody, all these customers against a single individual. Especially when it comes to what are we doing in Kafka and Apache Kafka, right? Or Kafka Streams or any of our open source, you know, flink these libraries, like we really want to improve those for the masses. And so I think if you consider the world, your customer, as a source of information, not necessarily, like, are we doing a ton, like going above and beyond for people who aren't paying for things like my heat and power, speaking power. Right? You're focusing on serving those people obviously, first and foremost. But I think you need to also consider the world as a data point.

0:07:49.5 Joseph Morais: Yeah, I really like that response. We're making data streaming accessible to anybody, regardless of what you do. I think that's a very good take. So when were you both introduced to data streaming and event driven architecture and a bonus, have you or any of your past teams ever had the pleasure of running open source Kafka yourself? Let's start with you, Abhishek.

0:08:07.6 Abhishek Walia: Yeah, I was introduced to data streaming and it's a pretty interesting thing because when I was working on this, I never realized it was actually event streaming, event sourcing, those kind of architecture models. And when we started using it, it was like, yeah, this is awesome. Everything is available right out of the box and you didn't have to contact different teams and different, again, anyways, this was about seven years ago with my previous employer and we started looking into ingesting big data streams into Kafka. And basically we started with a smaller ESP which was not able to handle the load, we knew this is, eventually it fell over and we kind of knew that we needed something like Kafka to manage all that ingest the power that it gives you. I kind of touched Kafka a little bit before this, but Confluent was like, "Hey, yeah, you're dealing with Kafka, and the ancillary of things around that." But before that I only used Kafka, but as soon as I joined conference I was like, "Oh, this ecosystem is much bigger than just Kafka. It's not just Kafka."

0:09:16.6 Joseph Morais: Right. You start with event driven architecture, then you get the data streaming, I guess those are correlated. And then you start working yourself up to DSP.

0:09:24.3 Abhishek Walia: Exactly.

0:09:25.3 Joseph Morais: So Anna, how about you? When were you first introduced to data streaming and event driven architecture? 

0:09:28.4 Anna McDonald: I mean event driven architecture has been around forever. So that was like a way, way back. And I'm old, I'm 46, you know. Well, I'm proud of that man. Like that's awesome, I can't wait. Like I get excited every time I get a gray hair. So I'm old school, but I would say the fact like Kafka makes it possible to do eventing in an elegant way.

0:09:49.1 Joseph Morais: Yes.

0:09:49.8 Anna McDonald: In the real world. So you in like a greenfield environment, if you're going to pay people to you know, sit around and like architect and you have no brownfields that, like you've never had a mainframe, you have enough, tons of time. Like, yeah, great, that's not how people live. You've got like 75 order systems because your company bought, you know, it's a mess. And I was always just kind of like, I would sit there and I'd go, "Yeah, that's great. Except like, well, here's reality." And when I met Kafka and I would say that was around early 27, maybe late 2016, early 2017, I was like, well, "Oh wait, this has legs." Like it was the first time I saw something that one could take the data volume, right. If you're going to pump like every, like we truncated tables every night and reinserted, try doing that with CDC WISPs by the way, no keys. So I don't want to hear anyone complain like, you know, prime, we didn't have keys and we did it. It's like I'm like, you know that old woman back in my day. But you could take sorts from like eight different databases, pump them all together right? And then skillfully create these events, so it didn't matter all of your baggage, you could do it slowly, could do it when it made sense. And I think to me that was really a game changer in believing and seeing a way that eventing could really benefit and be used in the real world.

0:11:08.9 Joseph Morais: You know, I know what you're saying. You know, most, I think most of our customers are brownfield. Like there's a lot of great startups and things like that and they have the advantage of starting green, but even they have to kind of be concerned about future state because if you're a really successful startup, you may someday acquire a company and suddenly you have all that brownfield baggage. So you really have to future proof your data streaming scenario architectures to take it, take in account, hey, I may have legacy systems. I need to introduce this someday, maybe not. But you certainly don't want to put yourself in a position that you can't do that, right? 

0:11:41.8 Anna McDonald: Look, everybody started as greenfield.

0:11:45.8 Joseph Morais: Fight that, that's true. It's true.

0:11:47.3 Abhishek Walia: Like I said, right. The ESP piece, we started with the standard ESP, it was not able to scale and we realized that pretty quickly. So I mean, a lot of times people just don't realize that the scale they're going to get to, like you just mentioned, right? The startup world, they'll have a minuscule scale and suddenly they realize that there is so much influx coming in that now the scale is much bigger. Those are the times when you realize, oh, I should have probably thought of this a little better or a while before.

0:12:18.8 Joseph Morais: Yeah. Undoing those choices is non trivial. I mean, almost at any scale, but certainly at some of the enterprise scale that we deal with here at Confluent with our customers. So now that we know you both better, Anna, I'd like for you to help break down the fundamentals of stream processing, building off of what we did in episode one with Tim where we talked about data streaming. So Anna, can you explain for those listening and watching what stream processing is and how it differs from batch processing? 

0:12:45.5 Anna McDonald: Yeah, the way I like to say it, and I just had a really great conversation with my best friend Matthias J Sachs, Dr. Matthias, as he's known today, about this. And one of the things in eventing, I think it's a concept that make, people find it super hard to get your head around if you're like an object oriented person, is there's no pause like in data streaming, there's no pause, you can't ever be certain. And when you think about a batch job, and I'm old, like I legitimately have two pagers and two flip phones in like a basket because it's fun to keep them from being on call. Like batch jobs, like you're throwing that in cron, there's a huge pause. That's why you have time, you know. And so I think to me the big difference in mindset and how you architect and how you plan is the fact that there's no pause, there's no pause in stream processing. You're continually ingesting data. And so it's not like I do all this stuff, then I move to the next step, then I move to the next step, then I'm, and that can be incredibly difficult to get your head around.

0:13:47.0 Anna McDonald: And to me, my mind, there are still use cases which are very difficult to event stream or use stream processing on because of the complexity. And not to say that there isn't a future for those, I think, you know, with some fiddling, you know, there's some things on the horizon that could make that, you know, much, much easier and more elegant. But, you know, that's the main difference to me. And I know that probably sounds very abstract, but if you're somebody who's ever gone from normal to eventing, at some point you probably would have gone, when does it stop? It doesn't stop. I need it to stop, so.

0:14:20.9 Joseph Morais: I like that. That key is, there's...

0:14:22.4 Anna McDonald: You'll understand what I mean.

0:14:24.0 Joseph Morais: Yeah, right. Just, can I get a pause? Like, can you get a second so I could do something? 

0:14:27.5 Anna McDonald: Correct.

0:14:28.4 Joseph Morais: No, that's the whole point. You have to build your system to handle that pauseless world, right? Just a deluge of events.

0:14:35.3 Anna McDonald: Correct.

0:14:36.0 Joseph Morais: That's the key to real time, to building things out of real time, though. So for our viewers and our listeners who may be a bit more familiar with data streaming but not stream processing, how does stream processing differ from processing through the consumer producer, from a consumer producer application using those standard data streaming APIs.

0:14:55.4 Anna McDonald: Yeah, I mean, I think the difference is that, I'm a consumer, all I'm doing is ingesting information, right? I'm just getting some information and doing something with it. If I'm a producer, all I'm doing is producing data, I'm giving data away, here it is, look at it, it's beautiful, right? If I'm doing, usually when we say stream processing, what we mean is I'm consuming data, I'm doing a transformation to it, and then I'm giving it back to the world. So it's a coupling. And that's usually the definition that people say, it's really that mutability in the middle where I'm enhancing, I'm enriching, I might be subtracting, I might be cleansing something, removing PII. It's not always additive, but there is a transformation that occurs before I'm sharing that back out to the world.

0:15:44.4 Joseph Morais: Now, can you share a real world example of stream processing in action? I know you probably have a bunch.

0:15:49.8 Anna McDonald: I do. I have one favorite one. Maybe I use it too much, but it's still my favorite because it's like Darth Vader, you know, those stream, you know, like sleep apnea masks. I think that's a great one because, you know, it's gotten to the point now where people are not only, you know, having this information collected that can go with your consent automatically to your doctor from a durable medical device, right? And we know that, you know, people say, "Sleep apnea, hahaha," but it can cause heart problems, it really, really can. And so these types of, you know, adjustments, these medical devices can really save people's lives. And understanding that information in real time, streaming that information, knowing what's going on is huge. It's huge for you know, a number of the pattern matching all of these things. And that wouldn't have been possible before, so I think that's a real world example. One of my other favorite ones is, I kid you not, there, and I can't say who there is a department of transportation in a state that can detect if somebody drops a couch on, like the expressway? Yeah, totally.

0:16:54.2 Anna McDonald: So they have sensors built up and they can tell, tell if there is like a, like basically they can tell if there's a decrease in speed. Now it could be for something else, like an accident, you know, da da da. But my favorite example ever was they were like, well we knew you did a good job when we detected that somebody threw like a huge love street, like leather love street in the middle of the expressway because everyone was like slowing down and going around it. I thought that was awesome.

0:17:19.1 Joseph Morais: Oh, I love that. Like I could imagine seeing a traffic warning saying heavy traffic ahead, you know, couch in the.

0:17:25.1 Anna McDonald: Correct.

0:17:25.8 Abhishek Walia: I see out of the box thinking it, if nothing else.

0:17:28.5 Anna McDonald: Yes, exactly, yes. So, so I mean it's everywhere that you could possibly imagine.

0:17:35.3 Joseph Morais: So I really like your first example. I actually do have sleep apnea, I use a CPAP machine so for, for any of our viewers and listeners, our devices, at least from this particular manufacturer, they can phone home. I think they have like 2G, 3G access and they send data, opt in data to our doctors, you know, to make sure, are you wearing it? Is it effective? Are you still having events in the middle of the night and then your doctor can remotely make changes to that. So it's really a fantastic example. So I appreciate that one, Anna. Can you explain how real time stream processing helps organizations out compared to traditional processing methods? 

0:18:12.7 Joseph Morais: Yeah, absolutely. So traditionally, and I'm going to use her real name, there's a lady named Ethel who like gets into work every morning and she's like what went wrong? Like who do I need to call today? Now, you get an alert on your phone instead of talking to the lovely Ethel. I mean that's the difference is, you know, for example, like something simple like banking, you know, like setting an alert if a payment or withdrawal goes through over a certain threshold, then getting that alert in real time. You said checkbooks, man. You know, like, you know, you'd have to go to a bank teller or an ATM, you know, to try to figure this stuff out. I am an introvert, so talking to people is always exhausting. You know, you're welcome, I'm on this. But the, yeah, it also allows you to have, like, flexibility in your life. Like, I remember when ATMs, and I use that example a lot like ATMs first came out. You don't even have to go to your bank branch anymore, right? Like, what if it's far away? It's the same, you know, if you have, if you, you're busy or you want to do like, you know, go, you got kids, you got something, you can check these things.

0:19:16.6 Anna McDonald: You're not going to get, like have fraud that goes on for a month because, oh, I couldn't make it to the bank today to check my balance or do, you know, it's about bringing value to your life, where you are and where you're living. I will say that I think that it also should, and this is something I recommend to everyone, it's a new year, new you. It should give you more time to also unplug. So there's a tendency of people to say, "I can do everything everywhere. I'm going to keep doing everything everywhere without taking a break." You know, like, for me personally, I like to go antique shopping into, like, Goodwill, and I like to find weird stuff and look up the history of it. It's nice, it's relaxing. So make sure you're, like, using those gains to bring some, like, zen into your life. It's very important. That's my advice.

0:20:12.1 Joseph Morais: That's really good. I never thought of that, stream processing is like making our life's tasks more efficient, right? So we get time back so we can do the more fun things like.

0:20:19.9 Anna McDonald: Yeah, do something good with it, though. Don't do more, more, more, more, more, more, more, more, more. Until you unplug.

0:20:26.6 Joseph Morais: Right. You're like, "This is efficient, let me take some time back for myself." I like that. That's a very good mantra. So, Anna, what is the, again, this show is for someone who may have no experience. So this may seem like a very lay question, but I still want to ask it anyway. For anyone who's here exploring all of this for the first time, what is the role of event driven architecture in stream processing.

0:20:46.9 Anna McDonald: Yeah, I don't think it is though. Like, that's a good, it's, yeah, it's not an easy question to answer always, we get asked that, Abhishek knows this people, you know, they mix the terms up, which is fine also, like, what are you, the grammar police? As long as people understand the concepts, you're good. But the difference between, and it's not even a difference like data streaming. It's just data that's streaming in, it's coming, now that data could be events or it could be something else, it could just be observations, eventing. The easiest way to tell if something's an event is, it's something that happened. It's in past tense, simple English. So when you look at a message, is it telling you that something happened in simple English or is it just a payload? Is it just an observation? Is it just a re, you know, that's the difference. An event driven architecture is a way to, now that we can observe in real time the world, we gotta be able to do something. How do you find a construct that lets you do something with all of this data? Now that I know what's happening, like, it's kind of big brothery, but now that I can watch everyone sleep at night, I should probably do something about it. So that's where event driven architecture comes in.

0:22:00.6 Joseph Morais: Would it be fair to say event is really data plus a timestamp? 

0:22:04.9 Anna McDonald: No.

0:22:05.6 Joseph Morais: Okay.

0:22:06.1 Anna McDonald: It's really like, well, it's more like, for example, to use the sleep apnea thing. Right? It's more like, you could say, like, let's say that there's a range of good. We want to fluctuate, we want to make sure that this is, you know, you look at oxygen saturation, you can look at like a lot of things, right? 

0:22:21.4 Joseph Morais: Sure.

0:22:21.6 Anna McDonald: Breaths per minute, like how long you've gone without breathing. I think that's one of the ones is like, you know, the interval in which you do not take a breath. And they measure that by oxygen saturation, I think. Please feel free to correct me if I'm wrong, I'm probably wrong.

0:22:34.9 Joseph Morais: You're onto it. Basically there's this like a...

0:22:37.7 Anna McDonald: Yeah, I'm not a doctor.

0:22:38.0 Joseph Morais: It's like how many [0:22:38.8] ____. I'm think you have, there's some that are obstructive, some are central, like.

0:22:42.1 Anna McDonald: Yeah, see, it's complex. I'm just making that way.

0:22:44.8 Joseph Morais: Probably an apnea show though, though.

0:22:48.5 Anna McDonald: Correct, Right? So the way that you detect that, is by sending these measurements.

0:22:51.5 Joseph Morais: Yes.

0:22:52.1 Anna McDonald: Now, an event that you would look at those measurements and you would go, "Oh, we are outside our bounds."

0:22:58.3 Joseph Morais: Yes.

0:23:00.1 Anna McDonald: I'm going to send an outside of bounds event. Data coming in. I'm able to do events. Now that's not to say you can't throw events directly, you can, but that's an example of the difference. One is the data, the other are the events that you're grasping from those data. Now, you know, not to say you couldn't do that on firmware, right? Not as flexible does, you know, it's probably not as flexible. Now you can make changes, probably more reliable, there's durable medical, you know. Maybe that's some of the adjustments we make. But whoever's responsible, the event is based on what's happened and what, and that sometimes comes from data streaming, if not an application directly. So that's the way that I look at it. Because what else would you do with all that data? You know what I mean? You need to do something with it.

0:23:41.1 Joseph Morais: Absolutely. Actionable data means action. So tell us about Kafka Streams. Anna, was KStreams the original stream processing option for Kafka? 

0:23:49.6 Anna McDonald: I mean, I think yes, I would go absolutely, yes, it was absolutely done with the intent of, here is a simple way to do event streaming. According to the Kafka protocol, inheriting all the goodness of Kafka along with some of the things in some use cases are not so good, because you're bound by partition. And I think the goal of it was EOS, for example, transactions. The first citizen of transactions was intended to be KStreams, that's how we get exactly one's semantics. And I think it's been hugely popular. I would say the right way to ask it is what you said with Kafka, you can always use external frameworks like Akka, Akka Streams has been out there for a very long time as well. There were other frameworks but I would say, you know, Kafka is the direct relative. Kafka Streams is a direct relative to Kafka. Now that being said too, it's only in Java, right? And so, you know, I don't think it wouldn't be fair for people to like, you know, for it's not an opinion that I have now. Do I love Kafka Streams? Yes. Did we win an Oscar? Which I believe is right back there for I believe it says best streaming library in the world and no, I'm sure I didn't order that also yes.

0:25:04.6 Anna McDonald: But it's a streaming library, It's a JAR file that's all it is, you know what I mean? And it makes it so accessible. That's where my heart is. When people get started, it's an easy way to get that mindset shift and then you, if you need other tools for other things, you can build on that. But I think it's amazing, you know, and again, it's just a JAR file.

0:25:21.5 Joseph Morais: Right? Easy way to get started and get used to building things without pauses, I like that.

0:25:25.9 Anna McDonald: Yeah. Absolutely. And that is 100% true. That's usually where you figure they're like, "But wait." You're like, "Ah, but there is no waiting."

0:25:33.0 Joseph Morais: No. So how do you think organizations will use stream processing in the future? Do you have any visionary use cases in mind that we can geek out about? 

0:25:40.5 Anna McDonald: I mean. Well, I think that there can, there are probably some really wonderful advances if done correctly and locally that can help with medical care. And we've already seen that where we see, you know, AI detecting cavities, for example. There are places where there is a very low per capita rate of dentists. And so, you know, and teeth, I mean that could, your mouth is like the gateway to a healthy body. It's just true. People brush and floss. But you know, there are places all over the world where medical care is in short supply or unreachable. And so those types, anything you could do with remote diagnosis, I think would be awesome and huge. I think that would be wonderful.

0:26:29.9 Joseph Morais: So our next segment is called The Playbook, where in this case Abhishek is going to dish out some winning strategies for getting old, tired, unmoving data in motion. So I'm very eager to have your insight here. So Abhishek, we're going to focus on you. How do frameworks like ksqlDB and Apache Flink differ in their approach to stream processing versus say KStreams? 

0:26:52.5 Abhishek Walia: Oh, that's a heavy one. Let's talk to the simplest one. The development interface for example, is very different for both of them. One like what Anna said, it's a JAR file. It's an embeddable JAR in your Java code. So you write a Java code just like you normally do and you run that in your code and it just does the stream processing in the background with the Java code itself. Flink and KSQL for example, are in a different realm altogether where they run as clusters, you run jobs there, they do the job, they event out the result set or whatever it is. The whole premise is that for me personally, this is one of the Biggest differences from a development perspective, that one is your regular native development interface as long as you work with Java, of course. For others they are like frameworks which are deployed somewhere else, clustered compute, for lack of a better term, they do that. Architecturally, I think they are a lot different as well. Like for example, yeah, the state management, for example KStreams, KSQL, both rely on Kafka for their actual persistent stores. Now both of them use RocksDB underneath as well for local storage.

0:28:16.0 Abhishek Walia: But again the actual storage is in Kafka. So if something goes wrong, something goes kaput, in that deployed code, when a new instance comes up, it feeds back data from that Kafka topic, rehydrates itself and starts working again. Simple there is no additional components. All you need is Kafka and you're good to go, basically that's it. If it's something like Flink, it has its own internal storage, so it does its own checkpointing, watermarking, storage, stuff, short retention, all that stuff is done in Flink. It works with Kafka because it has connectors to Kafka, the source, sync, both, but it's its own framework and its own deployable where it works. So little bit of a difference. It's a major one, but again from an operational perspective that becomes a bigger thing to understand, right? That you need extra storage on the Flink side, you need additional compute nodes, all that stuff for KSQL and Flink, whereas for KStreams you're just running it as a Java job, that's it, that's all.

0:29:20.5 Joseph Morais: Yeah, it's always in the detail. So let's spend a little bit more time on Flink. Do you think our squirrel friend makes stream processing any more accessible to users? 

0:29:30.5 Abhishek Walia: Yeah, I personally believe, I think so. This is where I love having Flink as one of those contrasting frameworks where it combines streaming, it combines to an extent batch processing as well into the whole ecosystem. And another thing which it does is it has the huge part about Flink which drives me towards it is Java and we have been talking about this, right? KStreams, Java only. KSQL has been, SQL, SQL only. SQL like only. Whereas Flink on the other hand is Java API, it has a Python API, it has a SQL interface. So it has multiple places where you can say, "Hey, okay, my Biz devs can just use SQL so they can go into that particular realm." The actual developers who love working in Java, Python, they can just go to that other side of it where they can work through the APIs native APIs and get the same job done in a different fashion, which is much more controlled on their end. But again, this is majorly, for me, I feel where Flink is a little more approachable because it blends both of those approaches together and says, "Hey, okay, if you need this, you still can use this, if you need this, you can still use this." And that's where I think it's the most powerful at this point.

0:30:53.0 Joseph Morais: Well, squirrels are usually pretty flexible, so it makes sense. It was a very appropriately picked mascot and to kind of, you know, just to add, I agree. I mean there's something very powerful to say, hey, you're back end engineers, your data scientists, even DBAs, all of them can interact with it how they want, right? Do they want to use SQL queries for the DBAs, do they want to write things in Python maybe for the data scientists? And you still want to work with, you know, the Table API in Java, you could do that. It's very powerful to have one thing that makes data accessible to all those different personas. So are there any specific use cases where someone should consider KStreams over Flink or vice versa? 

0:31:30.4 Abhishek Walia: One of the big ones that come to my mind is, and this is a little bit more involved here, right? And I think I was talking to Anna about this a while ago as well. The event time is something very different. The way how time works for Flink and KStreams is very different. So those can kind of change the way how you look at use cases, right? For example, the way how Flink can actually do its aggregation, the windowing and all that stuff is by wall clock time, right? It does use wall clock time to close all those windows, aggregation and stuff like that. Whereas for Kafka Streams by default it uses the event type for doing all that stuff. So it needs an event to close a specific window, it does not close that. So there are use cases which are driven towards that. There are use cases driven towards the wall clock time. So pick your choice according to what your needs are, and that is, this is very specific. Not everybody, when they start using right they realize that. But once you quickly start using it and getting into the deep end, you'll start realizing, oh, this is a little different. I'll have to maybe tweak my use case a little bit to work around that stuff.

0:32:44.9 Joseph Morais: Right. So when you really get into like the critical requirements for your use case, you start to look at, you know, those slight differences and figure out which one is the right one for my use case.

0:32:53.1 Abhishek Walia: Yeah. And the other big one. Right. And so this is where another thing for Kafka Streams, it was designed as a lightweight, pretty lightweight Java library, that's it. It was supposed to work with your Java code to get you into data streaming pretty easily, make it work with Kafka as easy as possible and get you get your job done. Whereas Flink had a compute intensive infrastructure in mind, it was built for compute, small, large, extra large, doesn't matter. The whole point was that hey, I'll give you a compute instance where you can do whatever you like. If you give me data, I'll give you the compute and get the output as necessary. So there is a little bit of a difference in the way how they are optimized as well for use cases.

0:33:44.5 Joseph Morais: So this next question is for both of you and we'll start with you, Abhishek. Can you share a specific tactic that has significantly improved the customer stream processing journey? 

0:33:55.7 Abhishek Walia: The easiest one to pick here is POC as early as possible as as you. Once you do the POC right, you'll realize what the immediate points of issues, contention points might be for me and how to resolve them, move forward. And once you get into the actual implementation stage of the design stage, you will basically start with much faster speed. And you just "Hey, okay, check, check, check. We've already done this, this is already done. We can move forward." The other big one is for me and I love telling my customers is observability, you started from the get go, you'll be set for life.

0:34:34.3 Joseph Morais: Yes.

0:34:34.8 Abhishek Walia: If you don't, you will be in a painful state when you will realize it and then the implementation is going to be much harder as well, at that point of time.

0:34:43.6 Joseph Morais: It's surprising how many people build things out without having metrics in mind. It's really great.

0:34:46.5 Abhishek Walia: Exactly.

0:34:50.4 Anna McDonald: I would say, I mean you're 100% right. The first question that you start playing, first question you ask as soon as you event is you play the game of where's my message. "You're like 10 Microserve, where, where's my message? Oh no." So being able to know that and get that answer quickly is very important. I would say the one thing that I've seen is I always say allow yourself to automate and accept and accommodate your manual processes naturally. And what I mean by that is have an error router, have a queue. And people always say things like, "Well, you're never going to Be able to enumerate every single possible." Some things you may not be able to automatically solve now some things you don't, you don't know what you don't know. But if you build an error routing service or service that can flag these things up, then as you build these profiles and you go, "Oh well this is easy, I just have to resubmit this" or, "This is easy, I can do X." You can grab that event and then build a process to automate it. The ones that you can't pick them up, open a ticket. I've seen, you know that that is still an acceptable thing to do these days. At least you know, at least you don't have to go figure it out then open a ticket.

0:36:00.3 Anna McDonald: At least you've got an error message with context, like it's a beautiful thing. Right? And so that's the thing I think to me I've seen people do early and be very successful with because they're able to show this continuous improvement to the old process. Well, think about this. Before this we had to open a ticket and it took 24 hours. Anything you can resolve now automatically based on that event you threw right business value. So while at the same time anything you can't, you still have a way to do it. So that would be, that's kind of like a, I was looking for one that, I've seen that is, that is not as obvious as some of the other ones I could name. And that is one I think, you know, error routing service. Embrace it, love it, have a schema, make sure everybody has to use it.

0:36:46.4 Joseph Morais: Yeah, it's really powerful because especially if you're automating and like if somebody was, if a group or an organization was manually dealing with all those errors, now you automate some percentage of it. Now you've opened up more bandwidth to kind of remediate the other issues so you can slowly over time shift it almost everything to automatic and then you have this manpower there that can handle the one offs. So yeah, that's a really good one, I didn't think about that.

0:37:10.7 Abhishek Walia: As an extension of that, I would just want to add, maybe start with the fully managed services first when you're just exploring things.

0:37:19.5 Joseph Morais: Sure.

0:37:20.3 Abhishek Walia: That basically cuts down on your deployment times your figuring out times your OPEX to get the systems out, installing them, figuring out the networking and making sure everything is tuned, optimized and working for those sets. Most of the fully managed systems have already kind of done that for the customers. You can't, even if you want to start, start there if it's a use case fit enough for cloud, stuck with a fully managed service that's probably going to save a lot of cycles for you.

0:37:50.6 Joseph Morais: That's true. I mean it really goes back to what you said like start with a POC. What's the fastest way to do that with a fully managed service. Because as you mentioned, you know Flink itself has its own processing, its own compute nodes and kind of stringing all that together, open source, is really non trivial especially if the whole idea is to make sure that this is going to work for us in the first place.

0:38:08.3 Abhishek Walia: Exactly.

0:38:09.2 Joseph Morais: So continuing with you Abhishek, what are the top three tools a data streaming engineer should have in their stream processing toolbox? Or are there any tools they should avoid like batch workloads? 

0:38:19.5 Abhishek Walia: I'm a little biased here so I'm going to start with tools first. Kafka, Flink Kubernetes to a certain extent and then observability. I've been a really big proponent of observability, any observability tools you'll want that. Kafka for its scale, right? You will have, you will hit the ground running with as much data as you want. You won't have to care about it. Flink for a massive compute so that you can do all the compute on the data that you're getting from the volume, the veracity, velocity, all that stuff won't matter. Kubernetes so that you can scale all those compute out if you need to, easy, quick way, right? Observability, we know we need to see all that stuff where the data is flowing, how that's flowing, error, all those error messages, all those things to observe making sure that everything has been done properly, observability. And initially just as a brownie point for Confluent, I think the Vs code extension that will probably be something which I'll tell any new developer to say, "Hey, okay, you're getting into Confluent. Okay, Use this VS Code extension. This will help you streamline your workflow much faster than how you would do it in like five item windows."

0:39:33.7 Joseph Morais: Yeah, so much excitement around that piece. That extension is awesome. So Anna, how about you, any tools you could call out? 

0:39:41.8 Anna McDonald: Yeah, I see, I love this because that's the reason why Abhishek spends a lot of time with digital natives. My answer is going to be like you should find something that runs in the Zip, you know what I mean? Yeah, I Spend a lot of time with people that are, you know, I mean, they're still using the clips, you know what I mean? Like it said, right? They, yeah, no, but, but I mean, I think, you know, to Abhishek's point, if everybody could do that and most people can at this day and age, I'm kind of being a little, I think, like, data streaming engineer is an interesting term. What's a data streaming engineer? Is that like, what is that? What's our definition of that? You can tell I was raised by a lawyer. I can't answer a question without asking a question. Are we thinking of like a data streaming engineer as somebody who's writing day one use cases like that are running the business or are we thinking of them as more of somebody who's writing, you know, analytical programs on the back end or things that are like 24 hour days? Or is it all of the above? 

0:40:45.1 Joseph Morais: They're all the above. Anyone that's working without pauses, based on your definition, I think is a data streaming engineer.

0:40:51.2 Anna McDonald: Yeah. Then, you know, I would say make sure that you have access to the list of best talks on the Apache Kafka website. Don't ever undervalue, like that's one of the first things I tell people to do is go watch the list of the best talks. There are great talks from customers, from people using Kafka in the real world. There are great talks from Apache Kafka engineers about internals. If you want, anything that's your slice. So I always think you should have knowledge as a tool. And there are different ways, different people learn in different ways. If you don't want to watch a video because like sometimes that's annoying and I, but sometimes I, you know how people speed them up, sometimes I slow them down to see how drunk people sound. Tim Berglund and I, we sound trashed.

0:41:38.3 Joseph Morais: At 0.5. Yeah. Okay.

0:41:40.0 Anna McDonald: . Yes. It's so embarrassing. We're like, it's great. But I think, you know, there, if that's your way of learning that that's a great resource. If not, there are so many good Kafka books out there about Kafka essentials. Then there's Case Dreams, you know, Bill Bijak has his book, I think it's Mitch Seymour has another one. If reading is, you know, that's the way you absorb knowledge. And then, you know, to Abhishek's point of doing is the Way absorb knowledge grab, you know, whatever IDE you happen to be in. VS Code, you got an edge because the new one is really, really good. But that's easy, like I just remember bashing my head against the wall going, I said, I remember looking at the KStreams example repo and going, "Who wrote this? Like somebody whose PhD is in." And then I was like, "Oh no, it is Matthias." So yes.

0:42:31.8 Joseph Morais: That why, I thought, I wrote it...

0:42:34.1 Anna McDonald: Yeah. He was like, "I thought it was easy." I was like, "Yeah, I'm sure you did. I'm sure you did. You wrote a dissertation on." So, that's the thing, now are the examples that are out there, I think are much easier. Like, you know, it's frustrating, even something as simple as printing an example that doesn't have a schema attached to it. That's Avro that needs you to stand up Schema registry.

0:42:52.5 Joseph Morais: Yeah.

0:42:53.0 Anna McDonald: What the hey? So that's why like I always put all of my like demos in my GitHub repo. They just use JSON that's hard coded in a JUnit test. So you can change it to be your JSON.

0:43:03.5 Joseph Morais: That's right. You don't even need to figure out what a schema is.

0:43:06.2 Anna McDonald: Correct. Like, so I would say that too is, you know, find good examples, play around with them. Like you know, to Abhishek's point, like get, get, so that's what I would say. Curiosity is the ultimate tool.

0:43:16.0 Joseph Morais: Oh, I love that. I like this idea of going back to like these talks that are, you know, they might be a couple years old, but they're still relevant.

0:43:22.5 Anna McDonald: So relevant.

0:43:23.3 Joseph Morais: That's the reason why they're popular and they really help you kind of build up that foundation of the right way to do things. And I can't help but laugh and think that there's this, the segment of people out there that learn by listening to people that are drunk, like to speeches.

0:43:36.2 Anna McDonald: Drunk history.

0:43:38.4 Joseph Morais: Reasonably watch things on point. Well, yes, exactly. Like that's how I learn. I learned through drunk histories.

0:43:44.2 Anna McDonald: That's right.

0:43:44.9 Joseph Morais: That's great. So Abhishek, in your experience, how do customers evaluate and select new tools for their data stack and how do they balance factors like scalability, cost and integration with their existing systems? 

0:43:56.5 Abhishek Walia: So I'm going to speak again from a digital native perspective like what Anna mentioned earlier, right? And most of my customers are pretty much cutting edge. For them, the biggest thing is can your, any of your tools handle the volume, velocity and veracity? If they can handle the 3Vs, we can put it into the evaluation bucket. If they can't, I don't think it's going to work out because right now the volume first of all is huge. The velocity is even worse because of the entire internet age that everything spawns events at this point of time. Anything and everything you do spawns events. The accuracy has to be really good because then that's only when sidetracked. But again, ad revenues for example, right? They come from when you do stuff and if you are missing those, the accuracy goes down, you lose revenue. That's one of the big things. And so all those things do have a prominent effect on at least my digital native side of customers. Other big points are actually if they can integrate really well with the legacy systems or not. Some of my customers still, a lot of my customers actually still have legacy integrations that they need to build up and say, "Okay, I'm not able to integrate with this customer."

0:45:09.3 Abhishek Walia: So that's where some things like the Kafka connect, framework comes in, the integration ETL workflow. There are numerous of them available which can talk to Kafka, talk to the other side and put the data into Kafka as well, those come in handy. But if they don't have that, how do they integrate with newer systems? That's one of the big things as well. And another one nowadays, and I started seeing this pattern recently is the reliability, is my data going to be available if there is an issue and if not, what is my disaster recovery for that? I need my system to be running all the time even if your system, one side of the system is not working. So those are the few things that I've seen the customers ask for.

0:45:58.2 Joseph Morais: Yeah, that's great. So, you touched on something I hadn't thought of, especially with digital native. You know, a lot of non digital native customers are thinking about what can I integrate with existing systems where digital native may be thinking about, where am I going to Send my data 2, 3 years, 4 years from now? Like future proofing the data destination. So that's an interesting factor. Anna, how about you? What about your customers from the non digital native side of things? How do they evaluate and select tools similar way? 

0:46:25.3 Anna McDonald: No, probably not. So some of it there's overlap, right? I mean like, yeah, obviously like it has to work but I think really there's a lot of like time to value. What's the time to value and then what's future proofing? And also like, like I was just gonna say, welcome to my, because I spend, you know, a ton of time. DR, BCPDR. How do we do high availability? How do we stretch clusters? How do we do all of these things? Because it isn't that, boy, it's bad if it fails. It's that it can't fail. It absolutely cannot fail. Because if it does, people can't get money out of the ATM, people can't cash checks. You know, if flights don't take, like there's all of this big things that happen, you know, if these things don't these those critical use cases. And some of those are digital native too. Like, you know, my mom wants to get her groceries, man. You know, like these are, there are things in that day and age too, but the regulatory component in the industries that I work with, a lot of them is much, much, much higher.

0:47:28.7 Anna McDonald: Yeah. And so there's a lot of that type of intricacy, can we provide that level of security? Is it flexible enough? And also, again, right? Like can it talk to the mainframe? Because they're everywhere, you know, which it can, yay. So I would say like that is a big component. And the other component that I would say is how much expertise and how many known good, successful patterns for my vertical exist. Whether it's insurance, whether it's banking, whether it's health care, there are very few. There are people and they know who they are that are the leading people of these industries. And everybody else for the most part goes, does that work? What are they doing? Right? Literally. Right, exactly. And then they're like, hey, so that's really something because it's low risk. You're trying to balance risk versus reward and it's very risky to go first. So understanding and having these best practices in these industries, which we absolutely have, I mean, more so than I think any other technology when, absolutely when it comes to data motion, we have these beautiful patterns to handle pretty much everything in these kind of stalwart, I like to call them the money making verticals.

0:48:51.8 Anna McDonald: Some people call them by a different name. That's what I call them because I love the banks, yes. But that really helps. And I think it's really important because customers, they want assurance that this is gonna work, that we know all the edge cases, we know how to guide them, we know how, you know, and I'm very proud of the fact, very proud of the hours and toil that we save customers. That is always, you know, makes me feel really, really good, is like hey, right. I told them, you know, they would never have done that because I was there, ha ha. And now everything just works. And it's like, it's this whole thing, I have zero patients for making the same mistakes twice. I always say this to my everybody in this household, everybody makes mistakes. You make a mistake, we don't got a problem, you make it twice, then we're going to have a conversation. So I view kind of part of the heart of our job is to live that mantra and no customer makes the same mistake twice because we know. And I think that is hugely valuable. And I think companies do look for that, they look for people who are able to advise on that level, which I think is very valuable.

0:50:02.2 Joseph Morais: That's fantastic. Yeah. Who can help me pioneer, who can help me, only make a mistake one time? That's a pretty powerful piece. And the fact that we closed on banks is a very good segue into our next segment. So we've heard your winning strategies and tactics, but now we're going to watch a quick clip from a real world user of stream processing and I'd love to get both of your reactions.

0:50:34.7 Ad Read: I lead a team that manages the operations, infrastructure and engineering at Citizens and my job is to make data available to my business partners and empower them to delivering incredible customer experiences. The expectations of the customers are through the roof right now and the need to make data available in real time or near real time has never been greater. Whether it is for fraud prevention, whether it's for risk management, whether it's for hyper personalization, whether it's our digital channels and self service channels, making data available in real time is very, very important. And really the two pillars of doing that is APIs and streaming. We have been over the past three years in a journey to modernize our technology infrastructure and applications. We have a lot of legacy platforms in the mainframe for example, and it's really, really important to unleash the data from these. We're doing that using change data capture using Confluent. So I think we are one of the early adopters of content power platform and it's been really a key and important tool for us in unleashing data from the mainframe. When we came to the point where we had to stand up an enterprise grade streaming data platform, we looked at multiple options and Confluent was a clear winner and it fit right into our overall ecosystem.

0:51:56.0 Ad Read: The data footprint at any organization has increased exponentially. So how do you get data from point A to Point B and who are the consumers that need to receive that data at what time? How do you ensure the data is secure? How do you ensure it's accurate? So this is a very, very complex problem to solve. I think Confluent and what they do with streaming data, you know, make it really easy for us to exchange data between applications. This in turn enables the business to come up with actionable insights in real time or near real time. And the demand for this has never been greater. What is the future of framing data and citizens? I think it's going to explode. So where would we be without data streaming platform? I think we will be out of business. Customers expect real time experiences and if we don't deliver it to them, they're going to go somewhere else for us to be in business. Stream data processing and real time analytics on data and motion and actionable insights is going to be really very, very foundational for our existence.

0:53:06.6 Joseph Morais: So we just heard from Citizens Bank about how they're using stream processing and data streaming for multiple use cases. Hyper personalization, risk management. We heard mainframe, we heard CDC. Do you think this is a, a common scenario? Everything we heard from Citizens across all banks that we work with.

0:53:24.8 Anna McDonald: So I think, yeah, I mean I think the thing you can tell is, so I always talk about my Mox Credit union. It's like open Tuesday for like two hours, and like good luck finding a parking spot. And now you see credit unions participating in Fed. Now you see like, credit unions going to real time and credit unions have a very loyal customer base, very, very loyal. They're kind of like the leading and lagging indicator. And so if you look at, you know, credit unions, they're coming to the party now. Yeah. I mean, people expect these things. You don't expect to, you know, when was the last time you cashed a check? I'm just interested, I mean, honestly asking.

0:54:03.6 Joseph Morais: Yeah, I do it through like a mobile app so I take a picture of it.

0:54:06.7 Anna McDonald: Right. But that doesn't count. I mean when was the last time you put it in one of those pneumatic tubes? 

0:54:10.2 Joseph Morais: Oh my God, a decade ago.

0:54:13.1 Anna McDonald: Right. I actually missed those pneumatic tubes because they were super fun. But like not, I don't want the, not at the expense of taking a picture on my phone and it telling me to move to a darker background.

0:54:24.6 Joseph Morais: Yeah. I like that.

0:54:25.5 Anna McDonald: Right. So that's the thing is like, you know, absolutely, the expectations of consumers keep jumping ahead and ahead and ahead. And it's like what I talked about it's getting that time back. But please, yet again, use it for something peaceful and that fills you as a person not to shove more.

0:54:40.0 Joseph Morais: Yeah, there's gotta be a more fun way to use those tubes. Like, I feel like people should deliver food through those tubes.

0:54:44.9 Anna McDonald: Oh my God, they're amazing. I love pneumatic tubes.

0:54:47.1 Joseph Morais: There'd be a sandwich in there, like, that would be the greatest.

0:54:49.5 Anna McDonald: Totally having one in my island.

0:54:51.5 Joseph Morais: Abhishek, was there anything else in that video that kind of popped out to you that you wanted to comment on? 

0:54:57.0 Abhishek Walia: At this point, I kind of, honestly, I just agree to you guys, whatever you said, this is right on point because the solutions have changed because the technology is moving fast. The expectations from the customers are moving faster than those. So earlier, like back in India, right when I was in India, it was like, hey, we go to the bank, we get a passbook, sign that which will detail out all the transactions, they'll print it out on your passbook and say, "Hey, okay, these are the current transactions and this is the current balance." People don't care anymore. They don't want to go to the banks for all that stuff anymore. They just want, "Okay, what's my current balance and what can I do with it? Can I transfer it real time without having to sign checks and handing over a hard copy to somebody else?" That's what people need and that's what is evolving everybody, not just the banks, to get to more real time rather than saying, "Hey, okay, it's going to take like five days to get your cash in any way."

0:55:56.6 Anna McDonald: Yeah, I like that take. I mean, to generalize it, if you could take tasks that people need to do and you make it easier, better, faster, you're on to something. And banking is a great example of that. So we discussed what stream processing is, strategies, what a data streaming engineer is, and we've heard from one of our customers on how data streaming is transforming their organization. Now it's time for real hard hitting stuff. Our data streaming meme of the week.

0:56:41.9 Ad Read: This is on the nose, bang on point.

0:56:50.3 Joseph Morais: So I really like this meme. It really kind of talks to how difficult it is to set up stream processing. In this case, flink alongside data streaming and all that, the work it takes, when you start thinking about the various angles, observability, scale, et cetera, et cetera, what Are your thoughts on this particular meme and does it resonate with you? 

0:57:07.4 Anna McDonald: Is that Mr. Beast? 

0:57:09.0 Joseph Morais: That's Mr. Beats, I think it's KSI. And one of them.

0:57:12.8 Anna McDonald: Yeah, what I got kids. That's my first thought. I know that man.

0:57:19.3 Joseph Morais: You know that man. Yeah. No, so by all means, like whatever your reactions are like, yeah, I mean it's a great meme. So go ahead Abhishek, if you want to, what are your thoughts.

0:57:29.2 Abhishek Walia: At this point of time? This is something which everybody runs into once they start deploying open source Kafka or, this is where it starts getting harder, where, hey, I need more topics, I need more partitions, I need more disaster recovery, I need better reliability. I need things to just work. Why can't we just make them work? And this is where it gets to, right? You start small, everything just works. As it gets bigger, the harder it gets to manage the tunables, the things to worry about gets the list of laundry list actually gets longer. I love this post. First of all, the meme is awesome. That kind of sells the whole point of why you kind of should talk to a fully managed system before you start standing up your own system by yourself.

0:58:20.0 Joseph Morais: Right. Until it falls over on top of you and now you're crushed under your open source.

0:58:25.8 Abhishek Walia: Exactly.

0:58:25.8 Joseph Morais: Any thoughts, Anna? 

0:58:27.4 Anna McDonald: Yeah, I mean, I would say like one of the first questions I always ask people like when they're like, we're great at managing Kafka is what version of Kafka are you running? And then like, well, either two things happen. Either like, they're doing a good job and they're not embarrassed or they're like, "I don't... " I'm like, "What was that? Sorry, Louder." Right? I mean, because the thing is, you know, and that's another question is like, you know, then you start to ask, you just tease that out. You're like, "Hey, do you not upgrade? Because every time you upgrade, the clients fail." They're like, "I don't... " And you're like, well, you know, maybe, right? So, you kind of tease it out. And that's when I think, you know, you're at a precipice is like when you have a really hard time even doing the simple thing. Multi tenancy is super hard in AK, you know, in terms of like your rate limiting, your controls, your ability to box out. Even though like requests like, you know, I love request quotas, they're bit esoteric to understand, you know, but it's not that AK doesn't have things. It's just that there's so much DIY in that to run a truly multi tenant stack yourself. Like if you can't, like, that's our job. It would be, you know, and I think you know Dr. Anna Pauser who's amazing and incredible like you can't get better than her. So you should use us because we have her, that would be my sales pitch. She's a genius, what do you want? 

0:59:53.6 Joseph Morais: I love it. Yeah. And I think a lot of people start with these systems. They don't think about them as multi tenant but especially an organization like a bank. Like eventually, you're going to want to have other teams on board and you don't necessarily want to replicate that. Do you really need to rebuild everything or have multiple fling clusters on top of multiple kafu clusters? That would be a big challenge. So don't do that. Look at something fully managed. Now let's get to our next segment. So before I let you both go, we're going to do a lightning round, byte sized questions that's B-Y-T-E, byte sized answers like hot takes but schema backed and serialized. Are you ready? 

1:00:33.7 Anna McDonald: I'm ready.

1:00:34.4 Joseph Morais: All right. We're going to go back and forth.

1:00:35.7 Anna McDonald: Go Bills.

1:00:36.4 Joseph Morais: What is something you hate about IT? 

1:00:38.4 Anna McDonald: The clown.

1:00:40.1 Joseph Morais: The the what? The clown.

1:00:41.6 Anna McDonald: Yeah. In IT, I don't like that clown.

1:00:46.2 Joseph Morais: Pennywise, just catching strays today. Abhishek, what's the last piece of media you streamed? 

1:00:52.5 Abhishek Walia: Squid Game 2.

1:00:54.4 Joseph Morais: Nice. I haven't, I still got to catch up on one. Anna, what's a hobby that you enjoy that helps you think differently about working with data across a large enterprise? 

1:01:02.4 Anna McDonald: Lock picking.

1:01:04.0 Joseph Morais: Lock picking. Do you watch the Lock Picking Lawyer? 

1:01:06.2 Anna McDonald: Yeah, I've seen him before. He has stuff is pretty good. It absolutely is a long time lock picker over here.

1:01:11.1 Joseph Morais: Nice. Okay, now how does that help you work with data across a large enterprise? 

1:01:14.9 Anna McDonald: Well, you have to unlock things one at a time. You have to be sensitive and you have, everybody has to work together or you're screwed.

1:01:20.7 Joseph Morais: I like it. And you don't always have the key, right? 

1:01:22.9 Anna McDonald: No, you never have the key. And lock picking.

1:01:25.7 Joseph Morais: I love it. Abhishek, can you name a book or resource that has influenced your approach to building event driven architecture or implementing data streaming? 

1:01:34.7 Abhishek Walia: I would go with the Designing Event Based Systems. That is a great book, I love that. It influenced me a lot when I was starting with Confluence. It was amazing.

1:01:50.9 Joseph Morais: That's great. Still a very popular asset for us. So I couldn't agree more. Anna, what's your advice for a first time chief data officer or someone with the equivalent impressive title like that? 

1:02:00.2 Anna McDonald: I would say embrace evolution. Don't tolerate it. And what I mean by that is embrace changes in your data schemas, embrace changes in your data sources. Embrace changes in regulation of your data. Make sure your design embraces evolution, it doesn't just tolerate it, otherwise your life is going to be messy and we see that all over. And I think that's rare today, people tolerate evolution, they don't embrace it yet.

1:02:22.7 Joseph Morais: I like that. Very good. Embrace change. Any final thoughts or anything to plug for either of you? 

1:02:28.2 Anna McDonald: Go Bills. Let's do this. That's my final thought. Let's go Buffalo.

1:02:34.6 Joseph Morais: Let's go Buffalo. How about you, Abhishek? 

1:02:38.9 Abhishek Walia: I think this was a great experience for me. So I'm learning so many new things now. Go buff.

1:02:44.0 Joseph Morais: Okay.

1:02:44.7 Anna McDonald: Yeah, that's right.

1:02:45.7 Joseph Morais: Yeah. That's great. Yeah. Jump on the train. Excellent. Well, thank you both so much for joining me today, Anna and Abhishek. And for the audience, please do stick around after this. I'll give you my three top takeaways in two minutes. Man, that was a great conversation with Anna and Abhishek. Here are my top three takeaways. First, I love what Anna said. There is no pause in data streaming, right? That's a whole different way of thinking about event driven architectures. Now you have to start building things, you have to start writing code that does not account for pauses because those events are going to continuously come in and they need to be continuously actioned. So how do I know it's an event driven architecture? Well, there's no pauses in the requirements. That's a great one. Stream processing or the idea of stream processing making life tasks more efficient. So this idea that now I don't need to go and deposit a check or I don't need to necessarily go to the dentist to check out to see if I have a cavity. I just need to take a picture of my teeth. Incredible. And when stream processing makes life a little bit easier, make sure that you take that time back for yourself.

1:03:55.8 Joseph Morais: Don't just throw it back into more work. I mean, work is important and it's all necessary, but take a little time for yourself. And the last point that I really want to call out is that Flink is for everyone, right? It's super accessible. Whether you're a DBA who loves SQL or you're a backend engineer who wants to write everything in Java, or you're a data scientist who wants to query things through Python. Flink has modules and APIs that are for everyone that make it accessible to anyone that wants to work with data in real time. Really impressive. That's it for this episode of Life Is But A Stream. Thanks again to Anna and Abhishek for joining us, and thanks to you for tuning in. As always, we're brought to you by Confluent. The Confluent data streaming platform is the data advantage every organization needs to innovate today and win tomorrow. Your unified solution to stream, connect, process and govern your data starts at Confluent.io, if you'd like to connect, find me on LinkedIn. Tell a friend or coworker about us, and subscribe to the show so that you never miss an episode. We'll see you next time.