Life Is But A Stream

Ep 8 - How Airy Powers Real-Time AI Agents with Data Streaming and Confluent

Episode Summary

AI agents are already here, reshaping how data moves. In this episode, Airy’s Co-founder and CEO, Steffen Hoellinger, shares how his team uses real-time data streaming with Confluent to power agentic AI and deliver faster, smarter decisions at scale.

Episode Notes

Agentic AI is transforming how modern systems interact with data, and Airy is at the forefront with its real-time approach.

In this episode, Steffen Hoellinger, Co-founder and CEO of Airy, discusses the transformative impact of agentic AI for enterprises and why real-time data streaming is a crucial component. From Apache Flink® and Apache Kafka® to the importance of focusing on core business challenges, Steffen breaks down how Airy builds intelligent systems that react to data as it happens.

You’ll learn:

How Airy’s team uses Confluent to simplify their data infrastructure
What agentic AI means beyond the buzzwords and why it’s a leap past basic natural language processing (NLP)
Why real-time interaction is essential for creating next-generation software experiences
How business users can create and test AI-driven patterns without writing any code

If you’re building AI-powered apps—or planning to—this one’s for you.

About the Guest:
Steffen Hoellinger is the Co-founder and CEO of Airy, where he leads the development of open-source data infrastructure that connects real-time event streaming technologies like Apache Kafka® and Apache Flink® with large language models. Airy helps enterprises build AI agents and data copilots for both technical and business users across streaming and batch data. Steffen is also an active early-stage investor focusing on data infrastructure, AI, and deep technology.

Guest Highlight:
“Real-time is getting more and more important. Once you understand all the fancy ways people in the market are trying to solve a real-time data problem, you come to the realization that [data streaming] is the best way of doing things. It became natural to adopt it early on in our journey.”

Episode Timestamps:
*(05:10) - Overview of Airy’s AI Solutions
*(37:20) - The Runbook: Tools & Tactics
*(44:00) - Data Streaming Street Cred: Improve Data Streaming Adoption
*(47:50) - Quick Bytes
*(50:25) - Joseph’s Top 3 Takeaways

Dive Deeper into Data Streaming:

Links & Resources:

Connect with Joseph: @thedatagiant
Joseph’s LinkedIn: linkedin.com/in/thedatagiant
Steffen’s LinkedIn: linkedin.com/in/hoellinger
Learn more at Confluent.io

Our Sponsor:

Your data shouldn’t be a problem to manage. It should be your superpower. The Confluent data streaming platform transforms organizations with trustworthy, real-time data that seamlessly spans your entire environment and powers innovation across every use case. Create smarter, deploy faster, and maximize efficiency with a true data streaming platform from the pioneers in data streaming. Learn more at confluent.io.

Episode Transcription

0:00:06.8 Joseph Morias: Welcome to Life is But a Stream, the web show for tech leaders who need real-time insights. I'm Joseph Morias, technical champion and data streaming evangelist here at Confluent. My goal? Helping leaders like you harness data streaming to drive instant analytics, enhance customer experiences, and lead innovation. Today, I'm talking to Steffen Hoellinger, co-founder and CEO at Airy. In this episode, we're talking about why data streaming was the perfect solution for Airy's challenges. You'll hear how they build their products around data streaming and get real examples of stream processing in action. We'll cover things like agentic AI and data streaming working together to enable business users, how data governance and shifting left can empower AI agents, and much more. But first, a quick word from our sponsor.

0:00:51.3 Announcer: Your data shouldn't be a problem to manage. It should be your superpower. The Confluent data streaming platform transforms organizations with trustworthy, real-time data that seamlessly spans your entire environment and powers innovation across every use case. Create smarter, deploy faster, and maximize efficiency with the true data streaming platform from the pioneers in data streaming.

0:01:19.9 Joseph Morias: Welcome back. Joining me now is Steffen Hoellinger, co-founder and CEO of Airy. How are you today, Steffen?

0:01:25.0 Steffen Hoellinger: I'm great. Good to be on the show.

0:01:27.4 Joseph Morias: Fantastic. Well, let's jump right into it. What do you and your team do at Airy?

0:01:32.3 Steffen Hoellinger: At Airy, we build software for building AI agents on data streaming. We work with the largest enterprises in the world trying to enable them running on data streaming to build AI agents incorporating the great advancements that data streaming can give for them by continuously processing data.

0:01:53.3 Joseph Morias: Tell me more about your company's high-level product or company strategy and obviously how data streaming is involved and do that for me in like a minute or two.

0:02:03.5 Steffen Hoellinger: Yeah. We try to work with the freshest data and also guarantee data consistency at all times. So, with Airy, you basically get an AI agent suite consisting out of several elements that we can put together to work with real time data and put that towards AI. The idea here is really to get a sense of the data, to understand how that data is used in an enterprise context, map that data out properly, annotate that data, use AI LLMs to generate commenting. We call that schema intelligence, and then put that towards different AI use cases, for example, empowering AI agents that continuously monitor data streams of all the real time events that you have in place as an organization, but also provide tooling for your developers and your data engineers and data professionals to have productivity gains to work with your data, generate SQL statements, run that programmatically as stream processing jobs. And then we have also business users in mind that might use conversational interfaces like you would do in the new AI tools of JetGPT, OpenAI, Entropik and others, just with your own data. So, it has to be grounded in your own data. It has to make sure that there are no hallucinations and that you can trust the data that you're seeing and interact with that data on a daily basis to make your life more productive and also to kind of fulfill the longstanding promise of data mesh.

0:03:37.8 Joseph Morias: Fantastic. Well, so a quick follow up for me, you know, for those of us that are not fully immersed in generative AI, can you just very quickly kind of talk about how agentic AI differs from what I think most people think of when they think of Gen AI right now, like chat GPT, how do agents differ?

0:03:54.4 Steffen Hoellinger: It always depends a bit on how you define agents. I think there is also a lot of confusion in the market right now of what agents actually are. People describe different things when they use the term agents. So what we define as agents is being an entity that continuously monitors all the data streams you have as an organization and can react to that. And this goes beyond the previous capabilities of, let's say, older machine learning models, like in the natural language processing space, for example, that were able to recognize intents, because nowadays with the new generation of AI models and large language models at hand, you have basically capabilities that go far beyond that in terms of generating content, understanding content, working with multi-model experiences, and putting your data to use in ways that couldn't have been imagined a couple of years ago.

0:04:51.3 Joseph Morias: Yeah. I think that's a really good way to summarize it. I don't think I've heard someone mention it, you know, having it consistently running, always looking at whatever data set you're exposing to it but then also giving it roles and responsibilities and the ability to act. It's fantastic. All right. So we've set the stage. So let's dig deeper into the heart of your data streaming journey in our first segment. What were the large business or product challenges you were addressing as you were building Airy?

0:05:25.3 Steffen Hoellinger: We always were focused on the area of conversational AI. We partnered early on with the likes of Google and Meta for that subject. That was primarily driven in the early days, remembering chatbots with customer support contexts for larger brands. So we started out on that dimension, and we pretty much reached a volume where it got hard to just process all the events that we got from those switch sources in real time because we had to give a response in real time back to the customers, ideally with aggregated context so we could actually give a meaningful response and not only an FAQ response that was true for most of the customers at the same time. So for us, it was always a matter of adopting data streaming early to provide that ultimate scale and ultimate real-time experiences with the best possible context. So you had to pre-compute in order to fulfill that requirement at low latency.

0:06:22.4 Joseph Morias: I had a follow-up here but I think you've already fulfilled it. So my follow-up was around what were the data challenges associated with those business and product challenges? But I think I heard it. Multiple data sources, making them available in real time, and also at scale. Did I capture that correctly?

0:06:38.9 Steffen Hoellinger: Yes, precisely. So it's basically the unification of real-time and batch at inference so you can provide the best possible context to the model. We've been doing that for years, so we have a lot of experience in that. We've been doing that with the previous natural language processing models, natural language understanding models and now we also have been doing that with LLMs and large language models, new foundational models. I think the focus shifted a bit more on the internal part of the organization because nowadays most enterprises still don't trust LLMs talking to their customers directly.

0:07:14.0 Joseph Morias: True.

0:07:14.8 Steffen Hoellinger: So nowadays we see the use cases more around making employees more productive, working with data, preparing decisions, and then enabling them to make the best possible decision based on real-time and batch data.

0:07:26.3 Joseph Morias: Instead of the Gen AI directly interfacing with the customer, instead let's empower our employees and make them more productive because we still trust the humans. And this is pretty obvious but I'm gonna ask it anyway because I'm curious your specific take. So Airy was built on top of data streaming, why?When there's other similar technologies that may have filled those requirements. How did you end up honing in on data streaming specifically?

0:07:49.8 Steffen Hoellinger: Yeah. We actually built our own processing framework and we pretty early reached the scaling limits of our own framework. So we had to adopt something that was standard back in the days. This was 2018, 2019. And we reviewed a bunch of technologies, and Kafka was like the most natural choice for the problem. So why reinvent the wheel when there is a perfectly fine ecosystem around that has infinite scaling potential?

0:08:20.8 Joseph Morias: How do you use data streaming as a close partner of Confluent and part of your product? Is there any specific Confluent technologies that you're taking advantage of?

0:08:30.2 Steffen Hoellinger: So, we're actually a heavy Flink user alongside Kafka. So we obviously use Kafka streams for a lot of the processing capabilities within our own conversational platform. But for all the, let's say, more ephemeral things that runs through the system like users asking questions in natural language, we actually convert those natural language statements to Flink SQL statements and then run them programmatically, the API. And this is actually something that is really comfortable in Confluent Cloud. So we're designing and building our services most natively on Confluent Cloud.

0:09:13.5 Joseph Morias: What inspired you to use data streaming? Was it specifically the AI capabilities that you were looking to build or was it changes or challenges in the market?

0:09:22.3 Steffen Hoellinger: I think it was really both. It was basically the AI capabilities changed dramatically. We were obviously kind of spectators and contributors to this Cambrian moment of AI that we're all just witnessing over the last one or two years. But we're at the same time also seeing some changes in the market, how people actually interact with software. Real-time obviously is getting more and more important. Yeah. I think once you understood it once, also all the kind of very fancy ways, how actually people in the market are trying to solve a real-time data problem that they actually have in all kinds of fancy ways. You kind of come to the realization that this is the best way of doing things. So for us, it became naturally to adopt it early on in our journey.

0:10:11.5 Joseph Morias: Yeah. You know, this is something I talk about a lot in that, many enterprises have had data challenges. They have a hard time, especially with the companies you're working with, larger enterprises, been around 20, 30, 40, 50 years. They have data sources that they might haven't even realized that they have, that sit in very small business units. How do you stitch all that together? How do you have one place to make your data available? And generative AI has just kind of been another reason to get to that. Before it was like, hey, your competitors are moving faster than you or their analytics are better. Now it's, they have Gen AI capabilities that you don't because you don't have data availability. So, it's very cool to see that you honed in on that. I'm curious though, this is, you mentioned enterprise more than once as it pertains to Airy. Is there a place for your technology and your services at a startup or do you believe it's more of the companies that have those larger data problems that are less streamlined that get the most benefit from Airy?

0:11:10.6 Steffen Hoellinger: I think it's both because when you're looking at larger enterprises with a lot more history, they come from a data integration problem. So they obviously have old systems, sometimes mainframes that they still need to run in order to provide the services that they currently provide and get an entire perspective on the data landscape and on all the data sources that they have. So the first thing when we work with these kinds of businesses is normally to kind of map out where all the data is and then use AI to increase the understanding of where the data is, to annotate that data, so to create comments using the AI so that the AI, when you then want to provide an AI feature, has an easier time to find out where the data is and how the data is concretely used within a business process. For digital native companies, you normally have a different kind of value proposition, even though the end result is quite similar because you obviously want to port all these great new AI features with low latency at inference. You normally don't have that big of a data integration problem because you just have less history and more modern systems in place.

0:12:16.9 Steffen Hoellinger: But basically the tools that we use, they are very similar, very comparable. And in that regard, we also obviously see more and more interests of digital natives in the financial sector, insurance companies, et cetera, also moving into that direction that obviously they want to provide the best possible service to their customers. They want to make their employees more productive. They want to automate more and more, utilizing AI. And they need, obviously, data to do that. And with kind of the rise of LLMs, I think the shift from more traditional ML batch-driven towards inference-based real-time data is kind of becoming obvious. So, whoever can fulfill that is looking to a bright future.

0:13:01.3 Joseph Morias: I like that. So, what have you built or maybe more importantly, are currently building, like what's new with data streaming?

0:13:08.4 Steffen Hoellinger: So we're currently building out our AI agent suite, specifically leveraging the capabilities that Flink has in order to continuously monitoring event streams. So we use something called complex event processing, which is an extension of Flink, in order to define, for example, event patterns that our AI agents can look out for. So, one example would be fraud detection. So you're looking at a series of events. So you have a new account. You have, for example, small transactions, a couple of small transactions and you've very large transactions. So this would be a fraud detection pattern. And then you kind of filter an event stream of a transaction for that fraud pattern. And basically then use a more traditional ML model to score that event pattern for fraud. And this is something that we, for example, at the moment work quite extensively for different use cases across several industries from, let's say, fraud detection, but also supply chain disruptions is a very important topic these days, towards other use cases in different areas, mostly around insurance and financial industry use cases, but also gaming and broadly speaking anomaly detection. So this is also becoming more and more relevant for retailers, where you might have a series of events indicating that you have a problem and that a certain shipment might be delayed so you can actually proactively engage and maybe reroute that shipment to make sure that your supply chain stays in order.

0:14:39.4 Steffen Hoellinger: So these are use cases that we are currently focusing on and we try to, in that regard, make stream processing more accessible because it always has been possible to write code in Java and put these pipelines obviously in place that are trying to be smart filters. But what we, I think, can do now with AI is actually we can enable new user groups, so obviously data professionals who also don't want to write code or who don't want to spend so much time writing the code that they have to write or the SQL statements that they have to write so we can make them more productive. But it also opens up, I think, a very interesting new opportunity towards business users, where you might have a risk team of a bank that you directly put in touch with creating these new patterns but also creating these new risk profile models. So you can actually say, yeah, why don't you try out kind of creating or testing new patterns that you want to detect within your data so we can put that towards historic data, read from the data lakes that they might have, how often a certain event pattern was observed in the past. But we can also basically use the AI to generate different patterns that the AI thinks could be relevant and that we can test that, both towards the historic data but also towards the real-time data and start recognizing patterns that might indicate that you have an issue in parts of your organization. So we believe this is really powerful, basically making the technology more accessible, going beyond the core Kafka audience that is comfortable always coding, obviously, in Java and loves to code and we don't want to take that away, but we actually want to augment it. We want to make that more enjoyable and we want to open it up to more users that might have a streaming problem today but they don't realize that they have a streaming problem and they go all around, sometimes a long journey, until they have built a system or wiped coat something these days, one has to say, that will break down rather easily.

0:16:30.7 Joseph Morias: So empowering business users, I think, is something that everyone for the last 30, 40 years has been trying to do as IT has become and technology has become so prevalent. And I think the biggest challenge is it's not just presenting a tool, it's having the guardrails around so that someone doesn't break something. So it's like, "Hey, business user, I'm not gonna give you access to our production database but I'm gonna have some type of abstraction layer that I know is safe for you to interact with." And I think Gen AI or AI in general, split that around, really can provide those guardrails because it can be that intermediary where you can ask it for things or you can inform it or tell it to do something, but you already have this predefined set of rules or actions that it's limited to. So you can have your business users be as creative as they want without actually breaking anything. But I'm curious about, a lot of the use cases you mentioned really come down to anomaly detection, whether it's fraud or supply chain, they're all types of anomalies. In these specific verticals, are you giving explicit prompts to the AI in order to say, hey, you're working for a financial services institution, you should be looking for fraud detection, or is it more broad than that? Do you basically point it at a stream and say, look for any type of anomalies and whatever you see, report back?

0:17:48.7 Steffen Hoellinger: I think it has to be a bit more focused still. So we're working with blueprints, also with, let's say, a rich catalog of pre-built use cases to inspire people. So normally they just don't start out and say, what can we build today? But they have some very concrete business use case in mind. And when we can provide them with something that is actually already pre-built and pre-configured that they can freely customize, I think this is something very interesting that we try to enable. So basically we have a catalog of pre-existing use cases that we observed in the industry and that we see frequently being relevant, and that can be deployed rather easily. Apart from that, when you have a special use case, we have an AI agent builder that actually leads you through a sequence of steps where we utilize the LLM continuously while you create that custom AI agent. So you would start normally with a purpose, describing the purpose for your specific use case, and then we guide you step by step in finding the right data sources that we think might be relevant for that use case. We map that with business process information that we have indexed on how the data is used within an organizational setting, so to also inform the agent, being context aware and being state aware of, let's say, where something is in a specific process. And in case something goes wrong, the agent has some kind of visibility on what is the next step, what is basically a workflow, what is a rule also within that setting.

0:19:25.6 Steffen Hoellinger: And then obviously what comes into play is these AI agents also need to be governed. So, sometimes you might be a user that needs access to a certain kind of data set but you don't have access to, so there needs to be some kind of governance in place, making sure that you can only access what you're supposed to access. And we all kind of bundle this together and then also make it accessible, let's say, to work with your human teams. So we can actually deploy those agents in places where work actually happens. So let's say Microsoft Teams or Slack, wherever your organization lives and breathes, this is the place where you want to make an AI agent that is informed, that has access to your data, that is perfectly governed, that is connected to your data streaming platform. You want to deploy that agent in that place so it can really assist and help your teams and make sure that everything runs smoothly.

0:20:19.2 Joseph Morias: It's interesting you mentioned human teams because there's going to be a future where we have both and they're gonna be blended. So that's like a weird realization for me as I actually heard that comment. So, I think this fits in really well with our previous question. So, what outcomes have you seen or aiming for with stream processing and integration?

0:20:37.5 Steffen Hoellinger: I think there is a lot of very interesting business use cases at the moment that companies are looking for with the help of AI to improve numbers on their end that are highly relevant for them. So you can obviously see when you go in with an AI agent, you might have only a certain accuracy out of the box. And then we see that as an optimization process of bringing that accuracy up. So let's say when you initially deploy an agent, you might only have an accuracy of 30%, 40% without further customization and without further orchestration for a specific use case. And then by actually utilizing something like schema intelligence, we can bring that number up and we can basically give the agent more and more context, give the AI more and more real-time use cases and also examples and also feedback from human users that have been interacting with that data to see how well the AI agent actually was able to help their human counterparts and was able, for example, to detect certain anomalies within to be defined pattern. So this is something that we really see as an optimization function where you can actually bring the accuracy to 90% or more over time when you give enough training data, when you basically put enough effort in to bring that number up.

0:22:01.5 Steffen Hoellinger: And in terms of business impact, I think it's equally relevant, obviously, to make a business case for it. To basically say the AI agent with the human team should be able to have some business impact for your organization. So one of the use cases we have been, for example, looking into was a sourcing use case where a large retailer actually said, at the moment, it takes for us two weeks and costs around $10,000, $15,000 to generate an RFP for a new product, let's say a vitamin C supplement. And if we can do the same thing by connecting an AI agent to your supplier database and have basically a human define some KPIs and say, who are the best suppliers for this vitamin C supplement? Rank them according to these two or three KPIs, and I can give you an answer in 10 minutes. I'm 100x or 1000x more productive compared to the current process. And I save, let's say, 100x or more in costs. So this is kind of, I think, the impact we're looking for on a business scale. And if you put that, let's say, towards a very large organization, the impact can be tremendous that you have with AI and with AI agents putting that technology to use in that regard.

0:23:12.0 Joseph Morias: So it's interesting. It sounds like it was less of an accuracy problem and more of a data accessibility problem. It's like the more access we give the agent to the various data sources, suddenly that accuracy goes up and there's training as well that goes into that. And I had a follow-up here that was, you know, is there a specific KPI tied to the outcome? And it sounds like at a high level, as it pertains to the agent, accuracy is the KPI. But then as you start to drill down into the specific use case, in the example, the RFP is the KPI really becomes, you know, your time, the completion and things like that.

0:23:46.1 Steffen Hoellinger: Exactly. So when we normally look at, let's say, the accuracy of an agent, we try to benchmark that based on, let's say, a certain amount of benchmarks that you can define. But first and foremost, we look for successful completion of a query because sometimes, let's say, when the data is just not perfectly annotated and you ask a question, you need to translate that from natural language to SQL, the query might just fail. The query might just not complete. And this is what I mean also with accuracy, basically. Yeah. Bringing that successfully completed queries up and then also making sure that you have, let's say, a high quality of Flink SQL statement that you can run continuously. It's not only about running it once off, saying, I have a question, here is the answer, but it's really how you can actually enable people to generate these long running jobs that continuously do something. And this is a bit more like an employee that never gets tired. This is really, I think, where the potential of this is to say, to task something, to continuously monitor your data stream and trigger action, you define the action that obviously needs to be triggered but there is a lot of use cases that you can use this for in order to augment humans that currently do this job, both on the developing side of things, but also on the business side.

0:25:05.7 Joseph Morias: Makes a lot of sense. So, I know Airy is doing a lot of integration for customers and availability of data. I'm curious, are there any services, and this could be specifically to building out your product or from working with customers. Are there any cloud service provider, ISV services, that you see that are commonly integrated with your customers or with your product?

0:25:25.8 Steffen Hoellinger: We really focus on reducing the amount of dependencies. So at the moment when we work with the data streaming infrastructure, we primarily focus on Flink and Kafka. That obviously needs to be there. Then we have some conversational connectors, the ones I mentioned, for example, before, like with Slack and others, to make sure that we are integrated with the services where the work happens. Then obviously data catalogs is a usually important topic where you obviously have to integrate with whatever people have today in terms of the metadata, about the data that their organization uses. So this is a very important topic for us. And then there is also, let's say, vector databases that we at the moment use to index all that data, that metadata, and make that accessible also to the AI in different ways. So we have different indexes, both on the knowledge graph side of things but also on the full vector search capability. So we can actually enable the AI to find the right data at low inference in real time.

0:26:26.0 Joseph Morias: Fantastic. So, how does Airy approach data governance?

0:26:30.8 Steffen Hoellinger: Data governance is hugely important from my perspective because AI agents, this is maybe something that people rarely talk about when they talk about all the new capabilities that all the agents that you can give a coding task or any kind of other planning task have. They actually will require that you govern them and that you also keep the solution space small from our perspective to really keep them focused on their objective. So we rather look at it from a perspective that you might have not one or two agents that do like very large tasks but you will actually have an army of very small agents that are very focused on very specific tasks with only limited access to, let's say, a very small subset of the data that you actually want to give to that agent. So it really becomes a very crucial aspect of making sure that the data that the relevant agent has access to has obviously a high quality. You need to make sure that the data is consistent in that regard. You need to make sure that the data is PII free, for example. So you can actually also set rules in place, automatically detect, oh, there is a social security number in that event stream where the agent should not have access to that because it might expose it to another service downstream.

0:27:48.2 Steffen Hoellinger: So we really think that this is a very crucial aspect, data governance, maybe the most important one at the moment in that aspect to make sure that the agents fulfill their premise but are also governed in that regard. So, you kind of can provide a full lineage, which is especially important, for example, in the financial industry, that they need to be able to see where a specific data event or data point is coming from, especially when you think about hallucinations further downstream. So another agent that might get called from a previous agent in a pipeline has to really rely on, okay, I got this information from this table or from this stream, and this was the original event that kind of triggered this all off. So you need to provide basically full lineage while perfect governance on that level of the agent.

0:28:43.6 Joseph Morias: Yeah. See, I had a follow-up that was around how important is tracking and enforcing the flow of quality data as an energy system. You literally answered that. I mean, it's paramount because you're talking about PII. You're talking about financial services where the table stakes are dollars and cents. And let's be honest, it's how our cultures are kind of built around worth and value. But as you mentioned, because of stream processing, there's quite a few permutations of how the data changes, and having that full lineage is pretty important. So this kind of makes me think of something we talk about quite a bit here at Confluent, and that's this idea of shifting left, the idea of taking a lot of the processing that would traditionally be done in a lakehouse or a data warehouse and moving that closer to the source into your operational state. And I think you kind of hit the nail on the head of why that's so important, because it's the creators or the producers of the data that know that data the best. So if you are enforcing quality data as it ingresses the system, that's your best opportunity to ensure that the data is correct, it's timely, and things like that Because once it gets a few layers down, it's gone through some transformations and it hits the data analytics team, they're like, "What is this data?" And they may not even know because it's gone through five transformations before they've even received it.

0:30:02.9 Steffen Hoellinger: Yeah. We couldn't agree more. In my perspective, having a smart pipe is often much more efficient and provides a much better result than basically scanning the entire lake for fragments of where that data might actually come from. So in that regard, yeah, shifting left to make it right.

0:30:25.9 Joseph Morias: Oh, I like that. So let's talk about data retention. How long does Airy need to hold on to operational data, and is that number driven by compliance?

0:30:33.9 Steffen Hoellinger: It depends on the industry, obviously. So you have regulated industries that need to provide not only data for a very long time but they also need to provide full lineage in order to comply with regulatory conditions in that regard.Like for example, in the financial industry, you have at the moment new trends like ad hoc reporting that also the regulatory bodies will require from some of these institutions. And so this is one of the aspects why this is becoming more and more relevant at this stage. And for some of the topics that we have, we use infinite retention by default. Because also when you think about context, that context can become relevant even though it might actually be outdated. It might be one or two years old. It can become relevant suddenly again. So this is why certain topics we actually try to retain the data infinitely.

0:31:29.1 Joseph Morias: So this makes me think of, you mentioned it earlier, but the DSP. So for the audience, you may hear us use DSP or data streaming platform pretty often around here, and what does that really mean? Well, if you're exploring data streaming, the first thing you're gonna learn about is probably Kafka, open source. And the DSP is around kind of uplifting open source beyond what you just get out of the box and it's including like advanced security, things like governance or lineage that allows you to track the data flow over time and those permutations, or even things that Steffen just mentioned, like having infinite retention. So you don't have to worry about how many disks do I have left in my SAN and am I gonna be able to continue to grow my data streaming in my operational state. With the DSP, you don't really have to worry about any of that. We've figured out, whether you're a startup or an enterprise, what you need to be successful just because we work with so many customers. And we've built all that in, including things like built-in Flink. So I just wanted to, for anyone that's curious about what DSP means, I don't want them to think maybe they're aware of an initialism that we're not. So, what's the future of data streaming and AI agents at Airy?

0:32:38.1 Steffen Hoellinger: So from my perspective, it's really going towards the question of how do you build a multi-agent system on a data streaming platform? Because this is kind of the holy grail from my perspective. So I think this image of, let's say, having microservices that consume from data streams, this is not really new. I mean, this has been around for years. This is kind of battle-tested to some extent. And I think the big change that we're seeing now at the moment, also with the latest trends in the market, is basically that there is a need really for, let's say, building these agents, basically providing a way to standardize the tools and the context within those agents. So this is something that Entropik, for example, pushed with MCP last year. It's now being adopted across the whole industry, basically for the capabilities of the tooling and, let's say, the access to the data for the agent. So this is something that we see that is highly relevant at this point as an integration topic. And then we also see, let's say, the announcement that Google made last week with A2A being highly relevant in that aspect as well, governing inter-agent communication.

0:33:54.5 Steffen Hoellinger: And this is really very, very interesting from our perspective because when you as an agent know that there is another agent in your organization, you can kind of, let's say, call that agent and kind of have these two or more agents work together on a specific subject. So this is highly relevant from our perspective of how you build out these multi-agent systems, especially in a complex organization where you might have, in the end, thousands of agents running in parallel together with your workforce in trying to provide a service to the customers in order to basically keep things running. And in that aspect, I think data streaming, there is one new aspect to it which maybe often is overlooked because the biggest value from our perspective is also that it can actually work with a certain amount of information asymmetry, and it can scale quite well because of that because you can actually think of it in a way as, at the moment you have different consumers in microservice architecture, event-driven architecture consuming from the same event stream in the same way you could actually have agents listening to the same event stream and working independently from each other with that data.

0:34:56.5 Steffen Hoellinger: And that becomes a very powerful paradigm because in this case, you don't have to know all the microservices across the organization and call those directly, but you basically have a very, very powerful and scalable architecture in place already with a data streaming platform that you can actually build agents towards that that kind of work independently from each other and fulfill their tasks independently. You might have to, let's say, orchestrate that down the road and resolve conflicts, but to a certain extent I think this is a really powerful pattern that has been battle-tested for years and works quite well for microservices. Why shouldn't it work for agents? So this is what we're trying to bring to the market in that regard and provide solutions for where we currently see kind of a very new trend emerging with basically these multi-agent systems thinking of real-life use cases like supply chain disruptions. Normally, you wouldn't solve those with one agent but you might require, let's say, at least a handful to solve this together, to keep them governed and focused on a specific area and also improve the accuracy for one specific task where they're really good at. So you wouldn't solve a complex problem in an organization with one employee either. It's always teamwork. So in that regard, how can you actually model that into a world where you have AI agents contributing to that execution of processes and overall the creation of value for an organization?

0:36:22.0 Joseph Morias: It's a change. Instead of thinking, what microservice do I need to utilize to get what I need, it's, which agent do I need to send this data to? And I find it very amusing, this idea of agent to agent and a future where we have a complex task and it fails and we essentially have one agent blaming the other. "No. I did it right. This agent messed up. You need to figure that out." But honestly, this idea of mapping agents back to data streaming because data streaming is proven in microservices, makes a lot of sense. Because the agents are essentially consumers and producers of data. So why would it work any differently? It's just how we code it and how we interact with it is different. But I think with that mindset and adopting the DSP, you really future-proof your architecture to continue to scale and to grow into years, many, many years ahead. Our next segment is the runbook where we break down strategies to overcome common challenges and set your data in motion. What led Airy to choose the Confluent Data Streaming Platform over various open source and other vendor products?

0:37:33.8 Steffen Hoellinger: Yeah. I think it's just the simplicity of the solution and that everything works very well together. So we have been part of the early access programs for AI inference, but also for TableFlow. And I think all in all, it makes just a lot of sense that you as an organization kind of focus on the problems that your organization exists for and leave kind of the hard management of a data streaming platform basically to somebody like Confluent who's fully focused on that. So, yeah. We kind of try to work with the ecosystem in that regard and leverage also the preexisting capabilities like the whole Connect ecosystem and obviously also Flink and Kafka quite heavily being kind of tightly integrated instead of reinventing the wheel.

0:38:23.1 Joseph Morias: Fantastic. Great answer. So, what is the top tool you or your team relies on for data streaming today?

0:38:31.0 Steffen Hoellinger: So, obviously it's Flink and Kafka, but primarily in terms of tooling, we have a very thin layer where we try to actually work with the relevant models that are constantly being released. It's hard to keep track actually with all the benchmarks. There's a new model coming out more or less every week. So we try to test these for the capabilities specifically that they have for our use case. And there is obviously some very interesting aspect of, let's say, the new reasoning models that can actually provide a very interesting new layer for, let's say, this agentic experience where you really have use cases where you might want to employ a reasoning model to kind of take in all the data that you have and really provide a multi-step reasoning function and provide back a very high quality response to the user who might actually need to decide something very meaningful in that very moment. And you have other use cases where actually we see some of the small models working extremely well and in those cases, inferences of that is much more important. So you might have actually a use case where somebody needs an answer quite quickly or has like a thousand files that they need to analyze in order to kind of make a judgment call.

0:39:56.1 Steffen Hoellinger: In this case, I think you can actually pick a different model. And this is also what we try to orchestrate here. So it's not only about putting the same solution out there over and over and over again but being smart about it. Basically taking also that decision of, what is the model I should actually use away from the user. And in order to do that, we need to be informed about what are the strengths and weaknesses of all the models out there and how we can actually deploy them from a position where we are normally agnostic to the models we use but we we need to guide the users and we need to also automate basically picking the right model for the for the right use case.

0:40:36.0 Joseph Morias: Interesting. So I hadn't thought about that. So I guess in the future would be kind of determining what's the best in read model or models for inference, for reasoning, and then deploying those at any given customer. So where agent A may be running this model, agent B may be running a completely different model. Is that correct?

0:40:55.1 Steffen Hoellinger: That's correct. And also you have different, let's say, requirements or restrictions in place. Let's say you could actually say agent B has a budget goal. So agent B needs to optimize the output at the lowest possible cost. So that actually might lead the agent to pick a cheaper model that might run with less reasoning capabilities but would be better suited for a specific use case. That might be the case when you have a lot of data that you need to ingest and analyze or let's say segment. And this is also a part where basically we try to contribute back to the ecosystem by suggesting something we call AI functions to the core Flink base where we can actually say we, let's say, have really very comfortable implementations in place where we say AI summarize, for example, and you can just provide a text basically to Flink and Flink basically then calls the model provider that would be best suited for this use case and provides back a summary at, let's say, the least expensive mode, especially when you're looking at a data streaming pipeline where you have, let's say, millions of documents where you want to do this for in parallel. And this is something where actually picking the right model becomes a crucial aspect of agentic behavior.

0:42:10.5 Joseph Morias: So, let's go to the other side of my original question. Are there any tools or approaches that Airy actively avoid?

0:42:16.8 Steffen Hoellinger: I wouldn't say specific tools, more like patterns. So what we often see out there is people still running dump pipelines, where you actually have a source, you get data in, and then you basically dump that data off in a place. You often don't even know what you're processing at, and you basically just dump it in the way the data was ingested. And we feel that this is often not a pattern that you should follow as an organization, because we talked about shift that is really the use case for a data streaming platform to make sense of that data as early as possible. So not only dumping it in a specific place. And this is also something that we see often people trying to do. So for us, basically doing that with stream processing as a tool on an agentic level, saying we enable somebody with stream processing as a tool to make sense of that data in the stream or even from a data lake, and then unionize that response to guarantee data consistency is very different from, let's say, somebody just dumping data somewhere and then writing a SQL statement to query that data from that place. I think this is a pattern that we often see in the market. And obviously that works but that doesn't have all the strong guarantees. That doesn't have all the lineage. That doesn't have a lot of the requirements and capabilities we talked about before. And this is really crucial, especially if you think about it at scale. So, replacing dump pipelines through something smarter, I think this is really what makes us do this every day.

0:43:49.2 Joseph Morias: I know I'm definitely biased but I agree with you. No dump pipelines. So we talked about the tech, the tools, and the tactics, but none of that moves the needle without the right people behind it. Let's dive into how Airy got fully committed to data streaming. So, how did your engineers adopt data streaming?

0:44:14.7 Steffen Hoellinger: I mean, for us internally, it was a natural way of adopting it. So it wasn't really that difficult to convince them to start using it because we had some very experienced people that had built scalable systems in Kafka very early on. So even like 2013 or something.

0:44:36.4 Joseph Morias: Oh, wow! Very early.

0:44:37.6 Steffen Hoellinger: Very early. Earliest days. And in that regard, I think it really didn't take that much convincing to do, especially because the pain was so heavy while we were running the system still in 2017, 2018, that we had to migrate from. I think everybody was aligned that it was not scalable to do that anymore because it took us about two and a half, three hours to reboot that old system. So we had to have several instances running side by side just to guarantee that there was no downtime of the system. So it was very painful to run that old system. And in the moment when we basically had the chance to migrate to Kafka and have something more scalable in place, that was kind of a revelation for us. And nobody wanted to go back.

0:45:28.5 Joseph Morias: Well, that's fantastic. So for the audience, again, obviously the team at Airy had already had some great experience with data streaming. In 2013, that's about as early as it gets. If you're...

0:45:39.4 Steffen Hoellinger: It was a previous company. It was like...

0:45:41.0 Joseph Morias: Okay. At a previous company. No, I realize it's not Airy, but you had engineers that had that experience. So for other organizations that were thinking about adopting data streaming that may not have those professionals that have that experience, fret not, Confluent has this amazing documents repository. In fact, if you search Kafka and many of the underpinnings of Kafka, you'll probably end up at our docs page. But we also have professional services and training materials that can really help any organization adopt data streaming. If you're just dipping your toes into the water or if you're really looking for some advanced patterns like CEP, which Steffen mentioned. I'm curious, how do you get customer buy-in for data streaming, Steffen? So I imagine that a lot of your services, they obviously run on top of data stream. What if a customer is completely batch? How do you get them to buy in and kind of go through the pain of building data streaming? Or do they not need to do that to utilize your services?

0:46:32.9 Steffen Hoellinger: We try as much as possible to work with whatever they have in place right now and kind of come on top as an intelligent layer, utilizing AI and data streaming. Then once you put the data in motion, once the data basically moves once, and you also, in our case, it's specifically this feature of schema intelligence. So basically, when you compare, let's say, your not so perfectly annotated metadata that you have as an organization, empty columns for comments all over the place, table comments. We often see those not utilized at all, even in large organizations. So you have cryptic names of the columns. That's basically all you have. And maybe a human with human intuition can figure out what it actually means and what data could be in that column or in that field in a schema. You actually, yeah, you don't want to go back. Once you experience once how well it could work with something like schema intelligence, I mean, it makes so much sense that you don't want to go back.

0:47:41.3 Joseph Morias: Now, let's shift gears and dive into the real hard-hitting content, our data streaming meme of the week.

[background conversation]

0:47:57.2 Joseph Morias: All right, Steffen, what did you think about that? That camel did not seem very happy.

0:48:03.0 Steffen Hoellinger: Actually, I wasn't even sure if it was a camel or if it was a llama or something, maybe?

0:48:08.9 Joseph Morias: I think it was a camel.

0:48:09.5 Steffen Hoellinger: You're sure?

0:48:10.3 Joseph Morias: That is not my area of expertise.

0:48:12.5 Steffen Hoellinger: But yeah. I think everybody who's been there personally can kind of feel the pain of that animal.

0:48:18.9 Joseph Morias: Yeah, I remember. It wasn't a data streaming outage, but we had a NAS outage many years ago and I slept in the data center waiting for the disks to rebuild, and I remember making noises like that. So before we let you go, we're gonna do a lightning round, byte-sized questions, byte-sized answers. That's B-Y-T-E. Like hot takes, but schema-backed and serialized. Are you ready?

0:48:49.5 Steffen Hoellinger: Yeah.

0:48:50.6 Joseph Morias: All right. What's something you hate about IT?

0:48:53.9 Steffen Hoellinger: Bad documentation.

0:48:54.6 Joseph Morias: Good answer. What's the last piece of media you streamed?

0:48:59.8 Steffen Hoellinger: An event log, obviously.

0:49:03.1 Joseph Morias: What hobby you enjoy that helps you think differently about working with data across a large enterprise?

0:49:09.0 Steffen Hoellinger: I personally cook a lot for my family and really enjoy it and I think, yeah, it's actually a quite interesting analogy in terms of, you know, you have the recipe, like the schemas, and everything needs to come together and also, too many chefs can spoil the broth.

0:49:25.8 Joseph Morias: I really like that. Can you name a book or resource that has influenced your approach to building event-driven architecture or implementing data streaming?

0:49:34.7 Steffen Hoellinger: Yeah. Actually, when I was young, I was a big fan of Umberto Eco, and there is one book called "Baudolino" it's a novel, about a boy in the middle ages who continuously invents new stories, and there is always a matter of judging which of these stories are true, so you have to avoid hallucinations in that regard and, yeah, I always felt it was a really very interesting subject that has a really beautiful analogy to data streaming.

0:50:14.9 Joseph Morias: Yeah. That seems very apt. So what is your advice for a first-time chief data officer or someone with an equivalent impressive title?

0:50:21.8 Steffen Hoellinger: Yeah. I think it's, there is no value really in all the data that you collect, maybe in a data lake or somewhere, unless you really put a business use case towards it and, yeah, now in the days of AI, you need to put value to that data using AI.

0:50:39.1 Joseph Morias: Now, any final thoughts or anything to plug?

0:50:41.7 Steffen Hoellinger: Yeah. I think AI agents will not go away anytime soon. I think, as an organization, you should rather start working with the tools that are out there. You should adopt data streaming to the extent you haven't done so, and the integration of AI and data streaming is supposed to be groundbreaking and providing the grounds for a really interesting future.

0:51:06.7 Joseph Morias: Well, listen, thank you so much for joining me today, Steffen. It's been an absolutely fascinating conversation. And for the audience, please stick around because after this, I'm giving you my three top takeaways in two minutes. Wow! What a great conversation with Steffen. I mean, I learned a lot about generative AI today and specifically agentic AI. But here are my three top takeaways. First is, enabling business users. Again, if you've been in this industry long enough, I've been in it for 25 plus years, and you've heard this before, like, hey, our technology is gonna make it easier for business users to create websites or to blog or whatever it is. And it has. But this ability to have business users, people that are just not technologists as their given trade, they work in different parts of organizations. To give them the power to create something as complex as a streaming pipeline with stream processing, but with the governance to limit their ability to potentially break anything by having these agents, these powers agents, that is just unbelievable to me. So a business user can say, hey, AI, can you look for anomaly detection along the lines of fraud?

0:52:25.1 Joseph Morias: And it just can do it because it has, one, it can understand natural language, it has access to the data sources and it can build the streams and can inference. Just mind-blowing. And the next thing that really caught me off guard was human versus AI teams. And the idea that, hey, this particular work is gonna be handled by the AI teams and this particular work is gonna be handled by human teams. It's something I don't think I'm quite ready for but it is absolutely going to be the future. And maybe shifting our mindset towards that is prudent at this time. And coming fresh off of Google Cloud Next 2025, I have to call out agent-to-agent communication. It was all the buzz at the event and of course Steffen called it out, and what they're trying to build at Airy is absolutely gonna depend on that because you have different models for different agents. Some are really good at inference, some are really good at reasoning, and there's models that are gonna come out in the future that are good at things that we're not even accounting for yet.

0:53:22.6 Joseph Morias: So, this idea of having one agent handle the first piece of a data flow and handing it off to another agent, which has a more optimized model for that next step, just absolutely fantastic. I really think Airy has an incredible future and I can't wait to keep track of their growth and the success of them and their customers. That's it for this episode of Life is But A Stream. Thanks again to Steffen for joining us, and thank you for tuning in. As always, we're brought to you by Confluent. The Confluent data streaming platform is the data advantage every organization needs to innovate today and win tomorrow. Your unified platform to stream, connect, process, and govern your data starts at Confluent.io. If you'd like to connect, find me on LinkedIn, tell a friend or coworker about us, and subscribe to the show so you never miss an episode. We'll see you next time.