Life Is But A Stream

Ep 19 - How CARIAD Powers Software-Defined Vehicles with Real-Time Data Streaming

Episode Summary

How CARIAD built a multi-region Kafka platform unifying data from 45M vehicles across 90 markets.

Episode Notes

45 million vehicles, 90 markets, 12+ iconic brands, each with its own data silos, standards, and infrastructures. 

In this episode, Chetan Alatagi, Solution Architect reveals how they transitioned from fragmented legacy ETL silos to a Unified Data Ecosystem—a global data streaming highway that turns vehicle telemetry into real-time value.

Chetan joins Joseph to go under the hood of a multi-region, multi-cloud architecture powered by Apache Kafka® and Confluent Cloud that ingests petabytes of vehicle telemetry, enforces data governance from edge to cloud, and powers mission-critical use cases like real-time emergency services, friction data alerts, and automated compliance across the entire Volkswagen Group.

You'll Learn:

Whether you’re battling Apache Kafka operational overhead or building a global data mesh, this episode is a blueprint for large-scale platform modernization and real-time fleet analytics.

About the Guest: 
Chetan Alatagi is a Solution Architect in Solution Management at CARIAD, the automotive software company of the Volkswagen Group. He works within CARIAD's Connecting Cloud division on the Data and AI Platform, where his team is focused on unleashing vehicle data through a Unified Data Ecosystem that powers connected, intelligent experiences for brands across the VW Group portfolio. He is a certified SAFe 5 Architect.

Guest Highlight:
"We chose a neutral event highway and migrated brand by brand, easing integration with simple configuration and easy onboarding. We upheld security standards, delivered on time, and enabled a switch with almost no data loss." – Chetan Alatagi

Episode Timestamps: 
[00:47] Guest Introduction + CARIAD Overview 
[06:10] Segment 1: Data Streaming Goodness
[19:52] Segment 2: Beyond the Stream 
[43:36] Segment 3: Quick Bytes
[51:00] Segment 4: Joseph’s Top Takeaways

Dive Deeper into Data Streaming:

Links & Resources:

Our Sponsor:  
Your data shouldn’t be a problem to manage. It should be your superpower. The Confluent data streaming platform transforms organizations with trustworthy, real-time data that seamlessly spans your entire environment and powers innovation across every use case. Create smarter, deploy faster, and maximize efficiency with a true data streaming platform from the pioneers in data streaming. Learn more at confluent.io.

Episode Transcription

0:00:00.2 Chetan Alatagi: When you go to a development team and you talk to developers and they say, "What do you think about Confluent Cloud?" so they would be excited because, as I said, they can just start working from day one. So convincing the teams because you are creating value right from the beginning, that's much easier.

0:00:24.5 Joseph Morais: Welcome to Life Is But A Stream. Today, we're gonna be talking to CARIAD, a automotive software company that is dealing with the challenge of integrating multiple brands at massive scale. I'm really excited about this one today, folks. I'm your host, Joseph Morais. Let's get into it.

0:00:47.5 Joseph Morais: Let's jump right into it. Tell us all about CARIAD, what CARIAD does, and what you and your team do for CARIAD.

0:00:55.6 Chetan Alatagi: Yeah, thanks, Joseph, for having me on the show.

0:00:59.8 Joseph Morais: Absolutely.

0:00:59.9 Chetan Alatagi: CARIAD is the automotive software company of the Volkswagen Group. And what we do is, we are developing advanced systems with synergetic tech in the areas of driver assistance, automated driving, infotainment, cloud, and connect across the Volkswagen Group. Our connect and cloud solution, which provides connectivity solutions, it powers more than 45 million vehicles in around 90 different markets and constantly growing. Thus, it becomes one of the largest automotive clouds in the world. I work as a tech lead, solution architect in the connect and cloud, in the data and AI platform, where we are moving ahead with a motto of unleashing vehicle data with our so-called solution, Unified Data Ecosystem. What we do, we collect data, we manage data with excellence, with accuracy, security, and compliance. You have to test rigorously all the different functions within the vehicles. So for that, our data management solutions, we provide ways how you can or how our teams can ingest terabytes to petabytes of data using our hubs which we have. Once the data is ingested into the cloud, we have to polish it. We have to convert it into the proprietary formats before it could be worked on.

0:02:47.7 Chetan Alatagi: And then they can work on the traces. They can do the error management. And then with this, the testing's kind of complete for the sensors, ECUs, and then, of course, some kind of an approval to go through this testing phase. And now comes the post-production data, right? The real customer vehicles which are on the roads. There, we do have a lot of data products which we develop our own. We also provide our teams support in building data products for product optimization, legal, and compliance use cases. So, I don't know, there are a lot of use cases which we are currently handling, like intrusion detection systems, predictive maintenance. I will probably come to some of those use cases.

0:03:41.2 Joseph Morais: Yeah, I was gonna say, as we kind of walk through a bit more and dive into that architecture and how all that works together, I'm sure you'll get through that. But to summarize and kind of regurgitate for the audience, CARIAD builds incredible integrated software for VW Group brands. So if you folks are curious, search VW Group. You'll be surprised how many different brands you know fall under that if you don't know already. And I imagine that's a pretty incredible undertaking, because prior to this larger VW Group, a lot of these brands were separate entities. And I imagine they were all building their own software for their platforms. And it sounds like CARIAD's ultimate goal was to kind of reduce some of that duplicate effort and to build a unified system across all of the brands and then start to utilize that aggregate data, both pre-production and post-production. Did I summarize that well?

0:04:37.2 Chetan Alatagi: Yes, in parts, yeah. So we started with a specific vehicle platform to begin with, right?

0:04:46.0 Joseph Morais: Okay. Got you.

0:04:46.8 Chetan Alatagi: That was one of the modern vehicles which are currently running around the worlds which we have built. But what we have seen in the meanwhile, that we do have different data silos in the group which are pretty much probably doing the same thing. So pull the data from the vehicle, upload, and then analyze it, and then build great data products. But just imagine this one from a data product point of view, right? If I need to build a cool data product, I need to grab data from this team, this platform. I don't know. It would be really hectic to go through different teams, different standards, different people, teams. It's quite scattered. So there we started looking into similar data products where we can combine. And now we have already integrated quite some data platforms itself into our unified data ecosystem. And this change is currently ongoing. So we constantly analyze if there are any similar data products or platforms. We kind of try to integrate them into our platform. It's economies of scale at the end.

0:06:11.8 Joseph Morais: Absolutely. All right. Well, I'm excited to go through that because data products, I heard some integration in there. I bet stream processing is part of the story, but let's get into that in our next segment. So we set the stage. So let's dive deeper into the heart of your data streaming journey in our first segment, Data Streaming Goodness. So you've already alluded to this and I think we'll start to dive deeper. But my first question or my follow-up question is, what have you built or are currently building with data streaming? And it sounds like just to kind of put words in the air, is that you're building a unified kind of data substrate, and I imagine you're using some of the data streaming integration technologies to kind of tie all of these brands together. Is that a fair assessment?

0:07:08.1 Chetan Alatagi: What we have built using data streaming is multi-region architecture with high availability, low latency, and of course, making sure the compliance according to the local data regulations. Just to go it from end-to-end, we have built an intelligent data collector with different agents within the vehicle which you can actually configure. Okay, if I want to collect a specific set of data from sensors, bus signals, CAN data from the vehicle. We have built an order management solution in the cloud where I can go click on different signals. Hey, I need speed, mileage, or different set of signals. I configure that from our catalog, very easy to configure, and then push into the vehicle. So vehicle starts collection of this data and then uploads. And here comes the magic of streaming. Once the data needs to be uploaded, there are several mechanisms how we can upload the data. So streaming fashion, we can upload the data into file uploads, but let's focus on the streaming part. This is where now the focus is. What we do once the data is streaming. There's quite a lot of hops with MQTT brokers, Kafka, and then the whole analytics and AI and dashboard which comes later.

0:08:52.8 Chetan Alatagi: So once we have the data into our Kafka, we have streaming pipelines which validate, which enrich, which transform the data and normalize. So normalize decoding the signals to interpret the values. Once we do that, it's important that we make the data discoverable for our consumers. We need to make sure our data sharing capabilities reach across our consumers, our partners, our internal teams.

0:09:25.0 Joseph Morais: That is very cool. I mean, the idea of having a front end like that that almost treats it like a store. I add these things to my cart and suddenly I have the data I want. Now I'm curious, is this post-production data? This, I assume, is ultimately there to build a better vehicle, right? That's why you want post-production data, so you can figure out what's working, what's not working, and then make that next generation or a next revision of that vehicle even better. Is that a fair assumption?

0:09:51.8 Chetan Alatagi: Correct. It's not only for that, right? So product optimization is something which comes into this category. And then there are certain legal use cases which we need to handle, as providing some emergency services. And then there are certain compliance use cases which are very important to be handled in EU, for example, right? In Europe, of course, the GDPR everywhere. EU Data Act is something which is probably... Some of the colleagues who are listening in the EU region are familiar with this. So it's just basically saying it does not only go for the product optimization to build best vehicles in the future, but also making sure we follow certain standards, regulations at the same time. So it's much...

0:10:51.6 Joseph Morais: Right. So some of it's about optimizing the future, but some of it's about using that data to help you work that customer right now, like you said, in an emergency service where you perhaps detect a collision and proactively reach out to emergency services. I think that's incredible value add. And that's the thing about tech that I know tech is scary to a lot of people, but things like that are life-saving, right? Technologies that detect your eyes and whether you may have passed out or drifted asleep and then suddenly it parks the car for you. That's the future that I'm extra excited for.

0:11:30.0 Chetan Alatagi: Exactly. Yeah.

0:11:31.0 Joseph Morais: Yeah. Now, in the middle of that, you mentioned that all this data is coming through your data streams, but then that data is then shared with other teams, with other partners. Is that being shared through your data streams as well, or do you have another system for the sharing piece?

0:11:42.5 Chetan Alatagi: We have multiple mechanisms to share. So yes, we do share directly from our Kafka streams, from our topics where we have robust mechanisms how we make sure we assign certain groups, we make different roles which are able to access for a specific product.

0:12:08.9 Joseph Morais: So it sounds like that sharing is very subjective.

0:12:11.3 Chetan Alatagi: Yes.

0:12:11.6 Joseph Morais: So I'm curious, what were the underlying data or technology challenges that led CARIAD to adopt data streaming? Did you guys know out the gate when you were first kind of building these systems that data streaming was the way to go, or did you start down a different path and eventually land on data streaming?

0:12:28.4 Chetan Alatagi: Well, data streaming was and is not the new topic within the group. There are several teams within our groups that have actually excelled at it. So it's not the first time that we are looking into data streaming. Having said that, as I already talked about having different silos of data platforms, everyone used it in different ways. Some of the platforms were also kind of choking only with the request-response stuff. So it definitely made sense that, okay, we introduce the data streaming part. Again, every architecture which was built, which sounds like legacy now, was built with a very specific purpose back then, with the requirements at hand back then. So there's nothing right or wrong to say. It's just at this time, probably it is not used to the full context. And there were some data platforms which were using standard ETL jobs. Scaling was one of the biggest issues, produced a lot of wastage of resources to some extent. There were redundant components with similar logic at multiple levels, which was not required, creating just more hops. So what we tried to do was we bring a leaner architecture with stream processing, making it faster, obviously, and then making it easy to consume by our clients.

0:14:18.2 Joseph Morais: I'm curious because you mentioned this more than once, and again for the audience that may not be familiar with this technology, as data ingresses through Kafka or a data streaming platform, you can invoke stream processing against it. Now, you could of course build a microservice that consumes a message, does something to it, and reposts it, but stream processing is generally done inside of the cluster or using some specific frameworks like Kafka Streams. I'm curious, what stream processing technologies are you using today at CARIAD?

0:14:48.4 Chetan Alatagi: Very good question. I think we did go through a lot of decisions, architecture decision records, and we analyzed a lot of possible solutions. How do we do it? Yes, of course, you can just do this with microservices. Then you have some cloud-native solutions which you can go for it. So it's fine to do it. But we chose for Flink. We chose for Flink because it's the modern tech stack. It's one of the fast stream processors, stateful, stateless, for batch, for events. It was fitting to our needs. And that's where we said, okay, let's go ahead with Flink. Again, how we set this up was something which we had to take a middle route. So you can use Confluent Cloud Flink cluster. You can also use your own Flink cluster.

0:15:57.2 Joseph Morais: That's right.

0:15:57.5 Chetan Alatagi: So we went through a middle route because we had to also solve the challenge in China, for example, where we had to use Confluent Platform, and there it comes in handy. But we are currently using Confluent Cloud Flink cluster as well as our own cluster, which absolutely is one of the best decisions which we made.

0:16:21.5 Joseph Morais: That's great. I know there's a lot of fans here of Flink that are gonna be delighted to hear that. I like Flink a lot. I'm a little jaded or a little... I'm a little biased. In fairness, I love squirrels, and squirrels are the mascot of Flink, right? Because the squirrel always gets its nut. But you kinda listed the reasons why I'm such a big fan of it. It's extremely fast. It's very flexible. You wanna use it in batch, use it in batch. You wanna use it in real-time, use it in real-time. You wanna do both, do both. So I think you made a really good decision. Now, walk me and the audience through how you thought about addressing data integration between the different brands to unify software platforms. And I'm particularly curious, are you using any Kafka connectors to handle those integrations?

0:17:05.9 Chetan Alatagi: We definitely use some of the Kafka connectors, right? Like the MQTT to Kafka is one for the ingestion part. There are several Event Hub in our surrounding ecosystems where we use the Event Hub connector with Kafka. And yeah, from the consumption side, I think we use REST APIs currently. However, we tried to limit the use of connectors, but we still ended up using some of the connectors. Some of the same connectors are also being used for the blob, for the Azure Blob or ADLS Gen2. Of course, there are more than, I don't know, 100 connectors, so it's quite a rich ecosystem. We are taking step-by-step in this, also looking at a lot of database connectors because there are exactly the CDC type of connectors which we have to use for a lot of notifications. So yes, we are using some of them, but we are now definitely looking at a lot of other data connectors as we are exploring more and more into the Confluent Cloud, which is evolving very fast.

0:18:31.2 Joseph Morais: That's great. I'm glad you already dipped your toes in because I've been around the industry a long time, 25, 26 years, I'm dating myself, and I realize just how many different data systems there are, right? And it's very challenging to get them to all speak the same language. And for the audience, the reason connectors exist, there's this whole framework called Kafka Connect, is because we acknowledge that not all systems are gonna talk the Kafka protocol. Now, ideally, you use the Kafka protocol when you can, especially if you look at some cloud service provider native services, they now can natively talk Kafka. And when you can do that, you should do that. But whenever you run into a scenario like they have at CARIAD where you just can't do that, like you're interacting with a database or you're ingesting MQTT, you can use these connectors to kinda fill that gap. And they act as producers and consumers, and they talk to these upstream or downstream systems and kinda transparently integrate other these data destinations and sources into your data streams. Next, we're gonna dive into how your partnership with Confluent solved your data challenges, but first, a quick word from our sponsor.

0:19:45.0 Speaker 3: Your data shouldn't be a problem to manage. It should be your superpower. The Confluent data streaming platform transforms organizations with trustworthy real-time data that seamlessly spans your entire environment and powers innovation across every use case. Create smarter, deploy faster, and maximize efficiency with the true data streaming platform from the pioneers in data streaming.

0:20:21.9 Joseph Morais: Now we'll go beyond the stream on why Confluent was the right fit. Now, Chetan, with our partnership, how did your teams tackle addressing data integration between the different brands? So I know ultimately you talked about some of the tech that was utilized, but what did day one look like? What was that first kinda initial attempt to integrate the first brand that you onboarded, for example?

0:20:49.1 Chetan Alatagi: So earlier integrations, a bit of backstory, right? Earlier integrations were a bit of complex and resource-intensive, especially in terms of data consumption, environment setup. A lot of times it happened that the brands, our consumers, wanted a rapid access into our ecosystem, right? And then a seamless way to integrate. But this was kind of hindered by some technical prerequisites, some bottlenecks, a process-heavy onboarding journey, which slows down the whole integration life cycle, not only for the ones who consume, but also for we to look at now the next step in the data platform, right? So you do not want to get stuck there. So we had to choose this neutral event highway with Kafka, which we made. And then we were migrating also from a previous platform, so where we did set up certain standards, right? We had schemas, we had API contracts, we needed to adhere to this. So making sure the backward compatibility along with the extensibility for the future, so what can be brought. So how can you extend this?

0:22:26.3 Chetan Alatagi: So we went brand by brand after this migration, making sure that nothing changes for you, but it just eases the integration for you. You just need to configure a couple of clicks. Easy onboarding. At the same time, we never forgot about the security standards. So all the RBAC, ABAC policies were intact and even probably strengthened with the Kafka ecosystem. And we were able to deliver in time. So the trust with the brands was possible because we had very clear timelines, we had an extensive documentation which we put, and it was very easy to use. Right from the different stages of development, we were allowing them to kind of integrate, and then it made so easy for them to switch from the previous platform to the new platform with probably almost no data loss. That was absolutely tremendous.

0:23:39.3 Joseph Morais: That's excellent. I mean, honestly, that's the right recipe to get an engineer to do something. Because the hardest thing to do, many times, is to undo something that's already working, right? Even if it doesn't work great, the alternative is that we have nothing, right? So you had the right recipe. You're like, "Here is," I like the way you said, "a neutral event highway." I want to use that. We made it as simple as a few clicks. The onboarding's easy, and the security might even be better than what we had before. That is the right way to incentivize someone to make change. Now, with the experience of the engineers already at CARIAD and their experience with data streaming, it sounds like your architecture team knew that data streaming was the right answer to build this new platform at CARIAD. But there's a lot to wrangle with running Kafka yourself. So I'm curious, can you tell the audience what made CARIAD choose to work with us here at Confluent for their data streaming platform?

0:24:34.1 Chetan Alatagi: Yeah. So we chose Confluent because some of our engineers and also we know the management or operation of Kafka at scale is brutal without enterprise-grade support. And we needed production-ready solution. We needed a solution which is compliant. We needed a solution which is multi-cloud, multi-region capable, operations relief at scale, right? So, I guess, we don't want to be in the business of Kafka operations.

0:25:25.8 Joseph Morais: That's right. I know I remember reading like when CARIAD was formed, everything was kind of like, "Let's go at 100 miles an hour" from the very start. So I think it's very pragmatic to consider something like a managed service, especially from someone that you trust, like some of the original founders or originators of the technology. It's the right move. I mean, you said it yourself, you're not in the business of running Kafka, right? Like, that's undifferentiated heavy lifting. And what you are in the business of is building these incredible software technologies for all these brands. Only you can do that, but we can happily run Kafka for you.

0:26:06.8 Chetan Alatagi: I mean, as I said, there were experienced engineers who could do this. But you have to be also true that, okay, what is needed now in the market?

0:26:18.6 Joseph Morais: That's right.

0:26:19.3 Chetan Alatagi: So again, if you want to manage yourself, you need to set this up right. And for that, you need a lead time of a couple of months at least until everything has been set up. And that was something which we avoided in the beginning. So we just went on, "Hey, here's the tool, everything is ready. Just start with your performing world outlook, create topics and then start working." So that was really development right from the first day of having the Confluent Cloud.

0:26:55.4 Joseph Morais: That's great. And with you working with these other brands, if data streaming wasn't necessarily new to CARIAD, but the other brand engineers it was, one of the best ways to get someone onboard something is to onboard them onto something that won't break, right? Because if you try and... And I've had this happen and I've been embarrassed by this, you try to onboard a new team into a new technology and it's brittle and the thing goes down and now suddenly no one ever wants to use it again. It's very easy to lose trust, very hard to build it. And something you mentioned I thought was very interesting is you mentioned running Kafka at scale is brutal, and you're right. But your requirements specifically, multi-region, highly secure, multi-cloud, that's when it gets really hard, right? If you try to run Kafka within just a couple of EC2 instances in one AZ, that's okay, not too hard. Now let's make that multi-AZ. All right, that's a little bit harder. Now let's make that multi-region. Okay, that's a lot harder. Okay, now let's make that multi-cloud. Even harder.

0:27:51.3 Joseph Morais: And the fact that you were able to use all of these different kind of differentiators of Confluent Cloud really makes me happy. I've been here before any of that. Most of that existed. So it's great to see that these features are really doing the job for you and your team at CARIAD. So I'm curious now, okay, you have your data streaming platform, you have your highway. After integrating data with the different brands and unifying software platforms, what were some of the results today? I realize you probably have even more aspirations in the future, but what are some of those benefits that integrated data is helping the brands and the customers as well today?

0:28:27.0 Chetan Alatagi: I mentioned earlier the data discoverability part, right? Yes, a lot of variable platforms collected data which were kind of segregated. First of all, we were able to integrate that and then focus on the areas where it actually helps our consumers. Make our data products visible. Show them some example data sets on a catalog. This was quite easy. And just imagine if we were stuck up with setting up Kafka, this would have delayed some time. So I think there it was possible on the discoverability, governance part, fast access to quality data, right? That's very important. Operations, which I already explained, is also quite a topic of concern. The quality of data within our group is everything. So I guess you agree that as the quality of data degrades, it could ultimately end up in an AI hallucinating disasters.

0:29:58.9 Joseph Morais: Sure.

0:30:00.5 Chetan Alatagi: So a non-quality data which propagates throughout the system, it's like, I mean, if you want to fix this, then you have to have bandages or plasters across multiple levels, which is very risky. And it also... I don't know how to do some audit trails in that, some lineage. It will be very difficult. And yeah, so with the age of AI and AI models, I guess, the quality of data was very important. So this was...

0:30:34.8 Joseph Morais: Yeah. I'm glad you talked about quality data because that's something we talk about here a lot. We talk about this idea of something called shifting left, which is where you do your stream processing as close to the source as you can. Because the people that are producing, or the groups or the technologies that are producing the data, know the data the best. So a lot of times people will have that data go through many hops and then process it at the end. And then you have these poor data engineers in some service that go, "I have no idea what this data is. What am I supposed to do with this?" Right? So if you can get that stream processing, building that quality data early as it ingresses your system, you don't run the risk of that whisper-down-the-lane type of scenario where you keep whispering the same message, but at the end of the line it doesn't sound like anything. It doesn't resemble what you started with. And you mentioned it earlier, using Schema Registry. So again, for the audience, Schema Registry is how we build data contracts to ensure that the data that is being emitted by producers can be consumed by consumers and has mechanisms for evolution as you change the data sets. But also, ensuring that quality midstream by using stream processing like Flink, a really smart way to ensure that you get that quality data because that quality data is absolutely necessary today. But as you mentioned, with AI, it's going to be even more critical.

0:31:55.0 Chetan Alatagi: Correct. Absolutely.

0:31:56.7 Joseph Morais: So I'm curious now, I know your team had experience with stream processing, but what about your leadership? Was there anyone at CARIAD that said data streaming wasn't the right... We don't believe this is the right approach or we don't want to make this investment? And if so, how did you get your team and those leaders on board?

0:32:17.5 Chetan Alatagi: I think two aspects or two perspectives. So when you go to a development team and you talk to developers and they say, "What do you think about Confluent Cloud?" So they would be excited because, as I said, they can just start working from day one. So convincing the teams because you are creating value right from the beginning, that's much easy. And now comes the hard part, right? So the leadership. How do you convince them? I definitely remember when we started this whole journey and my solution manager comes to me and says, hey, Chetan, we need an architecture now, a redefined architecture which helps us to handle all the use cases which we are currently handling in the previous platform, it was handling streaming, it was handling batch uploads, so we needed something more powerful. So there I pulled up all my architects from different expert groups. We sat down, we went through section by section within the whole data platform chain, identified, okay, where is the first optimization? Where could be the replacement? So we made, okay, for the event store, what is the solution? For the stream processing, what could be the solution? Everything was fact-based.

0:33:53.6 Chetan Alatagi: So identify the problems and then provide two, three solutions and then come up with a decision. So everything was really well documented and showed, "Here is what we propose." And we gave the leadership two options. We could do our own Kafka installations or we could also use Confluent Platform, but that could involve some initial heads-up, right? That needs some time to properly set this up. When you go for Confluent Cloud, you have to be... So you can basically start with the development. So this was two aspects which we gave, but then also comes the cost aspect. So there we went into very detailed estimations of what are our current loads, how do our loads trend in the next months, years, and what would be the best decision for us? And before it's too late, what should be our options? So from an architecture point of view, it was presenting with multiple options and our leadership was very confident, okay, go ahead. We need time to market for this. And they were convinced to go ahead and there was the decision made. So I would say, everything went through a proper process.

0:35:27.1 Chetan Alatagi: And of course, we had the support from Confluent colleagues whenever we had issues on certain topics, how to handle something. So they jumped right in, provided multiple options how we can do different things, and then we were really into it. So this was a long story, but I think it was necessary to be told.

0:35:54.3 Joseph Morais: No, no, absolutely. And honestly, it's the first time I've heard anyone answer this question and say, "Hey, we just told our engineers we're using Confluent Cloud and they weren't gonna have to run it," and they were happy. And that actually tickles me. My background's in operations, right? I could still hear PagerDuty go off in my nightmares. And that to me, as an engineer, that's what gets me excited, that I could use something and I don't have to worry about it breaking or turning it over or having to update it and things like that. But I really like the approach you took with the architects and the leaders and you say, "You know what? We're gonna make this facts-based. We're gonna go through every rubric that we need and we're gonna look at all the alternatives, including running it ourselves." And here are the realities of that. You presented that and they said, "Yes, you know what? This is what we value. This is the decision we have to make." So I'm very happy because I think that was the right plan. To the engineers, "How does it affect my day-to-day?" For the leaders, "How is this gonna work from a time to market, from a cost standpoint?"

0:36:48.8 Joseph Morais: And I'm glad Confluent was the right decision. So in a very similar question, can you share advice and lessons learned for leaders like yourself for starting to tackle data streaming? So if you had to go back in time and tell a younger you how to get started with data streaming, what is some of the advice you would give yourself?

0:37:09.0 Chetan Alatagi: I'll put it from an automobile industry, right? So you don't need to stream everything, right? So there are use cases where you need to stream. So there are use cases which are, let's say, earlier nice to have and now have become probably critical, and they do have certain SLAs which you need to provide, and that's possible with data streaming. So you need to be very fast there. So there are use cases where you need to provide the data from the vehicle until the last mile within seconds because it actually creates value. I can just give you one example which I read from our company recently posted. Because it's winter, it's icy roads everywhere in Europe, and it was a harsh winter. Yeah. So our subsidiary actually analyzes the friction data. So basically the friction data from the tire, how much of the tire can actually grip the road surface. Yeah. Why is it important? So based on these analysis, you can come with analysis of how slippery is the road. So you can basically create alerts within the driver's cockpit so that the driver can already take or basically reduce the speed, just diversing some dangerous situations.

0:39:03.9 Chetan Alatagi: So I read from a report that it was just in a couple of months in December to January, around 2.5 million alerts were raised from, I don't know, millions of data points which we collected from our group. And now you can imagine how this would be helpful in the autonomous driving, right? So the vehicle has to make decisions. So if you already know the road safety, you need to slow down. So it's already useful. So again, the point is, there are use cases which are absolutely necessary to stream. There are use cases which needs to have, for example, some video uploads. So you can always use a stream. So you can use it in combination. So metadata and events, you stream it. Larger file uploads, you can upload it via the HTTP, HTTPS file uploads and then analyze it later. So there's no one standard to say, so keep all the options open and you should have all of that.

0:40:22.8 Chetan Alatagi: So there's nothing like just stream. Of course, and then comes a lot of compliance things, right? So UNECE cyber security regulations, GDPR, EU Data Act we talked about, those needs to be met. Observability is very important. Security is important. You need to have the throughput of vehicle right from the beginning until the end. The latency should be measured. So I would say these are some of the key decision pillars which I would advise anyone to...

0:41:06.8 Joseph Morais: I really like this idea that you have, like don't boil the ocean, right? Like don't onboard everything to data streaming upfront. Because I agree with you, again, having spent a large portion of my career in data streaming, it feels like everything should be streamed, but that's not the reality, right? It's like if you're starting fresh, why don't you focus in on those super high-value, absolutely crucial, you have to stream, like this case with the safety aspect and the friction or the grip of the tire. I think that's a really smart way. It's like focus on what you absolutely need to stream, and then other things that may be enhanced by streaming you could do later. And then of course, there's gonna be some architectures just don't require streaming. I think as our world is evolving and more and more with AI, I do think more things will look event-driven, but that doesn't mean everything needs to be. And...

0:41:54.9 Chetan Alatagi: I want...

0:41:55.0 Joseph Morais: Go ahead.

0:41:55.9 Chetan Alatagi: I want to stream, yeah, Joseph, but it's... So as you said, so in the fast-evolving world, I think we need to come up with faster insights, linking back, faster feedbacks. Probably we will be doing only streaming in the near future. Don't know. But if you ask me today, keep options open.

0:42:19.7 Joseph Morais: I like it. And I think this is perfect to kind of, for this last question of the segment. What's the vision for data streaming in the future at CARIAD? Like, what are you really excited about over the next year or two?

0:42:33.1 Chetan Alatagi: I would say we have the open table format which integrates with Apache Iceberg and Delta Lake with the Tableflow.

0:42:44.7 Joseph Morais: Yeah, Tableflow. Yeah, Tableflow is cool.

0:42:47.7 Chetan Alatagi: We are waiting for this for the general availability. It's there. Now we could use. I think this is also the way how we can shift left. And then we are looking closely into the newly released Queues for Kafka for various task-based workloads. I think this would be very handy. What we, again, have to really look into the current streaming pipelines is combining this with some AI models for some real-time anomalies which we could detect throughout, maybe unnecessary data right from the beginning before it's not needed, which we did until now, let's say, very late in the post-processing logic could be also moved. Yeah, so definitely there's a lot of AI aspect which comes in handy with this.

0:43:50.9 Joseph Morais: Of course.

0:43:51.7 Chetan Alatagi: Yeah. So yeah, so I think these are the topics which I could see immediately looking at, and we are actually looking at it now.

0:44:02.3 Joseph Morais: That's great. More data, right? Just now data in different places. Queues for Kafka, that's very cool you guys are looking at that. That's something we just released, just kind of released into both open source but also at Confluent. So I'm really interested to see what kind of use cases you build with Queues. All right, so before we let you go, we're gonna do a lightning round. Byte-sized questions, byte-sized answers. That's right, that's B-Y-T-E. Think of them like hot takes but schema-backed and serialized. Are you ready?

0:44:39.4 Chetan Alatagi: Absolutely.

0:44:40.5 Joseph Morais: First thing that's off the top of your head, what's something you hate about IT?

0:44:43.7 Chetan Alatagi: It's invisible when the work is done well.

0:44:48.4 Joseph Morais: Oh, yes.

0:44:50.4 Chetan Alatagi: So, for example, if everything works smoothly, no outages, no slow times, then people start assuming maybe IT is not doing much.

0:45:07.3 Joseph Morais: The dull edge of success.

0:45:10.6 Chetan Alatagi: As soon as something doesn't work or something breaks up, suddenly everything is very urgent and everyone is looking at this, and then somehow IT is not doing its best. So it's a strange dynamic, but it's a classic one.

0:45:34.2 Joseph Morais: No, it is. And again, coming from operations, it's one I can understand completely. This whole asymmetrical relationship we have with success, it is very frustrating. What is your hot take on the future of AI?

0:45:47.0 Chetan Alatagi: So AI is the future. So I think that's pretty much everyone says. I wouldn't just say it's the future. It's not just the future in its own right, it's an interface to intelligence. Back in university, for me, cloud computing was the future. And look now where we are. And tomorrow might be quantum computing is the future. So I would just say it's just one step towards more intelligence. And as a data guy, data is the real asset.

0:46:29.2 Joseph Morais: I agree with you.

0:46:30.2 Chetan Alatagi: Data is the real asset. And to empower AI, we need to build proper data architecture. So those who master data architecture, they just don't keep themselves for now, but they also define what's possible for AI.

0:46:51.0 Joseph Morais: Yeah, I like it. See, I'm gonna put this together succinctly for you because I like it so much. If you can master data, you can master any future technology. I like that. What's a non-tech activity or hobby that's impacted how you think about data?

0:47:06.7 Chetan Alatagi: Building a house is a good metaphor for data and data architecture, right?

0:47:13.4 Joseph Morais: I like that.

0:47:14.4 Chetan Alatagi: You need to have a proper design before you build. There needs to be a proper structural stability. So before building a house, you need to make sure that smaller decisions are taken seriously. The choice of materials, the appliances which need to be fit, they are absolutely important. Otherwise... So when I look now into the data world, naming conventions, some guardrails, having proper schemas, data models, contracts, they're all very important right from the beginning. In the same way, in the data world, you need to build the pipelines properly. So it's kind of the same. And we need to have a very good sequencing. It's not about having the very nice, shiny house, but you need to also have a very strong foundation, inside materials. So you don't go for installing the drywall before electrification. In the same way, you just don't build nice, sleeky dashboards and AI before you clean the data, before you provide the right quality of data. So I guess, that's a metaphor I would bring to...

0:48:56.4 Joseph Morais: Yeah. I like that analogy a lot, especially you gotta run your electricity before you drywall. That will burn you if you do it the other way around. Chetan, where are you getting outside inspiration? When you think about data streaming or just technology in general, is it from a book or perhaps a thought leader that you follow?

0:49:17.1 Chetan Alatagi: Not books, definitely thought leaders. Maybe because I'm a real-time guy, so they provide real-time perspective and also bring some bold arguments in the conversation. So listening to podcasts, some blogs from thought leaders. I'm really amazed by, just take the example in all the summits currently which are ongoing, when you hear people talking, even the political leaders are currently talking about AI stack. They're talking about different layers of AI stack, and they're actually talking about how to empower the people with AI. So it's very fascinating. And again, thought leaders, they do provide their ideas, their perspective. I just consume all of that. I just use them, not copy, but use them into my perspective. So definitely thought leaders.

0:50:23.9 Joseph Morais: It is fascinating. Not since the onset of the internet have I ever seen so many traditionally non-technical people talking about a technology, and I don't think it's gonna stop anytime soon. It is quite fascinating. So, Chetan, any final thoughts or anything to plug?

0:50:42.5 Chetan Alatagi: Yes. I think the last sentence from you was too much of data. There's so much to consume. I would just take some time for myself for self-reflection. Just as I said, assess what is absolutely needed because nowadays, idea to execution might be very fast because you can prototype anything so quickly, but need to really reflect on ourselves. What is important? What is important for me to kind of automate my work, optimize my work, and how I help my team, my product to basically get the best out of it? So that would be my take and... Yeah.

0:51:36.1 Joseph Morais: Honestly, I think that's a very insightful takeaway to provide to the audience. So I appreciate that. And I want to thank you so much for joining me today, Chetan. And for the audience, please stick around because after this, I'm giving you my top three takeaways in two minutes.

[music]

0:52:02.1 Joseph Morais: What a great conversation with Chetan. Here are my three top takeaways. The first one, right from his mouth, the magic of streaming, right? Once CARIAD was able to build some of this new software for data collection on all these vehicles, they were able to do some incredible outcomes. The first one is product optimization. By having all this data, they're able to build even better cars. Look at that VW Group and the type of cars that are in there. I'm really excited for what that type of optimization will lead to. But it also leads to these magical things like emergency services or even compliance. Imagine you're having one of the worst days of your life because you had some type of accident and you're completely disheveled and you don't know what to do, but suddenly an emergency services vehicle shows right up because your car summoned it. That really does feel like magic. And that's one of the fantastic possibilities with data streaming.

0:52:59.3 Joseph Morais: Chetan mentioned that earlier integrations were expensive and complicated. They were process-heavy. But what they really needed at CARIAD was a neutral event highway. And the way they were able to get people to onboard this newfangled tech was they made it super easy. A few clicks, easy onboarding, and all of the enterprise-ready features that we would expect. Things like roles-based access control, access-based control, and all the security and the authentication. So ultimately, the Confluent data streaming platform became that highway. And it's very apt considering CARIAD builds software for vehicles that allow them to take all these disparate brands that have all these different data systems all over different clouds and tie them all together. How impressive.

0:53:45.4 Joseph Morais: And then as Chetan mentioned, what it was like to get various folks onboard with data streaming. For engineers, they basically said, "Hey, we're going with Confluent Cloud. You're gonna be able to work with data streaming technology and never have to worry about it breaking, scaling it, doing all the maintenance," and suddenly the engineers were in. But for the leaders, they went through and did a full fact-based and cost-based analysis. They looked at every possible solution, all the various forms of managed Kafka, all the different offerings, even running it themselves. And their leadership said, "Absolutely. Data streaming platform from Confluent, Confluent Cloud is what we need to be able to be multi-cloud, multi-region, have all the security requirements, be able to be in data centers as necessary." There was no one else that could do all that for CARIAD. Overall, just a fantastic episode. So many things I learned today.

0:54:38.5 Joseph Morais: That's it for this episode of Life Is But A Stream. Thanks again to Chetan for joining us, and thanks to you for tuning in. As always, we're brought to you by Confluent. The Confluent data streaming platform is the data advantage every organization needs to innovate today and win tomorrow. Your unified platform to stream, connect, process, and govern your data starts at confluent.io. If you want to learn about some awesome Confluent use cases, please check out the Ultimate Data Streaming Guide by my good buddy here at Confluent, Kai Waehner. We'll have a link to this awesome book in the show notes. If you'd like to connect, find me on LinkedIn. Tell a friend or coworker about us, and be sure to subscribe to the show so you never miss an episode. We'll see you next time.