Life Is But A Stream

Ep 4 - From Legacy to Cutting-Edge: Henry Schein One's Data Streaming Vision

Episode Summary

Henry Schein One is modernizing its data infrastructure with real-time data streaming to stay ahead of the curve. In this episode, Chris Kapp shares insights on data governance for HIPPA sensitive information, leadership buy-in, and the future role of GenAI in real-time data strategies.

Episode Notes

Despite its value, legacy data can feel like a roadblock in a fast-paced digital world—Henry Schein One is clearing the path forward with real-time data streaming.

In this episode, Chris Kapp, Software Architect at Henry Schein One (HS1), shares how his team modernizes data management to stay competitive and unlock real-time insights.

You’ll learn about:

How tagging strategy, immutable audit log, and governance keep data secure and reliable
The challenges (and wins) of getting leadership buy-in for data modernization
HS1’s approach to decentralized data ownership, domain-driven design, and the importance of stream processing for scaling
The role of GenAI in the future of real-time stream processing

Get ready to future-proof your data strategy with this must-listen episode for technology leaders facing scalability, governance, or integration challenges.

About the Guest:
Chris Kapp is a Software Architect at Henry Schein One specializing in domain-driven design and event-driven patterns. He has 34 years of experience in the software industry including Target and Henry Schein One. He is passionate about teaching patterns for scalable data architectures. He’s currently focused on the One-Platform initiative to allow Henry Schein applications to work together as a single suite of products.

Guest Highlight:
“It's important to collect the data, try to eliminate our biases and go towards what is delivering quickly. The key is to start small, agile, iterative, and build something small with the people that are excited and willing to learn new things. If it doesn't work, then be agile, adjust, and find the thing that does work."

Episode Timestamps:
*(01:18) - Chris’ Data Streaming Journey
*(03:35) - Data Streaming Goodness: AI-Driven Reporting & Data Streaming
*(21:08) - The Playbook: Data Revitalization & Event-Driven Architecture
*(31:55) - Data Streaming Street Cred: Executive Alignment & Engineering Collaboration
*(32:03) - Quick Bytes
*(40:14) - Joseph’s Top 3 Takeaways

Links & Resources:

Connect with Joseph: @thedatagiant
Joseph’s LinkedIn: linkedin.com/in/thedatagiant
Chris’ LinkedIn: linkedin.com/in/chris-kapp-87868a4
Designing Event-Driven Systems eBook
Designing Data-Intensive Applications eBook
Current 2025—The Data Streaming Event
Learn more at Confluent.io

Our Sponsor:
Your data shouldn’t be a problem to manage. It should be your superpower. The Confluent data streaming platform transforms organizations with trustworthy, real-time data that seamlessly spans your entire environment and powers innovation across every use case. Create smarter, deploy faster, and maximize efficiency with a true data streaming platform from the pioneers in data streaming. Learn more at confluent.io.

Episode Transcription

0:00:06.4 Joseph Morais: Welcome to Life Is But A stream, the podcast for tech leaders who need real time insights. I'm Joseph Morais, technical champion and data streaming evangelist at Confluent. My goal, helping leaders like you harness data streaming to drive instant analytics, enhance customer experiences and lead innovation. Today I'm talking to Chris Kapp, software architect at Henry Schein One. In this episode, Chris tells me how his team is transforming the way Henry Schein One handles data to power better customer experiences and unlock real time insights. We'll explore tools and tactics they rely on, the challenges of leadership buy in and even a glimpse at the future of generative AI in their systems. But first, a quick word from our sponsor.

0:00:06.5 Ad: Your data shouldn't be a problem to manage. It should be your superpower. The Confluent data streaming platform transforms organizations with trustworthy real time data that seamlessly spans your entire environment and powers innovation across every use case. Create smarter, deploy faster and maximize efficiency with the true data streaming platform from the pioneers in data streaming.

0:01:18.4 Joseph Morais: Welcome back. Joining me now is Chris Kapp, software architect at Henry Schein One. How are you today, Chris?

0:01:24.2 Chris Kapp: I'm doing great, thanks.

0:01:26.0 Joseph Morais: Excellent. Let's jump right into it. What do you and your team do at HS1?

0:01:31.0 Chris Kapp: We manage dental software all around the globe. So we have practices in Europe, United States, Canada, Australia, New Zealand, Brazil. Our dental presence in Europe is really strong and we're a major market leader in practice management software for dentists.

0:01:51.8 Joseph Morais: So that when you say practice management, it's like that's everything, right? It's like tracking their patients, billing, you know, getting authorizations. I imagine all of that's included.

0:02:02.3 Chris Kapp: Yeah, the goal is to be a one stop shop. If you're a dentist and you need software, we're there to meet all of your needs from the beginning to the end, all the way through the insurance and the billing, all the treatment planning, scheduling, everything.

0:02:18.1 Joseph Morais: Across all those geos I imagine that's a lot of data.

0:02:19.6 Chris Kapp: It's a lot of data and a lot of complexity to deal with.

0:02:23.3 Joseph Morais: And I imagine that is also, you know, highly sensitive data like in the US that would be HIPAA protected.

0:02:29.2 Chris Kapp: It's the vast majority of that is HIPAA data and we take that as one of our highest priorities is protecting that HIPAA data.

0:02:38.9 Joseph Morais: Tell me more. What is the data streaming strategy that you have at Henry Schein One? In a minute or less.

0:02:45.5 Chris Kapp: We have been historically very good at purchasing companies around the globe to fit the needs that we have, where anytime there's a gap in what we need in the tooling, we're able to purchase companies and fill that gap. What we haven't been good at in the past is making all that software work together. So we have a new strategy. We're calling it the One Platform Strategy. And basically, we need to take all of these pieces that were amazing on their own, but very siloed and make them work together in one suite of products. That's what customers demand, and they rightfully should have that experience. They're used to having that experience where everything just works together and you don't have to worry about, oh, my data's in this place or in that place. It just works together.

0:03:37.0 Joseph Morais: Right. Our cover... Our customers shouldn't have to suffer because we put data in all different places through years and years of tech debt. So, you know, having that kind of consolidated abstraction where all your data can be regardless of where the source is, I think that's a really important goal to hit ultimately, I think, for many enterprises, frankly. All right, so let's get into our next segment. So let's dive deeper into the heart of your data streaming journey in our first segment, which we call data streaming. Goodness. So what has HS1 built or are you currently building with data streaming?

0:04:12.0 Chris Kapp: Like I said, we have the different practice management software that's individual. And what we realize is we need to do a lot more with that. So we need better reporting, we need AI on top of that, we need to stream that data out to other partners that can then enhance the experience. And so we need that data to be available in real time, very high quality. And what we're finding, or what I found at several companies I've been at, is the leaders of the company start saying, what we really need is AI and we need this feature. And what they don't realize is, okay, building the AI isn't. The hard part, is getting the data moving where it needs to be, with the quality in the right place. That's the hard part. And that's what I've been focusing my career on the last 15 years or so.

0:05:05.3 Joseph Morais: That's awesome. Yeah. Gen AI, like any emerging technology, it's thirsty for data and it's only useful if you can plumb that up and make that effective. And I think it's important. You said something really, really interesting. You said quality data. I think that's an important distinction because there are plenty of mechanisms and architectures that you ca get plenty of data through, but you can get a lot of garbage data. Right? So having systems that allow you to prevent that from happening in the first place or allow you to transform data in real time is very helpful. And you know, I think we could talk about governance and stream processing as we move along. But I'm curious, you know, I know what you're building. Right. Give me a picture. And I think, you know, people can formulate on their own, but I'd rather hear it in your own words. You know, what are things like now without this system in place? What are things like today in absence of this unified layer? And did your previous architecture, your current architecture scale?

0:06:03.2 Chris Kapp: I've got to be honest, historically it hasn't been. What happened with one of our main applications is we got to a certain point where we just ran out of scale. That's all there was to it. And that was where it ended. And what we did is do Control C Control V on our hardware and we say, okay, new customers start going to this new version of it. And oh, that ran out again when we got up to 13 of these. And so that, that brings in the other part of scale. Everyone thinks of scale as in like vertically or horizontally scaling the hardware. What about the teams that are managing those 13 different... And that was not working. And so we really needed to lean heavily into something that has partitioning built into it so we can let the partitioning. I mean that's effectively what we were doing is we were partitioning at the infrastructure level and we needed to lean on tools that partition underneath the covers and automatically give us the scale we need. So we're working with Kafka. We brought in MemoryDB for Redis that also has partitioning. That was a great coupling and other tools like we rely heavily on Postgres v2 serverless from AWS that it gives us a lot of automatic instant scale that we need.

0:07:30.2 Chris Kapp: So as long as we're using read replication correctly, then we're getting the scale that way. And so we're honestly, with the new services that we're spinning up, we have so much headroom on scale. I mean, we could just keep going and going and going because it just scales horizontally and it's scaling in cost effectively. So we watch these different services that are working like our Postgres v2 serverless on the weekends, like our dentists go and play golf on Friday.

0:08:02.8 Joseph Morais: And so all those utilization drops.

0:08:05.1 Chris Kapp: Yep. Yeah. And they're, they work brutally hard on the other days of the week.

0:08:11.0 Joseph Morais: Sure. They deserve those cow foundings. Absolutely.

0:08:14.2 Chris Kapp: Yeah, absolutely. But that's what our systems look like is they scale up on Monday, Thursday, they taper off Friday, they do a bunch of analytics and then it's quiet on the weekend. You can watch our server, our database servers go up and then just drop off and then just nothing. Sipping resources on the weekend. So we're able to take advantage of that and get the cost scale as well.

0:08:39.8 Joseph Morais: That's awesome. I think you honed in on something really important and I think a lot of people relate to this. They're used to sharding with systems, right? And that becomes very difficult because now you have to maintain 13 versions of configurations, 13 sets of systems, 13 different sets of storage, memory, compute. Just becomes completely untenable. And you mentioned the human scaling. So do the sharding at the abstraction layer so you can decouple the hardware from the things you're running on it, really ends up being a much better way of doing things, both pragmatically but also in lieu of hammering on our pagers throughout the day or throughout the weekends, because I'm an ops guy and I think, I think about that. I have nightmares of pager duty going off and I empathize with everyone that is still supporting systems.

0:09:27.2 Chris Kapp: That's not the way to run the systems anymore. They should just scale on their own.

0:09:30.9 Joseph Morais: Absolutely.

0:09:31.4 Chris Kapp: Couldn't heal, self heal.

0:09:33.3 Joseph Morais: Couldn't agree more. I mean those things like, like those elastic scaling, when you look and everything is just like scaled down and there's no noise and you know that it can scale up. So tell me what type of data sources and destinations you've integrated with data streaming or intend to integrate with data streaming? I know you talked about the Aurora v2 serverless database, but are there any other like CSP or ISV services that you see part of this architecture?

0:09:57.1 Chris Kapp: So we've got our data warehouse, which is owned by Jarvis, a company that we own, and they're moving to Redshift very quickly. So we're pumping that data into Redshift. We've got our C levels going. We need this data or we need to look at that and how's this working? And so we're pumping data into OpenSearch and into Domo and so our customer service people can get that and out into, as we mentioned before, AI. We've got built AI tooling off of that. It's expected for us. We had one of the largest medical, no, it was at the time the largest medical AI rollout in the history of the FDA of approved features. So lots of different places that we can push that data and get return on that, but we've got to step up and do it.

0:10:56.6 Joseph Morais: Now, I know we talked about the data in the closets and we're talking about this from the lens of you know, dentistry and all the, the tangentially related or directly related services. But that's not like, it's not limited to your, you know, to just dentistry. Right. I mean, across any health services. If I'm a chain of hospitals, medical providers, I have data in each of those locations. Maybe I have a centralized service. Oftentimes I don't. I know for a fact many retail companies or logistic companies will have some type of compute in these remote locations. So this is really like a common scenario where my data isn't just across data centers, it's across brick and mortar locations globally distributed. And the fact that you're able to leverage data stream in a kind of title that is very, very exciting. Especially my background. I worked for a retail company for four years and success we had like computers like under the registers. So it was very fun when one of those broke. So I know, I think you're pretty passionate about this, this next, the subject of the next question. So what, what about data governance? Right? How important is tracking and enforcing the flow of quality data as it first enters your system to HS1?

0:12:08.2 Chris Kapp: That's everything for us because what happens is we get these projects and we get this data and, and what we were running into is because obviously, we care very deeply about protecting the HIPAA data, that's in our top three attributes that we plan for in our architecture. Securing this HIPAA data. We cannot lose that data, we cannot let it get infiltrated. So what we were running into is every time somebody wanted to build a feature, we'd run it by security and they were like, yeah, and how do you lock that down and make sure it's not abused or they're not getting more data and things like that. So what we did is we heavily invested in a naming strategy on our topics. So we have our private internal topics for each team that they can use and then they have the public ones that become that layer that they can consume from. The other thing is that there's a newish tagging feature that's available and architecture would, or, I mean, security would come back and say well, how do you lock that down to make sure they don't get access to this or that or we know that that's really this data.

0:13:18.5 Chris Kapp: Well, what if we just tag each of those topics? Oh, we really need columnar level tagging. Okay, let's do that. Okay. How do we make sure that the developer just didn't say, let's put that tag on there and that's good enough. Now we can share it. Well, we've got that data in a report now. How about you look at the report and you validate it and you approve it and nobody can add that tagging on there. And then all the governance on who's allowed to change that tagging and things like that, it's all built in there. We just needed to hook up to the APIs and add on that. And so security loves us, right, because everything is immutable. We know who did everything and we know what, who can access what and who has been accessing what.

0:14:07.4 Joseph Morais: Yeah, it's the great part of having a data as a log, but then also having things like audit logging, so you know who access that data. But there's something you touched on which is a pattern and I think is very popular and very powerful, especially in today's world, just data rich world that we live in. And that is being able to make a "public topic". Right? A topic where each flow of data is already been sanitized. So, maybe the part that usually is the problem is it identifies the person, it has financial data, things like that. But if you take those handful of fields out, the data is still very useful for analytics purposes and other purposes, but now it's no longer hot potato. Right. So that stream processing and data streaming combination, I imagine, is what powers those public topics.

0:14:54.4 Chris Kapp: Yeah, yeah, there's a lot of processing back there. We use a lot of different patterns behind the scene to get that data to where it's consumable. We follow the Ben Stopford patterns, if you're familiar with that. I mean, the book is published on your site and we...

0:15:10.2 Joseph Morais: Yeah, I've got, I got to work with Ben while he was here. Yeah.

0:15:13.3 Chris Kapp: Really? I'm jealous because I love that stuff. I wish next time you see him, tell them to publish a more detailed version.

0:15:19.2 Joseph Morais: It's so fun actually, without giving away anything, I think we're actually working on one, so.

0:15:23.4 Chris Kapp: Yeah, I love it. Yeah, there's great stuff in there, but like there'll be like one sentence in there hidden in there that has a gem in it. And I read that book twice and a friend of mine pointed out, oh, you know, there's this pattern that's here, and he showed me the line and I read it and it was the latest versioned pattern and I had to read that paragraph four times to even realize like hidden in there was this gem of a pattern that I use all the time now. And so for anyone who doesn't know, the idea is you don't just have the one topic that everyone consumes from. If you need, let's say, a couple weeks worth of data, you track everything and you make that publicly available. But if you need more historic data, maybe having compacted data is good enough. And so you pair that with compacted data and so now people can self serve with what they need. Oh, do I need everything but it's just the last couple weeks or do I just need to know the latest state and then I don't have to process everything? And they pair so well together.

0:16:31.0 Joseph Morais: So let's talk about retention. How long do you need to hold on to operational data and is it compliance driven?

0:16:37.1 Chris Kapp: With HIPAA we have a minimum 6 year retention on that data, so we can't just be losing the data or getting rid of it. But what I'm finding is I quite often say what would happen if we just held onto all the data and people freak out. They're like, that would be so expensive. And then we started doing that in places and we found out not only is it not really all that expensive to do that, but look at all the stuff we can build off of the top of it. Like you mentioned audit logging previously. So we're doing event carried state which is like event sourcing. So Kafka becomes our database and in many places we retain everything in the topic and everything that happened. And so we built a tool, a very security sensitive tool. And somebody said, man, I really think we need to know who did what, when here on this thing. And we're like, oh, turns out we have that information. All we need to do is present it in a place that's readable and then we build this tool off of it and they're just floored.

0:17:54.6 Chris Kapp: Because before what we did is you would build whatever feature and then you would then spend three months going, okay, let's push that into an audit database table. And oh, that got too big and heavy. So let's track the last two months in there and then we'll put it off to a long term retention table and we'll build all the mechanics around that and we'll do it again on the next feature. And instead we were just like, we're using the CQRS part pattern here. So we're separating our reads and so we write to Kafka and then whoever's interested in consuming that data, all they have to do is sign up for that. And one of those can be the audit log. So we'll just publish that report and all of a sudden you've got an audit log in like 20 minutes. You've built this audit log feature on it. So people kind of freak out over we're going to build in this complex pattern of event carried state. They're like, man, that seems like really heavy. And you're going to hold onto that data forever. Yeah. But then the really expensive part of software development, the maintenance, the building, the next feature, it's all gone.

0:19:08.0 Joseph Morais: Yeah, it's interesting, I never thought about this like a concept on return on investment of retention. Right. Like you said, it's costly to build some bespoke query when you don't have a system of record that is exhaustive. And in this scenario you do and you can find creative use cases to say, hey, I know you're going to need this eventually, I'm just going to keep it because this is going to be much cheaper in the long run. You know there was, you said there was a rollout of a Gen AI at HS1. Did that involve data streaming at all?

0:19:40.2 Chris Kapp: That was on detecting cavities.

0:19:44.3 Joseph Morais: Okay, wow.

0:19:45.4 Chris Kapp: So now the dentist is going along and saying I think there's a cavity there. And on the X-ray we've already highlighted, here's one you should look at. Here's one you should look at. This is definitely one. And the data coming back from that is very impressive. The dentists don't like me to say this but we're more accurate than the dentist. [laughter] I've been doing this for 50 years. Yeah, well, the computer's statistically beating but now we're getting into the more difficult areas where because we're security paranoid, we need to make sure that as we build these AI features that they aren't leaking the data. And it's really scary to us because we see these other companies building these AI features and there's no firewall there. We're just saying, well, we've got this security fence built in there so that AI won't leak the data out. And if you break the AI and it starts hallucinating, I mean, we see that all the time. So we err on the paranoid side, but we do see the value of it and so we want to deliver it. But we can't deliver it if we can't 100% guarantee the HIPAA data is protected.

0:21:08.5 Joseph Morais: Right. It comes back to like getting that, getting this project done to make sure you can route the quality but protected data wherever. And Gen AI just becomes another data challenge which I really think that's what it comes down to. It's fancy, right? And it's cool. And now you know, it doesn't matter what level, what industry you're in, it's piquing your interest but it really at the heart of it, it's just another data challenge. Our segment is called the Playbook where Chris, you dish out your winning strategies for getting old, tired, unmoving data in motion. What are the top three tools you rely on for data streaming? And as a bonus, what do you avoid? Like batch workloads.

0:21:50.6 Chris Kapp: To put the shock paddles and wake the patient up and get the data flowing. We bring in change data capture and put that right into the database. And okay, now we've got instant events but I mean, the data's there but it's not a great experience consuming change data capture data because there's no abstraction layer there. You're just getting the raw data from the database and then someone's still going back to the DBA, going, what does this column mean? Right. And so that's just to get the ball rolling. I don't like, I'm not a fan of the outbox pattern long term, as a long term strategy. It's a short term strategy for me. So I like to use events natively so we get change data capture in there, we start using that data, we build abstractions over the top of that and then we start every time there's a new feature, you just start building the new topics natively into the software and we're using the command event pattern there. And so we just start building these new commands and events. Commands and events all the way through it. Here's what I would avoid.

0:23:08.3 Chris Kapp: Do not use the wrong tool for the job, right? So the biggest one I'm seeing is there are a lot of assumptions around what certain other tools can do in the space that aren't there. And I'm going to pick on RabbitMQ really hard right now if that's okay. People assume that RabbitMQ does a lot of things that it just doesn't do. And so when I see that what I mentioned earlier, that turtles architecture, there's no foundation on how to sync up the data. That's the number one prime suspect I see in that space, getting the at scale, large scale, getting the data delivery assurances in it, that is not the tool for that. If you're doing fire and forget, that's fine. And I also see it used a lot when in places where resources are created and removed all the time and it just isn't built to handle that resource creation, creating and removing a lot of topics quickly. We've had a lot better success using MemoryDB for Redis on AWS that is built to add and remove those resources. To me that's a much better pairing with Kafka where Kafka has its more permanent structured topics.

0:24:37.3 Chris Kapp: And then you pair that with something like MemoryDB for Redis that has those dynamic streams and data that can be created and have retention period around it that it works so well together.

0:24:51.3 Joseph Morais: Yeah, I've seen that quite a bit. Using Redis as like a caching layer so you can get those like high, high volume of queries but still maintain Confluent or Kafka as your system of record.

0:25:04.0 Chris Kapp: Yeah, and and not just normal Redis but MemoryDB for Redis.

0:25:08.8 Joseph Morais: Okay. [0:25:09.6] ____ MemoryDB for Redis. Correct.

0:25:11.4 Chris Kapp: Yeah, it has the partitioning built into it and it has a log backing it so that you can get persistence guarantees and that adds a new dimension to the tooling and it gets us a lot of power.

0:25:25.6 Joseph Morais: Nice pairing. I haven't explored that, but I absolutely am going to. Before I took on this role, I was our AWS partner at Seaside. I used to play around with hard tech and AWS all the time. Not as much now, but now you've given me a really good pattern to try out so I'm looking forward to that. Thank you, Chris. Can you share a specific tactic that has significantly improved the adoption and kind of rallied the excitement around building data streaming out at HS1?

0:25:54.7 Chris Kapp: For me the important thing is finding those people that will adopt early and being very data driven. Unfortunately, because we're have the data in Kafka, we can be data driven. And the other thing I find is that everyone I talk to says they're data driven and not everybody I talk to is data driven. A lot of people will use data as a weapon to justify what they're doing. And it's important just to collect the data and try to eliminate our biases and go towards what is delivering quickly. So the key is to start small, agile, iterative build something very small with the people that are excited about it, willing to learn things and if it doesn't work, then be agile and adjust and find the thing that does work and then use the be data driven following that channel and start spreading, radiating that information out to people to show that oh, this is something that's working. Let's go this new way. It's working better than this other thing we're doing. These problems that you're having, look over here, it is working. And some people can be really stubborn about not moving on to new ways of solving.

0:27:15.0 Joseph Morais: Been through that. I've absolutely been through that. You know, showing being able to alleviate pain when after you build something that's a tactic. I think that works in almost area, any area of our lives. But I really like the idea of going and finding those people that you know will help you early adopt. And then you inspired me. I started to think of like a field of dreams type scenario where you're like build it and they will stream. And that's totally where my mind went with that. So I think that's like, that's a really good topic.

0:27:38.1 Chris Kapp: You're going to have to market that one. That one's good.

0:27:40.9 Joseph Morais: Build it and they will stream. Okay, I will write that down somewhere. I'm going to have to communicate that. So how does HS1 evaluate and select new tools for your data stack? And how are you balancing factors like scalability, cost and integration with existing systems?

0:27:56.4 Chris Kapp: Using what we call the Golden Path solution. So we pick out the set of tools that we feel will help us moving into the future, because we can't be polyglot, we can't support everything. It's not cost effective to support every possible type of tool. It's too hard to hire for that. So we picked the Golden Path. But we don't want to be restrictive. So if someone has an exception, they can come to architect. And we have architecture and we have a lightweight way of approving that and saying, oh, I can see why you need that, but we'll let you do that. But it's contained. Don't everybody use that? We have kind of an unusual problem in that I mentioned how we over the last 30 years have been buying up dental companies all around the globe. Well...

0:28:47.0 Joseph Morais: Each one of those creates a new data set for you guys too.

0:28:50.5 Chris Kapp: A new data set and a new technology footprint, because they, when, when our vendors ask what, what contracts do you have in this area? All of them is the answer. We have all of the contracts with every company. So we have all the technologies. And so trying to funnel into the ones that are effective is really difficult. What we found is we had kind of a clumping that in Europe and New Zealand and Australia we had a lot of Dotnet and in the US we had more Java. And so we kind of stuck with that and said, okay, just stick in your path and kind of work with that and kind of narrow down on these technology paths that have some commonality to them. And then we rely a lot on domain driven design. So what we do is we took a step back and we looked at the events as they flowed across the entire system and how they interact with each other. And when you do that, you get to a point where you can visually see, well, there's a clump right here and there's a clump right there and they have low interactions with the other team.

0:30:06.3 Chris Kapp: And so it's easy to pull off a piece and say you are a team. Those few places where you're connected to the other teams, lock that down with contracts and make that solid, absolutely solid. And so what that does is it allows that team to be able to attack the problems without dependence on other teams. They don't have to worry about what the other teams are doing. I want to solve this problem. I have the Kafka data and I have my use case and I own that whole area and I will solve that problem all the way from the beginning to the end. And I'm responsible for the shift left of data to make sure that by the time it leaves my area, it is completely scrubbed, locked down, has all the data SLAs in place so that it's ready to consume, so that somebody downstream doesn't have to try to scrub it a second or third time.

0:31:02.3 Joseph Morais: That's, man, that's a really impressive playbook, I got to say, Chris. I mean, especially for any enterprise that's dealing with sharded data everywhere and also any company that's dealing with M&A. I mean, every time it's brand new data and you said a whole new tech stack. But if at least you could find the common patterns that you know are going to work and you can bring that adoption across and you have your data figured out, you're pretty close to kind of covering the big pain points. So let's go into our, our next segment here. So we've covered the tech tools and tactics, but let's face it, none of that matters if your leadership isn't on board. So next let's dive into the art of getting leadership, buy in as it pertains to data streaming. So how did you convince leadership at HS1 to get on board with your solution? Was it smooth sailing or was it a bit of a roller coaster?

0:31:55.0 Chris Kapp: Oh, dude, you're gonna make me out my grievances here.

0:32:00.7 Joseph Morais: I told you at the start this was going to be therapeutic, but. [laughter.

0:32:04.5 Chris Kapp: Yeah. I should be laying down on the couch and telling you my problems. What we found is the C levels are seeing the pain that we need to fix this thing. At the engineering level, we didn't come up with this idea of one platform, that was coming down from the board and the executives. And so we're able to see that vision and we're able to see down in the trenches. This is the problem we're facing and we need to solve it this way. How do we tie those together and say, this is how you solve your problem as a C level. And once you can tie that, all of a sudden you're getting money, you're getting roadblocks cleared for you and busted out of the way and resources brought in as you need them. To me, that's the key, is to find out how to align those two problems.

0:32:56.0 Joseph Morais: Yeah, right. Take those business challenges, align the technology, get the adoption. I like it. So we cracked open your playbook, unpacked the tools and tactics, and we've explored some real world wins with data streaming. Now let's shift gears and dive into the real hard hitting content. Our data streaming meme of the week.

0:33:27.4 Chris Kapp: Yeah, so that's the third. There's a basis in truth here, right? Like you, you come in and you're finding data in your poison pills or this schema didn't promote up to production quite like you thought it would. And you have to solve that. And people were like, I thought we signed up for this because it was supposed to ease our pain. And you're like, it is, it is, but we gotta fix that. And then you become an apologist for that. So yeah, it's in the long run, it's worth it. But like, there's some pain points you gotta work through that is relatable.

0:34:15.3 Joseph Morais: So, Chris, before we let you go, we're gonna do a lightning wave around bite sized questions, bite sized answers, like hot takes, but schema backed and serialized. Are you ready?

0:34:24.1 Chris Kapp: Huh?

0:34:24.8 Joseph Morais: All right. What is something you hate about it?

0:34:28.8 Chris Kapp: It's all about talking, convincing people to move into new ways of doing things, right. And so getting over that initial hump of I have my way of doing it. I have my way of doing it. How do we work together and collaborate? Oh, that's the pain point. Right. Once you're collaborating, all of a sudden everything's awesome.

0:34:47.3 Joseph Morais: Yeah. I've dealt with every level of people that were like mainframe operators for 30 years and they wanted to. This company wanted to convert them into DevOps engineers and learn infrastructure as code and Jenkins and it was hard. I hated that.

0:35:03.7 Chris Kapp: Don't move my cheese. [laughter]

0:35:05.1 Joseph Morais: What's a hobby you enjoy that helps you think differently about working with data across a large enterprise to.

0:35:13.5 Chris Kapp: So I really enjoy rock climbing. I'm not claiming I'm good at it, I really enjoy it and to me it's about the problem solving. So as you're going through the rock climbing, your body is screaming at you that you're about to fall, you're in extreme pain, your fingers are slipping and at that same moment you're trying to think through the problem. If I counter my weight this way, if I dip my head this way, I can get grip on that one. And to me that's very related. Right. Our day to day is chaos. Right. We've got conflicting requirements, we've got people trying to shove way too much whip into our backlog and trying to solve the problem in a way that's scalable. And so we're trying to ignore this external signals that everything is falling apart and we go into our inner Zen and we think through the problem and we take the time it needs to build it right. And then you don't just win this battle, you win the next three.

0:36:21.0 Joseph Morais: I love that. I got to tell you though, I feel like I'm going to fall and my body is screaming at me. It's quite the endorsement for rock climbing. So you already dropped one and you may just want to plug it again. But can you name a book or resource that has influenced your approach to building event driven architecture or implementing data streams?

0:36:39.8 Chris Kapp: So I would say Kleppmann's Data Intensive Systems is a must read for anyone in this environment. There's so many insights in there about understanding eventual consistency and embracing that. And it's not the enemy. And thinking through what he calls time travel, the data arriving at different times to different places, not the way you wanted it to. And as we get into these global systems we used to... It used to be okay to have your software run in just one dental office and that's all you had to worry about. But these days the data is moving all over the globe and you're bound to the speed of light. And to get your systems to scale correctly you have to acknowledge that and embrace the tools that take it, understand the trade offs and give you the power that you need.

0:37:46.2 Joseph Morais: Man, that's two really good suggestions. You've given people a lot of ammunition to build some great event driven architecture.

0:37:53.4 Chris Kapp: That one is not for the faint of heart. That one is... That one is the. That is the full battleground book.

0:38:01.1 Joseph Morais: Start with Ben's book.

0:38:02.6 Chris Kapp: Yeah, start with Ben's. [laughter]

0:38:05.7 Joseph Morais: So what is your advice for a first time chief data officer or somebody with a very impressive equivalent title?

0:38:15.1 Chris Kapp: I would say, and this is not going to be very original but it, this was the theme of the Confluent conference that I just attended down in Austin and over and over and over again they were talking about shift left on your data. Shift left on your data. And that ties into the idea of the data mesh. It's built on that same principle I mentioned, domain driven design. That's the transactional data version of those same concepts. Get the teams the boundaries to own the data and make sure that they own the API as they share that data. And that data is meeting the SLAs. If you get that, everything else is easy after that, that's the hard part. And getting people convinced that they need to fix that is not easy.

0:39:09.2 Joseph Morais: Yeah, that's really good advice. And just for anyone who is kind of new to data streaming, Shifting left, the idea is to move your stream processing as close as you can to the source. A lot of times we do things that look more like ELT, but Shift left is more about ETL and getting that stream processing as close to your source to one, ensure that the data is quality as it flows through your data streaming system. And it has a bunch of advantages. One, that the quality data is already upstream. It reduces duplicate levels of processing, as Chris mentioned. And it allows these things that we call data products, like high quality pieces of data to be used operationally, not just analytically. So thank you for bringing that up, Chris.

0:39:53.8 Chris Kapp: So the other idea there, like the counterpoint to that, I see all the time people saying, oh, look at this tool. It's able to grab data out of like Redshift and connect it to this thing here and so...

0:40:06.9 Joseph Morais: Reverse ETL.

0:40:08.4 Chris Kapp: Yeah, yeah, let's just have everyone go to that place. Okay, well now you have a single team that has to be responsible for all the data in the entire system. That doesn't work, that does not scale in any sense of the word. You have to shift left and have the teams that produce the data own the data and the quality of the data. Because otherwise you've got a team that's trying to drill into this specific data spot and they don't have the time to figure out what this piece of data means. If you have a team that all they do is I'm building the accounting ledger and I'm going to do event carried, state, event sourcing on that accounting ledger. And they know this is what that means in the United States, this is what it means in Europe, this is what it means in New Zealand. And they know that narrow piece of the data inside and out and they know how to make that quality. If you get somebody that's in one of these centralized teams that's trying to own all the data, they can't know all that. Yeah, it's too much. They can't know it at that level of detail.

0:41:20.5 Chris Kapp: And they end up just being the middleman of, of going, oh, I got my piece of data. Did you get it? No. Did you get it? Oh. And then they're... It just doesn't scale.

0:41:31.9 Joseph Morais: Yeah. There's no chain of custody. Nobody knows what anything is. It's. And this is a problem that a lot of people face. I'm glad you called that out. Chris, I've learned so much from you today. I mean, I can't thank you enough. Are there any final thoughts or anything you have to plug before we let you go?

0:41:47.5 Chris Kapp: I'll just skip right to the plugs and say, check out Dentrix and Ascend or dental software packages. And there's a lot of amazing stuff on the future that it's going to be exciting to see what happens.

0:42:03.5 Joseph Morais: Yeah, I mean, you guys are building it the right way, so I think that is definitely a great plug. Well, thank you so much for joining me today, Chris, and for the audience, stick around because after this I'm going to give you my top three takeaways in two minutes. Man, that was a great conversation with Chris. Here are my top three takeaways. So when I think about one piece of info that I want everyone to kind of take away from, today's episode was really this concept of abstracting your data away or your architecture so that you can partition at the abstracted layer. Right. So Chris was talking about how it's often that companies, large enterprise will shard or partition things by just creating another cluster or standing up another set of microservices and all those backends and how hard that is to scale. But having Something like Confluent and having this abstracted layer where you can do the partitioning and decouple the data from the hardware seems really powerful for any enterprise right now and kind of proofing yourself for the future. What I was really surprised about because I never thought about this before with this concept of return on investment for retention.

0:43:17.6 Joseph Morais: So Chris talked about having topics that had basically infinite retention so they turn off retention so the data stays there forever. But they found interesting ways and particularly called out with the audit logging of re-accruing the cost of storing that data with the cost of having to build something brand new and bespoke to get the data that you need. So sometimes it's much better to just kind of store everything and make it easy to get the data when you need it as opposed to saying maintaining this really complicated state that would be very hard to retrieve what you need from it. Very interesting concept that I had never thought about before. This idea of public topics, right, where you can take really sensitive data and through the ability and mechanisms to stream processing you can kind of take out all the scary things, the PII, the critical sensitive health related information and just put the data that is interesting for the analytics and then make that consumable. And we have this thing that we see here a lot at Confluent where you know, this networking effect where once you make data available, more and more people subscribe to it and then ultimately you get more new and interesting use cases.

0:44:24.7 Joseph Morais: And then Chris also called out the latest version pattern where you take a compacted topic. Now we're getting into some of the nitty gritty of how data streaming works. But you take a compacted topic and you put your data there and you also have a full topic that doesn't use compaction so that you can either grab all the data if you need it or just the latest really powerful combination. That's it for this episode of Life Is But A Stream. Thanks again to Chris for joining us and thank you for tuning in. As always, we're brought to you by Confluent. The Confluent data streaming platform is the data advantage every organization needs to innovate today and win tomorrow. Your unified solution to stream, connect, process and govern your data starts at Confluent.io. If you'd like to connect, connect, find me on LinkedIn. Tell a friend or coworker about us and subscribe to the show so you never miss an episode. We'll see you next time.