Life Is But A Stream

Ep 9 - Bottlenecks to Breakthroughs: How Covetrus Solved Latency with Streaming

Episode Summary

When latency blocked progress, Covetrus turned to real-time data streaming. Senior Director, Joe Pardi shares how his team tackled scalability, security, and data quality with Confluent Cloud.

Episode Notes

For enterprises managing sprawling systems and frequent M&A activity, data latency isn’t just inconvenient—it’s a blocker to business value. In this episode, Joe Pardi, Senior Director of Global Data Engineering at Covetrus, explains how his team replaced fragile data pipelines with a robust real-time data streaming architecture that enables instant decisions across the entire enterprise.

The Covetrus team started with open source Apache Kafka® and then shifted to Confluent Cloud for scalability and data governance. Joe shares the adoption has drastically reduced operational overhead, strengthened data governance, and supercharged innovation. He also emphasizes the strategic importance of data integration in post-merger scenarios and how real-time data streaming helps uncover and address data quality issues early on.

You’ll learn:

Why Covetrus moved from self-managed Kafka to Confluent Cloud
How real-time data streaming helps detect data quality issues faster
What are the requirements for scaling a data architecture across hundreds of engineers
How data streaming supports M&A integration and value creation

Whether you’re centralizing data across silos or enabling downstream systems with real-time feeds, this conversation offers a blueprint for building data streaming platforms that scale.

About the Guest:
Joe Pardi is the Senior Director of Global Data Engineering at Covetrus, a leading provider in the veterinary services industry. Based in Portland, Maine, he has held this role since January 2021, where he leads data engineering initiatives to support Covetrus' global operations.

Guest Highlight:
“The whole key with that acquisition strategy is to unlock the value. It's easy to make an acquisition go from 1+1=2, but you're really trying to make 1+1=3… If you're going to do an M&A, you have to get really good at integrating data.”

Episode Timestamps:
*(05:10) - Covetrus’ Data Streaming Strategy
*(06:40) - Data Challenges: Latency and Quality
*(26:05) - The Runbook: Tools & Tactics
*(31:15) - Data Streaming Street Cred: Improve Data Streaming Adoption
*(38:25) - Quick Bytes
*(42:15) - Joseph’s Top 3 Takeaways

Dive Deeper into Data Streaming:

Links & Resources:

Connect with Joseph: @thedatagiant
Joseph’s LinkedIn: linkedin.com/in/thedatagiant
Joe’s LinkedIn: linkedin.com/in/joepardi
Learn more at Confluent.io

Our Sponsor:
Your data shouldn’t be a problem to manage. It should be your superpower. The Confluent data streaming platform transforms organizations with trustworthy, real-time data that seamlessly spans your entire environment and powers innovation across every use case. Create smarter, deploy faster, and maximize efficiency with a true data streaming platform from the pioneers in data streaming. Learn more at confluent.io.

Episode Transcription

0:00:06.4 Joseph Morais: Welcome to Life is But a Stream, the web show for tech leaders who need real-time insights. I'm Joseph Morais, technical champion and Data Streaming evangelist here at Confluent. My goal? Helping leaders like you harness Data Streaming to drive instant analytics, enhance customer experiences, and lead innovation. Today, I'm talking to Stefan Hollinger, co-founder and CEO at Aerie. In this episode, we're talking about why Data Streaming was the perfect solution for Aerie's challenges. You'll hear how they build their products around Data Streaming and get real examples of stream processing in action. We'll cover things like agentic AI and Data Streaming working together to enable business users, how data governance and shifting left can empower AI agents, and much more. But first, a quick word from our sponsor.

0:00:50.5 Announcer: Your data shouldn't be a problem to manage. It should be your superpower. The Confluent Data Streaming platform transforms organizations with trustworthy, real-time data that seamlessly spans your entire environment and powers innovation across every use case. Create smarter, deploy faster, and maximize efficiency with the true Data Streaming platform from the pioneers in Data Streaming.

0:01:19.3 Joseph Morais: Joining me now is Joe Pardi, Senior Director of Data Engineering at Covetrus. How are you today, Joe?

0:01:24.4 Joe Pardi: I'm doing great. How about you?

0:01:26.9 Joseph Morais: I'm doing well. Thank you so much for asking. I appreciate that. You know, I ask everyone how they're doing. I don't always get it back, but I never take it personally either. So let's jump right into it. What do you and your team do at Covetrus?

0:01:40.0 Joe Pardi: Well, I'm in charge of data engineering. And so I sit with the other data engineering teams, but we have mainly a focus around a line of business orientation around some of the animal health data that we have with our customer base. And as you probably know, with Covetrus, we're centered around the veterinarian or animal health industry. And a lot of that data sits with veterinarians and practices, veterinary practices and such. So yeah, we largely are caretakers for all of that data and help unlock its value, whether that's internally or for customers.

0:02:15.4 Joseph Morais: Well, let's expand on that. So tell me more about what Covetrus does and specifically who your customers are and who your customers are not.

0:02:22.0 Joe Pardi: Yeah, so Covetrus is a prominent animal health technology provider. And so we have essentially like three flagship products or services. And so we offer prescription management capabilities. Think of that as like an online storefront where you can kind of a la Amazon style, get your prescriptions in clinic and follow up with that e-commerce kind of way and get that shipped and delivered to your house for your pets and such. We have a practice management software. So think of that as like an ERP for allowing a practice, small practices, which we call individual practices or a larger corporate, and they need to run their business, manage their business. And that's kind of the operating system, if you will, that allows them to do that. And then we do distribution. And so distribution is kind of like the Amazon Prime of supplies that animal health companies may need to run their business, things like gloves and gauze, scissors, things of that nature.

0:03:21.2 Joseph Morais: That's great. So it sounds like for the people that take care of our pets, those providers, Covetrus is making it easier for pretty much all of the challenges they have, whether it's managing their customers or supply chain, Covetrus has solutions for all of that.

0:03:37.9 Joe Pardi: Yeah. And at the center of it is practices. So we call ourselves a practice improvement company or platform, and that's the center of our universe.

0:03:47.8 Joseph Morais: That's amazing. So now at a very high level, what is your Data Streaming strategy? In just a minute or two, we'll expand on it as we get through the episode today.

0:03:57.2 Joe Pardi: I would say our Data Streaming strategy was largely focused around really two aspects of it, the technology strategy part of that, but it's also linked to our business strategy. We're a company, a pretty large company, we're approaching 5 billion on the revenue side. And part of that is mergers and acquisitions. And anytime you're in the M&A space where you're merging or acquiring with companies, you're bringing in disparate systems, systems that were not engineered to work together.

0:04:24.8 Joe Pardi: And so the whole key with that acquisition strategy is to unlock the value. It's easy to make a company on acquisition kind of go from one plus one to equal two, but you're really trying to make one plus one equal three. In order to do that, you've got to unlock the potential of that product or company. You want to bring their customers to your products, bring your products to their customers, and integrating data is a fundamental tenant to be able to do that. If you're going to do M&A, you've just got to get really good at integrating data. And then the second part of the strategy is we offer individual products, as I mentioned, but we also offer it up as a suite. And so we have a product called Vet Suite whereby if you think of it as like the Microsoft Office, you get your Word and Excel and PowerPoint, et cetera, you're also good with our products. If you kind of bundle those together, we give some discounts around tiered pricing and things of that nature. And in order for those products to work seamlessly together as a suite, we have to have a good integrated data that's able to take the friction out of using those products together.

0:05:26.3 Joseph Morais: That's fantastic. I really love what you said, and I'm going to bring it up later because I do my key takeaways, but that one plus one equaling three, that's really important, right? Because it's not just about bridging the data together. It's about, hey, we just acquired this company that we should synergize. We should be doing things like, let's take their strengths, let's take our strengths, and let's do something net new that is where the sum is greater, or the whole is greater than the sum of the parts. So I really like that. And it's interesting, Joe, because I work with a lot of startups, right? And they're never worried about that challenge. They're like, hey, we're all on one cloud. We know what we're doing. And I always think to them, I'm like, you got to think bigger. You're going to be successful, and eventually you're going to buy a company, and then you're going to have these challenges. So you got to really think ahead of what the future state might look at. So I'm glad you already figured that out.

0:06:16.1 Joe Pardi: Yeah, the deal close is just the start of the journey. That's where the fun begins.

0:06:30.9 Joseph Morais: Yeah. All right, so we set the stage. So let's dig deeper into your Data Streaming journey in our first segment. So before implementing Data Streaming, what were your biggest data challenges at Covetrus?

0:06:41.1 Joe Pardi: Well, clearly latency is always a challenge. There's a physics aspect of just moving bits and bytes across a network and getting it to arrive from point A to point B. And so latency, it's almost as if the original premise is having that data integrate within a day is generally the accepted kind of paradigm. But when I look at that, I don't want latency to be, I don't want it to get in my way. And so what I'd like to do is solve that problem, solve it once. And so Kafka in particular is just a really good way to deliver data integrated across your systems, even your cloud providers, whereby I don't worry about that anymore. The data and the technology is so good at what it does, that problem's almost been solved. I just wanted to solve that wholesale, get it behind me and focus on other important things. And the second is data quality. And data quality, I won't say that solves because data quality is always going to be at your forefront. But the sooner that data can come together, the quicker it kind of shines its warts. It's like you find those warts quicker. That's kind of a good thing because you can knock those problems out and get them behind you.

0:07:50.1 Joe Pardi: So as I would say would be the two main problems.

0:07:54.1 Joseph Morais: That's great, right? Yeah, when your data is streaming, it fails faster. So you can figure out where the defects are.

0:08:00.2 Joe Pardi: That's for sure.

0:08:01.4 Joseph Morais: Yeah. So I'm curious, how did these challenges impact your customer experiences or day-to-day operations?

0:08:08.7 Joe Pardi: So in my world, I live a little bit on a large stakeholder that are internal customers to me, folks inside my business, and then of course, external customers. And for internal customers, driving those latencies down, it's just when you have analytics to provide, you have lots of data analysts, folks that use tools like Tableau and Power BI, and they want to see data quicker so that they can make those actionable quicker. And then for external customers, as I said, getting systems to remove the friction in the customer experience itself, by having data together quicker, in sync quicker, you don't worry about those friction, the things that do provide friction. So a good example of that is, let's say you had a product catalog, that's mastered in a particular system that you may have, maybe it's locked up in your ERP, but that product catalog is offered in an e-commerce store or in-clinic products of that nature. And you want those products all in sync. You want your catalog all in sync so that someone doesn't order something that's no longer offered in your catalog, in your e-commerce site. So having data in sync as fast as possible removes all that friction that basically can arrive, can evolve.

0:09:24.1 Joseph Morais: Yeah, it's like who doesn't want things faster and easier? Of course, those were the challenges, but it's great that data stream was able to address those, get you that higher quality, more accurate data in a timely manner. So what ultimately led Covetrus to Data Streaming? Was there a specific tipping point where you said, no, we absolutely need to implement this, let's go through the pain now because our future growth is being stifled?

0:09:47.1 Joe Pardi: I don't think it was any one thing. I think some of it was a previous CTO who had asked me to join the company, wanted to get a data platform up, a kind of modern data platform. And it gave us an opportunity to kind of think from the ground up. And just having had previous experience, I've done quite a number of years in the software side and more recently in the data side, I came to the epiphany myself personally to say you can really use this technology on two facets. You can use it to integrate your data between your microservices, which is more prevalent in your software and microservices space. But you also can hang some of your pipelines off those same Kafka topics to listen to data as it gets democratized on your network. And that's also a really good feed or a tap into the data so that you can bring it into your warehouse such that you can solve two problems at once. Part one of that for us was to do it on the data side just because I was a bit of an advocate to bring in Kafka into the company. We had already had open source Kafka, so it's not like we started completely from scratch, but we just weren't doing it in a wholesale scalable kind of way. It was really about taking some of the green shoots we had and just carrying that into and maturing that into something that we could scale across the enterprise.

0:11:07.3 Joseph Morais: That's great. So you started with the most classic use case, right? And that's like decoupling microservices, which I think is probably where everyone kind of discovers, not everyone, but most people kind of discover it that way. And then you realize, oh, they have these things like connectors. They have stream processing. So we can use this for intelligent pipelines. And then you can start to build the usability. And the other thing you touched on is really democratizing data, right? Like I've run into situations with customers where like they have data in a mainframe and reading from a mainframe is very expensive and very costly. So what they'll do is they'll set up a connector and they'll expose that data into Kafka topics. And now anyone can read from those topics because the thing is built to handle as much throughput as you want. So you can kind of save yourself some money offload. And that doesn't just have to be a mainframe, right? That could be a legacy database. It could be really anything. But having that information substrate becomes so important. And I think as you start building new things, it gets even more important.

0:12:02.7 Joseph Morais: You know, you talked kind of to the requirements and to some of the improvements, but what specifically have you built or are currently building with Data Streaming?

0:12:11.2 Joe Pardi: Yeah, we started on our data platform, on the Snowflake platform. And there's a connector, a Kafka connector that Snowflake provides that can allow you to connect in and absorb that data or ingest it into Snowflake. But there was a bit of a constraint with that particular connector, and it is open source, which is nice. But we took that, we forked it into our own version of a connector because we wanted to drive latencies even to the tightest latency we possibly could. And so that connector is really fabulous. It's resilient, works very well, but the data dies in a staging table kind of within Snowflake. So we wanted to go one step further and enhance that particular piece of software to, well, okay, the data finds itself into a staging table, but then we want to trigger an actual flow that will process that data as soon as it arrives. And that's whether it arrives as a single record or whether it arrives as, say, a micro batch of, say, 1,000 records. Either way, it's going to get processed. And the nice thing about that framework that we enhanced is we have total control over the latency. I can bring that data in one second. I can bring that data in 10 hours or at the end of the day. I control that, not the technology.

0:13:26.4 Joseph Morais: That's fantastic. And kind of what you're alluding to is something that, or at least tangentially related to something we talk about at Confluent a lot called shifting left, where a lot of times you're building these great data products in your analytical state, which is great. But those high-quality pieces of data that you build in your analytical state can sometimes help your operational state or can be utilized in your operational state. So shifting some of that processing closer to the source can really benefit. And also, I've seen this quite a few times where someone will build a data product in their analytics state, reverse ETL it back to Kafka, and then make it available to other things. And it's like, well, why not just do that in the beginning and then send it wherever you need it, right? So you can avoid that redundant loop. But I'm curious, what you build, which is very impressive, by the way, was there or is there a specific KPI that was tied to this? I mean, it sounds like it was latency and time-bound, but were there any other key performance indicators that were like, you know what, we nailed this, or this is what we're going to need to nail this?

0:14:32.8 Joe Pardi: Yeah, I wouldn't say it's a formal KPI, but we never want to drop any data off the floor. I have both operational use cases and analytical use cases. The beauty of an operational use case is that you can't miss orders. If orders are placed, it's not like those can just fall off the earth. I mean, we have to account for every single order, not in terms of just shipping it and fulfilling it, but also in terms of the financial things that wrap around that in terms of invoicing and things like payment and things of that nature. So that's a, it's almost an implied KPI is that we never drop any data and that doesn't come for free. That's a process of, yeah, it arrives, it may arrive, but for whatever reason we lost it. But we can always... We keep our topics, at least our transient ones, for seven days. We can always go back and replay some of those data streams and replay those and maybe process something that we may have missed.

0:15:23.5 Joseph Morais: Yeah, no, I like that. I really never drop any data. I think that's why data streaming was adopted so early on with financial services institutions for that very reason, right? Like you can't, if a deposit happens, you can't miss it. If a withdrawal happens, you can't miss it. So having something that is durable, that if you did have some type of interruption that you can replay and even can be idempotent, very powerful features of data streaming. You know, kind of moving up the, the DSP stack. We talk about the Data Streaming Platform first and we talked about data data streaming, but the next piece is really kind of stream processing and integration. So I'm curious, what outcomes have you seen either with processing right in the middle of your data streams or using connectors. You already alluded to Snowflake. So I'm curious if there were other scenarios that you've integrated data streaming with.

0:16:09.7 Joe Pardi: The two main ones are integrating data between the microservices and so Teams, the software teams themselves, those that manage the microservices, they've built up a data acumen that you otherwise would not build up. And so they participate in that ecosystem, that democratization process itself. The other is that I just don't worry about latency. So from an outcome perspective, I don't focus on ingestion almost anymore. It's a very, very minor part of my team's day. We have a no code approach basically to data ingestion and so almost all of our focus is around curating the data and then integrating it back to those that care about it. It may sound odd, but almost every piece of data that we ingest is coming off a consto topic. We by and large, it's very rare for us not to absorb data or ingest data that doesn't come on from a Kafka topic. So we're primarily stream first.

0:17:05.3 Joseph Morais: Awesome. Now again, I know you called out Snowflake, but I'm curious, are there any other cloud service provider ISV services that are integrated with your data streaming today?

0:17:15.6 Joe Pardi: Yeah, the microservice teams, we by and large have embraced MongoDB and they have an Atlas platform as a data platform and they're amongst the microservice, they stream between the microservices, databases and such that they have in Mongo itself. And so whether we're delivering something from Snowflake back into Mongo or from Mongo, say into Snowflake, that's all coming through the Kafka platform, the Confluent cloud platform in particular.

0:17:44.0 Joseph Morais: Fantastic. So I know you've already touched on this, but I want to dive deeper into it. How do you approach data governance?

0:17:51.8 Joe Pardi: A pretty big topic, but essentially we actually, we've benefited from the fact that we have a set of very experienced architects and data professionals that I guess I'd just say they know the fundamentals of handling and processing data, whereby it's just kind of ingrained into our DNA. You combine that with your cybersecurity policies and such. In the early days, we, as some on my team helped craft some of the policies, some of the important aspects like data classification policies and such. And so those were co-opted with us in the cybersecurity team. I personally, I came from a highly regulated space between the payment and banking segment. And those are much, much more aggressive with both compliance, legal and auditing aspects, security in general. So much of that is just just par for the course for me and several of my coworkers. And even though we used to be a public company, we've since gone private. So we still practice Sarbanes-Oxley rules and such, even though we're not public anymore. And we have a lot of data in animal health.

0:19:03.7 Joe Pardi: And so even though I mentioned the fact that we democratize quite a bit of that data, it's still based on least privilege policy. So, and we may get into this a little bit later, but the aspect of having enterprise grade Kafka, which is, I can put ACLs security rules on top of the topic so that only those folks that need access or require access as part of their job, see the data in the topics. Well, that's just based on the principle of least privilege.

0:19:30.6 Joseph Morais: That's great. And yeah, I am going to ask you more about specifically Confluent. So save some of those GC bits for then. But as a follow-up to that question, how important, and by the way, I think your approach is the right one, like data governance should just be table stakes, right? How important is tracking and enforcing the flow of quality data as it enters your system?

0:19:51.7 Joe Pardi: It's very important. So lineage no data lives, it almost always goes beyond its original source system and it flows, as you know, it flows from one system to another, to a warehouse, to a mart, to someone's Tableau environment. And then, heck no sometimes people print it out, right? Things like that. And that data if you don't know where the source is and the lineage of that data, as it goes from all the way from point A all the way to point Z, you're going to get asked sooner or later that why does this particular number show up on this particular dashboard and what makes that number? And so that's where your data analyst community are confront and center that they've got to know everything about that field, not only how it flows into the data warehouse, but where did it come from and what are the rules in which are wrapped around that so that you can answer those types of questions.

0:20:43.3 Joseph Morais: That's great. And then even the other piece in the data portal, you have metadata about your data, right? So just the fact that you can track and say, what was the source of data? What team manages this? What is this flow about? All of that is really nice value add. So let's talk about data retention. How long do you need to hold on to operational data at Covetrus and is it driven by compliance?

0:21:03.4 Joe Pardi: It's largely, at its core it is. Legally, we're obligated to retain operational data for seven years, obviously because you have to respond to lawsuits and such. We also comply with CCPA. So as part of that, a consumer has the right to know, so to speak, where they want to know what types of data that you're housing on their behalf, and then also the right to forget. So if they want you to say opt out of marketing programs, things of that nature, that all is tracked. We have to track all of that. The rest of it is largely based, the retention is based on a business need. And we do have those. If you think about the life cycle or the life of a dog, for example, some dogs live to 15 and 20 years old. Some animals, because we do more than just domesticated pets, some live to 20, 30, 40 years old. If you go beyond some of the operational aspects of that and you go into clinical decision science, for example, it's just a whole another area that we're interested in.

0:22:07.5 Joe Pardi: That's where you want to look at the life cycle of a pet to understand, well, what's happened on that breed or species and what's that general population of animals in that breed or species? What do they look like? The demographic aspects of that, that may dictate from a business perspective why you'd want to keep something beyond the seven-year mark.

0:22:27.0 Joseph Morais: Yeah, and you mentioned also having that retention to be able to replay things if there's something going wrong. Really important. Obviously, you don't need seven years of that, but using the data in the way that you're using it is very interesting, especially tracking particular animals or perhaps breeds of the animals and how they respond. Just kind of doing that discovery in the data is really impressive and where I really think, obviously, the direction of data engineering is going, but it's just interesting to see it in this particular vertical. I bet you've never expected this question, but what is the future of Data Streaming and AI at Covetrus?

0:22:58.8 Joe Pardi: We've had an element of AI in place for a while. We've largely focused almost from the ground up, if you think about it that way, where you have to have a good bedrock of clean data in order to unlock some use cases, and so we've always been in the normalization game. The animal health space is a little different from the human health domain in the sense that that's more standard. The human health is more standardized, largely because of insurance regulations and such, and animal health is a little different. The data's not quite as mature, and so we spent quite a long time on normalizing data between breeds and species and medical notes, vaccinations, things of that nature, and now we're starting to really unlock some of the Gen AI types of use cases that are more or less being introduced into our flagship practice management system called Pulse. So within there, a vet or a vet tech spends quite a bit of time documenting things. So one example is, let's say you book an appointment to bring your pet into the clinic, and that vet does not want to walk into that appointment cold.

0:24:04.6 Joe Pardi: So they're going to read your medical notes that are tied to your pet, but with Gen AI, we can summarize that for the vet or vet tech such that they can just read the summary part of that, and that saves them time. The same is on the flip side, where they have to document what transpired in that appointment, and the Gen AI capabilities of our Pulse product can help do that real quickly. We have a capability called ambient listening. So with permission of the pet parent, the AI engine can kind of sit there and listen in and provide some summary of what's occurring in the conversations and such. And if it hears, for example, that a prescription might be warranted, that may trigger a workflow so that the vet or vet tech can initiate a prescription on behalf of the pet.

0:24:50.7 Joseph Morais: So I actually experienced that now in my own medical care. With opt-in, doctors have asked me, can my laptop record this conversation? I have AI that's going to summarize today's visit. And of course, I'm like, yeah, let's do it. Of course. It's good because you think about, if you've been with a doctor for many years, you could have literally pages and pages of notes, but having something that can instantly go over those notes and maybe beyond just proofing it or summarizing it, it could actually maybe detect a pattern. Like something like, hey, they mentioned this symptom three years ago. Maybe, and it's recurring, and I noticed every two years it's coming up or something like that. The type of insights that all this unbounded data can give through the uses of AI is kind of amazing. And kind of just calling back to something I mentioned earlier, but Data Streaming really helps you future-proof your architecture, right? Because there may be some system or some brand new API that everyone's going to be using in two years. But by having your Data Streaming already, integrating with whatever new system comes out in the future is order of magnitudes easier, at least in my experience.

0:26:08.4 Joseph Morais: Next up is the runbook where we break down strategies to overcome common challenges and set your data in motion. So I know we've kind of been leading up to this question, Joe, what led Covetrus to choosing the confluent Data Streaming Platform over various open source and vendor products? And for the audience, we talk about the Data Streaming Platform quite a bit here at Confluence. So it's about data streams, stream processing, our connectors which allow you to integrate the things and then our governance package. So that whole those four things make up the Data Streaming Platform. So I'm curious why go in and partner with Confluent?

0:26:42.0 Joe Pardi: I would say it's largely centered around the aspect of an enterprise grade product. As I said, we started with open source, Apache, Kafka and you know, for the most part within a very specific use case, it worked pretty well. I would say if you fast forward a year and a half later, we weren't keeping up with the patches, we weren't upgrading the system. It's a pretty complex piece of software to run because it's very, very powerful and there's tons of settings that you want to really focus in on to performance, tune it and control the behavior of it and such. And when you then add on to the aspect of governance and the aspect of once you start scaling it across teams and across systems and such, you'll quickly realize that if you don't do good schema management, version control, security practices, all that enterprise kind of stuff that wraps on and lays on top of your foundation of data, you're not going to scale very well. And when we started to look at it that way, combined with the fact that we have only a certain number of DevOps resources, our DevOps resources they largely are very much centered around the customer facing aspects.

0:27:52.8 Joe Pardi: Those systems that bring in revenue, those are where they spend most of their time. And so we said, well let's go, let's have a managed service for us. We looked around, we looked at Amazon's product, we looked at some others and by looking at Confluent Cloud in particular, we just said okay, it's hosted by those that invented the technology. So they obviously know what they're doing and you know, they'll have all those enterprise grade type capabilities built upon there. Plus the other thing that you don't get, it kind of goes without saying is they're going to innovate, you're going to innovate on top of the platform. Whereas if we just host that ourselves, we're just going to use what's there. And so innovation is a very key aspect because this as you know, whether it's AI or streaming and such, this technology is changing very quickly and you want to stay on top of that so you can compete and offer new things to your customers.

0:28:41.1 Joseph Morais: Yeah, that's a great response. And I think a lot of people, they kind of fool themselves with initial proof of concept. They're like, oh, we could do this with open source. But then like you said, that might work for a startup. But once you're enterprise and you have hundreds of engineers and you have different levels or different systems that authenticate you or you may have hundreds of systems to log into. All of those things like OpenOff. You mentioned access control lists. You also mentioned part of our governance package, we have a thing called Lineage that allows you to visualize all the flows. All of that makes building at enterprise scale much more tenable. And frankly, why run something that's just undifferentiated heavy lifting? I mean I've always had that opinion and I used to be one of your DevOps engineers. I used to be at a startup and I was the guy that ran Kafka and this is before confluent cloud existed. So I realize and I understand that pain. So what is your top tool that you rely on for data streaming today? And outside of just Kafka, of course.

0:29:39.5 Joe Pardi: I was going to say Kafka stole my thunder. That's funny. So yeah, I mean so if you think about we follow the premise of what folks would call smart endpoints and dumb pipes and so on that at the end, each end of that dumb pipe is the ability to either produce a piece of data or to consume a piece of data. And so Python's front and center there. We obviously we multi language, we do C# and we have folks using Java as well. But you know those, those programming languages are key. You know, Avro, Avro helps us stream in a way that is responsible in a sense that we don't break things, things are backwards compatible. We have versioning in place and then Spark if we do some high volume processing, which we do do Spark helps us process millions of records off our, our topics in very limited amount of time. And Spark really marries up well to Kafka because the offset tracking that's in Kafka also parlays into a different type of tracking on where you're left off on processing in the Spark framework as well. So those go hand in hand very well.

0:30:45.8 Joseph Morais: Yeah, I couldn't agree more. We actually announced earlier this year a very strong partnership with databricks and we have a new feature that just ga'd recently called Table flow, which allows you to get your data into an object store already in parquet format and then automatically managing the metadata in the catalogs so that you can even make it easier to integrate your data streams and Spark. So that is a great list of tools. So we talked about the tech, the tools and the tactics, but none of that moves the needle without the right people behind it. Let's dive into how you got Covetrus to fully commit to data streaming. So how did you convince. Now again, I know you had some background here that kind of brought you in to do this, so you already have the credibility, but I want to dive a little bit deeper into it. How did you convince leadership to get on board with your solution? Was it smooth sailing or a bit of a roller coaster?

0:31:45.1 Joe Pardi: I wouldn't put it in the rollercoaster category. I wouldn't quite put it in smooth sailing because there's always a little rocky, a little smooth. Well, there's always a responsibility to advocate and make sure that if you're going to scale a particular technology that you do it responsibly. And you don't want to obviously surprise folks. There's a change curve to it, right? At my previous company, I ran a team that was largely integrating data based on traditional enterprise service bus. And so as I came into Covetrus, I was like, well, I know I don't want to do that because that was quite painful, to be honest. And it was not about the technology. It was about the culture wrapped around an enterprise service bus. It was more about the centralization of integration. And as I said, I'm a firm believer in the whole smart endpoints, dumb pipes paradigm. I think it has the aspect of federating the responsibility to those teams that sit at the end of the pipe. And as I said, it gets them very well versed in just the aspect of dealing with data and the rules that sit on their part of the pipe or at the end of the pipe.

0:32:50.5 Joe Pardi: And so the other part was I just took a little bit of a pulse with my own team. And we were largely using a standard ETL tool to largely do the traditional ETL paradigm, processing data, waiting for a day, that kind of thing. And they weren't so interested in the GUI-based tools that ETL provide. They wanted to drop down into a programming language where they had more control and just, quite frankly, they could exercise their skill sets even better. So we decided upon Python. Python is just dominating the data engineering space. And an easy language to use and learn and such. And so it was largely based on the things I didn't want to do. And knowing, like I said, reading the room, so to speak, or the industry about latencies and lower latencies and driving those down, it really was just a matter of, okay, well, let's do this, but let's do it in a way. So it is smooth sailing. I wanted to eat my own dog food first. So we introduced it within my own team. We built up the skill sets and the acumen. And then I started to approach the software architects to say, well the stuff we're doing, you can also do this between your microservices.

0:34:00.8 Joseph Morais: Yeah, so that's interesting. So a couple of things to touch on there. One we hear a lot here at Confluent about the Kafka networking effect. Like you get your data into Confluent or Kafka, and suddenly now new parties are interested in it, right? They're like, hey, I can subscribe to that. And I think that's why you always kind of see a fan out for like every event produced. It gets consumed X number of times. It's usually way higher consumption rate than a production rate. But I had a follow up about getting your engineers and your teams excited up to speed, but you kind of already answered that. So that's interesting. They were kind of sick of the UI based things. They're like, hey, can we get our hands in the code? And Data Streaming and Python brought that for you. So that's pretty awesome. And I think what's great, another added value of partnering with Confluent, is all the things, the extra bits, whether it's support, documentation, professional service, training materials. If you had a team that maybe didn't have that background or was thirsty to get started quickly, all of those extra pieces of referential material really help people get off the ground much faster.

0:35:04.9 Joe Pardi: Yeah, I would add to that that we don't see Confluent as a vendor, we see as a partner. And so when I have needs, when I have new needs, it's just a quick reach out to your staff and just to say, okay, well, here's something new I'm trying to solve. And some resources would be made available, etcetera, where we can discuss some of those things.

0:35:25.9 Joseph Morais: I'm obviously a bit biased, but it really does warm my heart to hear that. Because my background's always been in operations or support, and just knowing that you were happy with what we're putting out there makes me happy. So can you share a specific tactic that has significantly improved the adoption of Data Streaming at Covetrus?

0:35:43.1 Joe Pardi: Yeah, so some of it was that advocacy. So really getting myself, it starts with like one person, and it's like you said, network-affected, it spreads to at least my team. We just built up some strong skill sets there. And for me personally, I'm a bit of a bit head myself and wanted to learn right down to the bowels of what this thing was doing. And so as we approached other developers that were curious about using it, we wanted to be able to answer their questions without any doubt. And so we gave them essentially, we started to advocate for three different options to democratize that data. So I'll work in backwards. In an advanced use case, which is actually the one we prefer, we have what we call state streams, where data is published by the microservice. It's not published directly from the database per se, but it's published by the software into the Kafka topic. And they can decide best how they want to publish that. That's up to them. Another more intermediate level is where, and this is less intrusive, so if the microservices team does not want to do that themselves, we'll put a CDC-type tool against their database, change data capture, against their database.

0:36:47.9 Joe Pardi: It's not intrusive. They don't quite know that it's there, more or less. And then we'll start to ingest that data because the CDC tool we use largely puts all data on Kafka topics as a connector, essentially. And then the beginner version of that is we introduced them to something called the inbox pattern. And the inbox pattern is maybe they're not quite as savvy with Kafka. It's just like, well, that's not my thing or not ready. I say, that's okay. Just take your data. Instead of publishing it to a topic, just publish it to a single table within your database, and we'll listen to that table, and we'll pull the data from that inbox that's baked with it. They know, obviously, relational database tables and things of that nature, so just by publishing it into that, we'll be listening to that, and we pull it off that way.

0:37:33.7 Joseph Morais: You know, that's a great set of tactics. It starts with a person, it starts with one person, right? And then also being able to integrate existing things without breaking anything, right? Or causing any additional pain. That's a great way to be like, hey, we need your data, we're going to make it easy for you to onboard it to our new Data Streaming platform are you okay with that? Yes. And then suddenly that data is available, and now people are using it, and it's just, then the snowball effect just kicks off.

0:38:00.6 Joe Pardi: Those are just lessons learned over the years, my friend, because I have advocated technologies where you just kind of lay it on their lap, and they're like, whoa, don't want it, don't want it. And then I get stuck with the bill.

0:38:13.4 Joseph Morais: That's right. I've seen that happen more than once. All right, now let's shift gears and dive into real hard hitting content. The data streaming meme of the week. I thought you would appreciate this one.

0:38:42.1 Joe Pardi: I do appreciate that. I will say when it comes to, to get real specific when it comes to the brokers, we've almost had 100% uptime. And so while I have empathy for the person that's in the beam, in some respects, I don't identify with it because those days are behind me.

0:39:04.6 Joseph Morais: That's great. Me too. I used to be on page of duty constantly. I'm glad I'm that that's behind me. I empathize with everyone else who is still doing that. But truthfully, you know, the service is pretty, it's pretty rock solid. So not having to worry about that is very powerful. I thought it was good because this particular meme, because I think when this episode premieres it's going to be very close to Memorial Day. So kind of sitting similar, similar challenges for other people in the future. All right, before we let you go, we're going to do a lightning round, byte sized questions with byte sized answers. And that is byte like hot takes but schema back and serialized. Are you ready, Joe?

0:39:44.9 Joe Pardi: Absolutely.

0:39:46.1 Joseph Morais: What's something you hate about it?

0:39:49.3 Joe Pardi: Tickets. You know, tickets just drive me. They send a shiver up my spine. And then more importantly, I would say standing still. I don't like to stand still.

0:40:01.0 Joseph Morais: Oh, I can empathize and relate to both of those. What's the last piece of media you streamed?

0:40:07.2 Joe Pardi: I would say on the tech side I'm a YouTube junkie. And so I've been binge watching Agentic AI videos and just kind of seeing what folks are doing with that. Other than that, I just, my wife and I just finished binge watching Severance. And so that was just like, that was incredible.

0:40:23.3 Joseph Morais: Nice. I got to catch up on season two. I haven't done it yet. What's a hobby you enjoy that helps you think differently about working with data across a large enterprise?

0:40:33.0 Joe Pardi: So this may sound strange, but for me, time away from the tech helps me strategize around tech. So I'm a bit of a person that likes to focus on plants garden and such, shrubbery, that type of stuff. And so the time away from tech allows me to think and get some think time so that when I go hit my tech challenges and problems, I'm fresh.

0:40:56.9 Joseph Morais: I like it. Can you name a book or resource that has influenced your approach to building a venture of an architecture or implementing Data Streaming?

0:41:05.0 Joe Pardi: So for the official book, yeah, there's a really good book by Ben Stopford, I believe his name is, who is just, it's all about designing event driven systems, I think is what it's called. And that book really lays the foundation, particularly with a Kafka focus to to bring the paradigm forward. You're going to, if you don't read a book like that and examine what you're doing before you get in, you're gonna fall into some of the pitfalls that do exist.

0:41:35.1 Joseph Morais: Right.

0:41:35.8 Joe Pardi: You know, personally, I'm a bit of a disciple from Martin Fowler. His work is excellent. Chris Richardson. These are all SOA guys that they have the beauty of doing this a second time and all the things that they've done in the SOA space to carry forward in the microservice space.

0:41:53.6 Joseph Morais: I was fortunate enough to work with Ben while he was here at Confluent. And that is not the first time that book has come up. So learn from the things Ben has in his book. Do not make those mistakes yourself. What's your advice for a first-time chief data officer or someone else with an equivalent impressive title like that?

0:42:10.9 Joe Pardi: I think it's around at the beginning when you come in, just to focus on the fundamentals of the business. You take inventory of the landscape. In a meaningfully sized company, there's always going to be some some issues lurking in the shadows. And focus on some of those fundamentals. And then as you get a little bit more proactive, then to work with your business leaders, essentially, to partner on how do I take that data that we do have? Because data is an asset. But how do I unlock its business value? Because there's many different ways to do that. And you really got to be in tune with your business counterparts to figure that out.

0:42:46.1 Joseph Morais: That's excellent. Excellent advice. Any final thoughts or anything to plug, Joe?

0:42:51.0 Joe Pardi: No, I just want to plug my own company. You know, Covetrus is a really great company to work for. Great, great culture. Love working in the animal health business. I think the empathetic aspect of animal owners and pet parents and such. And I'm just proud of my company and what we do.

0:43:08.8 Joseph Morais: Yeah, there's nothing better than enjoying your job and also working at a place where you can appreciate the mission statement. So I'm glad that we both share that. So thank you so much for joining me today, Joe. And stick around for the audience, because after this, I'm going to give you my top three takeaways in two minutes. What a fantastic conversation with Joe. And what a great name. I'm a little biased. We shared the same first name. Let's go through them. The first one that stood out to me is one plus one should equal three. So Joe was talking about M&A, mergers and acquisitions, and that when you bridge the data between two companies, it's not just about one and one equaling two, because you're just doubling your data. But what you really want to do is find overlaps and synergies so that the systems that are working really well for the acquired company and the systems that are working really well for your company should work well together. And that synergy should not just equal the same amount, right? The whole should be more than the sum of the parts.

0:44:16.3 Joseph Morais: And I think that's a really good piece to keep in mind when you're thinking about integrating data and mergers and acquisitions as it pertains to Data Streaming. And the other thing we talked about with Joe that really stood out to me is how to get started with the adoption of Data Streaming. Joe's response was it starts with one person, right? It's all about having someone that advocates for it, whether you just read a brand new book, like from Ben Stofford, about building adventure and services, and you're really hype and you want to share that with everyone, or you have past experiences. It really starts with that passion from one person. And Joe kind of gave us the roadmap. Start with just decoupling microservices. This is the classic use case for Data Streaming is all about decoupling, decoupling events, decoupling microservices, even decoupling data sources and producers and consumers. And then you can kind of up-level that by building pipelines. That's the next step. And then you can really work your way up the DSP with additional integrations and governance. And speaking of governance, that's my third takeaway. Data governance is part of your DNA. Joe mentioned that his architects, they just live and breathe governance.

0:45:21.8 Joseph Morais: It's part of everything they do. For them, it's table stakes. So when they start building out new architecture, they're focused on the governance piece about ensuring that data is high quality before it enters the system first. And I think that's a tremendous approach to making sure that data governance gets adopted by your company. That's it for this episode of Life is But a Stream. Thanks again to Joe for joining us, and thanks to you for tuning in. As always, we're brought to you by Confluent. The Confluent Data Streaming Platform is the data advantage every organization needs to innovate today and win tomorrow. Your unified solution to stream, connect, process, and govern starts today at confluent.io. If you'd like to connect, find me on LinkedIn, tell a friend or co-worker about us, and subscribe to the show so you never miss an episode. We'll see you next time.