Life Is But A Stream

Ep 16 - Infinite Scale, No Outages: How MarketAxess Modernized Trading Infrastructure With Confluent

Episode Summary

MarketAxess, a leading electronic trading platform for fixed-income markets, modernized its trade infrastructure by moving from legacy, socket-based systems to a real-time data streaming platform built on Apache Kafka® and Confluent Cloud. Andy Steinmann, Software Engineering Manager at MarketAxess, shares how this shift unlocked hybrid architecture, infinite scale, and zero Kafka cluster outages while keeping mission-critical financial data flowing.

Episode Notes

Legacy systems don’t have to hold you back. In high-volume markets where milliseconds matter, MarketAxess—an electronic trading platform for institutional investors—needed a way to scale trade data reliably across on-prem systems and the cloud. That journey—from self-managed Apache Kafka® to MSK to Confluent Cloud—reshaped how their engineering teams build, scale, and govern real-time infrastructure.

In this episode, Andy Steinmann, Software Engineering Manager at MarketAxess, breaks down how his team evolved from a monolithic, socket-driven architecture to a streaming-first platform. He shares why Kafka became the foundation of their 2.0 strategy, how Confluent Cloud eliminated operational drag during peak trading windows, and what it takes to modernize without disrupting a mission-critical financial ecosystem.

You’ll learn:

How MarketAxess scaled real-time trade flows across on-prem and the cloud
How Confluent Cloud eliminated outages and operational bottlenecks
How to approach modernization through iterative “2.0” patterns and dark deployments

If you’re ready to explore hybrid connectivity, governance, performance tuning, and the operational realities of scaling market-data workloads, this is the episode for you.

About the Guest:
Andy Steinmann, Software Engineering Manager at MarketAxess, is a seasoned software engineer with 20+ years of experience in web, mobile, and backend technologies delivering proven solutions to thousands of users. Andy has been the technical lead/architect on many teams and participated in implementing and adapting agile methodologies to improve productivity.

Guest Highlights:
“One of the big benefits of the partnership with Confluent has not only been the zero cluster outages—but the bigger side of it was really the access to the experts… Somebody to walk that journey with you as opposed to just starting with a Google search or a ChatGPT question.”

Episode Timestamps:
3:20 – Data Streaming Goodness
20:20 – Beyond the Stream
36:15 – Quick Bytes
43:15 – Joseph’s Top Takeaways

Dive Deeper into Data Streaming:

Links & Resources:

Connect with Joseph: @thedatagiant
Joseph’s LinkedIn: linkedin.com/in/thedatagiant
Learn more at Confluent.io

Our Sponsor:
Your data shouldn’t be a problem to manage. It should be your superpower. The Confluent data streaming platform transforms organizations with trustworthy, real-time data that seamlessly spans your entire environment and powers innovation across every use case. Create smarter, deploy faster, and maximize efficiency with a true data streaming platform from the pioneers in data streaming. Learn more at confluent.io.

Episode Transcription

0:00:00.1 Andy: The managed and managed streaming. Kafka means it's managed by you and that's fine. It works great if you're a Kafka expert. What we found quickly was we're not Kafka admin experts, we're engineers. One of the big benefits of the partnership with Confluent has not only been the zero cluster outages, knock on wood. The bigger side of it was really the access to the experts.

0:00:25.0 Joseph: That's Andy Steinman, software engineering manager at Market Access. This is Life is but a stream, the web show for tech leaders who need real time insights. In this episode we're talking to Andy about what it takes to move data from on prem legacy systems to the cloud through a streaming architecture and data streaming platform that scales. We'll dig into how his team uses data streaming to keep massive feed volumes flowing in real time, modernizing how financial data moves across the business. I'm Joseph Benias, your host. Let's get started. Well, thanks for coming on the show, Andy. I can't tell you how much I appreciate it. Let's jump right into it. Tell me about Market Access and what you guys do in general and specifically what your team does at Market Access.

0:01:12.8 Andy: Sure. So first of all, appreciate you having me on. Market Access is an electronic trading platform for fixed income securities and that's municipal bonds, corporate bonds, that type of thing. The teams that I'm responsible for specifically, and there's three of them, but they specifically deal with the broker side of that. The broker side is the liquidity providers. In that situation they're responding to the industry term is RFQS or request for quote. But effectively they're providing responses on inquiries whether to, you know, you know, buy or sell the platform. At the end of the day, and this is an oversimplification, but the platform is basically fancy ebay for bonds. That's the way I describe it to people that you know that don't have a finance degree, which I don't. And so that, that helps me understand it as well.

0:02:01.0 Joseph: So bonds, just a very low stakes business. Right. You can, I'm sure, inaccuracies, things out of order. None of that matters to you?

0:02:08.4 Andy: None of it matters at all? No. Obviously the quality of data and the order of data is very important for applications that we build.

0:02:17.1 Joseph: Yeah, we'll dig into that further. I love talking to folks from any type of financial services just because I think event driven architecture was, was an inevitability and almost a hard requirement. And you know, I think people in financial services kind of Got it first. So I love kind of learning about these use cases that have some level of maturation. So tell me, who are Market Access is? Customers.

0:02:39.9 Andy: So those, those customers then are the, the dealers and there's clients as well. Again, Market Access is a large organization, but it's on one side they're generally the ones that are sending out the interest to buy or sell. You have the broker dealers on the other side that are responding to that. These are big investment banks, JP Morgan, BlackRock, Raymond James, the big investment banks and obviously have clients all over the world. So there's Euro bonds that get traded on the platform, all that kind of stuff as well.

0:03:11.8 Joseph: Basically anywhere there's a bond, there's Market Access.

0:03:15.2 Andy: If not, they're looking at a way to get there. Yes.

0:03:17.4 Joseph: Nice.

0:03:18.0 Andy: Okay.

0:03:18.4 Joseph: I think that's still innovating. We set the stage. Let's dig deeper into the heart of your data streaming journey in our first segment. So Andy, this show is all about use cases like what did you build? What amazing thing was unlocked by data streaming? So with that, my first question is what have you built with data streaming or are you currently building?

0:03:44.9 Andy: Sure. So we have a suite of applications that all run in aws, providing different levels of connectivity or productivity for increasing volume into Market Access. The team that I operate on or one of the teams that I run on focuses a lot on interaction between vendors. Right. So let's say there is a partner and they don't speak the same language, put it that way. In layman terms, we would provide a piece of software that would act as an intermediary. All of that running backed by Kafka. And so there's translation, there's correlation of messages across streams of data. The way that data gets into our system, we'll also kind of start with that as well. The fixed protocol, which dates back to the early 90s, is a socket based protocol is there have been several iterations on that over time, but that's still the way the majority of the data is transferred between partners, between trading systems, between clients and dealers, between banks, all of that. So what we built was basically an interface that allowed us to interface between the socket and a Kafka topic. So we have a Kafka topic for data that comes in, a Kafka topic for data that goes out.

0:05:02.4 Andy: What that did is allow us to then build these applications that can scale horizontally with load, but also maintain that legacy compatibility with the partners or with Market Access.

0:05:14.4 Joseph: Yeah, see you really honed in on something I think is interesting because I think People, especially an organization, has been around for a while. I like to think of insurance companies is a great example. They started on some type of batch socket based system or some legacy system. And I think they, they when they start to evaluate modernization, they wonder, do I have to throw everything out? And you clearly have not done that.

0:05:39.9 Andy: Right.

0:05:40.1 Joseph: You've kind of worked around some of the limitations I imagine that your partners still have because they're not moving as quickly as Market Access, for example. So you have this translation layer that you can still talk to your partners that are using legacy systems, but it didn't prevent you from being building out something new. Is that a fair take?

0:05:58.1 Andy: That's actually really accurate. The only piece I would add to that is it also allows us to talk to our own legacy system as well. So Market Access has a legacy trading platform that is obviously still very functional, but is written in technology that is not running in aws. It's not stream based, that type of thing. So by doing this, by providing that interface, it gave us the ability to build applications in the cloud while still reading and producing data from the on prem application. The goal of that is that lets us iteratively move that application to the cloud. Right. It is a migration strategy as well. So we could start to peel off functionality. It's the standard strangulation pattern. It's not new in software development, but yes, that pattern allows us to interface again with our own legacy systems as well.

0:06:51.3 Joseph: Yeah, that's very helpful.

0:06:53.1 Andy: Right.

0:06:53.4 Joseph: Because I think someone might think, wow, they got all their stuff figured out, they're just brokering this legacy technology because of their partners. But the reality is you still have some legacy systems yourself. And I've talked to a number of different customers and it's always kind of fascinates me how the one thing you can't strangle, for example, is a mainframe. But having a mainframe doesn't limit you because you can use something like a CDC connector and get that data out of there and still build on this modern substrate and still allow the mainframe to exist for whatever reason it's there. And that there's many reasons. So I'm curious, what were the underlying data or technology track challenges like trade data, high volume market ticks and that legacy monolith you talked about that led you to adopt data streaming?

0:07:41.6 Andy: Sure. So the, the move to data streaming came really as the need to have the data in two places, all on prem and in the cloud. So we knew we had to get the data pulled out of our legacy application. The Solution was to slightly change the code at the very edge that instead of reading and writing from a socket, it reads and writes from Kafka. What we found that to be a very non invasive change, at least the way our application had been architected. What that allowed us to do was to have consumers that we deployed into AWS as microservices, kstream applications, what have you, that could also read and write that data. That gave us reduced load and capacity on the on the legacy server. So that got us some wins there because it was no longer dealing with all these socket connections. Made network connectivity simpler because we've got one Kubernetes cluster that's dealing with network as opposed to a bunch of different on prem servers. But the big key was now the data is not directly in that application. And so for us that was the first step. Once we had that data pulled out and into Kafka, then we were able to start building what we call 2.0 because there's some guys on the team that have my amount of gray in the beard and we remember web 2.0 and so that's why we've called it 2.0 for the cloud based next gen architecture.

0:09:07.1 Andy: Whatever phrase you want to use there.

0:09:09.1 Joseph: Absolutely. So I used to work at AWS and I have some friends on the Direct Connect team and I'm sure your use case would be music to their ears because it kind of hits all the things right. You guys have been around for a long enough time to have legacy monoliths, have your own data centers, but realize the kind of the promise of the cloud and being able to instantly scale up, utilize all of these new systems. But I'm a network engineer at heart and I realized that some companies just can't fully leave on prem for a lot of reasons. And it's really fantastic to talk to someone who is part of a company that actually was able to realize the dream, like the hybrid dream of we're going to have some things here, we're going to have some things there. We recognize we have a lot of technical debt, but we're not going to allow that to stop us. We're going to. You didn't use these words, but it kind of hit my mind like we're going to democratize the data and we're going to make it available across both sides of the hybrid fence and we're going to be able to do great things with that.

0:10:05.8 Joseph: Now do you think this would have been possible without adopting data streaming in the short term?

0:10:10.6 Andy: No. I can imagine architecture where you have some kind of shared database or something else. But for us, the simplicity and the durability of Kafka and we can get into architecture choices at some other point, but I really don't think so. The other thing that's interesting about this as well is we were already in a data streaming environment. Messages being transmitted across the socket is data streaming. So what we were building already was meeting a lot of those requirements and paradigms.

0:10:45.2 Joseph: Right. Once again, fitting the 2.0 idea.

0:10:48.5 Andy: Right.

0:10:48.8 Joseph: Or that comparison. This was, you know, 2.0 of data streaming, except without sockets. You have this persistence layer, you have a new fangled protocol which is, you know, clearly more modern and I think probably was a better fit overall considering the way you're architecting these today. Now we're, you know, you and I have chatted before and we talked about this idea of increased feed volume. Right. Obviously the more successful you get, that means more traffic. I think, you know, there's more people that are generating trades of bonds, you know, and I imagine you probably even have some customers that are using AI to drive those decisions, which is even driving up more. Tell me how increased feed volume ties into your data streaming journey.

0:11:31.7 Andy: So a couple things there. Yes, you're absolutely correct. We have several clients that are working with algo traders and that type of thing. At the same point, only about 50% of fixed income securities are trades happen on an electronic platform. So there is a huge market share growth opportunity there, I guess is how I would put that.

0:11:53.2 Joseph: So the other 50% are still happening.

0:11:55.2 Andy: Through agents or something over the phone. They pick up the phone and send, or send a fax and yeah, one of the reasons that growth companies like this, there's a huge untapped market there. Yes, we want to steal market share from our competitors, but we're not all fighting over the same 10% of market share. There are multiple ways that you grow. That I guess is how I'm wanting to say that. So not only do we have increased volume due to new methods of trading, but we have increased volume due to just new streams of data coming in as, as we're onboarding more and more of that, you know, total market.

0:12:34.6 Joseph: Yeah, I imagine like easily within our lifetimes we'll see that number of, let's call it the seat sneaker net way of doing trades, you know, probably hitting single digits. I'm actually surprised it's still at 50%. That's actually kind of mind blowing to me. Now as you were talking through that, I kind of, it kind of Popped into my head. When you start thinking about 2.0 and you realized that Kafka or data streaming was the right fit, Was there a tipping point where you realized that we just had to get off this legacy system? Was there a certain event or maybe chain of events that made you realize we better start building out 2.0 sooner than later?

0:13:10.1 Andy: I don't think there was a specific. Fortunately, there was not a specific tipping point that was like, hey, we were down for a week, right? Yes. Fortunately, that was not the case. However, you know, we were seeing increased load in our legacy monolith that you have to give more CPU and more memory to. And that does have a finite limit eventually. So it was more of the concept of, we see the pattern we're on here, we will be out of CPUs and memory if we don't start, you know, start building this. And again, like I mentioned, by moving those workloads off of the monolith, not only did it, you know, we have the data and to be able to start to build the new applications off of, but. But we also reduced load on the monolith application itself. So it bought us two things, right? It got us the data to be moving forward, but because we were doing less processing with reduced the CPU and memory load and that type of thing. So, yeah, it was two birds with one stone kind of thing, right?

0:14:11.0 Joseph: Yeah, right. You took the steam off of those cores that were melting down and you gave them a little bit of headroom. Right.

0:14:17.1 Andy: And it.

0:14:17.4 Joseph: But. But you also had that benefit of using, you know, the unlimited scaling of the cloud and then, of course, modernizing kind of really a great playbook for any, you know, legacy stack that, you know, from whatever vertical you're in. You know, this is a good blueprint to not throw everything away today, keep your business up and running, but also buy yourself time and ultimately get to the modern stack that I think everyone desires. And of course, that's going to look a lot different if you're starting right now. Your modern stack might look a lot different than if you're starting five years from now. But I'm curious.

0:14:49.1 Andy: We call it rebuilding the car while you're driving down the road. That's really what we. How is the analogy that we use for that?

0:14:56.9 Joseph: So, yeah, I totally understand that one. I remember having like a switch stack and it was a redundant. And we were down to one switch and we had just used all the secondary connections while we were flashing another switch to throw back in there and do all this while everything was up and running and just hope that no one sees. So thank you for summoning those memories for me. I really do appreciate it. So, something that we alluded to earlier in terms of high quality data. Tell me, what is your approach at market access to data governance?

0:15:29.5 Andy: So data governance is a large topic. We could spend a whole hour talking about just that couple things around that we make heavy use of ACLs for access control. We also have a concept, our cluster, the Kafka cluster that we use is isolated from the rest of market access and we can get into, get into some contractual reasons why all that is or whatever. But long story short, it's not just wide open. Anybody can even internally can connect to it and grab data. The other pieces of data governance, we have custom K stream applications that we use for validation of data. And that can be as simple as just making sure that it is, you know, the correct format that it, you know, you know, it's not a fragment of, you know, of a message or something like that to correlation around even with some business logic where you know, if we sent a reject, did we get a correct acknowledgment back or something along along those lines. So there's, it's, you know, it's not just data governance from a. What's in Kafka, but it's from, you know, it's from a little bit larger, larger viewpoint.

0:16:37.7 Andy: That being said though, having a dedicated cluster certainly reduces the amount of things that we have to worry ourselves with as opposed to if it was open to the whole company or we were running some kind of data platform, you know, something like that.

0:16:50.6 Joseph: Yeah, and sometimes it's about guardrails, right? It's about not giving people access to systems they don't need access to.

0:16:55.8 Andy: Right.

0:16:56.0 Joseph: I mean, when you think of, you know, the, the principles of least privilege and things like that, you're just saving yourself a lot of headache, frankly. Now, what about the other side of governance, the governance of the data as it ingresses your system? Are you utilizing things like Schema Registry or Lineage to you know, build what we call here confluent data contracts?

0:17:15.7 Andy: We don't use Schema Registry currently.

0:17:17.9 Joseph: Okay.

0:17:18.3 Andy: And there's pros and cons to that and we can get into one of when we get down further on. I've got some, some complaints with Avro that we can talk about with Schema Registry if we really have time to it, but currently we don't use Schema Registry. That being said, that fixed protocol, those messages that we talked about, which are the Edge and really in a lot of cases are either the first input or the first output of. Basically all of our applications do have defined schema. So it's not confluent Schema Registry, but it is an XML document that defines. These fields are required, these are optional, these elements repeat, that type of thing. There certainly is a concept of validating that and it's something that we build into our, into our applications. We'll use that schema and then you have to decide what do you do if it doesn't match the schema, just like every other application has. Sometimes it's do the best with what you have. Sometimes it's throw that message away and alert it, set an alarm, don't care if we can't parse it, move on to the next one. There's different levels and that kind of depends on the business use case.

0:18:24.5 Joseph: Right. But you do have some mechanism for ensuring you know what messages and events are being produced would be consistent with what's being consumed. I think that's absolutely. And like you said, whether you're using our schema registry or some type of centralized XML or there's other ways of doing it.

0:18:38.2 Andy: Right.

0:18:38.5 Joseph: The error handling is always very subjective to whatever your requirements are. Now before we move into our next segment, I just want to make sure. Because the use case to me is. You're a star to me, Andy. But the real star is the use case and I just want to make sure I have it summarized correctly. So you had a monolith system, you were all on prem. At some point there was this need to go to this 2.0 version and the decision was to make it on top of data streaming in the Kafka protocol. And Confluent helped you kind of bridge that gap between on prem and the cloud and to build services abstractions on top of this new layer to still talk with your partners and your own legacy systems, these socket based systems and allowed you to simultaneously keep the lights running but also modernize and expand into the cloud. Did I get that right?

0:19:30.1 Andy: That's correct. Yeah.

0:19:31.5 Joseph: Perfect. Next we're going to dive into how your partnership with Confluent solved your data challenges. But first, a quick word from our sponsor.

0:19:44.1 S3: Your data shouldn't be a problem to manage. It should be your superpower. The Confluent data streaming platform transforms organizations with trustworthy real time data that seamlessly spans your entire environment and powers innovation across every use case. Create smarter, deploy faster and maximize efficiency with the true data streaming platform. From the pioneers in data streaming.

0:20:20.4 Joseph: Now we'll go beyond the stream on why Confluent is the right fit. All right, so we now know that data streaming was the answer to some of the challenges that Market Access had. There's a reason that Kafka is the industry standard to address increased feed volume, but it also has a lot to wrangle with Kafka. What made Market Access choose to work with Confluent for a data streaming platform?

0:20:43.1 Andy: That's a great question because it's certainly not where we started. Okay, 2019 we started running our own Kafka brokers inside of our Kubernetes cluster with the rest of our workloads that functioned. Everything connected, data worked, topics were created, that type of thing. What we found quickly was we're not Kafka admin experts, we're engineers. We build software, we don't manage it. So then the next and kind of I think natural step was to move to MSK Managed Streaming Kafka from aws. All of our other applications were running in aws. And again we ran into the same problem that the managed and managed streaming. Kafka means it's managed by you. And so the, you know, the. And that's fine. It works great if you're a Kafka expert. You know, as far as how many brokers should you run? I don't know, run five and see what happens. That's not great for a production environment. Right. That's great for a dev environment if we're just playing around with some stuff. But what we found also was that you'd get random node reboots at 3pm in the middle of the day, which again, applications built to disconnect, reconnect to the other node.

0:21:52.3 Andy: All of that worked. But it's not great when you suddenly get a bunch of disconnects right at the time that some of the highest trade volume happens in the industry right at 3pm Eastern.

0:22:01.6 Joseph: Always brings it the worst time.

0:22:03.0 Andy: Always brings it the worst time. So, and again, I have nothing bad to say about msk, but we ran into the same problem as running it ourselves. We aren't Kafka experts. That's then what led us to confluent is, you know, being able to take advantage of the expertise, the auto cluster size, you know, sizing. How do we, how do we pick how many clusters we need? There's a little slider that will tell you how many nodes do you want, all of that kind of stuff. So that was really kind of the journey for how we ended up at Confluent Yeah.

0:22:32.3 Joseph: I mean, that is the most, I think, classic journey. I think a lot of people start with open source and I think open source is great, but like you said, it's undifferentiated heavy lifting. Right. Because at the end of the day, what we want is we want data to go in and data to go out. And we may not want to care about all the underpinnings of that. We just want that service to work. And what I think impressed me and the reason I've been here so long is that Confluent is really trying to make data streaming as simple as S3. Right. And I'll pick on AWS for a second. Cause we talked about MSK and you know, it's a fine product if you're willing to accept, you know, some of the, let's say, limitations of it. But, you know, like S3, you know, you don't really think about S. Like, you think of a bucket, you think about uploading, you think of downloading, you don't really think about anything else. Right. But the reality is there's things scaling under there. Of course there has to be. There's disks going in, there's disks dying that need to be replaced, but they've abstracted that all.

0:23:22.9 Joseph: And that's what we tried to do with Confluent Cloud. And it really makes me happy because I started in this industry. Well, I've been around for a while, but in 90, I'm sorry, 2017, I was running open source Kafka as well, and I hated it. And I think MSK was just starting. I don't even know if it was announced at that point. It might have just been around the way it might have been early access and Confluent Cloud was very early as well. And it really delights me that you kind of figured out, hey, we don't want to do this. Like, we want to focus on these differentiated pieces of building what's meaningful to our customers. And whether it's Kafka or running a document database or running a running Spark, you want to hand that off to the experts because you realize that there's value prop and not having to worry about that and making it as simple as possible. And I'm glad you are realizing that. So that makes me very happy. Now with our partnership, how did your teams tackle addressing increased feed volume and scaling? Like, so I know you had this kind of journey from open source to, you know, MSK to Confluent when you got on, Confluent was Was there a turning point?

0:24:33.8 Andy: I would say one of the, one of the big benefits of the partnership with, with Confluent has not only been the, you know, zero cluster outages, knock on wood, you know, since, since we went over. Yeah, exactly. That's, that's the goal. But, you know, the bigger, the bigger side of it was, was really the access to the experts for us, you know, whether, whether that's partnering on, you know, hey, we have a performance issue. How would we recommend to set up, you know, consumer or producer, you know, flags. Yes. As you know, there's, you know, a thousand of different knobs you can turn for different performance scenarios. Yeah. Getting insight into beyond just, well, here's what the, you know, here's what the default values are or here's what some stack overflow post says you should do for your consumer settings, which may be completely accurate and may not be, or it may have been accurate in 2015. So the advantage is getting access to the feedback support when there are, when there are issues, you know, has been, has been a huge, a huge help as well. And then even just we had an issue here in our staging environment a few weeks ago where we were having some performance issues and you know, it took a while to get tracked down where it was and you know, just that again, somebody to walk that journey with you as opposed to just starting with a, you know, with a Google search or a chat GPT.

0:25:51.9 Andy: You know, question as to me, that's been the real advantage of the partnership there.

0:25:58.3 Joseph: I love to hear that because honestly, I do think that is a huge value add to having, you know, working with a company that just focuses on basically one or a small subset of technologies. Right. Whether you're Google, Microsoft or Amazon. And we're partners with all of them. And I very much value those partnerships. They all have a minimum of 100 plus products. Right. And you can only support all of those to a certain level of granularity because ultimately all three of those providers, they build great clouds. And that's not easy to do like building these huge data centers that are globally distributed connected, redundant, high speed, et cetera. So when you start to peel a little bit of a layer and say, hey, we built something else, that's cool. But it's not your core competency. Your core competency is to build data centers. And when you, you know, you find those vendors that, you know, love a technology and that's all they kind of focus on, you can have these type of outcomes where you, you know, whether it's, you Know, getting advice on tracking down what was happening in your staging area, or maybe being part of our customer advisory board, things like that.

0:26:55.2 Joseph: These opportunities are there to help kind of drive the future of data streaming. And we appreciate customers like yourself and those opportunities to delight you. So with a project as big as 2.0, there's more than just tech to consider. How did you approach getting the business on board? Right. Was there any pushback from initially switching over to OSS Kafka in the first place and then was there a business pushback to going from Ms. To MSK and saying, hey, we're going to pay more money for something that we're running ourselves? And then to confluent, was there any pushback or was the business, did they have your back from the beginning?

0:27:31.8 Andy: So we have the advantage. The team that I'm or one of the teams that I'm responsible for runs a lot like a product team. And so it has a product, it has a legacy monolith product that came out of an acquisition 10 plus years ago. And so we're allowed to make some of those decisions a bit more in a vacuum because they're not affecting market access as a whole. When we make some of those decisions, that's great. The advantage with that is that we're responsible for that. The downside is that we're responsible for that, if that makes sense. The move to MSK was more of a, hey, we're responsible for the cloud budget that we use. We have to make sure that we have offsetting revenue and those types of things. The move to confluent cloud was very similar. It was a very easy sell though, for the move to confluent cloud. It's like, hey, the Kafka cluster keeps, you know, getting security patches at 3pm on a Tuesday. No matter how many tickets we put in with AWS, it still seemed like you're getting it at 3pm on a Tuesday. So the advantage is we didn't have as much pushback until we get we have a more functioning, mature product.

0:28:42.8 Andy: And then it became much more of a, hey, everybody else, this works really well. And you know, because we started it was just us and now there's quite a few teams in market access that are using confluent and, and that type of thing. So we didn't have the pushback you would normally have if you had, you know, if we had to go through, you know, an entire architecture council or something like that.

0:29:04.5 Joseph: You know, my producer Taylor, who always kind of sits invisibly behind the scenes and makes all of us look good. She would confirm that. That's one of my heartier belly laughs that I've had here on the show. The upside is that we're responsible for it and the downside is that we're responsible for it. I absolutely love that. So could you share some advice, lessons learned for leaders like yourself for starting tackling data streaming? Right. Imagine you're yourself, you're back in just that monolith world. What advice would you maybe give to yourself before you start a 2.0?

0:29:36.3 Andy: I would say start small. Realize that a cloud migration, moving to data streaming, basically moving away from your monolith can't be, hey, everybody comes in on a Saturday and we turn a bunch of, you know, turn a bunch of knobs and suddenly Monday morning, it's the new platform. There's just, there's no way. There's too many moving pieces and parts. In most applications that I've worked on, you know, through at Market access and in the past, the concept of starting small, of getting your data out there, you know, getting, getting your data first, right, you have to have the data. Then you start iteratively building, you know, applications. It's easy to sit and say, hey, we're going to do, we're going to do agile and do, you know, iterative development, it's a lot harder to actually do it. But moving toward that is the key. And it's an overdone analogy, but how do you eat an elephant? It's one bite at a time. That's the concept. Instead of trying to just say we're going to go dark and work on this for three years and then hopefully that it all works when we turn it on iteratively walk through one of the things that we've done and there's varying ways to segregate data, if you will, but figure out a way to bring clients on one at a time to the new system.

0:30:49.8 Andy: You find your client, that is the one that's actually great to work with if you, you know, and doesn't complain if you have a, oh, we got to roll this back real quick, that type of thing. The other thing I'd say is that you have to have a mature CICD system. You have to be able to do, you have to be able to release in the, to production in the middle of the day. It's fine if you don't want to. And we generally don't. Trading hours, you know, end at 4:30 Eastern. That gives us a great window like the whole rest of the evening with if there's problems. Same with weekends. But you need to be able to, you know, to deploy to production in the middle of the day if you have to.

0:31:26.4 Joseph: Right.

0:31:26.8 Andy: Once you have those, then you can start your, your, your migration and earnest.

0:31:31.8 Joseph: No, it makes perfect sense because you can have a release that went out last night that you don't realize is broken until the next day.

0:31:38.3 Andy: Exactly.

0:31:38.8 Joseph: We've all run into that because it's creating a memory leak or something. And even though you don't want to, you have to do that hotfix. So I love that. That's really good. And getting those small wins are so important. Right? Because now you can build buzz within your own organization. You can get the business on board. Suddenly everyone has something to look forward to.

0:31:57.1 Andy: There's one other thing that I wanted to jump in real quick and cover. By using Kafka, it's given us the ability to do something that we would call a dark deployment. So we would deploy a new service to production that's reading data, doing whatever, processing and then just writing to a placeholder topic, screaming into the void, if you will, whatever concept you want to use. But there is nothing like running your application in production to determine is it going to work? Is it going to have. Are you going to. What does the actual data look? No matter how close your staging environment is, that also helps you deal with load. Is it lagging all of those types of things? By using Kafka and using some configuration, you can actually run it side by side with the existing one in production and compare output. Maybe you're replacing an older application with a newer one and you can take those two topics, pull all that data in and you may be talking hundreds of millions of data points. Pull all that in and programmatically compare results to confirm. Yes, the new thing is still doing what it's supposed to.

0:33:05.2 Andy: We did fix that one bug that we can identify here and not over here. And that's huge because now you don't have to turn it on and hope it works. You turn it on, deploy it to production, verify that it works for a week and then enable it and it gives you a much more confident release cycle.

0:33:22.9 Joseph: Yeah, I like that. For both sides of that one is the correlating between v1 and v2 and saying, hey, we got a week's worth of data points. We know this thing is rock solid. It's doing everything we expect it to, to. But the other side, right, that I think everyone endeavors to get staging to match production as closely as possible. But inevitably there'll be some setting that is on in production that's not on. A staging that burns you.

0:33:47.7 Andy: Right.

0:33:48.0 Joseph: And you avoid that by doing these dark releases. I really like that.

0:33:51.3 Andy: Volume is another area that's hard to replicate in staging.

0:33:54.3 Joseph: Staging very hard.

0:33:56.3 Andy: It is difficult, but yeah, right.

0:33:57.8 Joseph: It's, it's, it's hard to replicate. It's, it's hard to replicate with like actual real data. And then of course it's expensive. Right. Because production is of a certain size. And if you can get away with not having the size you're staging the same way, you can definitely save yourself some money, but not at the risk of having a bad release. But you kind of have found a way to bridge that gap as well. So kudos for, kudos to doing that. Now, what's the vision for data streaming at Market Access? And it's okay if it does or does not include artificial intelligence?

0:34:25.6 Andy: Well, our CEO would probably fire me if I didn't say AI on a call somewhere. But no, I just. Everybody, everybody has to. So obviously data streaming, we already have clients that are using AI. Market Access is already investing in AI and I won't get into too much of that. That's not really been any of the projects that I've worked on. That being said to me, I really think that the future of data streaming is moving away from the fix socket connection. I think if you think about it, we have this functional for sure but antiquated method of transmitting data. It is a bottleneck. There are certain limitations to how much data you can squeeze over a socket at a certain amount of time. And so I really think the larger vision, not only internally for the suite of products that Market Access has internally, but also to the vendors, the brokers that we connect to, I really see a growth there to where it's not socket connections for fixed messages, it is Kafka connections. And whether that's data replication between clusters or whether that's bespoke consumers that are limited by acls. There's a lot of ways to solve that, but I really think from an industry perspective that's where it's headed.

0:35:52.7 Andy: And obviously an increase in volume significantly due to AI, due to market share increases, all of that kind of stuff.

0:36:00.1 Joseph: Yeah, I was going to say, you know, peeling off that abstraction layer, getting to the full embrace mint of the Kafka protocol and then to make your CEO happy, you know, some of those being AI powered microservices, I'm sure using that consumer producer model. So before we let you go we're gonna do a lightning round, bite sized questions, bite size answers I.e. B, y t E, like hot takes but schema backed and serialized. Are you ready?

0:36:34.1 Andy: Sure.

0:36:35.0 Joseph: What's something you hate about it?

0:36:38.0 Andy: I hate technology choices because it's trendy. For example, using Kafka, where you don't need to use Kafka. Sure, I love it. But, you know, but, but this is what we're talking about on, you know, today. So I'm a Midwest boy. I, I grew up here. I, I still live here. For anybody who's ever used a shovel to dig a hole, my analogy is there's lots of different kinds of shovels to dig different kinds of holes. They will all dig a hole, but you will appreciate one versus the other. That's, that's my analogy when it comes to, you know, tech choices for, you know, just for the sake of it. So that's, that's one of things.

0:37:11.0 Joseph: You just need a small shovel. I get that. Right?

0:37:13.5 Andy: Yeah. Or if you're digging tile but putting tile down, that's, that's a completely different. You want a tile spade.

0:37:18.3 Joseph: So that's very, very salient. What's a data myth you'd like to bust?

0:37:26.5 Andy: The concept that real time data is always better. And I say better. Maybe necessary is a key, is really the key there. It's actually a conversation that the product manager and I have often is we have a screen and it's got ticks on it. He's like, I need this to show ticks. My answer is always, how real time do you need it to be? Because is it 10 seconds? Is it three a minute? Where's the SLA, if you will? So to me, that's the myth that you just always need bleeding edge data. And for humans consuming data, struggling for more than about three ticks a second, it would be fast. I mean, if you think about numbers moving up and down on a screen, if it moves more often than that, you don't really need it. And so what that's done is allowed us to build some systems to basically limit data, or rate limit data, whatever you want to call it, to reduce volume, reduce load, all of that kind of stuff.

0:38:30.5 Joseph: Yeah, sure. I mean, of course there's nothing better than real time data. You need data from the future and that's just not possible. But is it always necessary? And I agree there are definitely use cases where it's not necessary. But again, like you said, that's the humans, right? A new world where everything needs to be made within a millisecond I think is coming is going to be thrust upon us, but we're not there yet.

0:38:53.5 Andy: But also whenever you think of better, better is also relative as well. But no, I think that it certainly is a brave new world when it comes to where we're headed with streaming data.

0:39:03.7 Joseph: I think that was a really good take, honestly. So what's a non tech hobby or activity that's impacted how you think about data?

0:39:09.8 Andy: Yeah, so I don't really know if this is a counts as a hobby or activity. This actually came to me when I was at Current here a couple three years ago down in Austin, the Confluence Convention. So we'd sit through all these lectures talking about Kafka and streaming data and all this kind of stuff. And I'm standing out there in the lobby at the end of the first day and I'm kind of looking and I look over and there's a bank of elevators over here. And I look to the right and there's a three escalators that are coming down. And it hit me, the simplest way to explain batch processing versus stream processing is an elevator versus an escalator. And it's something again, it changed. It really changed the way I think about data. And so just from that really simple concept getting, you know, getting my head around it. And again, that's not to say that there aren't times when it's better. If you've got a heavy load, you'd rather take the elevator up. Right. Because you don't want to put that on your back and ride up the escalator. But if you have, and the concept there again is depending on the kind of data that you have, the workloads that you have helps you decide which paradigm is better.

0:40:19.6 Joseph: Yeah, like I got to move as quickly as possible. The escalator is probably going to be better for you. 9 out of 10 times it's right there. You're not waiting. Unless you just catch that elevator as it's opening up or about to close. There's definitely a delay. I think that's a really, really good analogy. I'm definitely stealing that.

0:40:34.8 Andy: So perfectly fine.

0:40:36.3 Joseph: Where do you get outside inspiration? You know, we're thinking about event driven architecture is from a book, maybe a thought leader in the industry.

0:40:44.1 Andy: So again, this one I don't really have something specifically toward event driven architecture, but it's something more when it talks about outside inspiration. The concept of, I don't know if you're familiar with Stephen Covey's the circles of control, influence and concern. Okay, so this is more and it really deals with any area of personal life or you know, leadership at work or running software teams, whatever the idea is, is that you have three circles and they get larger. You have a circle of control. That's the stuff that I am physically in control over. My attitude, my effort, where I go sit, those types of things. You have a larger circle which is an area of influence. Those are things I can't physically control but I can influence this. You know, like your kids, you know, their behavior. You can't control it, but you can influence. And then the concern is it affects me, but I have zero influence or over that. And it really then is this kind of whole mindset. The advantage I see is that when you're in a large organization, invariably they're going to be tech choices, decisions, whatever that you don't have any control over, you don't have any influence over, but they still affect you.

0:41:53.9 Andy: And learning to be able to effectively say I don't have any control over that. I cannot like it but I'm not going to get hung up on it if you will, or get tied up on. He's got a whole set of books but that one specifically is the one for me at least more recently that I've really, really focused on. So yeah, I don't know if that was, if this really not it was non tech related but that's kind of where, where I ended up on that.

0:42:20.0 Joseph: I love it because it could have absolutely ties back to technology. For anyone who's been any tech team, we know that there people are going to make decisions that may be bad ones. But you know, at AWS we had a leadership principle called disagree and commit and that's what we would do. We would disagree on it and we would commit and we would still do that. But I think also knowing, you know, maybe putting more of your energy into things you can control and influence and not spend a lot of time on the things that you can't.

0:42:45.0 Andy: Exactly.

0:42:45.9 Joseph: That is exactly sage way of looking at life. Any final thoughts or anything to plug?

0:42:52.6 Andy: I really don't think so. I, I think that's, I think that's it.

0:42:59.0 Joseph: That's great. Hey, that means you got it all out there. That excites me more than you will know. So Andy, thank you so much for joining me today and for the audience. Stick around because after this I'm giving you my three top takeaways in two minutes. That was just a fantastic conversation with Andy and here are my top three takeaways. So the first one is Just kind of talking through the history that Market Access had with Data streaming.

0:43:29.3 Andy: Right.

0:43:29.4 Joseph: They started, like many other customers, they started running open source Kafka and they realized that it was a distributed system that could be kind of a kick pain in the butt to run at scale. Right. If you're just getting started with it, you're playing around with it in Docker, it's no problem. But once you start to get to that, you know, bonds trading scale and you have that level of requirements in terms of no outages and things like that might not necessarily be tenable. Especially when the way Andy described it, they're not Kafka engineers. Knowing Kafka is a serious commitment and to really understand the, the full underpinnings, how many different things you can tweak, so running yourself is, is difficult. And then they went, you know, this classic path and they picked a different managed Kafka provider. And that didn't quite meet their, their requirements because again, this particular, this particular service provider, it wasn't their core competency. Right. Data streaming isn't the only thing they did. So they eventually ended up on Confluent and then they had zero cluster outages. Ever since moving to Confluent, I can't think of a better endorsement. And again, from an operations guy, music to my ears.

0:44:35.6 Joseph: Knowing that those folks at Market Access do not have to hear Patriot Duty go off because they're having a problem with confluent and this ties into their choice in that journey. And I loved what Andy said. The upside is that they're responsible for that, but the downside is that they're also responsible for that. Right. They had agency to make the choice to start with open source, to go to one provider and then ultimately come to Confluent. And all they had to do was justify that because they own that responsibility. And again, tying back to those zero cluster outages, I'm sure the business is very, very happy with that decision. And then my last takeaway is working with legacy systems. It does not prevent you from actually modernizing. Right. So in Market Access they built this abstraction layer that still talked through sockets to their partners and to their own monolith. But that did not prevent them from building new differentiated outcomes using the cloud. And how did they do that? Well, they ate an elephant one bite at a time. Right? So getting those little wins, getting a piece off of the legacy stack, and maybe getting one function out of your monolith on top of that democratized data allows you to keep the lights on.

0:45:41.9 Joseph: But modernize at the same time, and that's really a journey we hope that all of our customers can date. That's it for this episode of Life Is But a Stream. Thanks again to Annie for joining us, and thanks to you for tuning in. As always, we're brought to you by Confluent. The Confluent data streaming platform is the data advantage every organization needs to innovate today and win tomorrow. Your unified platform to stream, connect, process, and govern your data starts at Confluent IO. If you'd like to connect, find me on LinkedIn in don't forget to leave a like or leave a comment on the YouTube video. If you're watching, tell a friend or coworker about us and subscribe to the show so you never miss an episode. We'll see you next time.