Life Is But A Stream

Ep 20 - One Solution, Infinite Value: How SecurityScorecard Put Confluent at the Center of Everything

Episode Summary

SecurityScorecard's engineering team shares how going schema-first with Apache Kafka® and Confluent Cloud cut processing time from a weekend to near real-time.

Episode Notes

What happens when a security intelligence company decides that data contracts aren't optional, they're the foundation? For SecurityScorecard, that decision changed everything: how teams share data, how pipelines are built, and how quickly a new engineer can ship production-grade work on day one.

In this episode, Brandon Brown, Senior Engineering Manager, and Manan Monga, Senior Software Engineer at SecurityScorecard, join host Joseph Morais to walk through their multi-year journey from batch snapshots and MSK to a fully schema-governed, event-driven platform on Confluent Cloud. They cover how Protobuf and Schema Registry became the organization's shared language for data, how Apache Flink® pipelines now power real-time CVE enrichment across 12 Kafka topics, and how Kafka became the undisputed single source of truth for data across teams.

You'll Learn:

About the Guests: 
Brandon Brown, Senior Engineering Manager Engineer, Scoring SecurityScorecard – Brandon Brown is a Senior Engineering Manager at SecurityScorecard where he leads the team who provide the scores at the core of our ratings platform. These scores guide cybersecurity leaders worldwide in reducing their risk and addressing risk in their vendors. His team also drives secure data sharing and advances company-wide adoption of streaming technologies.

With 10+ years of experience in software development, Brandon has experience across the full SDLC specializing in Data Pipelines over the last 8+ years. His language of choice is Scala but he has a soft spot for SQL. He’s contributed to top open source projects such as Debezium and was an early contributor to the ZIO ecosystem. He’s passionate about Kafka and Hybrid Data Pipeline strategies. Outside of work he’s helping raise his young son and catching live music or talking about movies.

Manan Monga is Senior Software Engineer at SecurityScorecard, building real-time pipelines that stream threat intelligence into the SCDR platform. With experience in SIEM, cloud security, and distributed systems, along with an MS in Computer Science and Cybersecurity from Boston University, he's passionate about performant data at scale. Java is his go-to, but Kafka has his heart. When he's not optimizing pipelines, he's reading sci-fi, cooking, or out cruising in his convertible.

Guest Highlight:
"It was like a weekend to process one day, and that was if the Scala service didn't die while it was running… friends don't let friends ignore schemas. Schemas all the way down." — Brandon Brown, SecurityScorecard

Chapters:
[00:50] Guest Introduction + SecurityScoreCard Overview 
[07:00] Segment 1: Data Streaming Goodness
[23:20] Segment 2: Beyond the Stream 
[42:40] Segment 3: Quick Bytes
[45:20] Segment 4: Joseph’s Top Takeaways

Dive Deeper into Data Streaming:

Links & Resources:

Our Sponsor:  
Your data shouldn’t be a problem to manage. It should be your superpower. The Confluent data streaming platform transforms organizations with trustworthy, real-time data that seamlessly spans your entire environment and powers innovation across every use case. Create smarter, deploy faster, and maximize efficiency with a true data streaming platform from the pioneers in data streaming. Learn more at confluent.io.

Episode Transcription

0:00:00.0 Brandon Brown: When I came into Scorecard, I was like, I'm not managing Kafka because I've semi-done that at every other job. It was a nightmare. So not doing that. We're gonna use schemas. Protobuf was the only format, still is the only format that lets you use kind of a dumb definition language that anyone can understand. You could speak data, you could generate clients, and whether we had a schema registry or not, we could do it.

0:00:25.4 Joseph Morais: Welcome to Life is but a stream. Today we're talking to Security Scorecard, an incredible company who's helping minimize risk for anyone out there who's using the internet. We're gonna be talking to their engineers about how they're modernizing their data architecture using data streaming. I'm Joseph Morais. Let's get into it. Let's jump right into it, Brandon. Tell me about what you and your team do at Security Scorecard.

0:00:55.5 Brandon Brown: So my team, I'm an engineering manager for our scoring team. So, we take 100 plus thousand companies and we basically scan the public internet across the globe and teams provide us with data to then from all of that, calculate out a security score from an A to an F. So my team is responsible for processing all that data every day and actually calculating that A through F score. We also are responsible for other kind of data platformy type things as well.

0:01:25.4 Joseph Morais: Manan, what do you and your team do?

0:01:28.2 Manan Monga: So, I'm a senior software engineer here at Security Scorecard on the platform V2 team. We're rearchitecting a lot of our platform and what I do is I build real-time data infrastructure, define Protobuf schemas that are data contracts across the organization and then build flink pipelines that consume data from Kafka where teams across the platform V1 and platform V2 put data and then publish it into aggregate, enrich it, publish it into our ClickHouse databases.

0:01:59.7 Joseph Morais: I think for the audience, let's talk about what Security Scorecard, what those scores mean and then kind of, I think you guys have grown past just this qualification of the security on websites. I think you're doing more than that. So if you guys can describe that as well, it'd be awesome.

0:02:15.1 Brandon Brown: Yeah. We're looking for outdated web browsers, outdated library versions, if you have open SSH ports, things like that. We're collecting all of that and weighting them together to then kind of calculate that score. But a lot of what Manan's team is doing is taking some of this kind of classical data collection and actually presenting it in different views that are outside of just the kind of standard scorecard box. I'll let him talk about some of the stuff that he's actually been doing with that.

0:02:44.3 Manan Monga: Yeah. So, we try to recall all of them observations on our platform, but there are different kinds of security findings. It can be breaches, they can be CVEs, and we categorize them, certain action flows that people might incorporate into their security architectures and we also give them recommendations on how to fix them. All of that data is presented in a very clean way so that it's more actionable for a person.

0:03:07.7 Brandon Brown: Yes. And like one of the things that we've done on our kind of PV1 side, if you will, is we've architected around very specific guided patterns and as a result, we kind of limit some of the data because there's a lot of data that you can actually present. And so a lot of what Manan's side is doing is actually saying, well, there's all this data divorced of the score, we should surface that as well. And so they're actually taking in even more data than our team is because it's not that we filter out data, but we kind of shrink it down into the manageable size that you need to calculate out and serve that. But then there is some fidelity that you lose in that.

0:03:50.2 Joseph Morais: So this, Brandon, brings me into my next question, which, can you describe a typical Security Scorecard customer?

0:03:57.4 Brandon Brown: There's a few different kind of customer personas that we deal with and target. So one of them is the vendor risk manager. They, for example, want to know what is our security posture, what's the security posture of our vendors so that they can present reports to their board and say, "Hey, here's how we're doing and here's how we're doing compared to say, our competitors or other people in our industry." So, let's just say that you have company Manan, and your security posture is a C. Well, you can go and say, "Hey, here's why I'm a C." But then you could also say, "Well, hey, look, everyone else in my industry is also a C, so that's not necessarily a bad thing. But hey, here's how we could get up to a B if we fix these things." And a lot of it is, and this is part of where Manan's team comes in, is every company, every team is unique. Their idea of what is important is not one size fits all. So we're trying to kind of pivot into making the data more accessible and usable for each of those use cases.

0:04:59.8 Brandon Brown: So instead of kind of traditional saying, "We're just going to target these specific use cases," we're going to guide people, but then we're also going to give them the ability to say, actually these are the things, these are the findings that are important to me. So when I'm reporting on my security posture, I can say, "Look, yeah, these things, they don't look good." But attack surface wise, there's all these other mitigating things that make it so it's not as bad as it looks. But these things, oh, these are really bad.

0:05:29.0 Joseph Morais: Right. You want to make the rubric customizable for the customer, which makes a lot of sense. I mean, there may be certain type of postures that in my industry aren't as big, but there might be highlighted in another industry. Manan, I'm curious, what does a successful Security Scorecard customer look like?

0:05:47.6 Manan Monga: So, a successful Security Scorecard customer is probably someone who's able to leverage all the observations that they have for their vendors as the vendor risk manager, use the recommendations that we're giving them on that, and they can actually contact the vendors through it, and they're able to mitigate them.

0:06:05.0 Brandon Brown: The other thing you can think about this too is blast radius. When a zero-day happens or when a critical infrastructure provider goes down, how badly are you impacted? I'm sure you remember a year ago, there was the Microsoft update that crippled everything. You couldn't log in to your Wi-Fi, you can't get into your hotel room. Those types of things, when they happen, instantly teams want to know how much am I impacted, but more importantly, how much are my vendors impacted? How much is this gonna screw up my business? So being able to provide answers to those questions through the data is really key.

0:06:44.0 Joseph Morais: You know, minimizing that blast radius is all about time to action, which makes me think about real-time data, which perfectly leads us into our next segment. All right. So we set the stage. So let's dig deeper into the heart of your data streaming journey in our first segment, which we call Data Streaming Goodness. So to both of you, what has Security Scorecard built or are you currently building with data streaming?

0:07:16.1 Brandon Brown: Manan's team is actually fully realizing a picture that I laid forth about four and a half years ago when I first joined Scorecard. So when I first joined, we were very batch-heavy and we're like, we kind of would like to adopt streaming. How do we do that? And so one of the things that we've kind of approached with streaming is everything has a contract, none of this unschemaed data. Protobuf is kind of the main entry point for that. So we built out a lot of infrastructure around being able to use a single repo to define data contracts. Those data contracts work both for back-end pipelines and for restful APIs and services.

0:07:55.8 Brandon Brown: So there's no ambiguity over schemas because it's all in one place. It lets us kind of govern it. And then we can generate clients for each language we support. And we kind of initially used Kafka more for data sharing. So we used Connect to replace snapshot jobs by saying, "Hey, we could keep multiple sources of sinks of data in-sync from a single source of truth." Then we moved towards, okay, well, if instead of us just writing to Postgres and ClickHouse and S3, we produce to Kafka and then we use sink connectors, Kafka becomes the single source of truth.

0:08:31.9 Brandon Brown: And so we kind of evolutionary have moved towards that, but we still have a lot of batch pipelines. And so that's where Manan's team comes in is with the Flink part. So I'll let him talk a little bit about that.

0:08:42.9 Manan Monga: So my team is building a lot of a suite of real-time data pipelines using Flink. We basically get our security observations and all of our vulnerability intelligence into our database as quickly as possible. For example, CVEs. So CVEs consist of a lot of different data that comes from different data sources. For right now, this pipeline, we join about 12 Kafka topics like the products, CVSS score, CWE mapping, CPE matches, EPSS history, remediations that we've suggested. All of this data moves at a different pace in Kafka. So we join them, produce a single enriched vulnerability record per CVE that's always the latest, and it's real-time.

0:09:24.7 Joseph Morais: Okay.

0:09:25.7 Brandon Brown: And another reason that happens is the source database. So he's doing that all with Debezium. So the database is a MySQL database, has a Debezium connector on it. And so there's actually a product we have, CVE Details, that serves that data, but it runs queries joining that data together, which is great for that API, but when other teams want to use it, then they need to know all of the joins. So the pipelines that Manan is building is actually creating the true, for lack of a better word, big dumb table view of data, which I'm a big fan of. Big dumb tables, BDTs all the way.

0:10:02.0 Joseph Morais: That is definitely the first time I've heard that for sure. I may or may not ever use that initialism again, but we'll see. It is 2026. All right. So to unpack all of that, this is really exciting because very rarely do folks start with data contracts or schema registry. A lot of times you have to drag them kicking and screaming. Screaming: I didn't mean streaming, but it works. Kicking and screaming in order to enable data contracts in the first place or schemas. So kudos to you for starting that way. Mike Agnech, our GM, would love... He probably knows all about this story. It's probably one of his favorite stories. He is Mr. Governance here at Confluent.

0:10:44.6 Joseph Morais: So one of your first use cases was really just to kind of take things that don't talk Kafka and use connectors in order to have all your data land in kind of a single data substrate, single fabric. And then that, in addition to all the wonderful things you could do with Flink by merging multiple sources of truth to ultimately create a data product that is usable for your customers, you also now don't have to directly interact with those other systems, the databases. If a consumer or producer wants that data, they can just come and get it off of a topic. Is that right?

0:11:17.3 Brandon Brown: Yeah. And so we actually, we created a vendor detection module a couple of years ago where the idea was we could take website scans and we could say, "Okay, here's the outbound HTTP request and here's the detected libraries on that page." And so we can make those third-party associations automatically. And the problem that we had was that data was all in a big giant topic that was JSON with 10 different message types, and we only cared about three of those message types. So we schematized that and we wrote a Scala service that actually fanned it out and produced it into individual topics. And so then we were able to consume just those three topics. We can make a processor per topic. It was testable, it was easy to run through.

0:12:06.7 Brandon Brown: The original kind of S3-based read from the big giant single topic would take weeks to process a day of messages because it had to filter out 99% of the data. We found that by splitting it and then schematizing it and then being able to shrink it down because it's Protobuf, we were able to actually process the data faster. We could react to it and we could actually basically make a near real-time data product with it.

0:12:34.7 Joseph Morais: So you said you went from weeks of processing to almost real-time?

0:12:38.4 Brandon Brown: So it was like a weekend to process one day. And that was if the Scala service didn't die while it was running. So we were like, this is not gonna be something that works in production. So we quickly pivoted, made that schema fan out. And I had always worked in places where we used just JSON in topics, and I saw where breaking the contract of the JSON just screwed things up royally. So when I came into Security Scorecard, I was like, I'm not managing Kafka, because I've semi-done that at every other job and it was a nightmare. So not doing that.

0:13:18.5 Brandon Brown: We're gonna use schemas. And so Protobuf was the only format, and still is the only format that lets you use a dumb definition language that anyone can understand. You could speak data, you could generate clients, and whether we had a Schema Registry or not, we could do it. So originally we just serialized the bytes to the topic and we didn't have a Schema Registry. And we're like, if we at the time are on MSK, if we're stuck with using MSK, I don't have to worry about spinning up a Schema Registry. Like we still can use schema formats. And so that was where Protobuf came into place with that. And then being able to use Schema Registry obviously made life a lot easier and let us track things as we evolved it too.

0:14:02.0 Joseph Morais: You started on this, but I just want to come back to it because you said that you started very batch-heavy, right? So I know that obviously data streaming wasn't the start.

0:14:07.6 Brandon Brown: Yeah.

0:14:10.2 Joseph Morais: So what was the underlying data or technology challenge that led you to adopt data streaming? Was there a breaking point or something that said we absolutely have to now implement these connectors and we gotta start going down the Kafka route?

0:14:21.6 Brandon Brown: I would say it was this mindset of anytime you wanted to get a copy of data, you had to roll your own or go steal someone's time to do it. And when I came in, I was like, there's an easier way. You don't need to do this. And snapshots of databases are really great for point-in-time analysis if you want to see, because this is actually something we've had happen. Someone accidentally deletes items in their portfolio, someone else at the company deletes it. If you have a snapshot, the only thing you're able to know is these companies were in that portfolio yesterday and they're not today. If that happens multiple times a day, you have no idea of really knowing what do you really need to add back.

0:15:11.1 Brandon Brown: Whereas when you have a change stream, it's much easier to know, okay, here's when it changed. You could probably then tie back to who did the change. So it just unlocked a lot more visibility. And so that was honestly the way I sold it was, yeah, it's a little bit complicated when you want to save a copy of a piece of data. Now you can make a connector and it's documented. Here's a little piece of Ansible or Terraform you write. Oh, now I have that data. You don't have to go and say, "Hey, can you make a new snapshot job for me?" You just plug right in.

0:15:48.6 Joseph Morais: Yeah. You know, it's interesting that you called that out because sometimes you need the table, you need the materialization, sometimes you need the stream. But I think the really attractive part, especially if the use case matches, is the stream. You can always go stream to the view or the table, you can't always go in the other direction. Right. So I think that's why making the case for the stream is so important, especially in a world that is so transactional.

0:16:06.0 Brandon Brown: Yeah.

0:16:11.5 Joseph Morais: I feel like almost everything we do has some type of start or end time, everything's a transaction. So, why not treat our data in the way that the world has been formulated? So I'm curious, and again, this is to both of you, how did you think about addressing modernization? So I know the ultimate goal was to have a single source of truth to generate data products. And I know some of the pieces where you started batch, you kind of went through this journey of how can I make things more available. But what led you? I think there was some experience with Kafka. Was there any other consideration of maybe other technologies or was there any pushback that said maybe this isn't the way we should go that ultimately got you to say, after all these conversations, we're implementing data streaming?

0:17:01.1 Brandon Brown: It was really about selling it in bite-sized chunks. So I'm a big believer of if you tell someone the right thing to do, they might know it is, but they won't believe you. But if you just do it and you show them the value, then suddenly they get bought in. So if you do that enough, then when you say it for the hundredth time, they're like, "Oh, yeah, I see these things were successful because streaming was used." So in my case, it was showing how we had a service that would make these connections real-time, would store them in a Postgres database.

0:17:40.7 Brandon Brown: To actually make it useful, we have a concept in Scorecard of a portfolio. So it's a collection of companies that you monitor. In V1 of that, someone would say, "Hey, here's the portfolio. I want to see their vendor detection results for." We would have to go call an API, get back the list of domains in that portfolio. Then we would have to go through on our end, generate the DVD for each one of those. So one of the things I said was, "Well, hey, if we had a copy of portfolios, guess what? I can now test my queries on my own. I don't need to worry about another live API being up. And I can kind of gain this speed of development because I own a copy of all the data I need."

0:18:26.0 Brandon Brown: And so I was able to show that a few times and people are like, "Oh, yeah, they were able to actually fix bugs on something quicker because they had the data they needed." And so when Manan's team came on, they kind of wholesale bought in on the like, "This is the way to do it."

0:18:42.2 Joseph Morais: Yeah.

0:18:42.9 Brandon Brown: And so there was no kind of pushback, really. The only thing pushback-wise really was Spark is heavily used at our company. And so there was the question of like, "Okay, well, if we want to join data, how do we want... Do we want to use a data warehousing solution? Do we want to stay with Spark? Do we want to do Spark streaming?" And I was kind of like, "Well, let's go with Flink because Flink is Java and you can kind of widen the talent pool there. It's invested more stream-first. It's the same set of APIs." And so then I basically said that to Manan on his first day and said, "Hey, I know this is what I want, how I think we should do it, and I've got your back. I'll back up whatever you say. You go do it." And so then he kind of ran and got to build up that. And then I just said, he was good.

0:19:28.5 Manan Monga: Yeah. To add to that, just like Brandon said, I ended up just building out some of our initial Flink pipelines and some of the considerations we had, like the CVE thing. A lot of the data is run differently. The CVE entries come from one source, CVE scores come from another source, EPSS scores update daily. If you're doing batches, you're constantly joining all these datasets. With the Flink pipeline, we use a RocksDB state back-end. And what I'm able to do is any new piece of data that comes in that matches that key, it's an incremental merge, update the field, and it's the most complete record that we have in one place.

0:20:03.6 Joseph Morais: All right. So there's two big things there I want to talk about. The first is those pain points. So it sounds like you were at a point where repeatedly the friction that the organization had came around the timeliness or the accessibility to data. So once you kind of said, "Well, look at all these things we could do if we had the record and we could easily access it," then suddenly data streaming became a no-brainer. But then there's this other thing, this... Where do we process?

0:20:29.2 Joseph Morais: Whether it's through a consumer-producer app, Spark, Flink, KStreams, there's a lot of options on where you can process data. So for the audience, there's something we talk here at Confluent quite a bit about calling it shift left. So if you think of your operational data, the data that lives in Kafka over here, that is feeding an analytical state over here, our idea is that some of the processing that you're doing here at the end should be shifted left back to your operational state. And you guys kind of went through that same question, like, where do we do the processing? And I think the easy guideline is if I can process upstream in my operational estate, in my data streaming platform, and I can reuse that data operationally, if things that are inherently transacting on that operational state can use that data product, then it's pragmatic to do the processing there.

0:21:18.0 Joseph Morais: And not to say that it's kind of a Flink or a Spark type of debate. I think it makes sense to me, at least in my mind, you do some processing up here, you do other processing downstream, and everyone kind of has their own place.

0:21:29.6 Brandon Brown: Yeah.

0:21:29.9 Joseph Morais: Did I summarize that well?

0:21:32.0 Brandon Brown: I think you did. So one of the things that I experienced a lot was our team is heavily reliant on other teams' data. And so one of the things that we had a couple years ago was there was that concept I said of a portfolio. There was another concept of a company following another company.

0:21:49.6 Joseph Morais: Yes.

0:21:50.1 Brandon Brown: And so we had data snapshot in S3. We could look at what the queries were to recreate that from snapshots. And that's great. That works. We're still isolated, right? The upstream platform team changes the query logic. Guess what? I don't have that anymore. So now all of my results are invalidated, they don't agree. And so I think with shift left, the big thing for me is if you have teams that are going to effectively duplicate the logic for the data to be useful as a data product, shift it left and reduce that mental overhead, that mental burden. And it will be as long as you have the contract, the downstream consumer really shouldn't care about the query that made the data. They just care about the shape of it. So make that easy.

0:22:36.9 Joseph Morais: So next, we're going to dive into how your partnership with Confluent solved your data challenges, or at least made them easier. But first, a quick word from our sponsor.

0:22:50.6 Speaker 4: Your data shouldn't be a problem to manage. It should be your superpower. The Confluent data streaming platform transforms organizations with trustworthy real-time data that seamlessly spans your entire environment and powers innovation across every use case. Create smarter, deploy faster, and maximize efficiency with a true data streaming platform from the pioneers in data streaming.

0:23:21.5 Joseph Morais: Now we'll go beyond the stream on why Confluent is the right fit. Again, to both of you gentlemen, with our partnership, how did your teams tackle modernizing the architecture and the data to create the source of truth? So what I'm asking really is tell me what it looked like when you first implemented data streaming.

0:23:39.5 Brandon Brown: So when I came in four and a half years ago, we had an MSK cluster.

0:23:45.0 Joseph Morais: Okay, so just a little background. I was the AWS partner solutions engineer here at Confluent, so I know MSK very well. Go on.

0:23:53.7 Brandon Brown: So my first task was real simple upgrade 2.5 to 2.7. And to do that was a Rube Goldberg machine of, well, I need to go get an SRE to actually do the update in the AWS console because I don't have permissions, then I need to put that into Terraform. Okay, so now three days later, I've sufficiently actually upgraded to a new patch version of Kafka. So that already I was like, that was painful. So then the next thing was we were developing a couple different streaming-based products and we tried to do some load, simulate some load tests, and what we found was we could not get the throughput that we wanted in MSK without us having to really over-provision.

0:24:46.4 Brandon Brown: And even then, we weren't getting the throughput we wanted. Another employee that was here at the time in our threat intelligence group, they had their own on-premise Confluent Platform. And they were like, "Hey, let me hook you guys up with Confluent. You could talk to them." And you guys had just started doing Cloud and PayGo. So what we did was we spun up a PayGo account, we shifted the workload over to Confluent Cloud, and then we basically let it run for a month.

0:25:13.2 Brandon Brown: And I was able to look at the cost and then say, "Okay, here's the cost. Let's extrapolate because I know we're gonna go dedicated clusters for private networking. This is what roughly it's gonna cost us on Confluent for the year." And this is what it would cost us on MSK. And MSK, while cheaper, there was the human piece that wasn't accounted for.

0:25:35.3 Joseph Morais: Right.

0:25:35.6 Brandon Brown: And I was like, "Look, I didn't have to worry about configuring the cluster." I was like, "Hey, I need this cluster," and then I just unleashed the Kraken of data at it and it worked. I didn't have to figure out... The other big thing for me with MSK was I wanted to have, because it was ZooKeeper at the time, I wanted to have multiple brokers in the same availability zone.

0:25:59.2 Joseph Morais: Right.

0:25:59.7 Brandon Brown: And they were like, "Well, no. If you want more brokers, they have to be spread across availability zones." So I was like, "Wait a minute. So I need to make more availability zones that I don't need?"

0:26:11.2 Joseph Morais: Right.

0:26:11.6 Brandon Brown: And it was so funny. I remember talking to the AWS rep and they're like, "Well, why would you do that?" I'm like, "Because it's dev and QA. I don't care about resiliency."

0:26:21.4 Joseph Morais: I don't wanna pay for multi-AZ for that.

0:26:23.3 Brandon Brown: Yeah. I'm like, "Let them topple over. It's cool." Actually, it's more ideal if they topple over because then I know how it works. So that was really how we adopted Confluent. And then the other piece was you guys had just started doing a Terraform provider and we're very heavily Terraform-focused. We actually beta tested the Terraform provider and used that, and we're still using it today. And it actually let us... We had a threat intel account and we had a data platform account. We used your importer to import all of the resources from the threat intel org, Terraform state around, and then literally I just did an apply and spun up in the org that I wanted and we killed off the double org. So for actually three years, we were double paying Confluent, which was great for you guys.

0:27:06.9 Joseph Morais: Yeah. Thanks.

0:27:07.1 Brandon Brown: But it was not great for me. What we were able to do was really consolidate things and then also use the billing API to track and say, "Okay, here's who is using what resources." So we can actually generate billing reports where I can say Manan's team, this much of our Confluent spend is his team, and this much is my team and this other team. So now when we're estimating out developing new data products, we can actually say, "This is how much it'll cost infrastructure-wise between AWS and Confluent." We can actually budget more accurately because we get that visibility.

0:27:43.9 Joseph Morais: That was an epic journey, to be honest with you. So again, starting with MSK, and I don't like to bully our AWS friends, but you identified something very important. There's a lot of pain points with MSK. I used to do a presentation that would show... I would reference a blog post from AWS about MSK, and it showed all the different open source tools you need to actually change partition size and rebalance. Or it had this really impressive calculator, a spreadsheet that you needed to size your MSK. And the thing is, it's a basic-ish product, and that's why they can sell it for a certain cost.

0:28:24.7 Joseph Morais: But I think about this all the time. You're gonna pay for it. If it has less features, if it's less automated, you're gonna pay for that in human hours and human time and in calls. And I'm glad that you were able to see the value of a little bit more upfront, but what that ultimately costs from a total cost of ownership standpoint. And even that is we can actually compete just on price with things like enterprise clusters. And you guys were in well before any of that.

0:28:50.0 Brandon Brown: So did you ever watch the X-Files?

0:28:52.7 Joseph Morais: Yes. Love the X-Files.

0:28:53.9 Brandon Brown: Is okay. So you know how the X-Files, the end of every season was they're gonna close down the X-Files? My time at Scorecard with Confluent has been like the X-Files. Every year renewal comes around and it's, "Hey, can we ditch Confluent?" And it's funny because for multiple years, I've done the shootout with MSK, and last year we did the shootout with MSK and Red Panda. What we found each time was that human cost savings by letting us be able to focus on, to Manan's point, just the data, not worrying about actually how the cluster works and all of that. Not having to worry about the deployment of Schema Registry and Connect and all that, just having those things work pays dividends. And so it's been really easy actually to make the case for why we should keep it each time.

0:29:40.3 Joseph Morais: That makes me extremely happy. Again, I come from an ops background, so not getting woken up or not having to do things that I can give to a vendor who may do a better job than me, I'm all for. But for any detractors out there that may be watching this episode that go, "These guys really drink the Confluent Kool-Aid," you just heard them. They bake us off every year. I'm trying to get rid of them for four years, they still don't. So there's some testament right there.

0:30:03.8 Brandon Brown: So here was the fun one with Red Panda. When we looked at it, we were like, "Oh, they're gonna save us so much money." Then you got to support, and suddenly the dollars started ticking up. But the big thing in each of these cases I've been able to make is we have a well-defined process for service accounts, for creating connectors, for creating topics.

0:30:30.0 Brandon Brown: And so Manan, when he started eight months ago, I was able to point him to a repo and say, "Hey, you want to edit this Debezium connector? This is where you do it." And he did it that afternoon. Like there was no training, there was no like... It just worked the first time. And so making the argument of if we have to switch to another vendor, I was like, "We have to..." Red Panda didn't have a Terraform provider, or rather it did, but it didn't manage all of the things. MSK, we would again... It was like we'd have to write all this glue code that we didn't have already. And so you start making those kind of justifications like...

0:31:12.9 Manan Monga: Yeah. And nine months ago when I joined, I was coming from a company that had self-managed Kafka.

0:31:17.6 Joseph Morais: Right.

0:31:18.2 Manan Monga: And on my second or third day, Brandon showed me what we had here, and then I was able to go to Confluent Cloud, look at all the Protobufs, look at all the Avro schemas. I think I deployed my first Flink pipeline in a week and a half of being here without having to spend a lot of time talking to other teams on what data they're sending because it was already in the Schema Registry. I knew what to do with it.

0:31:40.6 Joseph Morais: Yeah. I think the GUI is underappreciated sometimes, especially for somebody who's coming in new, to be able to look at a data portal or look at Schema Registry and see what already exists and talk that language of data. I love the way you said that. Manan, I have a question for you. So my question, the one that's in front of me is, "What made Security Scorecard choose to work with Confluent?" I think we already kind of sussed that out. But Manan, for you being new to the organization, how has your experience been with utilizing whether it's Confluent tech, some of our features, or maybe some of our support? Have you had a chance to work with any of our SEs? I'm just curious how you're feeling kind of new into the organization around Confluent, and hopefully we've made your journey a little bit easier.

0:32:22.9 Manan Monga: Yeah, for sure. I've actually used support to a good extent too because I was having issues with seeing consumer lag when I first started out with Flink pipelines, and they were able to explain the reason why I'm not seeing it on the UI and what I need to do in New Relic to be able to work around it. It's been a great experience. And as I said, I used to work at a place for four years where we had self-managed Kafka and we were doing huge data volumes. So rebalancing partitions, making sure brokers are not going down, all that stuff was a huge pain. And not having to deal with it, just being able to experiment, especially building the new platform, having to experiment with different retention levels, compaction strategies... We can just do it on Terraform and it's so much easier. It's great DevEx.

0:33:07.6 Joseph Morais: That's great. I mean, really, what do you want? You want a bootstrap server, you want data to go in, you want data to go out. Who cares about what's happening underneath? And it sounds like we're providing that for you. That makes me very happy. Now to both of you, tell me, what will the final impact be of your data modernization efforts? I know that this is still somewhat in progress. So what do things look like in a year or two?

0:33:26.1 Brandon Brown: Yeah. So I like to joke that my nickname is Mr. Data Governance, which is I just tell people what they can and can't do with data. I think the future for us is true data governance and discovery. So for example, we are investing in DataHub as our catalog tool. And we're investing in that as our catalog tool because we have a combination of databases, batch jobs, streaming jobs. It's a good tool to bring all those in. But then Flink for us is about being able to do modern kind of data processing that is self-contained. For example, we can write a pipeline. It has integration tests that spin up all the necessary containerized infrastructure to test the pipeline end-to-end.

0:34:11.6 Brandon Brown: And so if there's an issue in the data, you can reason about it in the code. It's easy to reproduce. Really, the data modernization is about easy to reproduce data and debugging. And it's not necessarily like us being truly real-time all the time. It's about us being able to choose what the right tool for the right job is and data being a first-class citizen in every tool that we choose.

0:34:38.1 Joseph Morais: I think that's a great approach. Now, Brandon, I think this is another question for you just because it has more of a historical leaning. How did you... You had another team that had Confluent Platform and you got to see that, and you're like, "Hey, this is much better than what we're running today with MSK." Was there any friction getting leadership or your team to migrate off of MSK and onto Confluent Cloud, or was that kind of a slam dunk? And if there was any friction, was there any advice you could give to people that are in a similar situation?

0:35:09.1 Brandon Brown: So probably the biggest friction we had was, "Why are we going to pay this other vendor to do a thing that AWS offers?"

0:35:17.5 Joseph Morais: Yeah. That's usually the biggest challenge.

0:35:20.4 Brandon Brown: I was in the lucky situation, it was our team of four people were the only ones really using Kafka. So the friction of changing was really not hard. It was more that kind of figuring out how to connect all the pieces together and what we want that flow to be like. That was kind of the harder part. And then justifying to the organization why we should pay another vendor for something, and then on top of that, kind of getting our internal resources to spin everything up. We're a security company, so we have very tight networking restrictions. So a lot of our pain point friction-wise was in spinning up our clusters and working within the networking so things could talk. Not Confluent's problem. So that was really kind of all we hit there, I would say.

0:36:08.9 Joseph Morais: Yeah, I appreciate that. It usually is. It's like, "Hey, we're already paying for this. Is this other thing that much better?" And I think with the proper conversation and the impacts and a little bit of time modeling, more often than not, it does end up being the value.

0:36:24.4 Brandon Brown: I would say one of the other really big wins, and Manan kind of spoke to this, is the support. So, AWS is very good at support. You got a problem with EC2 or S3...

0:36:35.1 Joseph Morais: They're amazing.

0:36:35.3 Brandon Brown: They get someone like right away. It's great. We had problems with MSK, and it took us a long time to get the support person to help. And I know it's better now, but things like that really kind of make the difference. And so for example, all the issues with monitoring consumer lag, the fact that Manan was able to raise that and engage in a conversation super quickly and get the guidance, that meant that his product development wasn't delayed. He lost a couple hours. That's okay. Everyone loses a couple hours. It wasn't days. So I think that was the other really big win was the support side.

0:37:13.2 Joseph Morais: That's excellent. I mean, like you said, AWS does a great job of supporting their core services, but when you have 200 plus services, it's impossible for everyone to be an expert across everything. And the way their support is, it's usually in portfolio. So analytics is a very big portfolio. It's very hard to know all of those products. When you're a company like ours and we have much fewer technologies, it's much easier to get that high-quality advice when you need it.

0:37:36.3 Joseph Morais: And hopefully you never do, but you really hit your head on something. Any of these companies you mentioned, AWS, could be Red Panda, they'll all offer you support and they'll all have different costs for those support. But at the end of the day, when I need the support, how fast am I going to get it and how quality is it going to be is a gigantic game changer. And those are the type of things you don't want to sacrifice when you have an outage. So this is for both of you. Is there any advice or lessons you could share for leaders like yourselves that are just starting to tackle data streaming now?

0:38:05.9 Manan Monga: I would say I think it's important that the compaction strategies and retention strategies is something we really need to make sure we get right. I've been in positions where we are like, "This is the data we want, this is when we want to process it," and then we lose it because we just set a time-based retention instead of key-based retention. So make sure you plan out what you want to do with that data, how you want to do it, and when you want to do it. Again, data is a first-class citizen. You have to make sure you plan that out when you get into streaming. Second thing is I really have to say checkpointing and state management with RocksDB and Flink has been a game changer for us. I've been doing checkpoints in S3. It's easy to go back to where we want to go. We don't have to maintain offsets. That's also been really helpful.

0:38:47.5 Joseph Morais: I'll be sure to pass that over to the team. They'll appreciate it.

0:38:51.4 Brandon Brown: I would say for me, friends don't let friends ignore schemas. Schemas all the way down. I would also say to Manan's point, message keys are super important. We actually have a team here who is not doing message keys and it's very painful because you get very imbalanced partitions. So that key strategy is important to think about even if you pick the wrong key at first. Having a key is really key.

0:39:21.7 Joseph Morais: It sure is.

0:39:22.9 Brandon Brown: It sure is. Just came out and I was like, "Damn it."

0:39:27.4 Joseph Morais: I like it. English is a pain in the butt. Key can mean so many things. You're doing great.

0:39:31.4 Brandon Brown: The other thing I would say is that a lot of times when people think about streaming, they think about real-time, instantaneous. Something that I really champion in addition to data as a first-class citizen is what are the time frames you want to deal with it? For example, if you say that your pipeline needs to finish processing data in 15 minutes, your fastest customer that might see that data is, we'll say, 16 minutes. So if there's a bug, you probably have about half that time, so seven and a half minutes, that you can fix it before your fastest customer is gonna notice a problem. I don't know about you, but I can't fix a bug in seven and a half minutes.

0:40:11.6 Joseph Morais: No way.

0:40:12.0 Brandon Brown: Even with AI, but you can't. So really think about what your timeliness is, because if you design your pipelines around timeliness guarantees, you are baking in your support and triage and you're making your life a lot easier down the line. And that's actually something a lot of teams ignore. They're just like, "Hey just shove it through the pipe super fast and worry about it later." But if you say those timeliness guarantees, you can figure out what New Relic, Datadog, what have you, monitoring tools to put in place to catch when there's problems. You can stress test against them. It gives you a whole set of acceptance criteria that really makes your pipelines more resilient in the long run.

0:40:56.4 Joseph Morais: Well, you heard it from Brandon. Make sure you understand your time horizons before you set your SLAs. Otherwise, that will bite you in the butt.

0:41:03.8 Brandon Brown: Exactly.

0:41:05.1 Joseph Morais: So, again, for both of you, we're right here at the last question of this segment. What is your vision for data streaming at Security Scorecard? So this is pie-in-the-sky type things. What do you think the future possibilities could hold for your organization?

0:41:20.0 Brandon Brown: I open-sourced a project to support Flink Proto in the Table API because it's only available in Flink in Confluent, not in open source. And so I found a project and now we depend on that, which is exciting because longevity, someone always be like, "Who is that B-Brown-Sound?" But I really think that all of our pipelines in the future will be SQL statements and they will be fully testable on a developer's machine. They will not have to hunt for me or Manan to understand data. The data will be all self-encompassed in the pipeline, so discoverability will be a first-class citizen as well.

0:42:02.2 Joseph Morais: Right. Liberate the data, empower the users. I love it.

0:42:05.1 Brandon Brown: Exactly.

0:42:05.8 Joseph Morais: How about you, Manan?

0:42:06.8 Manan Monga: Yeah. Brandon said his name is Mr. Governance in our company. My name is Flink Guy. So I think we're gonna move towards a lot more real-time pipelines. Make sure all of our data, as we scrape it, as we enrich it, it gets processed and normalized and put into our databases real-time using Flink.

0:42:27.8 Joseph Morais: I love it. Use schemas, stream process when you can. What a way to wrap up this segment. Before we let you go, we're gonna do a lightning round. Byte-sized questions, byte-sized answers. That's right, that's B-Y-T-E. Think of them as hot takes but schema-backed and serialized. Are you both ready?

0:42:52.0 Brandon Brown: Yeah.

0:42:53.2 Joseph Morais: What is something you hate about IT?

0:42:55.0 Brandon Brown: You always have to turn it on and off again.

0:42:58.7 Joseph Morais: How about you, Manan?

0:43:00.9 Manan Monga: Permissions.

0:43:02.5 Joseph Morais: Yes. They're always getting in the way of things. That's a great thing to come from Security Scorecard. I appreciate that. What is your hot take on the future of AI?

0:43:12.0 Manan Monga: I think my hot take is we're gonna start moving towards a lot more... Developers have always had this stigma around them that we're not very good at talking to people. Now we're gonna be good at them because we're talking to AI all the time. You get a lot of practice.

0:43:24.9 Joseph Morais: All the introversion goes away from talking to Claude all day. I like that.

0:43:29.9 Brandon Brown: I would say that writing requirements will no longer be a pain point for developers because you have to write requirements to actually get good usage of an LLM.

0:43:39.7 Joseph Morais: I like it. Yeah, that's a good one. I like when the new tech enforces old processes that people should have been doing anyway. What is a non-tech activity or hobby that's impacted the way either of you think about data?

0:43:53.2 Brandon Brown: There's a person behind me because for my 40th birthday, I did Phish at Madison Square Garden. So being a Deadhead has totally impacted me as a data engineer.

0:44:03.0 Manan Monga: For me, it's probably... I love cars. So they come out with a new one every year, the theme changing. So it's streaming. You gotta keep updating your brain with how much horsepower each one has, what engine they have, how fast they can go to 60.

0:44:15.6 Joseph Morais: Yeah. How fast your consumer and producers are. You know, we like to think of Kafka as a data highway, so you're right there. Where are either of you getting outside inspiration from? Is it from a book or a thought leader, maybe a podcast?

0:44:26.5 Brandon Brown: A podcast, "What Went Wrong" is a huge one for me because it talks about movies. But Michael Drogalis's blog posts and Gunnar Morling, both of them.

0:44:35.3 Joseph Morais: Oh, I'll share this episode with both of them. I know MD very well. Now for you both, any final thoughts or anything to plug?

0:44:42.6 Manan Monga: Platform we do at Security Scorecard, it's called Titan.

0:44:45.9 Joseph Morais: Check out Security Scorecard. Love that.

0:44:48.6 Brandon Brown: I'm gonna use my same one that I always use, which is use Confluent.

0:44:53.4 Joseph Morais: Oh, I did not pay him to say that, by the way. I did not Venmo him $30 two seconds ago. I swear I didn't do that. But I appreciate it, and I'm gonna take that as earnestly as I possibly can. Brandon and Manan, thank you so much for joining me today. And for you watching or listening at home, stick around because after this, I'm giving you my top three takeaways in two minutes. That was an incredible conversation with Brandon and Manan. Now let's start talking about those takeaways. So the first one I have to call out from Brandon and Manan is how they ultimately use Kafka as a single source of truth.

0:45:32.5 Joseph Morais: They started with source connectors and then eventually started using sink connectors and then realized, "Hey, this is where we're gonna go to for everything. Everything's gonna go through Kafka." And it unlocked something that was really impressive to me. They said that it took them previously, before data streaming, a weekend's worth of processing to generate one of these vendor reports. But they got it down to near real-time. So a weekend, what, is a minimum of 48, 72 hours? To go from 48 to 72 hours to get the same thing and now you're down to, I have it all the time, just an incredible outcome.

0:46:02.1 Joseph Morais: I mean, if there's no other reason to implement data streaming, that would be one. Now, as Brandon was talking about how to get buy-in, whether it's from the team or from leaders, into something new like adopting data streaming or, in this case, Confluent Cloud, he said sell it in bite-sized chunks. Give them just a little bit. Don't make it complicated. Give them quick wins. But you can't just tell them about those bite-sized chunks. You have to show them, because if you tell them, they may not believe it. Setting realistic, very small use cases or improvements and then showing that value, that is definitely a way to win people to your side in any type of technological debate or architectural change.

0:46:40.1 Joseph Morais: And in my final takeaway, I love this, and they said it more than once: Protobuf and SQL statements allow you to speak data. Protobuf schemas and SQL statements are just simple enough that anyone can take a look at them and really understand what your data is, what is the shape of it.

0:46:55.0 Joseph Morais: And it allows you to treat data as a first-class citizen for discussions. I think that's really important because, at the end of the day, a data streaming platform starts with data. So being able to transact, understand, and enforce data is really important. And I can't not call out that really punchy line, "Friends do not let friends not use schemas." Thanks again to Brandon and Manan for joining us, and thanks to you for tuning in. As always, we're brought to you by Confluent.

0:47:23.0 Joseph Morais: The Confluent data streaming platform is the data advantage every organization needs to innovate today and win tomorrow. Your unified platform to stream, connect, process, and govern your data starts at confluent.io. If you'd like to connect, find me on LinkedIn. Tell a friend or coworker about us and subscribe to the show so you never miss an episode. And just before we go, if you want to learn about some incredible data streaming use cases, I cannot recommend the second edition of the Ultimate Data Streaming Guide enough. My buddy and coworker, Kai Waehner, went through and categorized some of the best and most interesting use cases in the data streaming world. Check it out. We'll also put a link to it in the show notes. As always, happy streaming, and we'll see you next time.