podcasts Market Intelligence /marketintelligence/en/news-insights/podcasts/451-research-episode-59 content esgSubNav
In This List
Podcast

Next in Tech | Episode 59: Data Security Evolution

Case Study

A Professional Services Firm Leverages AI to Quickly Uncover Actionable Insights from SEC Filings

Case Study

Powering the markets of the future with data and AI

Logistics sector prioritizes digital transformation, but needs technology leadership, skills

Podcast

Next in Tech | Episode 101 Data on Datacenters

Listen: Next in Tech | Episode 59: Data Security Evolution

While we’ve talked about the uses of data, there are complex issues around securing it, especially with more remote work. Security analyst Justin Lam joins host Eric Hanselman to look into changing approaches that can address the fluidity of data in modern applications. The scale of data volume growth is straining traditional approaches. Organizations are wrestling with declining storage costs that tempt them to retain everything, but a better path might be to learn how to let data go.

Conflict in Ukraine
Read Blog
Subscribe to Next in Tech
Subscribe

Transcript provided by Kensho.

Eric Hanselman

Welcome to Next In Tech, an S&P Global Market Intelligence podcast where the world of emerging tech lives. I'm your host, Eric Hanselman, Principal Research Analyst for the 451 Research arm of S&P Global Market Intelligence. And today, we'll be discussing the evolution of data security with analyst Justin Lam. Justin, welcome to the podcast.

Justin Lam

Hey, thanks for having me on.

 Eric Hanselman

Great to have you on, and especially in a topic that I think we touched on in a number of different ways and different areas, but have never actually had a specific conversation about data security aspects. We've talked about some of the privacy and governance pieces with Paige Bartley. We've talked about sort of infrastructure security, data uses, but haven't really gotten down into the specifics about data security. So First off, for audience, can you maybe scope what that is? And what really is that broader focus on data security.

 Justin Lam

Yes, sure. And I think the idea with data security is that it's really about security of the underlying data. It's understanding without the perimeter controls that I have, what are the actual protections that I apply to data itself in terms of preserving its confidentiality, integrity and availability. Think of it as a nuanced discipline that you're applying to that data.

 Eric Hanselman

Well, so when we think about that discipline, where is -- you talked about the loss of perimeter. And I guess, actually, for some of our audience members, there's the idea historically that we've dealt with data inside of what sort of metaphorically is a walled environment.

 You had the physical office. You had, from a security perspective, you have the network perimeter that sort of define the inside and the outside. But that's something that, as we've gone towards remote work, although I'll make the case that probably we've been dealing with issues of what gets referred to as deep perimeterization for a really long time, what does that mean for data security? And where is it headed?

 Justin Lam

Yes, it's a great question. And I think the idea here is that data, first of all, because of that deeper perimeterization, it's just becoming far more dynamic. And not just because of the work from home, but just simply from the amount of it that different parts of the organization are actually leveraging.

 It conventionally has been the case in many, many industries or many verticals and sectors that there was a centralized IT department that was going to run databases and do basically a certain set of tasks for, say, processing transactions or processing inventory.

 But now what you're seeing, of course, is in enterprises that so many more different consumers of data are out there, like there is data being enriched along each step of a process in a conventional organization. So that shift of just everyone using more data, everyone generating more data, everyone consuming more data as well as this move to the cloud in order to be able to share and collaborate with that data, in order to be able to have environments that scale to that data, you're definitely seeing that trend as well in addition to the work from home.

 Eric Hanselman

Well, and that's a piece that I think is an important point, which is that we used to live in environments from a technology perspective where data was all really concentrated. You had the great, big, huge, massive centralized databases, and that really was where everything lived.

 Now data lives in a lot of different places. And as you're saying, it's generated in a lot of different places and gets used in a lot of different places. And we've had to really shift the focus to where we're managing that data because where we're using it, and of course, how we secure it.

 So there's an awful lot in terms of thinking about how that actually gets leveraged and used. And in a lot of cases, that's trying to actually shift some of that security focus out to point of use for doing things like actually ensuring that a lot of those folks that are building the applications, that are working with the data have know enough to be able to secure it effectively.

 Justin Lam

Yes. Yes, absolutely, absolutely. And you're seeing the change in the progression in the way those tools are applied because of these shifts. A lot of the original systems that are out there, centralized databases were not. They were never really designed for securing the underlying data content, primarily for the reason that you just described that they were behind other perimeter controls or they were segmented off somehow some way.

 And so the first generation of data security solutions out there were almost sort of retroactive controls. They were applied after the fact, after the database had been defined. So -- or after customers had understood or enterprises understood what the risks were.

 So for example, in database design, if I were to organize a database, say groups buy everybody's TextPad ID or social security number, would have been 20 years ago, 30 years ago, a unique value for everyone to work off of. But of course, nowadays, that's privately identifiable information, right? That's personally identifiable information. And of course, that's sort of a no-no.

 But now for me to reorganize my database, that's really, really hard to do. So controls back in the day, they were really trying to fix our problem that the databases have never been designed for. And so now looking forward, you have to look at how our application is going to be developed and what are the new sets of considerations to take into account.

 Eric Hanselman

Alright. And the fact that we're now needing to move that expectation of where those various concerns are going to be out the development, we've got the idea of shifting left for security. And data is, of course, one of the primary examples of what shift left really means.

 Justin Lam

Yes, absolutely, absolutely. And so again, in the old world, you would have security controls that were applied separately by security teams with application owners, and those 2 parties really never really collaborated a whole lot, and they really never had a tightly integrated solution.

 Now with this idea of shift left, you're now in the scale of the data growth and the scale of the application growth really, it's best to enable the [ deaths ] to factor in data security as much as they can into the applications themselves. So for every step in the life cycle, where I'm ingesting that data, what I processing that data, what I am analyzing that data, security should be built in, in order to be able to provide a comprehensive set of protections throughout that data life cycle.

 Eric Hanselman

Well, you've hit what is the one key piece in this, which is it's the same problem we come across. There's so many different aspects of technology is scale. And where there wasn't a huge amount, or at least not the vast commodities of data that we have today, you can kind of pitch it over the fence to the IT team and say, "All right, here, wrap all this stuff with whatever security is necessary and off you go."

 But now, I mean, who knows what's important, who knows what's not, who knows what you should keep, who knows what you can trash, who knows what's sensitive, who knows what's not sensitive, and sort of pitching that over the fence to the security team just doesn't scale. And we've got to figure out how we actually manage that push by pushing some of that back out to those -- the actual users of the data themselves.

 Justin Lam

Absolutely, correct. The scale challenge for data security can be answered somewhat with different technologies that are out there, cloud native, those sorts of things. But really, there is a human element to growing the scale. And when we're asking our developers to factor in data security into their solutions, understanding what data am I collecting, understanding what data is being processed and how that's being processed.

 Data security champions have to understand those workflows, and they have to understand the developer experience in order to simplify the adoption to grow the scale of data security. If you simply just simply -- if you simply say to the developers out there, use this, it's this way or the highway and it becomes too onerous.

 You can sort of guess in the history of security, people get around it, right? People will just -- they'll find a way to go around it. And you see that in other symptoms in the form of shadow IT, in the form of people just not necessarily checking in with a centralized rule set that's just far too onerous.

 So I think you really have to consider once that developer experience like [indiscernible] who embrace things, and if you're going to propose something, make it simple, make it easy, make it transparent to the way developers work as much as possible. That's one of the great ways in order to increase the scale and the adoption of data security in the platforms.

 Eric Hanselman

Really build it into the environment. Well, as we're saying, this is -- people are trying to get jobs done, and they're going to take the most efficient path to be able to get that done. And whether or not to call it shadow IT, I'm kind of down on the general idea. Self-initiated technology is one of those things that I like to refer to because, hey, people are trying to get their job done.

 And in the absence of being able to do this any other way that, like nature, it will find a way. People will find a way to figure that stuff out. And that's the thing I think we need to think about in terms of how we put both the capabilities in place as well as, as you're saying, make them easy to work with, make it easy to actually implement and integrate into what that workflow really looks like, how the application works. Build it in so that you can help to simplify that actual end user usability piece.

 Justin Lam

Right. And I think it's also critical with all this data, it's also essential to realize that it's a process, it is an ongoing process. And like so many other initiatives out there, they fail sometimes because the scope isn't well defined. We say we have a scale problem. Yes, that's very, very true. But I think there's a lot of truth in managing the scope for individual phases.

 So the other thing that I've seen and we've seen in our research is, just this idea of being able to break out protections in phased measured ways, right? So that I [ can't ] achieve access, but I can achieve momentum and that also can achieve a certain amount of feedback to understand, okay, what it wrong in this phase rather than trying to boil a particular ocean with a large project or a large initiative.

 By being able to sort of make it more manageable, you sort of get a design win within an organization. And then from there on out, it becomes a much more repetitive, repeatable motion.

 Eric Hanselman

I think that's good advice for any tech project, right? You want -- you don't want to try and land everything in one humongous effort far better to be able to do it in manageable steps, and especially when those manageable steps are going to be touching what is something that gets used so integrally in the digitization process, the data part.

 Justin Lam

For sure, for sure. And I mean I think that swinging this pendulum back a little bit, while it's important to consider the deaths, I think there's also a lot of the lines of security, risk management, governance and all those other adjacent sorts of discipline.

 One of the other things that we've seen in our research has just been this idea of going away from sort of point in time compliance. If compliance is still the driver, and I'm not saying on the program here that compliance is equal security, but...

 Eric Hanselman

I don't know. To your point, I don't know how many times we've been through that.

 Justin Lam

Right.

 Eric Hanselman

What's also with the other data correlation is that causation, compliance should not be security. You got to do the work upfront. And hopefully, compliance is the product of good security.

 Justin Lam

Yes. Yes, correct. Though what I will say, Eric, is that compliance is changing in that it is less about sort of point-in-time controls and it is much more about controls over time and what's the operation like.

 And certainly, with the fluid nature of data, it's not like, "Oh, I've applied this once, and now I just sort of set it and forget it." Hopefully, because the data security is now innately part of the code base, presumably it survives refactoring, it survives being able to exist in the tech stack, not necessarily as a piece of tech debt, but really as an asset that can be continually modified and improved given the different flows of data or given the different data types or given the data in different environments or what have you.

 But by being able to sort of operationalize the development of data security, it's much easier for the risk management person now to understand, okay, what do the controls look like when they're applied over time, right? Because now there's an operational history as opposed to just a standard standalone report on compliance or some other kind of document that's sort of frozen and quickly becomes obsolete.

 Eric Hanselman

That's a really interesting check to position because, yes, I think we've historically thought of data is a thing at a point in time, but that the fact that data has a life cycle. And I guess, is that something where do we need to start moving towards data controls protections that travel with the data or that are more data-centric?

 Justin Lam

That's always a tricky thing to manage simply because you get to the point where you're managing almost 2 repositories or multiple repositories. One is a repository about the repository, if you will. And so I wouldn't necessarily think of it in those terms, but I would think of it in terms of taking the labels or attributes and making them part of the code itself, right?

 In as much as possible automate and have the program functionality, whatever application that you're defining to essentially be the [indiscernible] definer of whatever data classification or whatever attribute classification, you want to apply to those procurable classes of data because if I simply have another, if I try and go like, if I think about like the ISO27k information security standard, that is a fantastic way to logically think of different levels of secret, top secret, confidential restrictive. That's fantastic.

 But as soon as I start maintaining a separate sort of index, it just becomes another thing that I have to update, and that can get really, really problematic, especially when you start having multiple jurisdictions, multiple data sovereignty laws, multiple privacy laws, and now I'm going to have multiple indexes each 1 trying to cater to each one. Whereas if I could just simply look at which particular program code is backing the data, can I let that be my single source of truth. That's something that I would rather advocate for.

 Eric Hanselman

Well, that's a really important observation because to your point, we are in a very dynamic world right now in terms of data security requirements on a global scale. There is such difference between what's needed in various places. One more thing to ensure that we don't overcomplicate how we think about it.

 Justin Lam

Yes, for sure, for sure.

 Eric Hanselman

So in terms of areas that you're looking at for where this is headed, what are some future areas of research that you're looking at?

 Justin Lam

Yes. I mean I think that this idea of developer experience, we're still in the infancy of that. As a vendor-driven industry, we're still making that transition from solutions out there that were in line to data or they were trying to [ intercede within ] applications without necessarily integrating with them.

 And now we're going to the point where we're making the transition as an industry to really truly embrace the developer, to really truly embrace how they're adopting data security and to be able to measure that and to be able to understand what that is and its effectiveness. That's one trend that I think will continue.

 I think the other trend that will continue is simply the modernization of data security. A lot of data security out there is somewhat arcade and it is somewhat still complicated to use. In my recent [indiscernible] report, I covered Open SSL, encryption crypto library, and it's enormously complex.

 There's all kinds of different places that complexity and vulnerabilities can be opened up that are just as bad as not using any encryption at all. And so again, simplifying the developer experience, but also simplifying the underlying solutions that the developers adopt, those are both things that I'll be looking at in future research.

 Eric Hanselman

And will you talk about where control start to get implemented. Is this something where we just start thinking about shifting away from a lot of the preventative focus that we've typically had towards controls that are actually going to look at data with an expectation that it's going to be used and leveraged, and so that we've got the ability to maybe respond to that use to remediate exposure, things of that nature?

 Justin Lam

Yes, absolutely. I mean this is a debate that we have and -- are we at peak prevention? And that's very interesting, very -- another interesting area of topic is, if I were to have a breach or were to have a loss of that data, how do I best respond to that?

 It's difficult without having an attribute associated with a document to expire it or to revoke an encryption key if it's been encrypted or whatnot. But I do think that organizations as they account for this data, they should be looking at the resilience of what happens if I were to lose this and what can I do to either take its place or to remediate or to mitigate its loss? I mean, I think there's a lot -- we're just kind of getting at the forefront here, understanding from a data inventory perspective that resilience in terms of remediation and response.

 Eric Hanselman

That gets back to your early point about the fluidity of data and the fact that we really need to be thinking about the overall life cycle of data. It's got to be that bigger picture piece. Getting away from that, the database and the basement kind of idea.

 Justin Lam

Yes, absolutely. We spend so much time and there's so much other pointers to the organizations that don't delete anything. It is so easy just to say, well, the storage is getting cheaper. It's becoming more commoditized or what have you, let me go add in more and more terabytes and petabytes of data that are out there. How hard can it be to create yet another Google Sheet? The answer is, it's just too easy, right? Am I right?

 Eric Hanselman

We have infinite capacity.

 Justin Lam

For better or worse, Gmail, when it opened and gave us a gigabyte of data for a mail store, for a personal mail store. 15 years later, that has put in a lot of, I won't say bad habits, but just this expectation that we'll never delete anything, that it will always be available at our fingertips. And again, the companies have to understand, am I resilient enough to be able to say, could you buy some of those things and actually take a liability off the books?

 Eric Hanselman

Well, and it's getting back to the issue of how do you deal with scale. And the fact that now that we've got a list data, we much likely being in cloud, you got to figure out how to turn instances off. You've got to figure out when you get rid of data and all of those things that help us to manage scale.

 Justin Lam

Right.

 Eric Hanselman

Fascinating stuff, Justin. Well, we do not have infinite scale over the podcast today, unfortunately. But hopefully, this is the first of a number of discussions about this because there's so much more that we could talk about in terms of really dealing with data and a lot of the data security aspects that are taking place, and confidentiality, and as I said, we touched on some of this with some conversations with Paige Bartley. But there's a lot more. So hopefully, we'll get you back to talk about this in the not-too-distant future.

 Justin Lam

Awesome. Looking forward to it.

 Eric Hanselman

Well, thank you very much. And that is it for this episode of Next In Tech. Thanks to our audience for staying with us. And I hope you'll join us for our next episode, where we're going to be talking about data again, but in a slightly different format. We're going to be talking about data consumption and the means to be able to integrate more data, so more data at scale. I hope you'll join us then because there is always something next in tech.

No content (including ratings, credit-related analyses and data, valuations, model, software or other application or output therefrom) or any part thereof (Content) may be modified, reverse engineered, reproduced or distributed in any form by any means, or stored in a database or retrieval system, without the prior written permission of Standard & Poor's Financial Services LLC or its affiliates (collectively, S&P).