Data Alone Is Not Enough: The Evolution of Data Architectures

a16z
|

Automatic TRANSCRIPT

Hi and welcome to the a sixteen e podcast. I'm DOS, data, data data. It's long been a buzzword in the industry whether big data, streaming data, data, analytics, data science even ai and machine learning but data alone is not enough. It takes an entire system of tools and technology to extract value from data. A multibillion dollar industry has emerged around these data tools and technologies and was so much excitement and innovation in the space. It raises the question how exactly do all these tools together This podcast featuring Ali Goatee, the CEO and founder of data bricks explores the evolution of data architectures including some quick history where they're going and surprising use case for streaming data as well as always take on how he'd architect the picks and shovels that handle data and ten today. Joining Ali this hallway style jam is a sixteen see General Partner Martinez. Casado who with other a sixteen the enterprise partners just published a series of blueprints for the modern data stack. You can find that as well as other pieces on building a businesses the empty promise of data moats and more at a sixteen dot com backslash mfl economics. In this conversation, we start with holly answering the question. How did we arrive at the set of data tools we have today Starting eighties, business leaders were kind of flying blind, not knowing how the business were doing waiting for finance to close the books and data warehousing paradigm came about they said, look we have all this data in these operational data systems. Why don't we just all that data would take it out of all of these systems transform into some place. Let's call data house and. Then, we can get business intelligence on that data and it was just a major transformation because now you have dashboard, you could know how your product is selling by region by skew by geography and that itself has created at least twenty billion dollar market that has been around for quite a few decades. Now, what about ten years ago? This technology started seeing some challenges? One more and more data types like video and audio started coming out, and there's no way you can store any of that in did our houses second they were on prem big boxes that you have to buy and the couple of storage and compute. Kim really expensive to scale them up and down, and the third thing was people wanted to do more and more machine learning ai on his data sets. They saw that we can ask future looking questions, which are my customers charn which my. Products are going to sell which campaigns should it be offering to who? So then the data leak came about ten years ago and idea was here's really cheap storage, dump all your data here and you can get all those insights and it turns out just dumping all your data in a central location. It's hard to make sense out of that data that sitting there and as a result, what people are doing now is they're taking subsets of that data moving into classic data warehouses in the cloud. So we ended up with an architectural maths stats inferior to what we had ladies go. We have data in two places and they don't make ended that or house or the stale mass, and the recent sees not great in the last two three years. There's some really interesting technological breakthroughs. That actually now are enabling a new kind of design pattern. We referred to it as the Lake House and idea is what if you could actually be able to do bi that rightly on your knowledge and what if you could do your reporting directly on your radio and your data science and your machine learning straight up on the data link I would love to tease apart a few things that have led us here. You know this is very clearly a large existing data warehouse market behind. And you know it's typified by people using sequel on structured data. Like the Emily, a use case is a little bit different than the analytics use case right the case it's normally human beings that are looking pash boards in making decisions where the. Use Case, you're creating these models and those models are actually put into production and they're part of the product they're doing pricing. They're doing fraud detection to underwriting, etc.. The analytics market is an existing buying behavior and existing customer MLA is an emerging market and so. The core question is, are we actually seeing the emergence of multiple markets order this one market? Well, there are big similarities and there are big differences. and. Let's start with similarities roughly the same data. is needed for both there's no doubt when it comes to a and machine learning a lot of the secret sauce, you'll get those. Results predictions comes with augmenting your data with additional Meta data that you have. In some sense we have the same data and you're asking questions don't differences. One is backwards looking future looking but other than that a lot of it is the same and you want to do the same kind of things with the data you want to sort of repair it WanNa have it so that you can make sense of it. If you have structural problems with your data that actually causes also problems for machine learning, actually the difference today is that. It's line of business as typically doing a and they'll science or hardcore rnd whereas housing and be I oftentimes sits in it users of the data warehouse into Beatles, our data analysts, business analysts, and machine learning. We have the all scientists machine learning engineers. We have machine learning scientists. So the personas different and it sits in a different place in the organization and those people have different backgrounds and they have different requirements on the product using today.

Coming up next