Audioburst Search

Enabling end-to-end machine learning pipelines in real-world applications


The. Welcome to the Riley data show. I'm your host bend, Laura, before we jump into today's episode. I want to remind our listeners that we do have to event series that they can go and attend. And learn more about the topics covered in despite cast the first one is called the strata data conference, which you can find at strata con dot com. The second one is the artificial intelligence conference, which Yuban find FDA, icons dot com. And this episode of the data show, I sat down with Nick entry principal engineer IBM, and someone I've known for many years. In fact, I sat down with him many, many years ago in maybe the second ever a spark summit in San Francisco. And we did that on camera. Interview Nick was an early and avid user of Abacha, spark and became a spark Amidror, EMC member most recently has focused been on machine learning. Ng particularly deep learning. And he has sparked of a group with an IBM focus on building open source stools that enable and, and machine learning by lines. So our conversation will take a tour through some of the open source projects that Nick is involved in terms of spreading the word about these projects and evangelizing on their behalf. So a quick reminder, Nick will be speaking at strata data on France in New York on deploying end to end deep learning pipelines. Own IX also wanna give a shout out to a brand new to'real that we will be having at the conference in San Jose in September. It will be taught by Neil anway, and you ause Cimmamon of determined AI and able, be giving a tree, our tutorial entitled modern, deep learning goals, and dick Niks. I've seen the outline. I've seen earlier versions that should be a great youth Auriol. So. So I hope you enjoyed this episode. Nick country, principal engineer at the center for open source in a technology ide- 'em. Welcome to the data show. I've been yet thinks rematch for having me. So I don't know if you remember, but as she first met in person, I believe, at the first spark summit in San Francisco, years ago when I was tasked to interview people on site, and I grabbed you do benefit you might still be around. Actually, I'm gonna try to dig it up at Lincoln to the episode notes of this August. But physically that point, you had a startup called, graph low and graph flow happened to be one of the early users of sparks described how you discovered spark, and Grassle. Thanks very much ado, Disney remember that video into, and I think it was back in twenty fourteen if I'm not mistaken at that. Point, as you say, I was running a smoke thought up, gras actually based out of Cape Town, where I live at focused on, on actually international customers in the kind of online retail space, what has session ethics about video of mobile apps and Russia was a recommendation engine as service. So we built out and a baby is where I'll customers could post usage dates of the online store, for example, and we would crunch the numbers in the background, do the lyrics, and the machine any model building and seven back those handed extendable and analytics API's, as well as a recommendation API, so you could get similar products example, has on style, recommedations people feud, this product was of you, this used talking to us recommendations incidents Email out of that kind of thing. So the days of gruff flow, only the beckoned after when I had recently come across Spock, that would be kind of the end of twenty twelve in fact, twenty fifth. Eighteen point Spock was not even in patchy project, because today up until project UC Berkeley hand, I still remember the first time playing around with you versions, and working with this copy is just really exciting because it made what was what used to be pretty difficult working custom map reduce jobs made all of that, that workflow super simple, and, and feel really elegant, so how's excited about the technology and I wanted to, to hold out back in systems on on. So that's exactly what you did. And, and I'm proud to say this point, a gross. But one of the one of the India commercial use cases of, of spoke out in the wild actually building production system. So at the time of that Spock summit, we were I think running around divisional point grenades waiting fist about the, the back in divisional five and the company and my experience. Faulk crews from those editors through to the patching. Obata top level, patchy project and all to, you know, to my role in IBM, and for less than south Nick spark, Minter, actually, as you were describing that I realized I asked, you also jumped onto spark around version zero point five something like that. And there were ready some examples in the documentation done, as you remember this of some simple machine learning things clustering, h rank, but it was just do me was well, they used to use Astaire of compared to map reduce but the Neo was supposed to because he was in memory and fast. That was what I liked. Yes, Tiffany luck quicker to do things. And a lot of folks remember, Becton Grady, was a case, having to hand writes, John produce could for the most attention odd queries. That's compared to those working with, with Spock in memory using the. API's just made life much simpler. But as you went up, there were maybe a couple of examples, but no actual in machine learning. So one of that had to be custom built by me as back and at that point, the company had for the startup. So great experience provided really hands on exposure to Spock. And, and as you mentioned been, you know, I was very fortunate to be involved in those early days and be one of the content EMC members of the, of the Apache project into when it was donated to the Apache self-determination sedan. At some point you may your way into IBM. So how did that come about? So I've been with, with IBM coda team for about three years when I joined was known as the spark technology center says the name says it was a team formed Tanabe 'em to focus on the Apache. Spark project audio console saving thing we did was in the open source community, working with the community and contribute as out there and affected me. Trying. To your advantage to spark project community drive into production, and ultimately into IBM's products customers, I was stare at a spark summit, when this spark technology center was announced, I believe I even interviewed the VP in charge at the time. But anyway, it was a big deal. Yeah, it was a big commitment may Bibey him, and, but in terms of resources of people resources and developers, as well as the budget, and I was very fortunate to be part of his team. And of course at that point, building out a, a team focused on the spot required people with expertise and commences and so on. So you had to our job is I was very fortunate to be able to join the team. And in the days of being part of the point, most of the team was focused on on the Apache. Spark ecosystem, would obviously with a heavy focus on that aside sin machine learning. And that was mostly what I will. On Hanson team is, as Sydney expanded its focus so Spock is the key part of, of what the team does. But has the market has has rapidly moved into into this phase of data sides. She leading, hey, I deep learning saw team being in the open source data space has has followed those movements. So the team now is focused across the python debt is Hans stack little bit of our, the deep learning frameworks and more recently, we have been serving as a, a kind of conduit between the IBM product teams, IBM research, and the open source community. So trying to take some of the near the interesting and kind of cutting edge research projects in the end the data signs. Hey, I interest and trust the day I space and bring them in collaboration with with product teams into the open source community, and in keeping with that updates admission. We rebranded rename the spot technologies. Into into center for episode data energies Oko day. So they had the main offices, still based in, in San Francisco few team members around the world including me, Henry grew up around fourteen benefits at the moment focused on one of these different foundational episodes projects. And as you mentioned, the center, also in collaboration with the rest of IBM starting to push out some interesting open source technologies than you recently gave a top at strata London in early may entitled building a secure and transparent machine. Learning pipeline using open source signals where you mentioned several of these projects. And by the way for our listeners out there on going to try to bring Nick back to London in Dover for our AI conference in London. So we are still discussing how that will come about so safe dune. But anyway, so Nick onto these projects. The first one is something called AI fairness tree hundred six. Sixty. So what, what is this project now? Thanks for mentioning these. Pressure on it to be in London food for structure once again. So I'm glad that I was able to be there. So if you have the projects that, that you mentioned, I'll part of the code as it of projects, which encompassed the into into prize a alive cycle, and that goes from the data preparation phase all the way through to, to training, machinegun deepening models in a scalable, and framework agnostic MENA through to, to being able to to serve as models, as well as discovered instead of models and deploy them. And then thinking about the recent trend in trusted and secure a we have a couple of projects that in that space or sl put it I been trying, Nick. I've been trying to collect a bunch of these topics around the around the umbrella, managing risk in L. And I think that yeah. I think that's a great, great way of putting it a great umbrella. And that's exactly exactly. We have the few of these risks that are recent to be highlighted. Sample bias nearby potential models the, the need for expanded, but it's the needful adversary, robustness and a famous three sixty one is our adversarial, robustness toolbox to projects that to help address these challenges, so. Hey I finished three sixty three sixty short is an open source project out of IBM research, an all new twenty manage with the credits human research team that seeks to help drive forward buys research and buys mitigation in in machine any models. So it's a python to kits framework, so bailable on guitar. It is, is actually agnostic to, to underlying machine learning of so, you know, you could use it with, like, you could use it with depending works, and it is a headquarters, a collection of vice metrics, that you can run on a particular model to investigate with a that one was biased to sit in. Subpopulations or sensitive variables that emit etiquette, as well as some vice explanation techniques and importantly, ways to actually meant to get at buys. So is this kind of Sosa vision of a s three sixteen more of kind of during the training face kind of offline or is there like a real time monitoring component with at the moment? It serves as a as a cool could've framework, took it and the idea, here is that bias ultimately buys machine. Many models comes from the day to the fact that prediction may have another accuracy will give him a subset of the population radius coming from the fact that the data itself is, is skewed than the way that shows up is admit Trix differential, metrics, the way to mitigate. Those can come from a couple of different places it can be done at the day to level what can be done at the model level. So. They always to mitigate that bias from the perspective of rebalancing rebates. The data, for example, or we're actually doing it at Runtime based on the model predictions. So the vision of, of the tools to address all of these scenarios, what it doesn't necessarily grid that will offer a monitoring framework example, but it could be included in a monitoring driver at some point. Yeah, yeah, yeah. So one of the challenges, I think, in this topic is that there's no clear cut way of doing all of this one hundred percent of the time. You know what I mean? So it's kind of like you still. It's case by case you still need to end thera gate, the data you need some kind of domain knowledge. And so, I think these are gonna be very useful for people at the front lines of deploying, some of these things. But I don't know how you feel but I don't think I think, at the end of the day, people still need to get in there and really understand the problem the data, there's it's ethics. But after all, you know what I mean. They're still Yuma judgment involve. Yeah, absolutely. And he's metrics, mitigation techniques are exactly that all techniques. The they considered me, not simply be applied their guides for data scientists, and the domain experts precisely but they do provide a very useful framework for evaluating existence of bias, the severity hand, and for, for applying mitigating techniques. Of course as with any tool kits tools of needs to be applied with demand alleged with expertise. But what's nice about af? Three sixty, for example, in them on the guitar regarded, this should be ending to the demo that these tools, these metrics themselves, very nice data visualizations. So it's very easy to see you. What is the impact of degation techniques, all the Bosnian reduced and so on? And they're very nice online demo which is which is taking off. And then the other challenge, of course with this area is that there's actually a temporal dimension as well. So sometimes the Astra. Negative effects may be down the road as a paper in from UC Berkeley, pointed out. But the so now let's describe this kind of companion open source projects, which Justice adversarial. Robustness toolbox. Yeah, thanks been. So the, the heavy cereal robustness toolbox. Hot is, is very similar in concept to have three sixty again, coming out of IBM research had with a focus for this project on addicting advisory attacks and being able to mitigates adversarial attacks twenty listeners that may be familiar with new episode examples machine. Noting the idea is that you know a machine. Many model is trained on his hip of data under Cussing example of fed visceral scenarios is in image pesification. For example, the debriefing model may be trained to testify animals would have things like that. And it turns out that taking. The input data that image of. Panda bear, for example, and applying very small preservation some noise to that image. You can actually compete food, the model into thinking that is something completely different example of given or, or an ambulance and to human, I this is imperceptible. So if a human at the image, we would never be done exactly what it is. It's the panda the model is, is affected very, very sensitive to these small changes, and that kind of corruption of the input can happen naturally in, in, in some scenarios yoga by mistake. But it can also have been through bad actors and of an adversary actually knows what they're doing is intending to create home. They could inject these kind of attacks into a model so much the same way as as in computer security cyber security. You have the ability of hackers to try and do nefarious things willfully. There's the same concept. Applies potentially to machine learning models. So this is a this is. Not just a theory typical issue. Another passing Dunkel is through. Fannie small today Sion's annoys applied to St. Johns. You can fool the model into thinking that stop sign is actually a speed limit sign and one can imagine it, noise and self driving. Cough example, that can have significant negative impacts from a from human life perspective, neck, in terms of from a user point of view. So I train a computer vision model. I have training data how art fit into my workflow so ought has a few different components. It has a set of eventual, adversary attacks, so it does have the attack tools, so, which, which in this context would be I'm training, computer vision model with my labeled training data. So art would go in and kind of birther might training data as the Texans of typically on the train model. Okay. Once you've trained that you could apply in attack. Six the tubing tubing, the penetration sits in that way, you can actually measure what is the impact of? So this is more intended for education purposes, and to strengthen latest research. But then at least if you apply this attack from art on your validation. So you'll go, I'm not gonna deploy this, this model is not robust enough to deploy the, the first step is affected detections, applying Texan, as what is that tricks? They are robustness metrics, one of, which is a state of the opt metrical clever, which comes out of residual. There's a few others that also from the latest research papers and these the sort of the metric index that represent hall robust model is to various emissary attacks. So the fist is beginning, to try attacks to see how they work and get an understanding applied to metrics to kind of evaluate grossness. And then the third is to actually be able to mitigate them. So the instead of the odds mitigation techniques, full various attacks. Coming out of the research community, and many of them are implemented in odds, and in the framework lousy to actually apply integration technique again. They've got a very nice. Visual dimmer online. So an increase should be on the gets full adversary, robustness two bucks. And that allows you to play with input images of different types of federal attack in defense. With different strengths and parameters. Ready? Get a hands on feel Ha's this actually works in practice. So let's say I get serious and decide to use vote a fairness three sixty Andy adversarial, robustness still boss, their vote, open source their vote on getup. But the s you know, voted these serious are changing a lot. There's a lot of new research papers and research results coming out. So are the projects onto Gitta bay, just our dig on the actively maintained, and the latest attacks will be reflected. Absolutely. So they are actively maintained with. And if you look at the history activity, you'll see the idea research team members. His cO day, human czar are actively heading. You Elkins metrics and of out of the joy, of course, is to bring community involvement. So we would very much open to, to help contributing to projects, and if if any practices. Oh, researchers are out there that would like to contribute a technique approach or algorithm. All metric that would be most welcome. So Nick, at this point for this new projects, people who are actually using them in production his nose in one of my head. I think our view use cases out there, some of these metrics, in particular, fantasy metrics are actually part of an IBM service schools. What's not scale that allows you to monitor machine many models. In fact, both within IBM Todd is one is what is services such as we will. AWS Sange Jamaica, another's. So these all these fantas metrics expanded, but it's another actually valuable to monitor you'll production these cases in that product is that's one of the key production, these cases that I'm aware of. So then the next set of projects that you discussed are more in the area now starting to production ice, or using Astro machine learning models that maybe other people have developed I'm just gonna rattle off all of them, and feel free to describe them in whatever order, you feel is best. The first one is called fabric for deep learning, which is deep learning class floor, floor fencer flow. Cafe by arch. Esa service on Cooper, navy's, the next one is called the model asset exchange which is basically a place where developers can find and use deep learning models. And then the last one is called Selden core, which is machine learning deployment for coober natives. So. Nick, any additional insight on these projects. So the fabric deep learning and Southern cO that you mentioned the one common thread. There is model trading in deployment on communities that, I think that's something that you seeing emerges as a trend in kind of a ops muddle ups space communities on the very dominant in Tacoma for software general than seeing that happened in machine expansion. So, so deep learning is a IBM research projects that came in conjunction with the product team poor in fact will IBM's cloud depending service. So as you mentioned, it say to sort of meta-framework, essentially runs on top of communities, and allows you to train in any deep learning framework, that you want to use on a community questo at scale using GP's effectively abstracts away the infrastructure in DevOps decisions hand allows the data scientist research. Email engineer to some focus on the model training and the package. It up with a simple, configuration Yama file, and submitted to the fiddle Testa, and it'll take care of training skating, GP education note of that stuff of these to be needs to be done without the data sons having to worry about how that works on the deployment side of that spectrum. Once you've trained, the model you need a way to put framework and seldom coral is another episodes predicts that came out of a company called Seldon in under. Startup. That is pretty well known in the machine learning space and machinery, deployment space. And they open sourced, eventually the core of this system, which is a model department framework communities. Again, looking to be famous NAS tick and allow you to declare whatever machine ending depending on, whether it's not tens of lower cafe towards to learn and to do that in a standardized way. So for the people who don't follow these areas, very closely, Nick. So how do these two projects relate to q flow we'd Christian, you know, there are a couple of different projects out there that all signed to merge doing some of the things to fiddle out traffic deep learning federal is the relation is very similar to keep drugs effectively. Trying to do a much same the same thing in the same goal. Keep flow is initiatives thought adopted really as way of training of waddles on committees. And again doing that in, in a standardized way handles. The resource allocation, GPA, editing to kind of sets in bolts, if you typically have to, to worry about some without Petra work, but it does that in a sense, extensive way, and also this appoint full patronize another frameworks, how many that's still kind of inactive development so on is fiddle country, supports multiple frameworks. The keep chose predominantly focused on flow, has has pretty good pipe village Spock is sort of in the background day much. It's forty working, but it it's kind of independent is gonna end another's oral who in development, so at a level, they're actually quite similar to Seldon is ready from the deployment phase, so by and fiddle all focus training. And once you got a model that's kind of where they stop pet Selden heads that deployment lay on top and seldom has implementation for running on fiddle winter fessing the federal ones. Model. One is running on tubes up, so they're agnostic to actually down the line. The underlying gonna make an metaphor if you will that you using to train your models. So what about the dislikes project bottle, aspects change? So the last project, we work with and bone today's business. It's not all the project focus on. But said the machine learning pop line phases. Molasses exchange is, as you mentioned a, a connection of free and open so steep any models instead of the models across various to Maine's has. You can think of it as a model zoo Z of policies said, speak, more of model library in the sense that we're trying to to catalog and bits the model code awaits whites. The intellectual property licensing terms that are there assigned to each of those components. That's awesome stuff, for example, neck, last year was a big year for language models. I don't know how up-to-date model acid exchanges. But do you foresee? That's some point down the road. One can just go to model acid exchange figure out or just grab one of these language models. Language models, on Disney really important, selection, models, we have at the moment, so we have a named entity institute recognition model. We have some weight and bidding models. We have some customization models based on bird, which is one of the cutting edge. Then be models out there felt clarify, what's the difference between me going to get up in grabbing birth there as opposed to supposed to going to mottos change? I'd say if we model on the exchange to take a step back, that we typically have two types of bottles, and most of them at the moment, Paul, what, what we quote deployable models, so we're taking disagreed model implementations from around around the different models. Zoos research. As asians. Get Henry presence on and what we do is trying to package them up in, in a in a more standardized unusable way. So most of this stuff. In fact, one of it is treated valuable ready, but the banner in which it's vehicle can vary quite significantly. Some cases, you might go along to, to a cadet repose from some organization that has trended NAS model. You wanna use the it's very difficult to actually use that in production service said to take it from that states to being able to get deployed into application can be a significant amount of work, and that will involves going double checking code doesn't do what, what is on the ten isn't actually achieving what you need shaking, the licensing, terms, nor figuring out how to package Waddle up into in tune Arabia that you can actually use in your application, the version of the library, the dependencies owner. That's exactly. And in many cases, these models, maybe not have been around for a while. They're not depend on premiss. Of, of the framework, and, and you have all of these potential Garches on an issues that you come across when you when you try to take model that into that loss model of production deployment. So with Mexico tried to do is remove obstacles in his red blocks by standardizing the framework and mechanisms. The is so each model max is free available backed by guitar repository will have occurred, their full inference and food for the Waddell whites, we package that up as a Docker container in that Docker containers running as a standardized risks APR, and that exposes in as much as possible is standardized, inference important way use of metro data with text NLP model and you get back. Do you'll predicted results. So we handle that service handles the preprocessing his part of the pipeline. The coin front model, inference path, and the pulsing has by doing this in, in a in a stand is men that we have to make significantly easier. To go from your zero to, to actually implementing when using a model in a more developer friendly way. So let's say that I got a model asset exchange, and I did, there have been on this is the right model for me to deploy. But then I realize, oh, but it doesn't quite fit my domain, I may need to retrain it, and tune it a little bit for my data. How, how do I go about doing? Yeah. I think it's a really good question and very good, because in many cases not old. But many cases that that is probably going to be the scenario, a lot of the time is models of trained on fitting generic exits industries tended data's hits example, we switched today is hits and often for particular use case do need that, that fine tuning when training up until now, it'd be mostly focused on just trying to, to get a sufficiently Pruitt in and, and deep set of models out at all that are usable. And the next phase is focusing on this, this use case. Scenario being able to actually train on custom data making that is easiest possible. And to do that. We taking much same approach, which is to, to try and standardize has as much as possible. And if things much wild west, you know, for deployments, models, that is even more. The case full for the trading side in any given published model on guitar has its in custody of running scripts in getting day in training will this kind of stuff. So what they're trying to do is abstracted, little bit of that, in a simple ways that we have a single entry point for each one, single training on the fiddle, the fabric, depending with Watson, DIVY Matzen, depending service cottage into managed service and, and make it has simplest possible to drop in your indata and get back a bottle. So that's something that we, we have got a few trainable models currently. But it's something we're actively working on, you know. And you know it's it's in the. I'm not exactly sure win. We will have. That's what exact form it's going to take that is dif- knee. We recognize that, that's where I think the key, bad you flex using these these models is and want to get that in the hands of benefits data. Scientists one last question of the model asset exchange which is a pool or stared equivalent of editorial board. In other words, who decides what law goals, get onto model asset exchange and how often do the models get revisit very good Christian. Now at the moment. It is dal- team during the code team. That, that obviously, we have quite a few team members. But we, you know this Nimitz on our resources. We are driven largely by what is happening in the research community. We're just moving fast. It's moving so. So fast fast, not fast. Right. So fast into sense. The there's a hundred papers today on archive on machine learning, but, you know, there's probably a few models would really move the needle every year. Right. Precisely and nothing you into in the head with the idea of review void aboard. I mean in a sense, you're always like to be driven by the cycler approach, intrusion of something if you want to add in, you eligible to learn, for example, is a minimum bar in terms of how widely used is of what is this patient history? There's a sort of a resistance in since two heading. What is the latest ryen newest draw to say, as you mentioned, what, what is moved the needle, what is the state of the art is about, as you mentioned, not models, being very active something like a good, even though it's fairly recent, I think is critical to be in the other and think anyone would argue about that. But we. We just don't have capacity other than any team does to just put in every single model in terms of siding, what goes in there at the moment it's up to us. Absolutely. This is the no consoles project Cole of the website have since on IBM developer website, where you go into the model pages. But everything behind that older. Curd older containers, everything will the whites everything is open souls. So time we already we would love to, to have contributions from the community suggestions, while to for new models could be added use cases, and every time different enough to think about ways to make this more even to the point of having some been Wales reviewing, deciding honeys, weddings, get in there. That's an interesting idea at the moment. It's not something on the red mount butter said that his way I could have been interesting to see this project still hotel. I'm very excited about the model asset exchange. I mean, we are we excited about to the entire team that's working on on these models because you're having putting them in putting this power and this technology in the hands of his many developers as possible winner, they beat expedited scientists whether that actually. Application developers, you need to know anything really about the underlying technique as long as they can quote a nape, ya and get back into the result. It helps them. You know, I think that's what's exciting for us. Men, we're excited about takings Ford and about getting onto people involved who his interested with left to yet from you. And it also fits this whole trend that I'm seeing which is basically models increasingly becoming commoditised in some ways, and so companies have to go back to basics and think about data a how to wire data and get good training data. But also on the other end of it is this whole area that we've been talking about, which is modeled, governance model ups. And so one of the things that really got my attention this just how M L flow in ten months now has two hundred pumps using. Right. So and that's for model development and there's a lot more tools that need to be built as far as a model of rations. Model governance rights, just like we have tools that recognize the data are valuable assets. Right. So the data governance and data half along tools for probably going to also need specialized skills for model governance model offs. Yeah. I absolutely agree with that kid trends, I don't think anyone would disagree have different view on that time and said, I m is interested in these spaces, but from a bushel as what is a view as a ammo flows really interesting projects in, I recall being p at the spark summit was introduced the head announced think another similar project is probably cute for pipelines. There's a few different projects out there that are looking to provide this, again, this idea of, of kind of meta-framework who when level of obstruction higher across that machine learning data pipeline this morning pipeline. And by necessity that has to be a famous agnostic. The bridge on fragment wants to use older, the different frameworks in, in any small to medium size, even team you can to find multiple frameworks multiple visions, multiple took multiple languages that database accorded. That's what I've been greasing how these top projects off no trip times, this idea of being able to have one, narrow of extraction across one of those different tools. Whether it's peci sparkle and is on, on the data side through to the various sheeting tool kits from the trading side and model selection side through to ultimate few who I think we'll see you much same thing on the deployment side, d. D containers sell than any other deployment framework, as you mentioned model governance model at monitoring. I think that phase is coming is a few and said pieces on model DV we've mentioned some kind of fantasy robustness metrics that possible. Explain ability is part of it. I think on that side we're the level of the set of metrics in the tool kits but not the actual framework will help to monitoring of I've model in principle way alerts. Retraining dashboards, and so on, I think there's a number of commercial offerings out there, but on the openside I haven't seen in maybe missing something. But I haven't seen anything that is comprised of like for one governance. I think you absolutely right in the same way, that's Chad data, governance, need both data covenants, and the governance through broad because, you know, the set of features that are inputs, there is critical to the model wanna performance is if you'll features change if the, the data schema changes that's going to. Need to compete fainter, down the line if you don't have good governance offered. The input debt is the niche of the dates where it is coming from. What is what is this even apply? 'cause attending the time and that particular model. No. Where's that? Basic things even Nick, like give me a list of all the models, we have, and who has read write for villagers strengthening things like that. But I think at a high level what we're talking about. Here is just a growing recognition that M L is sufficiently different from three additional software development that you need specialized tools, and so two the extended M L is becoming important distills are just naturally going to emerge, by the way, I wanna put a plug in for a rice, lump project that you guys at Kodak should look at which is Ray. I don't have to what extent you've been following. Right. But staking off Asif today, seven thousand stars and get hub, and it's a distributed processing framework, written C-plus, plus, but the API is in python. So among other things reinforcement learning fairy natural to do in Ray, and they do have an R L lib, which appeals to our L researchers, but also just a users and more importantly for. Data. Scientists, I don't know Nick, if you've heard of Molden, which is Anderson Ray, so one line of code, and suddenly, you're panda's on your left up is faster and it automatically scales out the Lester. Yeah. I've, I've definitely Cup across rate in noted little bits. The IBM and Sydney. The, the oldest is in all the could a team has a relate relationship with, with what was nabbed them in Berkeley, which is not racist. That as we've looked to dre quite a lot. And if I recall the data store, the keypad knew that is still into peci arrows as part of that project, which is has that's really great to see that kind of standardization happening cannot not have come across the depend on Ray m eastern at a few of the examples of niece than it looks pretty pretty interesting. So I think that that's really kind of interesting spatial is this is the emergence of me a few different projects. Looking at some Busta, and more distributed data frames more scalable. Data friends, so including Penn is on Ray, doc, the Invidia Rapids projects effectively pan is on GP think is pretty interesting subjects that, that will be worked on myspace excetera writing the tools data. Scientists already using benefactor. You're trying to keep because tools API's the same. He. He does how to use pandas ham, but give them given the scandal. But it's that they may be needing as, as you know, the data volumes head working with growing significantly. This has been great. So I wanted you on the spot. That's I should have had a while back. But also, I think that I also wanted people to know about all these cool projects coming out of IBM open source, a lot of listeners being not be aware of. So thank you, Nick. Thank you. I mean, it's, it's a it's a huge honor to be on the, on the show, really appreciated as you said, I think many, many folks, may they may not know about some of these open source projects in this space that are coming out of ibn research on code team. And someone sandy IBM is not always associated in the in the minds of the those benefits data scientists so on without consoles. But we have a long and rich and proud history in, in consoles from your job through to the next three to. Ities Notre s Spock? And now you know, the, the deep learning and machine learning space and up say, I'll specimen just today. I is key full while business in other principals teams. So it's, it's really great to have the opportunity to about some, some of the things that could doing than pushing up into the community appreciate the opportunity. All right, Nick. Thank you. Thank you. You will follow Nick, Ben trip on Twitter at m L Nick, thanks for joining us. If you like the show. Please subscribe and rate us on I tunes or Stitcher, or tune in dot com or soundcloud, or Spotify, and never miss an episode.

Coming up next