Jason, Railyard, Netease discussed on Software Engineering Daily


You know, my algorithm is like I want my false positive rate to be this, and after that maximize recall and figuring out also like, in some cases, we kind of compose models, like, you know, try and run this one, but fall back to that one all the stuff in the API. I think that really it's been like a really good force, and kind of having this automatic retraining on top of it to have sort of, like codify, our valuation metrics, because you kind of have to do that, if you want to automate, all of the stuff and now we have. Kind of like a little bit of an evaluation API that we've built as well that, rob. Probably say more about too, is when I think of an API. I think of something that is kind of an imperative interface. Like I'm executing an API right now. Like, I'm making API request computer. Go do something, the coober Netease way of things is more of a declarative like this should be the state of the world at all times. Tell me a perspective on declarative versus imperative. I realize it's a sliding scale it's kind of semantic, but I mean you could have made you could have done this declared. If you could have said like railyard is a declarative way of describing when and how you want to run your models. But instead, you said here is an API you make requests to it was was that deliberate or like told me about declarative versus imperative in your mind it was pretty deliberate. And it was it was very driven by the product teams in that they were very interested in colleagues API, I write they wanted this API to resist because. They wanted to build in a sort of the idea of we want to retrain, these models in evaluate them and then decide whether or not to put them in the production, they, they wanted to own that logic, because that is sort of their logic to own as, as a product team. They're the ones making the decisions about is this model better than the last one showed this model go into production that sort of thing. So from their perspective, what they really want. It was in the eye that they'd call like sort of dynamic API declared peace interesting questions so on Cuba, Netease. You know, we use like the Cuban daddy's jobs, interface in, there are pieces of that, that are declared it, right. You know, we sort of have this, this pile of Yambol that describes how we want to run the job. Right. So, you know, to run a job on Cooper, daddy's, you have this gamble description of I wanna run this job with these resources with this Docker file. Right. So that is very declared if and we build up one of these declared a specifications when we run the job within that job within that. Docker container where executing a bunch of imperative python code. Right. So it's sort of these pieces all work together. So we make API calls. We make this or this dynamic she API call with Jason specification for how we want to train the job. We build this declared a spec to Tokyo Bonetti's run this job with a Stocker file. And then we pass through this, Jason specification, too, I thought goad, which imperatively decides, you know, how to train the model of what data defect based on the specification. So it's sort of like a nice blend of the two. Like, we're, we're sort of like route were sort of using a little bit of declared if pieces in building our daddy's job on this specifying the Docker container away, also had this dynamic piece this API that are brought teams are using to, you know, sort of trade models on their terms of actively taking a step back. How did you come upon the insight of giving this unified experience through an API because, like it's, it's not necessarily like this is not. I just want to understand your thought process, a little bit better because this is not necessarily like a like, you know, aliens to be leads to see, you know, kind of insight this is a inside of there's all this disconnected stuff that's going on in the machine learning developers workflow. Let's unified into an API that is by the way, I was looking at a request for like railyard like it's a lot of stuff you have to put in a lot of different parameters because a machine learning job is very complicated. But at the same time, it's very nice to have all this stuff in one place. What led to that insight? We struggled in the beginning struggles. Right word, we just had a lot of our decisions to make about sort of how many constituents there's so many different people that you're trying to cater to with this API. Yeah. There are a lot of constituents. But I think I think one thing that we did. Right. In one thing that we had to make a decision on early was sort of how how much freedom to, to give users in terms of the co that was. Being executed. So my mentioned the blog goes at one thing, we could have done is when we wrote rail yards, we were essentially, mostly just using cycler as machine learning platform. I don't think we did not have actually boost as a as a as a framework in that point. We're just using cycler and one thing that we could have done, and one thing that we talked about it, like seriously considered doing, it was just building a DSL for Cy killer, right where, you know, people would specify not just how we want to fetch data and how we want to hold out that data and how we want to filter that data, but literally how we want to construct the rest of our M L pipeline. Like each component like how to transform it, how to encode it. What model are estimated. We wanted to learn or wanted to us, and we consider that we thought maybe we could build a DSL aware, where folks would literally right? No python. The only thing they would have to write was this declared of Jason specification of how they wanted to build as machining machine learning job and to end. So they all he would have to do is write some Jason and that would. Say that would contain everything that they needed to run their job. I really, really happy. We decided not to do that we sort of paint a middle ground. There are some things that every every machine learning job needs to do. They all need to fetch data, and they all need to, like run that data through some sort of fall. Right. And then at the end they want to be able to serialize other model and they wanna be able to, to serialize some sort of valuation data, basically every machine learning job has a handful of things no matter what framework use, or what problem. They're trying to solve the almost all kind of have a, a few things that they have to do. Right. And so the railyard API really tries to address kind of those core, fundamental properties properties that are fundamental to every male job. And then kind of leave the rest up to the user, so I sort of jokingly referred to, to railyard as, like arbitrary python execution service, with opinion, sort of opinionated interfaces at the Valerie's on. That's kind of what it is. You know when you write a railyard. Workflow. We have these functions we have a method like trade. We said we're going to call the training method with your data that you have to implement. Right. Like at the very least we say you need to tell us Howard and how to fetch data. And then you need to write some python code. That trains your model impassioned model back out. But, you know, the product person is writing this python, and they can do whatever they want in that train Beth. Right. The only contract is we're gonna pass the data in in you need to pass a trained model backout and within the train mended, you can write whatever python what and we've also allowed users to sort of override, how they fetch data that can customize that. And I think the flexibility in terms of letting our customers kind of right the machine learning workflows that they need to write in python and us really just a finding the interfaces is part of the reason that we've gotten pretty decent adoption with railyard, because we have me haven't constrained, kind of like fundamentally what they're wanting to do. We haven't, you know, constrain them to just use one framework or just, you know, just right on in the way that we want them to the, you know, they have. A lot of freedom to build these training, workflows. We just sort of define the interfaces for them. Digital ocean is a.

Coming up next