7 Burst results for "Lynn Digression"

"lynn digression" Discussed on Linear Digressions

Linear Digressions

06:21 min | 2 years ago

"lynn digression" Discussed on Linear Digressions

"Regret having tried it, right? Yeah. That's actually something that I've learned working at larger companies is the mentality of bad. Things are not always bad. You know, like if you bring the site down, obviously, that's really bad for whoever is using the site, and it's obviously really bad for your company's, reputation, etc. But putting blame on the individual who did that is not necessarily the right course of action. The right course of action is to say, okay. What processes lead to this failure, not who made this failure? What processes lead to this failure? And how can we change these processes or increased testing or make pushes safer or whatever it is to avoid this problem in the future and by removing blame from the process, it makes us safer environment for people to take risks that can reap large rewards. Yeah. That blameless postmortem is can be a really really good way of learning from the things that go wrong. 'cause that's especially that's where you want to be the most cognizant of learning, right? Is that you just went through all the all the pain you've paid the price of having to learn something the hard way. Right. But until you have a chance to sit down and say like, okay. What did we learn the hard? Way than you haven't really digested it or arguably really actually learned from it. Yeah. I would also make the case that failing once is not a failure. I mean, it's a failure to the consumer. But it's not an organizational failure, really, maybe it's a lack of foresight or something. But failing a second time the exact same way that is a failure. You know, if that ever happens. It means that you didn't not you. But the organization the company did not learn the lesson the first time, so yeah, it's it's interesting doing those blameless post-mortems and almost feeling a point of. It's it feels really strange to say, but almost feeling like it's a point of pride. Like, yeah. I took down the site or I caused this issue. And because of the processes are changing and the company is way better for it. And our processes are way better for it. Yeah. I think. That's right. I think one way that data scientists encounter this this a little bit different. From software engineers that. I'll just mention enclosing is that because data scientists do so much work with data. The quintessential example is you're using data to build a model, and then success is whether that's a model that does a good job at predicting the thing that you care about then it can be really really hard to predict whether the data is going to be good for the thing that you want to predict with it. And this is a place where sometimes there's a little bit of disconnect between folks who are used to working with pure software. Engineers versus data. Scientists is that with pure software engineers, I think, you know, in many cases, it's an issue of anything's possible. Just some things are. Difficult. But if you decided that you wanted to hit a certain benchmark at the cost of everything else, you probably could like most things are possible. If you put enough work into them, whereas with data science if you don't have the right data to begin with. There are certain things that you're just never going to be able to predict. Well, there's a certain limit to how much information you can get out of any data set out there. And so that can be something. That's a little bit challenging for data scientists sometimes to communicate to their stakeholders, and that sometimes can leave data scientists second guessing themselves or spending way too much time on a hopeless problem. So for folks who recognize that in that possibility for themselves like keep that in mind that you shouldn't give up too easily just because the data is not super high quality for what you're trying to do with it. But at the end of the day, you are somewhat at the mercy of the materials that you have to work with the I guess. That's a really good point to close on. I guess this. This almost feels like an example where we can look at the outlier or look at the the edge cases and learn from them like probably no one or very few people who listen to this podcast are going to be working at a place like Google lax. But you can but we can look at organizations like that. And see the the ways that they approach problems, and maybe water it down or diluted a little bit. And and find ways that we can apply to our own day to day. I think that's pretty cool. Yeah. So on that note, we'll attach a few pretty good articles that I found about Google X to linear digression dot com. It's kind of funny because I talk about it as the top secret Google thing that there's articles about it all over the web anyway. So we'll touch a few articles that I thought were kind of interesting. And if you're interested in. Versions of this from the past even perhaps more high profile and cutting edge in some ways than Google X or anywhere else. There's a really good book called the idea factory, which was all about bell labs, which was doing all of this and more in kind of the middle part of the twentieth century, another really great research lab. So if you're really into reading about this kind of stuff you've probably like book. Russians is a creative Commons endeavor. Which means you can share or use it any way you like just tell them. We said hi to find out more about this or any other episode of linear digressions Goto the new year digressions dot com. And if you like this podcast go and leave us review on I tunes. So other people get so listen to this content. You can always get in touch with either of us. Are emails are Ben at linear digression, sti-, calm and Katie had linear digger dot com. In case, you have comments or suggestions for future shows, you can tweet us and Lynn digression. Thank you for joining us. And we'll see you next.

Google idea factory Lynn bell labs Ben Katie
"lynn digression" Discussed on Linear Digressions

Linear Digressions

03:10 min | 2 years ago

"lynn digression" Discussed on Linear Digressions

"Expectations. You know, say that this feels like a good development for people like me who might be interested in getting into machine learning. But frankly, it seems like the the hurdle to get a machine learning sorry to the hurdle to get data side job as data science is kind of huge unless you already have a good amount of the Right Math. And and all of that. It's interesting to consider that now as the field has become more mature. And you you have these new role title job titles in expectations coming out that someone who's just an engineer might actually have a lot to offer either as a data engineer or as machine learning engineer. I think that's a great point. Actually, there are a lot of there are a lot of, you know, online comment threads and stuff. Like that that I was reading studying for this episode. It seems like there's a lot of. People who've been fairly successful, actually, as you know, becoming software engineers, or you know, maybe they were software engineers before they thought data science sounded kind of interesting, and they just found ways to get a little bit closer to that type of work at their job. Whether it meant working with data, scientists if their company had them or being the person who raised their hand when there was some kind of interested in doing a little bit more sophisticated data stuff. And so that seems like a, yeah, it's an interesting path that has been pretty successful for some people where it's a move from software engineer a little bit more towards data science and the the field in general seems to be finding that really valuable. So if you're one of those people in you're listening to this right now hats off to you. And I think you you might be onto something. Awesome. Well, I will let you know if I start my move into data, you should w sofa and. Pretty cool. I think you could fake your way through. Gaudy interview after all of these all of these episodes. I mean, but again like we've already talked about in this episode. I can't get the language, right? That's true. And you kind of need to you need to do it. You know, I think. Yeah. You need to spend some time to get your hands dirty. But yeah, I I I look forward to seeing what incredible things you have to contribute to the field of machine learning engineering. Please don't set the bar that. Aggressions is a creative Commons endeavor, which means you can share or use it any way you like just tell them. We said hi to find out more about this or any other episode of linear digressions, go to the new year, digressions dot com. And if you like this podcast go and leave us review on I tunes. So other people get to listen to this content. You can always get in touch with either of us. Are emails are Ben at linear digression, sti-, calm and Katie at linear degradations that come in case, you have comments or suggestions for future shows. You can tweet us at Lynn digressions. Thank you for joining us. And we'll see you next.

engineer software engineer Lynn Katie Ben
"lynn digression" Discussed on Linear Digressions

Linear Digressions

09:22 min | 2 years ago

"lynn digression" Discussed on Linear Digressions

"Advocate. You know, having lots and lots of there's limited time and attention in the world to pay attention to these things. And so if there's a whole bunch of not very inspiring papers out there that are taking up people's resources to read then that's not great overall because it detracts attention from the other ones yet kind of I guess, it kind of messes with your signal to noise ratio. You've got a lot more to read. And maybe a lot more a lot less of it is nice high quality signal that you can learn important things from. Yeah. So I don't personally have a huge problem necessarily with sometimes breeding papers that I don't think are great. I try to curate them a little bit. And I don't necessarily always talk about them on the podcast because the whole point of the podcast is to talk about stuff, this interesting and impressive. But I do think that if you're a person who spends lots and lots of time reading deep learning papers or whatever else that that's something that you might be encountering. And it seemed worth talking about. So earlier you were talking about kind of those three I guess types and the third one was hey, we did a thing. And here's how you can do it too. And maybe in particular with deep learning where the data set is really important to getting the results that you get I guess that's kind of the case for everything. But what happens in cases, where whatever group or person or organization who put out this this algorithm in this paper. What if they can't share the data that goes along with it? I'm thinking medical research. I'm thinking proprietary research is that a concern? I imagine it must be and are there ways around that or ways that that's handled? Yeah, that's super tough. I'm remembering in episode. We did not too long ago. It was the one. About software two point. Oh, and one of the one of the ideas, there is the the data is part of the code in exactly the same way that the python or the R or the the bashes part of the code. And so, yeah, the idea that you would release a result. But you couldn't release the underlying data means that it becomes impossible to replicate the result, which means that if you think of the result as something that ideally is verifiable, and you can't do that. If you don't have the data set, but you make a great point that. Yeah. The flip side of few don't wanna be using stuff. Like, silly academic data sets that nobody really cares about us. Really interesting data sets that people do care about. Then the flip side of that is that. Yeah. Then there's ideally the expectations the data set would be released but in practice that's often not happening either. And it's leading to some. Some real problems where there's papers that come out that maybe described the algorithm in a bit of detail, but they don't release source code. They don't release the data sets. And so it's really hard to know if the results are true or not you kind of have to take it at face value that what the researchers say the truth. That's interesting said that it is correct action say the that makes it loaded in a way, I don't mean to. But that is correct. That almost feels silly. If you're sitting let's imagine Google is the example here, and they they want to they write a blog post or they release a paper about an algorithm that they obviously cannot release the data that they are operating on because it's proprietary or it's or it's private in that situation. If you're sitting inside of Google, you can say, hey this. This is the best thing that's happened to this industry, this algorithm that we've created or something like that. But if you're. Sitting outside of the walls of Google. It might even be a low quality like a low quality result or something that you can't necessarily even learn from in kind of the craziest example, I don't think that would ever happen because I think that a company like Google would only release things that they could, you know, provide value to the reader, but I guess the point I'm making is that it very much depends on where you're sitting whether what you're reading is high quality or kind of a to say it cruelly oh waste of done. Well, it's funny because one of the threads that are one of the sources that I was reading that made me think I wanted to talk about this a little bit. It's actually a thread on Reddit where people were complaining about Google, not releasing data sets are trained models that were associated with some results that they were claiming so so yeah, I mean, it's it's it's a real thing that really comes up. One thing. Also that at least I've learned by doing this podcast with you is that did anonymous is really really hard. And so you can't I mean, there are just some situations where there's no path forward. You know? That's that's safe in protecting the user data or protecting the proprietary information that you have. Yeah. So it's kind of a double bind. I don't know in some ways. I think it's, you know, more egregious to not release all the pieces of paper than it is to release a paper. That's not that impressive. But yeah, you raise your really good point that sometimes there is no way to release that stuff safely. I don't know. There's maybe still some middle ground where accredited researchers could get a hold of the data for reproduction in like academic environments or something like that. But yeah, you're right. It's not super clean. I guess that is just occurring to me that there are probably a lot of cases where organizations decide not to go through the effort of releasing results that otherwise they would because of these various incentives that kind of stand in their way, I could be I mean, it's hard to know. 'cause yeah, you know, it's like start to actually note, but by definition, you don't know about them. But yeah, that's conceivable to me that you're standing there. And you're looking at the options between going through all the hassle of dealing with this stuff and not releasing the result. Yeah. I can imagine not wanting to release the result, which is also not a great outcome for science. So I guess as a professional in this field knowing that there's low quality papers out there knowing that there are papers that even though they may they may be high quality ideas or equality, awkward items. They end up being affectively low-quality once they get to the pub. How do you as a professional interact with these kinds of things? Well, I would say that on average I read or start to read probably three to four blog posts are papers for everyone that I even consider talking about on the podcast, maybe the ratio is even higher. So I do a lot of just quickly scanning things and spend a lot of time trying to find the good ones and then spending more time on those. And then from that process you start to learn which sources have particularly consistently high quality papers you start to follow the citation trees that the good sources tend to link to other sources that kind of thing. So it is it is a place where you start to get to know your way around. But there's no easy solution that I've come up with you do have to you have to kiss a few frogs before you, find princes. But hopefully, the outcome of all of that work that we do is that most of the stuff that makes it on the podcast is pretty good. So hopefully, if you're listening to this you consider when you're digressions one of the better sources of content that that you use. I hope it is useful to you in that respect if. No, just stop at the if not. I'm sure it is. And you can always leave us a five star review at I tunes. Perfect. Yeah. So when that I think that's a great place to end it catch you again next week. Digressions is a creative Commons endeavor, which means you can share or use it any way you like just tell them. We said hi to find out more about this or any other episode of linear digressions Goto the new year digressions dot com. And if you like this podcast go and leave us review on I tunes. So other people get to listen to this content. You can always get in touch with either of us. Are emails are Ben at linear digression, sti-, calm and Katie at linear digger dot com. In case, you have comments or suggestions for future shows. You can tweet us at Lynn digression. Thank you for joining us. And we'll see you next.

Google Reddit Ben Katie
"lynn digression" Discussed on Linear Digressions

Linear Digressions

07:58 min | 2 years ago

"lynn digression" Discussed on Linear Digressions

"The new state of the art going forward pretty soon. Okay. So I I wanted to ask a little bit more about how these pre-training auger them's actually work like how how do you do? You just go about training normally. And then there's some way that you could just rip rip out the first two layers than than graph them onto another auto them. Or do you need to do anything more in a more specialized way or something like that? I don't know that rip out as exactly the verbiage. I would use. So there's or copy. So there's three that came up in this blog post, again might be more. But I'll stick with these three for now. So the first one is called Elmo. So let me say a few words about that. So this is an algorithm that the the thing that's kind of interesting here is in word toback, for example, you represent each word with a vector, and that vector sort of captures all the information that you have about that word the thing that's hard about. That is the different words can actually have different meanings and so- representing it with a single vector kind of projects it down into this less informative space. That's interesting. I I wouldn't have thought about that. But yeah, I guess so you need to like infer you need to understand the context if you're gonna lost Leslie capture or less law C-league capture the. Information about the word, and what it means ESO, a good example, we did king and Queen earlier Queen is a good example. It's a word that depending on the context, they kind of might have all different all kinds of different connotations. You could mean Queen in the sense that we talked about it like the female ruler of a country, but it could also refer to chess piece it could be referring to a drag Queen. These are all very different ideas that the band. Yes. And so if you don't have all of those different all of those different representations available to you. Then, you know, taking a word to vac representation that has been trained to think that it's think of the Queen as the Queen of England. And then you give it a eighties rock band data set. It's going to be very confused, right? And. So this is this is where my sense of humor came from as a as a kid was just intentionally or unintentionally misinterpreting everything based on these multiple meanings of words. So we in a sense. I mean, this actually may be a good way to create puns. Well. Yep. Puns or just like my not very good sense of humor as a child. Well, so the name of this algorithm. This feels kind of appropriate is Elmo. Yeah. And so the stands for embedding from language models. So we're starting to get into the idea of this is a language model. Not just a word embedding. And you know, one of the innovations here is that it actually has multiple vectors that each word can have for representing it. I think technically an infinite number of vectors, and then there's some kind of complicated calculus that can that you can do on those factors. I'm not exactly an expert here, but part of the strength of this as it can take a whole sentence into account instead of just in the case, if we're to it'll just take into account kind of a sliding window that might be five words or something. So it's starting to represent not just the local structure around that word when it shows up, but it shows up it gives it the entire the entire sentences worth of context. And so when. When using Elmo what they'll do is they'll pre-training on these much bigger more general purpose corpus's of of text. And then they will apply them to other use those at as the initialisation for these more challenging and and topics specific natural language processing tasks and their six different ones that were discussed in this paper. But each one of them, they they. Compared Elmo against the standard benchmark stayed the type thing or an algorithm that was pre trained with Elmo almost just that first layer and found that in general, it did better in some cases it did much better. And so that suggests that this pre-training transferring it from that that first data set to the more specialized tasks is paying off. And then the the other two algorithms, you alum fit and open a transformer from what I can tell they're kind of similar and these focus a little bit more on what is the protocol of that transfer learning what how does that algorithm, actually work? So the whole idea is that you have this first neural net that you train a one data set. And then you have to graft it onto a second problem that like you were asking about few minutes ago. And so these papers are actually talking about the algorithm that you would use. To do that. And how there's a couple of different passes. The you might take of first pre-training the algorithm actually hooking up those lower level language models to the neural net. That's going to actually solve your problem for you. And then going through some training that helps refine those last layers of the neural net. So that they they actually are taking advantage of the things that they learned in the first round. So it's kind of like, it's an algorithm a little bit for neural net finishing school where you know. You're you're on that kind of goes to it goes to community college for the first two years, or it gets to learn the basics for a lot cheaper. And then I don't know it transfers to to Princeton for its last year. So it can take a couple of seminar courses and get a fancy degree. I don't know. It's not a perfect analogy. But it looks like the idea, but it's a pretty funny. Allergy? I like that. So all of. All of the the research links and stuff like this much, much better and less fraught explanations and some cases than I gave all again, highly highly recommend this blog posts that explains it all on the gradient dot pub, and we'll have a link to that on Leonard digression dot com. And again, just the takeaway here is that natural language processing are starting to get pretty sophisticated and some of these pre-training methods, and in particular thinking about ways that we can pre-training neural nets with whole language bottles instead of just simpler, word Bendix. Linear digressions is a creative Commons endeavor. Which means you can share or use it any way you like just tell them. We said hi to find out more about this or any other episode of linear digressions, go to the near digressions dot com. And if you like this podcast go and leave us review on I tunes. So other people get to listen to contact. You can always get in touch with either of us are emails are bent at linear, digression SICOM and Katie had linear diggers dot com. In case, you have comments or suggestions for future shows. You can tweet us at Lynn digression. Thank you for joining us, and we'll see next.

Elmo Queen of England ESO Leslie Princeton SICOM Leonard Katie two years
"lynn digression" Discussed on Linear Digressions

Linear Digressions

02:22 min | 2 years ago

"lynn digression" Discussed on Linear Digressions

"Yeah. Sure. Yeah. I think it's really easy to do. I'm sure that there's been times that I've I mean, I think if you're if you're very experienced you kind of learn this stuff. Early on hopefully, and then you get used to it. And then you as one of my physics professors would have said you develop a healthy contempt for it. Then it becomes you know, it it's something that you you don't think about literally every time you run a linear model, but okay, because they're so easy to run because of python are and stuff anymore. You know, it it wouldn't be that hard to have a a pretty long career as a data. Scientist were you never really really had to think about this stuff. But it's really rewarding. Once you stop and think about it helps you appreciate a lot more. How deep some of this theory is and makes it I guess so you you don't take it for granted anymore. That's pretty cool. Yeah. I I definitely find that even the simpler things in computer science or programming knowing how they work under the hood, really gives you a better understanding of of what's going on and helps you think more critically about the way that you use these things, and maybe even be more creative in the way that you use them. Yeah. So if you're a data scientist or machine learning person who's just hearing about this for the first time because he just never. You quite digested that chapter in your stats book like go back and read it again, it's pretty it's pretty cool entire full stuff. There's a lot of deep connections between these ideas. Digressions is a creative Commons, which means you can share or use it any way you like just tell them. We said find out more about this or any other episode of linear digressions Goto, the new your take Russians dot com. And if you like this podcast, Gordon, leave us review on items. So other people listen to content. You can always get in touch with either of us. Are emails are Ben at linear, digression SICOM and Katie had linear Deger. In case, you have comments or suggestions for future shows. You can tweet us at Lynn digression. Thank you for joining us. And we'll see you next.

Scientist scientist Ben Gordon SICOM Katie
"lynn digression" Discussed on Linear Digressions

Linear Digressions

05:31 min | 2 years ago

"lynn digression" Discussed on Linear Digressions

"Yeah. Okay. So here's the thing. He's with pizza podcast. I think now I just have an opinion. And then I'm gonna say it, and then I'll shut up so people say that real Chicago real people who live in Chicago, don't eat the deep dish pizza. The only the tourists eat the deep dish crew. But here's the thing is there's a really good pizza place. That's good for deep dish, that's two blocks away. And there's nothing on earth that I want more when I'm in the mood for deep dish pizza, then to go get a some lose. And. So I like deep dish pizza, but it's it's weird because they're almost like different foods. You know what? I mean. That's true. It's like there's a Namespace collision. I used to work with the talian post doc back in my back in my physics days. He we once took out for deep dish pizza, and like this isn't pizza. This is salty cake. What is? That's pretty good salty cake self cake. But it's is not a bad point. They're just like different. Teas salty cakes. Do. I'm not. So here's on the New York style stuff. I haven't spent that much time in New York. Sure. I would like it a lot better. If I did. But I it tends to be really greasy. And I'm not that into the the floppy stuff is kind of hard to eat, which I'm sure it's part of the charm if you're into that. But I mean, the flip side of that is that most Chicago style debate shifty have to eat with a knife and fork, which it is it's always actually my next. Terrible. Terrible. That was my next question because I saw a tweet, sir. Patrick Stewart had his first slice of real New York pizza. This is a couple years back. And that guy is that guy's pretty old like he's been around. And he is not had a slice of pizza in the true style. Where you're picking it up with your hand. And it just seems crazy to me to eat pizza with a knife and fork specifically if it's thin crust he'd had. Pizza before. But. Fork? Yes, he did like a proper British man. I mean, but if there's anyone on earth who could pull that off. Oh, of course, it would be Patrick Stewart. Yeah. So i'm. Whatever I think it's a little weird. But I'm not mad. He can do whatever he wants. Okay. I agree. Okay. Well, that's been your data science pizza roundup. So I got. Yeah. I like Chicago style too. Just in case anyone's wondering, I was we did. So what you're saying actually about breaking open the binary insane. I like how zones but. Let's don't like. Cousins are stupid. I'm sorry since you live in the bay area. And since we've already wasted, you know, three or four minutes of our listeners time, what's another one. Do you? Have you ever had Paci Paci? Yeah. What is is that a pizza place? Yeah. It's a I think they got a bunch of locations throughout the bay area. But they were always my favorite when I was going to school bell P A T X Y, and they are obviously paying me lots of money to say this. But their stuff is it's really pretty good. That's called Paci is I thought it was Patsy this like that's how I say it in my head. Oh, no. Because they actually have had that they have a thing somewhere. And it's I forget, you go to their website or something, and they're like you pronounce it Pau cheese. And so that's what I know. But it's pretty good. That's good marketing. Yeah. So if you're in the bay. Yeah. Okay. That sounds good. I'm I think I'm gonna go try that. I like blue line blueline pizza blue lines to. And then of course, if you're in Chicago, like I said, my favorite is Lou malnati's, jeered donnas. Also, pretty solid not as big of a fan of gino's and p quads is fine. But I don't get why people are so obsessed with it like, it's good. But it's not like as good as people act like it is sometimes, and then I think you'd is racist covered. All right. So we'll be covering New York bagels on. Working beginning. I do have. I do have a story about that. If you remind me, I'll tell you. But anyway, I guess we should sign off for now. All right catch you later. Digressions is a creative Commons endeavor, which means you can share or use it any way you like just tell them we said to find out more about this or any other episode of linear digressions Goto the new year, take rations dot com. And if you like this podcast go and leave us review on I tunes. So other people get to listen to content. You can always get in touch with either of us. Are emails are Ben at linear digression SICOM and Katie had linear decorations com. In case, you have comments or suggestions for future shows, you can tweet us and Lynn digression. Thank you for joining us. And we'll see you next.

Chicago New York Patrick Stewart Paci Paci Patsy gino Lynn Ben Katie SICOM Lou malnati four minutes
"lynn digression" Discussed on Linear Digressions

Linear Digressions

13:24 min | 2 years ago

"lynn digression" Discussed on Linear Digressions

"Convex optimization solvers. This is the just just tough this out there this distorting Q as in Q U E U E that word list. Yeah. Yes. Okay. So let's fall thread a little bit. So is there some way that for each page? We say what is the value of recalling it at any point since when it was last crawled what this is effectively saying is that probably the general shape that this function has is that if we've just recall if we've just crawled a page the value of re crawling it is going to be relatively low because the chances that has been updated or going to be relatively low. So we say that this is just like not a high priority. But there some function that says based on my guesses about when on average updates. I should expect updates to this page, then the value of. Recalling it is going to start increasing because the probability that it's been updated. As is going to start increasing until it reaches some comparatively high number, and then it might flatten out or something like this. So we have this function that describes how valuable it is to recall page as a function of when it was last crawled, and what you can do is that function will be particular to each page was just going to be a function of that page. Plus how long it's been since last time he crawled it, and you can sort your list of pages by that function. And then that if you can figure out sort of what that function is or how to formulate than that's a fairly simple way of doing that, q sorting. You know, I love that idea because that gives you a lot more express city, I guess in in those functions because for example, if you take linear digressions dot com, our webpage updates pretty consistently. Once a week usually on Sundays. And so in theory, if if whatever is generating the functions could figure that out then you could actually have a function that is very very low up until whenever that Sunday would be and then it kinda spikes up because it's a lot more likely that crawling on Sunday evening is going to yield the change whereas crawling on Saturday or Friday or Thursday or Wednesday are very unlikely to yield a change. So like, even if you have a relatively naive approach in creating these functions initially, you're the solution has the experts city in it that allows you to update that earlier component that creates the functions to be smarter, especially with websites that do periodic updates or, you know, probably there are other examples, I'm not thinking of and just to be clear like it. I'm not sure that the sleep. And they came up with for this one is quite that level of sophistication. So I'll arkie through the actual way that you would formulate this problem. But yeah, there's a lot of you know, basically, you could Spartan this up. But let's suppose that you don't have quite enough information to optimize to that level. Or maybe it's a website that isn't as predictable as as we are. So this brings us to how do you solve this optimization problems? So the first thing is we have the same objective function. Like, we still want to optimize the the average freshness. Waited over all of the pages on the internet, and it still subject to the constraint of there's kind of an upper limit on each host for how often it can be recalled or what the the minimum time between recalls is if you want to sort of put that the other way, and then there's a total amount of resources across all of the pages that Google has available. And so the first thing that you do is the first time that. We set up. These quesions is very often in terms of the optimal time between crawls. And so that's kind of the difference between when you less crawled it, and whether you should crawl it again or not. But now we're going to rewrite those occasions in terms of the recall rate, and that helps out a little bit because that's nice because the rate is something that's independent of. When was the last time, you re crawled it, and this has a few problems in the sense that the rate could be something that you would have to change through time. So the so the solution, you know, if there's all of a sudden, a huge burst of edits to a single page, that's sort of out of the ordinary. This would me, you know, maybe not optimal in a case. Like that. But other than that, you know, it might still pretty good approximation that happens to politicians when they announced that they're running for office. All of a sudden Wikipedia page goes crazy. Yeah. Yeah. So there's there's. Some cases like that. Where might not be perfect. But for a for most of the your average cases is pretty good. And this is a little bit. I don't know. This is literally how their recall, Raj logic works. It's might be a little bit of an academic exercise. But anyway, and so then we have this. We have a constrained optimization problems same as we had before. But there's another way that we can solve it here. And it's with a method called Macron's multipliers, and this is just a a way that you can solve certain types of constrained optimization problems. That are subject to special rules about the the objective functions and the constraints and the relationships between them I'm not going to get into the details of that too much here. Although there was a pretty good resource that I found when I was reading up on this this weekend, and we'll put it on linear digression dot com. Like kinda some notes from college class or something. But the general point is that there's certain types of conditions. You can have on an optimization problem. And when those conditions are met, which they happen to be in this case than you can use this technique called look raunch, Legrand optimization or Legrand multipliers. And so the way that that works is we have one objective function, and we have two constraints, and what you do is you write a a new function called a Legrand jn and that is your objective function. Plus what's called a Legrand multiplier? Just just a number multiplied by constraint one. And then there's a second Legrand multiplier multiply by constraint to and you add those to the objective function. So you're making this equation of your objective function and your constraints, and it's all in there together in this term called LA Grandjean. And then you take the derivative of the Legrand chin with respect to the term in that equation that you. You want to optimize by said the derivative equal to zero. And then you can solve for the Legrand multipliers. Now that was a lot of math, and I don't expect that it was super easy to follow. But the point is that now you have an expression for what that variable is that your optimizing for which is the recall rate for each page. So now, I have a an equation for each page. What is the recall rate of that page expressed in terms of LaGrange multipliers, and one of those LaGrange multipliers is basically a number and it's the limit that's imposed by the limitation of the host itself. And then there's a second limit that's the limitation of Google. And so we're saying that the recall rate of this page has to be a slower than the constraints imposed by the page mostly constraint this imposed by Google. And then you can solve for those two numbers because Google knows their number and then you can do on a page by page. What's the maximum rate for this page or just say the research rate us to be slower than that? And then that's gonna optimal recre-. Yep. All right. And I'm like glossing over a a lot of mathematical details, it's not quite that simple here. And that wasn't even really that simple because we're doing things like introducing all these new terms into the equation and taking derivatives of weird things. And I was kinda what my point is. You know, it's it's kind of just it's a pager so of algebra maybe in the scheme of things this isn't that bad. And what it does is? It tells you number one, what's the optimal recall rate for this page, and that's really nice because then it's fairly straightforward to say how fast are we supposed to be recalling this page for how fast if we be crawled it in the last few sex. Winds and like use that to make the decision about whether to recall it again or not. And then the second thing that's kind of cool that happens to be true of this LaGrange multiplier technique is you have these two terms that are contributing to the overall rate at which you can recall one of them is coming from Google and one of them is coming from the host, and based on which of those two numbers is bigger you can actually figure out which is the constraint. That's holding you back at any given time. So if the if in solving this system for of equations, you find that the constraint that's really actually holding. You back is coming from Google, then that's their hint that if they were to invest in more computing resources that they could actually increase the the average freshness of the pages versus if they were to find it the other way that's affectively saying that the pages themselves are limiting the recall rate. And it doesn't matter how much more Google spends on computers. They're not going to be able to do any better than what they're doing. Right now says one of. These kind of cool about look Raj multipliers as they tell you, basically, which of the constraints are ending up like having the biggest impact on the overall answer that you get those really cool. And then that allows you and make business decisions like we should spend this many more millions of dollars to increase the page freshness. Buy this much percents or something like that. And that's all quite predictable, and you can tell which kinds of pages will be affected in that way versus won't be able to have their numbers moved. And then you can decide like, okay? Well, what does like when you do all of those cost benefit analyses? How many servers should we throw at this problem becomes an actually solvable problem? Yeah. And it also paralyzes really nicely this way too because each page now, whether you should recall it off is a very simple function of the recall rate, that's in ho imposed by the host and you can peril. If you have many many hosts that you want to be scanning over, then you could just have that number live on one machine for each host. You don't have to do some kind of complicated pooling operation rave to like send all those numbers into some central server that then send them back out. You just like have that assigned each one where it lives locally. And there is one global constraint that has to be updated across all of the machines, which is the overall resource allocation, but if there's just one number that you have to send out from kind of the master node to all of the to all of the worker nodes that's not that bad. So the the point is that this is a solution that also paralyzes quite nicely to relative to the one before. Because there's just. A lot less information that has to be aggregated up, and then broadcast backout nice. That's really cool. I I learned something about web crawling. Good. Yeah. So if this is if this is really interesting to you highly recommend this blog post, and then I'll also include some guy said some of the lecture notes that I found particularly helpful here about how Legrand multipliers work because. Yeah. There are a little bit complicated. They do not lend themselves well to the podcast medium, unfortunately. But they're really really interesting. If constrained optimization is is your jam. I don't know. I mean, I learned all of the ultra that I know via podcast. I'm sure that's not true. But if you wanna learn more if you wanna learner, maybe twenty minutes or so after I up date linear, Deger dot com some point on Sunday. You'll be able to find the election of threes. Digressions is a creative Commons, which means you can share or use it any way you like just tell them. We said hi to find out more about this or any other episode of linear digressions Goto, the near digressions dot com. And if you like this podcast, Gordon, leave us review on I tunes. So other people get to listen to this content. You can always get in touch with either of us. Are emails are Ben at linear digression SICOM and Katie at linear diggers dot com. In case, you have comments or suggestions for feature shows. You can tweet us at Lynn digression. Thank you for joining us. And we'll see you next.

Google Legrand chin Legrand LA Grandjean Raj Gordon Ben Katie SICOM twenty minutes