Software Engineer, Scientist, Google discussed on Linear Digressions
Hey, everybody. So this week, instead of having yours truly, and Ben here to explain some kind of data science this, or that I am here instead with a special guest. This is Joel Bruce, and he is a world renowned, author data, scientists software engineer, and a point of view holder. And we're gonna talk about all of those things in this episode. Joel, thanks for coming on. Thanks a lot for having me. It's good to be here. You are listening to linear digression. So Joel, I gave you a rather expansive introduction, because you are a, a man of many data sciences. So let me ask you to introduce yourself. Tell us a little bit about your background and how you've maybe ended up in the spot where you are right now. Sure. So originally, I'm a math person I studied math as an undergraduate. And then I got some really bad career advice because I mostly got it from math professors. And so after I graduated I went to math graduate school after a couple years of that I realized I did not want to be a mathematician, and I did not want to be a math PHD. So I dropped out and sort of went down. Of very, very whiny career path the that probably is not worth paying to every detail of I currently my job title is research engineer. I worked at the Helen stoop, artificial intelligence Seattle, which is a research nonprofit, we do basically a research and he was like an e and commonsense reasoning and revision and extracting information from, for example, scientific papers and so my job, I'm on a team called Alan and be and in addition to having an all researchers on the team, we make a library called alanon. I'll be the deep learning library. Don't be researchers to do research. And so most of my job is to work on that library for I was at air to have been there a little over three years. It was a Google as a software engineer for a couple of years. And before that, I did disci- at a bunch of startups. I read a book called data science from scratch that came out. In two thousand fifteen and the second edition just came out last month. So pure in the market for introductory data science book do science from scratch is the one I recommend, because I wrote it, I various other have various other sort of minor claims defame. I'm the I don't like Jupiter notebooks guy last year. I gave a talk at Jupiter. Con called I don't like though books and then it went somewhat viral. I wrote a blog post once about solving fizz buzz and an interview situation using tensor flow. And also went somewhat IRO. That's it. Yeah, that's a favorite in our in our department. We like to win around. Yeah. I, sometimes I make live coding videos are used to do the stunt where I would live code neural network library from scratch, you know, I have various, I'm sure there will be more stupid stunts like that in the future. I just don't know what area. So, yeah, that's in prepping for this. I was kind of struggling with whether to describe you as a software engineer, who knows a lot about data scientists about data science. If you think of yourself as a data scientist or researcher, who has strong opinions about software engineers. I mean to talk about your book a little bit data science from scratch. I think one thing that it really emphasizes in a way that I think is super valuable is a, a way of thinking about data science that has a strong emphasis on software engineering, and what's the right way to be thinking about these problems, not just from the methods, but from kind of a coding standpoint? So how do you think about that? Do you think of yourself as a person who is thinking about how to solve data science problems? And the software engineering is a means to an end or is, is there something, you know, maybe calling back to your software engineering past. That's more a little more deeply held than that. So I was a data scientist before I was a software engineer. I was working in a start up called villa metrics, and we were doing analytics on enterprise collaboration data so going into big company, and look at us emailing whom who's meeting with whom how often what topics things like that. And I was a data scientist. I was not at all software engineer, but because it was an analytics company, the data science sort of was the other. And so I wrote a lot of production code and a of a disturbingly large fraction of the product were stupid things that I built in D three kind of for fun in show to the CEO, and he said, I'll let ship that. So what happened was this product at a lot of coding that was written by me, data scientists not a software engineer, and it was not good code and supporting it was a real pain in the ass. And so that was one thing. The second thing is I discovered that I actually really liked writing that production code and I wanted to be good at it. And I wasn't getting good at it by being data. Scientists running production coat, so I made sort of. A somewhat deliberate choice to say, I, I really want to get into software engineering, and build out that part of my skill set so that I can be more valuable and do more of that kind of work. So I kind of took a hard pivot and said, I'm going to do less data science, and more software engineering. I went to Google and my job at we will had nothing to do with it. A science in the slightest, it was building back in systems, and C, plus us and building benchmarking tools to help ad sales people somewhere ads and things like that. And, and so that kind of accomplished, Mike goals of learning more about how do I build good software. And what is offering during best practices look like but at the same time after a while I thought I wanna be kind of closer to the data, and so I tried to sort of bring myself back, just how ended up at two in the sort of hybrid research, engineering, kind of role where I work with researchers. I'm expected to understand deep learning and right deep learning code, but at the end of the day, I'm really kind of a software engineer who understands research. So that's really cool. So tell me a little bit more one thing. I'm wondering about is maybe contrasting, the team that you were on it at volume metrics earlier on if you years ago because that sounds a lot, like probably what a lot of data scientists who are working at various companies are working with right now. Like they're kind of a team of one or they're out there trying to figure out what good code looks like versus the team that you're building out and the work that you're doing now at AI to, like, what's, what's the some hypothetical future state that you think is like a little bit cooler and more advanced that you see that you're building around yourself right now. So I was very early of all metrics. I was a second employees. So the first year I was there it was the CEO the lead developer me, and the lead developer was very opinionated. And he was really a geek about software engineering that he read he would read all sorts of books about software engineering. And when we would disagree about the way to build things he would really try and like, bulldoze, mayo will Kent Beck says this, you know, an uncle Bob Barr says this and like I I really struggled, like keeping up my end of those conversations in this is I don't recommend doing this. But I sorta recommend doing this, eventually, I memorize the names of all the people. He would sort of site as gospel. And sometimes in arguments, I would say, oh, well Kent Beck says that no, we should do things my way you really can't Beck said that. No. But so that was kind of fun to mess with them. So anyway, there was so little team there that the, the standard sort of division between data science and software engineering, just sort of didn't exist. We were three people in a room and as time went on. We basically tried to bring in a, you know, a CTO who could come in and say, we need to have more discipline around writing unit esta around code reviews. Around various standards around how we do our source control. And they ended up bringing in someone that I did not get along with, and that sort of, I would say, that's why I left out certainly didn't help. And so the difference. I feel very fortunate in that my team at I two the team of people building, alanon library that we really share a deep commitment to strong software engineering discipline. So if you were to look at Allen e you know, I'm my own horn and a little bit here. But it's really one of the like highest quality deep learning code bases. You will find it uses Eitan type annotations everywhere. And we is my to check them as a pre commit check, we have a pilot recommit check. We have extreme test coverage be have, you know, automated documentation building, and we've appre commit check to make sure that the documentation actually builds. And so and, and we're very hardcore about this, even when people come in with, you know, external Puerto quests, right? Okay. You can contribute, but you gotta write more tests in you, gotta typing your coat and you got to do all the stuff. So I feel very fortunate to be on a team that sort of is of a single mind on. This code quality issue right now, and it's very hard for me to imagine ever, again, working on a team that is not the case, cool, cool. And I think that's I've been, you know, one thing I like about the talks that you give like you mentioned a couple of them at the top. But once I've read recently are the, the one about I don't like notebooks very provocative title. There was one more recently about reproducibility, and how the idea reproducible science has a lot of very strong analogs with soccer engineering best best practices principles what have you and the this lies themselves like not all talks are like this. But you can actually sort of follow the narrative of the talk through the slides, you didn't have to be there in person. So some folks, you're listening to this might have actually seen those talks. But for those who haven't we'll post links on the on, leaner, digression dot com and you can actually go in and get most of it out. But. The, the reason I mentioned this is it seems to be a thing that as you're going out and giving talks publicly talking about, you know, generally, what you think people should be paying attention to that. It's, it's not just a, a local thing that you've set up on your team, where they're using these offer best practices, but trying to bring that message to the broader data science community and for folks, who either don't know that this is a way to think about building things or no it, but don't really know how to start unpacking that that seems to be, you know a pretty sweet spot for some of this stuff that you're talking about. That's right. So with the reproducibility, there's, there's sort of to sort of mirrored aspects to it, one is that if you care about reproducibility and a lot of researchers do, or at least say, they do then, you know, adopting software engineer. Best practices. Anything from like unit testing to dock rising things to making a real clean separation between your library mental code those things will help you accomplish that goal of reproducibility. And then there's this, flipside, which is a kind of an angle, I been taking a little bit more recently, which is that, if you're an engineer the thanks that researchers need to adopt software during best practices more than, you know, offering up this carrot of reproducibility. If you do all these things, you'll get reproducibility.