Building a curriculum for educating data scientists: Interview with Prof. Xiao-Li Meng

Linear Digressions


Hi everyone this is Katie. So this week we have a special interview with a guest. This is professor jowly from Harvard. Statistics Department Shelley thank you for joining us then on here once before so. If you haven't heard that episode you should go check it out. It's all about the Harvard Data Science Review which is a new journal that he started that I'm participating in along with a number of other wonderful folks but this episode we wanted to talk about data science education in particular Because is from your spot sitting on the On the faculty at Harvard. I know that you see a lot of needs and are thinking very carefully about this and I know there's probably also of a lot of interest to a lot of folks in our audience who are trying to think about what should education for data scientists look like Whether they are personally on the right track or not for having the right education to succeed in data science so. I think that there's a lot to talk about their so. Thank you so much for joining us. You are listening to linear. DIGRESSIONS is to start out a little bit. Just a bit of background for folks who maybe didn't hear your previous episode. Although you should we'll put a link to that on LINEAR DIGRESSION DOT COM for folks. You want to go dig into that a little bit. More Sierra professor of statistics. I'm very very involved in what's called the Harvard Data Science Initiative which sounds like not maybe a full-fledged department but said kind of this cross functional team of a number of professors across many different disciplines around Harvard who are interested in data science and recognize it as this thing that is growing up that needs to be addressed in some way educationally? So do you want to talk a little little bit about that. Group that you're part of the HD SL sure and the holiday of Science Initiative is Really University Wise Initiative in put together the by By our provost Alan Gobber and particular. Thank him for writing actually and at and Dettori Education for the first issue of a hobby data science review. And so I sing. It probably will be a very good place for me to start talk about the. They're not education because I think Ford Agreement with him in terms of how do we think about education. He basically said there's three groups of students we should sink about and one group of students probably more like you and me. We want to be a data scientist overeat or statistics. Whatever now everybody's data scientist so these another one that we are the full fledged to sort of trying to be experts so we got PhD's do all those things so why prime interest is in data science itself the Second Second Group A lot of students that they themself they understand the power of the data signs and they wanna use that to advance their own? Feud you'd be a physicist biologist. Did you humanity. You know on his feet. So they'd themselves their interest is not necessarily developing that aside aside you know methodologies series but they want to utilize whatever we developed to really advance still few. That's a second group. That's probably probably the largest school but that's for University of that probably is a pretty pretty large group. The group what Allen identified is essentially. Especially everyone else they WanNa have some basic knowledge of data science Being a citizen you can just digital age. You have to know something about what when all of these data side the fussers about basic idea of understanding what they read the newspapers you know. Don't be fooled by all kinds of claims made in the name of AI. I for example but you have to understand what what the real thing is so basically these other groups and the kind of education program you provide to this group of very very different yes right and in fact. What's interesting is that you know? I am currently involved in creating a new data signs and a graduate chorus at harbor. I'm a team. We have a team about a single to computer. Scientists to Titians actress reset dishes to computer scientists that possibly will be more people involved and we happy debating Amano ourself. Do we want to create. Hey this as the first introductory course two data size all we want to create this as a general education course in data science there is actually a real important difference when when you create a general education which is really serving dessert a group you have to designing such a way that what you have reminded reminders. That the only course there would ever take yeah. It's very different related. But it's quite different than say. Okay now we know those people we'll go on like the first or second group and so we. We struggled quite a bit. Because you know we wanted to boast 'cause we're thinking everybody should do that but but these are not easy. Well let me ask you another question that I suspect is. It's a simple question but one that has a complexity at least for me underneath the surface soup for you. What falls within the scope of data science when someone walks out having taken that? Course what abilities do you need them to have for you to consider yourself successful as an educator absolutely I mean that's people always say it's a million dollar question I want to say. That's that's a billion dollar question because we actually you know what we have been discussing this team of faculty and other post Docs is what we recall. What other learn learning objectives? Yeah that's what we do. The backward design. Let's settle down with a linear objective dentists about what needs to be. You know what needs to be covered but I want to answer yours or the big question like a what what other things should be ignored a science right that just to not to be incredibly hard question because by now the term data assigns revolve into what we call an umbrella term. It's a very broad umbrella tree. Okay Yeah it's it's a very broad. It's very much like a signs. You know people understand their assigns. You so what. Someone body is a scientist right that you know roughly where they do. Scientists physicists chemists biologists so it's very much like that. It's very hard to sing the ballot. If you want to design a quarter said introduction to science like yeah what are you what do you put in that so I I will say that you know for me. I WANNA putting the first thing I wanted putting. This is the price I one of the first thing I wanted to putting to have the whoever they are whether there's for jared occasion of four For this kind of introductory closer to be a scientist is to talk about the data quality. Yeah the first thing I want to talk about is to understand that forget. All these methodologies developed later to sink a bell like you know how you collect the data how data and how do you process data where have serious impact on what you do later. Absolutely you can actually teach that talk about that without getting into any eat. Neither computer science. Endorse sticks people. You know people understand right. There's this whole concept of of garbage in garbage out most understand. The only thing gets these as it gets complicated in the media. Obviously help to create his misperceptions perceptions. Oh tons of data anomaly. Matter but in fact the worst part is what I have tons of data confirms all the kinds of Buys Stan so so that will be you know. We can talk about that. He can go pretty far right to get people into dolls thinking about data signs from data. I guess is about data I so I would definitely the starting point for me but now let's people get excited that it's okay now you have to think think about okay understand this causing how did it go about to collect them right. How do you go about it to reprocess and then well? It just seems far outweigh the computer. Science definitely comes in. Because you know you can't just talk about without doing things right. And you know how to process them. It's very impor- and how to analyze the status of the coming and then along the way no by talking about the size of the data all the all the issues you already bring the whole Essex coming right so the philosopher Comoros and kings in all those things comes in you know very naturally the soda. That's where I would start and I would definitely do some basic. Oh computer cise but I think that if if this course is aimed at the general educational level and the analogy I I use for described as is since I've watched I've always using disciplines wine connoisseurs. They can appreciate wine. I have many many of the mattress. Don't really have too much idea how to make one but they can. You know develop the sophistication to appreciate. And that's what I would do at the level mucus because people understand this is not data. Science itself has the deep end there's all these methodologies right but We're GONNA you know you may not be the ones to actually do it but you should develop enough appreciation when somebody tells you all. I did them aggression for calls waiting for his. You know that sounds wrong right. You need to be able go to pick up on those things. So so that's the that's the level for the General Education for the for the one really want introduce them to the next S. level that a we want the potter learning objective. There will be able to actually do something. Instead of just appreciating in it will be different developing kind of projects which you can actually do some analysis interpreter redoubts and show why the redoubts properties is. Wrong

Coming up next