18 Burst results for "Hadley Wickham"

"hadley wickham" Discussed on Not So Standard Deviations

Not So Standard Deviations

06:54 min | 5 d ago

"hadley wickham" Discussed on Not So Standard Deviations

"It's SORTA interesting because, yeah. I guess like blogs and twitter and all that. We're like part of this older older model that like. Decoupled. Those things. Right. Yeah. It was kind of an extension at this early website. We're just like let's take the old model and kind of like Electronic Mickey Latronic, right? Yeah, and It didn't fit. You know that you had like banner ads and you had just like a newspaper right and. So it just didn't and you could do so much faster. It's not like once a day. It's like all the time. Yeah. So that was. Yeah. So that's One thing, the I mentioned about podcast. The other thing I feel like I didn't make this point very well because it was like at the very end. But I, the last point eight was it was like if you don't like it, don't do it. For sure. I mean, there is I think at least an academic. So I feel like there is a lot of pressure. Seemingly to like do to use every tool you know. To go give talks to start a blog to have a social media to have your podcast. We all the things. Right. I tell people that that's like the I mean people don't really ask me a like I would say like five years ago people were asking all the time lake. How do I get into twitter and like literally my response now is lake. Develop an addiction with it like. It's like it's like it's it's healthy that you're not on there and the way that people who are engaged all the time are just like it has become like a workaholic kind of quick hit. Like, I'm like develop a hole inside in it with this. and. Then, you'll get it but I basically, I'm like, if you don't, if you don't use it, don't start like. It's not pulling you. So don't try to like develop that. You know what I mean. Yeah. But I totally I always try to sell people. And it's it gets so frustrating 'cause. Isolate, can I just add insert run thing here? Real quick I think with both facebook and twitter. There is an element of like you have to use it. because. It's like, I. Think if you if you use any product in the world and like it even harmed due to the fraction of the extent that some of these. Websites, do you believe? Like Why. Don't you use this knife? It's because it cut me whatever. If a product, if you don't like a product, you don't use it. That's like a fundamental rule, right? But for some reason with like social media, it's like. Even, though I don't like it. I. Still have to use it for some reason. Yeah, it's like. The PR. Yeah. It's just like Oh. The you would never have companies didn't have a twitter presence like. It's just I. Guess you though if you're like an accounting firm or something but. I would argue most companies have a twitter presence, right? I mean. Because both most companies never heard of before I mean, it's just. Consumer facing products, I. Guess I should say, yeah, that will like at a certain scale at a certain audience. Anyway Sorry I had to stick that in there as you're no, no I agree and it's like I mean everyone I feel like the social media age everyone became their own pr person without realizing it. You know what I mean. Like A and everyone, else's PR person like Instagram, you're like promoting products for free. You know or you're like you become a model like it. It's like people like weirdly backed into like careers and rolls that they never actually wanted. They never liked intentionally chose to do. They just saw other people doing it. So like we've all become little micro marketers you know and like, and that I mean more than anything. That's what got me popular on twitter 'cause like I'd be like tweeting out live tweeting a conference and the conference speaker would follow me like that's how Hadley Wickham followed me as live. Tweety like some sort of thing we are doing and he like you know followed me, and so it is like I was being his PR person and. I'm just more aware of that. Now you know, versus in the past I was like mimicking the behavior is law. And I genuinely want to support people like you know there is like a degree of like. I do want to support people like I. Want Support Products I like I want to support people like. So I I do want to opt into being people's PR. But I wanted to know I'm doing it. Right. That's the. So. There's said to, which though is academics like us to be your own Pr Persson. 'cause you're running your little start up. But you don't have to do all the things, right. And I tell people that so much. Now, actually it's partly almost defensive where I'm like listen. I'm intentionally following one career path and cultivating my life a certain way, and that's not the only way, and I think actually most scientists wouldn't want to do it this way, and that's fine like develop deep expertise. So the company like I don't, there's so many ways to be successful that don't involve as much pr you know. and. So it's like I try to say that because I think what really bums me out is that people see the public presence and they see the PR and they're like that person thinks they're better than me and I'm just like. That's not that's not the transaction going on right now like. They're not trying to I mean I know a lot of people do try to act like they're like deep learning. You know expert. So they can like sell their coarser enough stuff like that. But there's lots of people who are not trying to do that at all. Yeah. They perceived as trying to do that, and that's that is like. That just bums me out because it makes people adversarial when it's like. We don't need to be a measly. But it's social media. We do need to be enemies L. exactly. I've been watching the NBA and it is like. The warriors aren't in the bubble. uh-huh. Have you been watching this? Not really know and like I fully admit that like I've gotten back into basketball in the last few years? This isn't link. I'm still learning the people and. It's kind of like I'm like investing in a hobby. But. It's hard right now 'cause like everyone in there are like people I don't like. Like I don't WanNa. See Lebron succeed. And it's such a downer. Challenge that, and just be like, can I just appreciate the sportsmanship in like So I don't know I'm thinking about like like the sport of it all like you know it's like people went enemies lane, job..

twitter Mickey Latronic basketball NBA warriors facebook WanNa Lebron Tweety Hadley Wickham
"hadley wickham" Discussed on Linear Digressions

Linear Digressions

07:10 min | 3 months ago

"hadley wickham" Discussed on Linear Digressions

"You don't necessarily know how to interpret that data and so I guess it's kind of on the graphics maker to use the grammar to make sensible plots veterans easy to comprehend. Yeah I mean one of the most interesting parts of this paper that actually found was Hollywood come starts to talk about how he thinks about defaults. Because as we're talking about this you can tell there's potentially many layers of complexity that you have to think about in order to compose a graphic that has all these things figured out and of course. If you're someone who's writing a programming language you might need to have all of that complexity get captured in the package or it in the language somewhere but for somebody who's using it. You don't necessarily want to have all of that complexity front and center. Instead you WANNA have a lot of it hidden away under sensible defaults and so there are some places where Hadley Wickham is musing about. What are what are the choices that you WanNa make about what the sensible defaults are and if someone violates a default you know what do you do? You WANNA warn them. Do you want to just give them back and move on you to make sure? It's a conscious choice. Because if they're violating default without realizing they're violating a default. That might not be good. Yeah one of the one of the assize. I think I read somewhere. It was on the topic of how it's hard to make good plots in Cartesian in polar coordinates rather was used like. Yeah maybe we should just like. Every time he tries to plot something in polar coordinates. Just give them a warning. This is probably a bad idea. That was very opinionated statement. But I thought it was really funny But yeah there's all kinds of stuff that can go wrong at this point like I was. I was reflecting on as I was reading. This and thinking about this stack histogram example this bug that used to confound me all the time when. I was starting out in physics in the eventually I got really good at just working around accounting for. But here's the bug was the plotting software that I would use. That would be making these stack. Instagram's right it would by default set the the scale of the Y axis according to the first layer. That you plotted so oh so yeah so what I would do. Sometimes I guess this didn't come up. And stacked histogram that would come up and just like histogram overlaid on each other so you know. I didn't have them adding up off each other but instead just had like you know they might be like translucent and so I could see all the district different distributions at once but I would inevitably just plot the first one that comes to mind. And then plot the second the third. And you know. Let's say that. The peak of the first histogram is at one hundred but the distribution for the second histogram is a little bit different it peaks. It one hundred thirty is going to get cut off every time we have these these gross gross plots where I have to like figure out which one is the highest one and then you say that's the top and then you set the scale. According to that one you draw the second one but then you had to lake redraw them so that they were always appearing in kind of the same the same order and you didn't have people being like you know your colors are all different and it was just such a such a hassle. These are the things that are so Tough to get right for exactly these reasons. And then they they give you really gross plots or bad user experience. Anyway I DIGRESS. The last The last piece of a layered grammar component is a facet and I think this one's one of the easier wants to understand So a facet is the notion that we can have not just one plot. That's you know a single image and a frame. And that's the end of it. But instead many plots that we make are composed of different images that are tiled like next to each other so we might have a histogram of all of our plots let's say histogram of all of our episode downloads where the scale of the x axis is the last three months. And then there might be another plot that we make at the same time that same data. But it's the last twelve months so that there might be multiple different views of the same. Or maybe we would have the same Download histogram but the one on the left is from the United States in the one on the right is from European countries So there's other ways that you can slice and dice your data and put them into multiple different basically multiple different images that sit side by side or you lay them out in a grid or whatever so that you can see in the entire the entire visual effect as a whole is that you have multiple different types of data. You're comparing with each other in each of these different facets. So that's the last pieces that you can arrange all of the different plots that you're making side-by-side it's interesting to pull this apart As a native English speaker I don't think about grammar very consciously very often but in any other languages that I'm learning I'm I think about it all the time as a consumer of Graphics. I don't really think about these different elements very often but obviously very important and having a conscious understanding of them probably makes for better graphs. Yeah I think we most. The vast majority of us have Never Britain. Fundamental data visualization software. And so we yeah. We just take for granted that these things just work the way we understand but yeah I was finding myself like really really really thinking hard. About what a stacked history was this weekend as that's not even like that complicated of a plot really It's something that you know. I would I some days. I make thirty of them a day and I don't think anything about it but Yeah once you start to decompose it and you say like what are the rules for making this visualization and also that same set of rules when applied to other types of data that have other types of statistical transformations or other types of geometry's that still like mostly? GonNa work and mostly give you something that looks aesthetically intuitive. Like it's it's I find it like a really interesting Thing that really. It's a thing I take for granted but it stretches your mind as soon as you start to try to unpack it so for those of you. Who are interested will of course have a link to the Source content this highly wickham piece on a LINEAR DIGRESSION DOT COM. It's got a decent number of Visual examples of illustrations of some of these concepts But hopefully this guided journey through a single type of data. Visualization gives you a greater appreciation for For Matt Plot Lib or.

Hadley Wickham Matt Plot Hollywood Instagram United States Britain
"hadley wickham" Discussed on Linear Digressions

Linear Digressions

13:53 min | 3 months ago

"hadley wickham" Discussed on Linear Digressions

"Hey Katie Hi ben high you doing? What are we talking about today? We're talking about the grammar of graphics. The Grammar of graphics yeah. This is a visual episode in audio form. So let's see how this goes. This can be okay. You're listening to linear digressions. Okay so I know what? The term grammar means as it applies to language It's kind of the the rules about how you would construct sentences and I'm sure that there are many people who find better than me but that's kind of how I think about it. Yeah that when we are using language to communicate. There's an order in which we place subjects in verbs and objects. There's a recurring to language in the sense that you can have phrases. That have substructure. There's also Orders in which things tend to appear like I would say I would always say the big black car I would never say the black big car. Yes grammar is yes this this thing. That's a little bit hard to define but once you start to think of it is pretty common to think of it. In terms of the rules of language I actually was reading. Something really interesting about this It's so I just found it a tweet by Matthew Anderson things native English speakers. No but don't know why we know and the quote is adjectives in English. Absolutely have to be in the following order opinion size age shape color origin material purpose noun. So you can have a lovely little old rectangular green French silver WHITTLING KNIFE. But if you mess with that word order in the slightest. You'll sound like a maniac. It's an odd thing that every English speaker uses that list but almost none of us could write it out yeah. I think I've heard something similar to so I think that was what I would like drawing on a little bit in that Great Green Great Dragons. No Great Green Dragons. Yeah exactly so. We're not talking about language in this talk of graphics. What how what does that mean yes? So that's what we're going to spend the next fifteen minutes talking about a little bit but the rough idea here. Is that so just like? There's an expectation that you have about the word order or the construction of phrases when you're listening to someone speaker when you're reading a sentence. There's a similar idea. Perhaps for visualizing drawing visualizations of data or consuming visualizations of data. Things that you expect to see whether or not you even really think about it. Or when you're composing a visualization things that you're planning for or taking into account that again. Maybe you aren't thinking about but this comes up in a really deep way if you are say. Dealing with data visualization software at a at a pretty fundamental level. So for those of you who are into our universe and particularly The tidy verse Hadley Wickham 's corner of the our universe. You're probably familiar with a package called G. G Plot to which is a visualization library. In our that's can famously makes very beautiful graphics especially with its its defaults make for really nice graphics. the gee-gee NJIT PLOT TO REVERSE TO GRAMMAR OF GRAPHICS and own. And actually. Yeah the most of the research that I did for. This episode was reading a twenty five page paper. That had they wickham wrote about how he thinks about. And how the field a general thinks about the grammar of graphics. Data visualization says where. We're going to talk about very cool. I don't even know where to start in thinking about this. This is this is GonNa be neat. Yeah this this was a pretty challenging Topic for me to try to understand because it gets into theory pretty quickly of like what is a facet and what is the scale and what is A. What's the difference between a mapping to an aesthetic and coordinate system I think There's certainly a lot to unpack if you're just really excited about this idea but rather than getting into some of these kind of esoteric concepts especially concepts that are ESA teric without having examples to look at. I wanted to illustrate the main pieces of the grammar of graphics as highly working for example talks about it using an example of a visualization. That probably a lot of people are really familiar. With and how that illustrates a few of the big important concept that again. We all kind of take for granted probably in our day to day. Visualizations Okay so what's the. What's the example graphic then? All right let's talk about a stacked histogram stacked histogram yet can you? Can you describe it for me? Yes so let me give you an example of stacked histogram ice to make all the time when I was a physicist so when I was a physicist we used to make lots and lots of plots where what you are trying to do was look at distributions of particles that you are getting in your detector and in general there were lots of different kinds of particles that were classified as what we would call background so these were types of particles that were you know interesting but not what we are really searching for and then there were in certain situations. You'll be looking for signal particles as well so this might be like a higgs bows on if you're doing a heck search and so when you were creating visualizations of your data. What you're looking for is okay. Do we have a distribution of data? That's more consistent with there. Only being background present or does it look more consistent with background plus signal for the second cases like Oh maybe we discovered some new physics or something so we would think a lot about how to visualize background and when you're doing that analysis you tend to have different kinds of particles that are coming in from different places in your detector and so if you just look at one of those systems at a time you're going to get an incomplete picture of all of the particles instead what you wanted to layer them all on top of each other so that you have yes so that you have like a picture of the overall distribution of the particles that you see but you also have them stratified by the different types of physics processes that they correspond to and so you're kind of stacking each of those strata on top of each other and you have a visualization that shows you know each of them separately but also all of them adding together. That's roughly what a histogram is God. I think I've seen these before are I'm sure I've seen them in many places but I'm thinking about when you look at when you do a software release and you look at all of the different All of the different computers that are running the software. And what version. They're on and you can see how people have upgraded. Each version of the software will be represented by different color. And over time. You'll see them kind of go and peak and then as new software later is released than the previous version will kind of trail off and The I guess the representation that you're talking about is showing all of that in a single graph with time. Let's say being the x axis and in in my example. It's always at one hundred per cent hike because every user is on some version but you can see the dip the I guess the distribution at any given point of those versions yeah or a few decided to represent it instead of as a percentage of the whole if you had your y. Axis was allowed float and instead it was the total number of users using that system than you could imagine like the overall rate could actually go up and down as users join. Leave your your system or you're right are using your software or whatever so. I haven't I have an image in my head now. Okay great and so hopefully for most of the folks who are listening to this. Hopefully you do too. But if you don't or if you're really struggling to think about what a stacked histogram might look like an might be worth taking like five seconds to Google this on your phone to see like a mental snapshot because it's I don't imagine that the rest of this will make tons of sense if you have no idea. We're talking about so okay So stacked histogram how do we think about this in terms of the grammar of graphics so let me layer in a few of the fundamental ideas of grammar graphic so either taking place in a very explicit order to the first layer the most foundational layer of when you need to make? Data visualization is What is the data? Set that you'RE GONNA BE VISUALIZING. And how does that map from The the variables in the data set to a set of aesthetics. So what's the data set? Let's talk about that first. Let's use my example of. Let's use your example. Actually I think that's probably a little bit more familiar to our listeners than like a particle physics date set but instead we have some notion of a data set that has all of the users of our software through time and the type of what did he say. It was like the version of the software that they're using yet and actually. Can I make this a little bit? Meta and tweak this and we'll say this could be a linear digressions episode downloads. Like we can go. We can go into our hosting provider and we can see how many people download on on a given day and so of course the day after we release an episode we see a lot of downloads and then maybe two months ago by and now that episode is a small sliver. Yeah yeah no I have seen that right where you see like a spike on the day after an episode released and then it tapers off but of course there's a new spike that pops up the week after because that's the next episode so there's kind of like layers of these decaying distribution sit on top of each other so if you want the total number of downloads. For any given day you have to add up the contributions from all of the different episodes. The one that dropped the most recently. But then there's probably also some episode from last week that still getting some download. There might be an episode from a few months ago that there's a new listener who just subscribed and they're downloading some stuff from the backlog so there might be a little contributions from those so yeah you layer them all top of each other and then that all adds up to the total number of downloads that you have for the day and it'll number you just look at the top of the graph at that exactly but you can you know break it. Apart into each component layers right okay. So let's think about the underlying data set if you're soundcloud. That's keeping all of this download all these download records In its core form with that probably looks like is a table. That has a few columns. It's like the same identifier number or the the title of the episode. So it's like which episode is this? It's the day that the day in the probably the time that it got downloaded. And I think that's all that you need. So then when you're creating the histogram you're working with that as your default data set and said the second thing that you need this level you need a default data set and you need a set of variables to aesthetics. So what are the aesthetics that we have here while we have some notion of the left to right dimension of the histogram is through time so we have some kind of notion that where a data point gets represented left? Her right gives us some kind of information about when it happened in this case. So that's an aesthetic choice. And one of the other notions that is worth introducing at this stage. But we're going to really develop in the next step In terms of aesthetics is that there might be different. Let's say different colors for each of the episodes that we WANNA represent. So maybe the episode that I released today shows up in my histogram as red of those downloads But then the one that we released a week ago might be in yellow and a one one from the week before. That might be in blue. So that when you actually look at the the the visualization you if you have a stacked histogram but all of the stacks are the same color then you're not gonna be able to officially stink distinguish them so there's also this notion that there's some way of mapping each of the different data points onto some kind of aesthetic representations in this case were kind of giving the example of color but it could also be like in the case of if you're doing a scatter plot it could be like are these stars are they little circles triangles. Or whatever okay so we have our default data said we have a mapping from the variables to Something that's like aesthetically. How are they being depicted in the in? The graph second component of a grammar of graphics is introducing the notion of layers so again doing explaining this by example more so than by abstract definition You can think of a plot not as being everything it is just dumped onto a set of axes or whatever but instead that each of the different things that you want to represent is kind of like a layer of visualization. Get the gets put on top..

Hadley Wickham Katie Hi physicist NJIT Google Matthew Anderson ESA higgs
"hadley wickham" Discussed on Linear Digressions

Linear Digressions

10:04 min | 3 months ago

"hadley wickham" Discussed on Linear Digressions

"Hey Katie Hi ben high you doing? What are we talking about today? We're talking about the grammar of graphics. The Grammar of graphics yeah. This is a visual episode in audio form. So let's see how this goes. This can be okay. You're listening to linear digressions. Okay so I know what? The term grammar means as it applies to language It's kind of the the rules about how you would construct sentences and I'm sure that there are many people who find better than me but that's kind of how I think about it. Yeah that when we are using language to communicate. There's an order in which we place subjects in verbs and objects. There's a recurring to language in the sense that you can have phrases. That have substructure. There's also Orders in which things tend to appear like I would say I would always say the big black car I would never say the black big car. Yes grammar is yes this this thing. That's a little bit hard to define but once you start to think of it is pretty common to think of it. In terms of the rules of language I actually was reading. Something really interesting about this It's so I just found it a tweet by Matthew Anderson things native English speakers. No but don't know why we know and the quote is adjectives in English. Absolutely have to be in the following order opinion size age shape color origin material purpose noun. So you can have a lovely little old rectangular green French silver WHITTLING KNIFE. But if you mess with that word order in the slightest. You'll sound like a maniac. It's an odd thing that every English speaker uses that list but almost none of us could write it out yeah. I think I've heard something similar to so I think that was what I would like drawing on a little bit in that Great Green Great Dragons. No Great Green Dragons. Yeah exactly so. We're not talking about language in this talk of graphics. What how what does that mean yes? So that's what we're going to spend the next fifteen minutes talking about a little bit but the rough idea here. Is that so just like? There's an expectation that you have about the word order or the construction of phrases when you're listening to someone speaker when you're reading a sentence. There's a similar idea. Perhaps for visualizing drawing visualizations of data or consuming visualizations of data. Things that you expect to see whether or not you even really think about it. Or when you're composing a visualization things that you're planning for or taking into account that again. Maybe you aren't thinking about but this comes up in a really deep way if you are say. Dealing with data visualization software at a at a pretty fundamental level. So for those of you who are into our universe and particularly The tidy verse Hadley Wickham 's corner of the our universe. You're probably familiar with a package called G. G Plot to which is a visualization library. In our that's can famously makes very beautiful graphics especially with its its defaults make for really nice graphics. the gee-gee NJIT PLOT TO REVERSE TO GRAMMAR OF GRAPHICS and own. And actually. Yeah the most of the research that I did for. This episode was reading a twenty five page paper. That had they wickham wrote about how he thinks about. And how the field a general thinks about the grammar of graphics. Data visualization says where. We're going to talk about very cool. I don't even know where to start in thinking about this. This is this is GonNa be neat. Yeah this this was a pretty challenging Topic for me to try to understand because it gets into theory pretty quickly of like what is a facet and what is the scale and what is A. What's the difference between a mapping to an aesthetic and coordinate system I think There's certainly a lot to unpack if you're just really excited about this idea but rather than getting into some of these kind of esoteric concepts especially concepts that are ESA teric without having examples to look at. I wanted to illustrate the main pieces of the grammar of graphics as highly working for example talks about it using an example of a visualization. That probably a lot of people are really familiar. With and how that illustrates a few of the big important concept that again. We all kind of take for granted probably in our day to day. Visualizations Okay so what's the. What's the example graphic then? All right let's talk about a stacked histogram stacked histogram yet can you? Can you describe it for me? Yes so let me give you an example of stacked histogram ice to make all the time when I was a physicist so when I was a physicist we used to make lots and lots of plots where what you are trying to do was look at distributions of particles that you are getting in your detector and in general there were lots of different kinds of particles that were classified as what we would call background so these were types of particles that were you know interesting but not what we are really searching for and then there were in certain situations. You'll be looking for signal particles as well so this might be like a higgs bows on if you're doing a heck search and so when you were creating visualizations of your data. What you're looking for is okay. Do we have a distribution of data? That's more consistent with there. Only being background present or does it look more consistent with background plus signal for the second cases like Oh maybe we discovered some new physics or something so we would think a lot about how to visualize background and when you're doing that analysis you tend to have different kinds of particles that are coming in from different places in your detector and so if you just look at one of those systems at a time you're going to get an incomplete picture of all of the particles instead what you wanted to layer them all on top of each other so that you have yes so that you have like a picture of the overall distribution of the particles that you see but you also have them stratified by the different types of physics processes that they correspond to and so you're kind of stacking each of those strata on top of each other and you have a visualization that shows you know each of them separately but also all of them adding together. That's roughly what a histogram is God. I think I've seen these before are I'm sure I've seen them in many places but I'm thinking about when you look at when you do a software release and you look at all of the different All of the different computers that are running the software. And what version. They're on and you can see how people have upgraded. Each version of the software will be represented by different color. And over time. You'll see them kind of go and peak and then as new software later is released than the previous version will kind of trail off and The I guess the representation that you're talking about is showing all of that in a single graph with time. Let's say being the x axis and in in my example. It's always at one hundred per cent hike because every user is on some version but you can see the dip the I guess the distribution at any given point of those versions yeah or a few decided to represent it instead of as a percentage of the whole if you had your y. Axis was allowed float and instead it was the total number of users using that system than you could imagine like the overall rate could actually go up and down as users join. Leave your your system or you're right are using your software or whatever so. I haven't I have an image in my head now. Okay great and so hopefully for most of the folks who are listening to this. Hopefully you do too. But if you don't or if you're really struggling to think about what a stacked histogram might look like an might be worth taking like five seconds to Google this on your phone to see like a mental snapshot because it's I don't imagine that the rest of this will make tons of sense if you have no idea. We're talking about so okay So stacked histogram how do we think about this in terms of the grammar of graphics so let me layer in a few of the fundamental ideas of grammar graphic so either taking place in a very explicit order to the first layer the most foundational layer of when you need to make? Data visualization is What is the data? Set that you'RE GONNA BE VISUALIZING. And how does that map from The the variables in the data set to a set of aesthetics. So what's the data set? Let's talk about that first. Let's use my example of. Let's use your example. Actually I think that's probably a little bit more familiar to our listeners than like a particle physics date set but instead we have some notion of a data set that has all of the users of our software through time and the type of what did he say. It was like the version of the software that they're using yet and actually. Can I make this a little bit? Meta and tweak this and we'll say this could be a linear digressions episode downloads. Like we can go. We can go into our hosting provider and we can see how many people download on on a given day and so of course the day after we release an episode we see a lot of downloads and then maybe two months ago by and now that episode is a small sliver.

Hadley Wickham Katie Hi physicist NJIT Google Matthew Anderson ESA higgs
The Grammar Of Graphics

Linear Digressions

10:04 min | 3 months ago

The Grammar Of Graphics

"Hey Katie Hi ben high you doing? What are we talking about today? We're talking about the grammar of graphics. The Grammar of graphics yeah. This is a visual episode in audio form. So let's see how this goes. This can be okay. You're listening to linear digressions. Okay so I know what? The term grammar means as it applies to language It's kind of the the rules about how you would construct sentences and I'm sure that there are many people who find better than me but that's kind of how I think about it. Yeah that when we are using language to communicate. There's an order in which we place subjects in verbs and objects. There's a recurring to language in the sense that you can have phrases. That have substructure. There's also Orders in which things tend to appear like I would say I would always say the big black car I would never say the black big car. Yes grammar is yes this this thing. That's a little bit hard to define but once you start to think of it is pretty common to think of it. In terms of the rules of language I actually was reading. Something really interesting about this It's so I just found it a tweet by Matthew Anderson things native English speakers. No but don't know why we know and the quote is adjectives in English. Absolutely have to be in the following order opinion size age shape color origin material purpose noun. So you can have a lovely little old rectangular green French silver WHITTLING KNIFE. But if you mess with that word order in the slightest. You'll sound like a maniac. It's an odd thing that every English speaker uses that list but almost none of us could write it out yeah. I think I've heard something similar to so I think that was what I would like drawing on a little bit in that Great Green Great Dragons. No Great Green Dragons. Yeah exactly so. We're not talking about language in this talk of graphics. What how what does that mean yes? So that's what we're going to spend the next fifteen minutes talking about a little bit but the rough idea here. Is that so just like? There's an expectation that you have about the word order or the construction of phrases when you're listening to someone speaker when you're reading a sentence. There's a similar idea. Perhaps for visualizing drawing visualizations of data or consuming visualizations of data. Things that you expect to see whether or not you even really think about it. Or when you're composing a visualization things that you're planning for or taking into account that again. Maybe you aren't thinking about but this comes up in a really deep way if you are say. Dealing with data visualization software at a at a pretty fundamental level. So for those of you who are into our universe and particularly The tidy verse Hadley Wickham 's corner of the our universe. You're probably familiar with a package called G. G Plot to which is a visualization library. In our that's can famously makes very beautiful graphics especially with its its defaults make for really nice graphics. the gee-gee NJIT PLOT TO REVERSE TO GRAMMAR OF GRAPHICS and own. And actually. Yeah the most of the research that I did for. This episode was reading a twenty five page paper. That had they wickham wrote about how he thinks about. And how the field a general thinks about the grammar of graphics. Data visualization says where. We're going to talk about very cool. I don't even know where to start in thinking about this. This is this is GonNa be neat. Yeah this this was a pretty challenging Topic for me to try to understand because it gets into theory pretty quickly of like what is a facet and what is the scale and what is A. What's the difference between a mapping to an aesthetic and coordinate system I think There's certainly a lot to unpack if you're just really excited about this idea but rather than getting into some of these kind of esoteric concepts especially concepts that are ESA teric without having examples to look at. I wanted to illustrate the main pieces of the grammar of graphics as highly working for example talks about it using an example of a visualization. That probably a lot of people are really familiar. With and how that illustrates a few of the big important concept that again. We all kind of take for granted probably in our day to day. Visualizations Okay so what's the. What's the example graphic then? All right let's talk about a stacked histogram stacked histogram yet can you? Can you describe it for me? Yes so let me give you an example of stacked histogram ice to make all the time when I was a physicist so when I was a physicist we used to make lots and lots of plots where what you are trying to do was look at distributions of particles that you are getting in your detector and in general there were lots of different kinds of particles that were classified as what we would call background so these were types of particles that were you know interesting but not what we are really searching for and then there were in certain situations. You'll be looking for signal particles as well so this might be like a higgs bows on if you're doing a heck search and so when you were creating visualizations of your data. What you're looking for is okay. Do we have a distribution of data? That's more consistent with there. Only being background present or does it look more consistent with background plus signal for the second cases like Oh maybe we discovered some new physics or something so we would think a lot about how to visualize background and when you're doing that analysis you tend to have different kinds of particles that are coming in from different places in your detector and so if you just look at one of those systems at a time you're going to get an incomplete picture of all of the particles instead what you wanted to layer them all on top of each other so that you have yes so that you have like a picture of the overall distribution of the particles that you see but you also have them stratified by the different types of physics processes that they correspond to and so you're kind of stacking each of those strata on top of each other and you have a visualization that shows you know each of them separately but also all of them adding together. That's roughly what a histogram is God. I think I've seen these before are I'm sure I've seen them in many places but I'm thinking about when you look at when you do a software release and you look at all of the different All of the different computers that are running the software. And what version. They're on and you can see how people have upgraded. Each version of the software will be represented by different color. And over time. You'll see them kind of go and peak and then as new software later is released than the previous version will kind of trail off and The I guess the representation that you're talking about is showing all of that in a single graph with time. Let's say being the x axis and in in my example. It's always at one hundred per cent hike because every user is on some version but you can see the dip the I guess the distribution at any given point of those versions yeah or a few decided to represent it instead of as a percentage of the whole if you had your y. Axis was allowed float and instead it was the total number of users using that system than you could imagine like the overall rate could actually go up and down as users join. Leave your your system or you're right are using your software or whatever so. I haven't I have an image in my head now. Okay great and so hopefully for most of the folks who are listening to this. Hopefully you do too. But if you don't or if you're really struggling to think about what a stacked histogram might look like an might be worth taking like five seconds to Google this on your phone to see like a mental snapshot because it's I don't imagine that the rest of this will make tons of sense if you have no idea. We're talking about so okay So stacked histogram how do we think about this in terms of the grammar of graphics so let me layer in a few of the fundamental ideas of grammar graphic so either taking place in a very explicit order to the first layer the most foundational layer of when you need to make? Data visualization is What is the data? Set that you'RE GONNA BE VISUALIZING. And how does that map from The the variables in the data set to a set of aesthetics. So what's the data set? Let's talk about that first. Let's use my example of. Let's use your example. Actually I think that's probably a little bit more familiar to our listeners than like a particle physics date set but instead we have some notion of a data set that has all of the users of our software through time and the type of what did he say. It was like the version of the software that they're using yet and actually. Can I make this a little bit? Meta and tweak this and we'll say this could be a linear digressions episode downloads. Like we can go. We can go into our hosting provider and we can see how many people download on on a given day and so of course the day after we release an episode we see a lot of downloads and then maybe two months ago by and now that episode is a small sliver.

Hadley Wickham Physicist Katie Hi Njit Matthew Anderson Google ESA Higgs
"hadley wickham" Discussed on Not So Standard Deviations

Not So Standard Deviations

15:25 min | 6 months ago

"hadley wickham" Discussed on Not So Standard Deviations

"I feel like as that intersection get smaller. It seems like that. It's no longer a sustainable and as as as software kind of grows in adoption and influence it. Those van diagrams tend to intersect less. Do you think that I saw my question is? Do you think open source. The open source model is strained situation. Where the US I mean? The extreme users developers are kind of totally. It is strange. It is strange. Because there's this whole thing if you read literature about open sources aside of scratching your own etch and then that some of the very best software created period open source or proprietary is when is when the developer and the user the same person developer has a way to relate really really closely to all the users and so for any software that becomes a challenge. Now the you you sort of need additional mechanisms to make sure that there's congress conversations are being had with the all the right source of users and and and all that inputs coming back and then all inputs being triaging and see all the other things that if people could be working on so it starts to look more like traditional professional software development as those diagrams intersect. Less right yeah. 'cause like in traditional software development even like product development. You have more formal mechanisms for getting user input like user research. Yeah discipline you know. It's like right like developers. Don't always naturally just go out and try to hear and listen and incorporate all that user feedback it's sort of has to be built into the into the process well left. That's interesting too because I feel like. That's something that Hadley Wickham was just naturally good at without necessarily even being able to articulate it. Yeah that's right that's the that's the. That's the really. Weren't that happy confluence of in someone who understands the domain really. Well understands the user really. Well IT UNDERSTANDS. The software design really. Well that's hoping to find those those folks that's not super repeatable can't yeah. I found that that would be a great solution. If we just say only those people only let those people designer software but raise many of them. I I will say this that in in terms of you know if you look at our packages that very much it very much is the users in the developers are kind of the same tends to work very very well you know. Our packages tend to work pretty well. And also the fact that there's a this you know the long tail of approaches methods Domains are actually not is not served very well by like one big centralized organization. But here's here's the methods you know that you get to us. So I think. In that case like sort of development within academia and buy directly by practitioners even without formal funding can can work quite well and has worked well. Do you st some sort of late graduation process like you know. Let's say 'cause yeah one suppose on process where like every once in a while. One of these user developed packages gets popular and then it starts to need that support. And so how did see that? Like does our studio kind of consciously decide to like start to take in certain packages It's I think it's more just by domain. So you know. We've we've had the the what's what's in the tidy first. Data Manipulation and visualization of things. And now what's entirety model so like modeling and all the worst things around that so it's not? It's been more like large horizontal domains that are of of of Of concern to a large percentage of our users those those are the sorts of packages weakened. Bob With wouldn't be like. Oh we see this great. You know package you know to do something to do with your time teary just thinking of that. Linked the tidy clients early. There's some yeah like really good are packages for less likely. I mean a time series starts to get into like lots of different people doing time series. So maybe that's not the most example but it's kind of an in between example right because it is pretty specialized so we we've tended to to just work on things that are like you know likely to be useful for like seventy percent or eighty percent of our users and then do you strategically higher. Because like I feel like you kind of see this from the outside where it's like. Oh they hired Max. Kuhn and I'm like you know like this is like the tiny models guy and so I was excited and yeah it just seemed organic. Where at that point? I felt like I understood. Arsenio is trying to do and I think it was really important to have the formalization of that so that yeah for people who are less trusting than me. I guess but yeah like how do you decide? Like who's in the room to decide like okay. We want to go after like you know. Let's let's think about hiring Mexican. It's I think that There's a few of us that Talk about that sort of thing and it's more like boy do. Do we think it's important that we provide a way of doing modeling. That's consistent with like tight. Tidy Verse Principles Or or that's one way to think of another way would be like Is having great great modeling interfaces going to be really important to the future of our It's kind of competitiveness and effectiveness and learn ability and all those things and the answer's yes. It's really important for our And then this sort of thing that we've consider hiring somewhat to work on it was definitely. They'll say this in that case it's it's more it's driven first off of the categorically. We think we should do something with modeling furniture. And then we say well then okay. Who's out there? That might be interested in working on this. I want to ask about this Seventy percent number. Because now that you said you know it's going to be hard coded into your corporate charter. But this idea of trying to do something that will be useful to some large percentage of the of the user base. I do you think I mean. I think I just feel like as I've been using our you know. The composition of the community has you'd grows more heterogeneous and do you think that's something that that will become you. Foresee a time where that'd be really hard to do. Yeah Yeah that's fair. Yeah because there are people who use our that. Don't do modeling at all. Or principally data visualization who So yeah that is but I guess as as the absolute size of the community grows you'll still be serving the there'll be a lot of people out there if it's even if it's not like yeah. Yeah it's mostly about like at like impact and and also just about making sure that are as the best place to do data analysis. So what are the pieces that need to be in place? Well Yeah my question is going to be kind of similar. Like the what? You're talking about like suggesting that you you do have a goal of like expanding the number of people using our. Yeah and so. Where's that coming from like? I think you touched on it a little but I'm interested because not everyone apprentice software that yeah well because I believe in this idea of a couple ideas. I think that. Come together with our One is I believe in you know I. I really believe in domain specific languages and languages. That are where the the way you express yourself in. The language is very closely related to the way that you think yes and I think our is very very good at that. I think it because of that it's it's an environment that can be learned easily by lots of different people including people who are who don't have an engineering background but there's a lot of data now sets that's done today using. Excel and point and click tools and things like that and we actually say we actually think that most people would would probably find it easier to use our if they learned it. Yeah so we. We liked the idea of Of A language that's very very expressive and very easy to learn Even again for people who are not comfortable with programming as as a means of evangelizing. You know you know like serious data science data science where you write code to solve problems sites where you can kind of solve any problem that you put your mind to solving. So that's one of the reasons we like our so much and the the other thing is I think we like the idea of building this end to end system where the development tools like our studio the tools for writing and communicating the tools for analysis are all consistent and work really really well together so we think there's a sort of a whole is greater than the sum of the parts of fact that by by focusing on one language an environment and tool set and and that we get to something. That's really special. I mean that's been my experience as someone whose first language was are and like I haven't branched out as much as like many other data scientists because I there's like this feeling of like empowerment in that environment that I just don't have another places and so- digging further like why do you feel like it's important to because there's a lot of statisticians as I'm sure you know that like don't necessarily feel like more people doing data scientists better cause and so I guess like yeah we're what's motivating that like like I feel like deep down. There's probably a sense of like the world would be a better place if mark. Yeah Yeah I think I think that the problems of knowledge meaning and also problems of knowledge problems of You know why did things work the way they work? How're things likely to change What causes something? What group benefiting or being harmed these kinds of questions or what? Madison is likely to work. All these things are Incredibly consequential and when they're when they're when we get them wrong than than a lot of harm can happen so whether it be. You know where I came from. Originally it was like public policy. So it was like we're GONNA make the federal government's GonNa make decisions You know and raise. Asians are going to impact hundreds of millions of people and so they better be as right as possible and a cascades into other as I say medicine and business so I feel like If we want to have a world that's Just we need to have a world that makes decisions based on sound a basis of knowledge as possible including data to inform those decisions is really imperative that the specific then furthermore so. There's a lot of people using data you know I in my experience Or one of the things. We believe as a company. Is that That if you're really serious about doing data analysis you need to be able to ask an answer every question and you know shape and mold your analysis techniques in your data in every way that you can and that means you write code. So that's and that's something we'd like to get more people doing because if you're not writing code then the tools that you use are basically just bound. What's possible for you bouncing questions you can ask an answer And so that's that's another big motivator for us as to get 'cause lots and lots of people are using quote unquote using data. But they're not necessarily doing it as well as they could. So that's that's one of the reasons. We emphasize writing code and using hard things like that. What do you think What are the things I feel like are really well is to target the pain points in our that kind of prevent people from either using it for the first time or kind of using it more than they would? I wonder what what your view of. What your view is in terms of what are the limitations of our either for doing more data analysis or for getting New People into the community I think of it as There's different types of complexity maybe for this this characterization before but there's what I would call accidental complexity. Which is things that are complicated? Not because they're inherently complicated but but just because that's the way things have turned out and then there's like intrinsic or inherent complexity that you can't actually make easier so I look at you. Know accidental complexity is like You know fumbling around with my like latex environment to try to get to do what I want. That is not inherently complicated. Problem is just happens to be that. Like like chaining together all the tools required to do it. Well you know just requires more effort and thought and trial and error than should one of the things we try to do is to remove accidental complexity. Her move things that are complicated. Just because they haven't been like we haven't built good workflow for them or have built good ways for us to understand and manipulate. What's going on and leave all the leave all of the inherent or intrinsic complexity. Because we can't do anything about that so that's the idea of like you know again. A point and click tool is trying to potentially maybe eliminating accidental complexity. But it's also at the same time masking intrinsic complexity. So I think you know we try. Try to target with our tooling like let's make. Let's get rid of accidental complexity makes. Let's make a lot of things just work And then but then when it comes down to a wrestling with difficult data analysis problems that we're not going to try to eliminate any of that complexity that's the the the analyst needs to struggle with those things. I'm another like an example of this and you can kind of have both ways sometimes like with our markdown. Like so we make it. So it's very easy just to hit the knit button and get get get a result and that that that eliminates a huge amount of accidental complexity related to like messing around knitter messing around with Pan Dock and self contained HTML. And all this stuff. So that's great but then you know we try to make the system itself like flexible and extensible. So that as then you get into more like. I really want to do this and I wanted to do that. And actually do WanNa mess throughout the latex also express those things. I think it's really important to give people. Like good solid handlebars that they can grab at first that that again don't don't obscure complexity that they need to be ultimately grappling with and then let them over time you know find all the smaller knobs and dials required to get exit..

developer US congress wrestling Bob product development Hadley Wickham Pan Dock Arsenio Kuhn Max Madison analyst
"hadley wickham" Discussed on Not So Standard Deviations

Not So Standard Deviations

04:57 min | 8 months ago

"hadley wickham" Discussed on Not So Standard Deviations

"Okay so. Summarize or summarize L. that one I honestly don't care I use the code American at its I mean I'm sure people know but the reason some lives because when Hadley Wickham he's from New Zealand New Zealand and when he when he wrote this he did everything with us Because that's how you spell these words in New Zealand and I think there was some defiance at some point you know. It's like no you conform to a knee and then he finally relented. Yup I think in our I mean all all those words like color and color and Anything with an S. or or Z.. They're all maps to each other right so I don't. It really makes a difference. I can't I can't think of a situation where like you. It gets messed up but I imagine that this exists slowly so that people who are you know British people who speak variant some British English can smile a little. They're actually I so has the stated in the past the official English of our is actually British in UK. English really. That makes makes me like that more like have a principal in sick to it right. Well it was stated very clearly. Brian Ripley on one of the our mailing list. Like many longtime a long time ago. That might have changed for all in Auckland. He's no he's an Oxford out. Yeah so anyway. So we're actually like you. You know the secondary English Mariam but I mean it's like this is what happened to the world. English French is the official diplomatic language. Right right. Yeah what a joke America takeover with Hollywoodland. And whatever else you know what I was when I it was in Australia for a year I bought a phone there You know the phone was obviously localize to Australian English so like all the autocorrect stuff. Like autocorrect me too. Like who take turn the S.'s. Yeah and So I I felt like I got a little I could be a little bit smoke there for like a year. There are some land and I do not remember. I feel like this is years ago. He used to tweet with British spelling. And I just assumed Zim. TV's British. I think everyone assumed he was British and then it was like no no. I'm just like autocorrect Suid. I'm too lazy to do it. But it's funny. How strong strong mets signal was? Yeah like if I was reading code from someone and it had summarized with an S. eyelid. One hundred percent. Assume that they they were not American. Yeah instantly would you change the code. This is a good question. There's some team who gave a really good presentation that I looked at the slides for to not attend but it was talking about. How like? Why does it matter to format code if you're if you're like copy pasting from stack overflow doesn't matter to make it look consistent and and They were sort of making the point that yes it does. It's like a broken window thing. Where if you assume some place isn't being taking care of? There's like a broken in window. Venue treated worse and so like cleaning up of making it like it is worth it just for like the mindset that puts the user in a and the other people contribute the mindset. They're in and so I am very. I agree with that so I do think having consistency within documents very important sites ladies s if I was like contributing to a project that was written in English issue officially. I'm sure there's a real word for this a maybe it is just British English. Well it's not. I think if you look get like the computer definition you know there's like I don't there's like five different englishes right. There's this is like just so there's like Great Britain right and then there's Ireland might have its own thing And then dislike million. There's you know. So there's in terms of like like the system locality there's all these like many different englishes but in terms of like the subset of English is that US s instead of Z and. Oh you Nelson Yeah. It's all right. I hope that was sufficiently interesting to warrant an episode. I mean are we ever. That's the real question..

Brian Ripley New Zealand New Zealand S. official Hadley Wickham Auckland mets Australia principal UK Zim America Hollywoodland Ireland Britain
"hadley wickham" Discussed on Not So Standard Deviations

Not So Standard Deviations

12:22 min | 9 months ago

"hadley wickham" Discussed on Not So Standard Deviations

"Welcome to not so standard deviations this episode ninety one and I'm Roger Paying from the Johns Hopkins Data Science Lab and I'm here with Hillary Parker stitch fix in this episode. We're talking about the book the creative curve by Alan Gannett and how it might apply to data science and data analysis. Oh we hope you enjoy our discussion. Sell the creative curve. Yes right so we're discussing the creative curve by Alan Dinette. Get Yeah Yeah and I think I mentioned this was a true Amazon machine. Learning success success in that sort of clicking around books about creativity and design and this one came up and it was compelling enough kind of right up thet I decided to buy it and I'm very glad I did. Yes so that's how I came across the book because you told me so. That's a different kind of algorithm oppose. Yeah well but it's the same. That's the behavior. It's modeling if you if you read this and it's it's that's a model that's truly based on you know expert collective collective action although sometimes actually. It's kind of funny because One time I I was talking to Hadley Wickham about presentations. And he suggested lasted a couple books on presentations and so I went to buy them and then the suggested book was our data science okay. Because presumably he'd like suggested this book so often so sometimes they're odd behaviors. Come out because those things they have very little to do with each other so we still have to validate your algorithm. So okay you mean by reviewing this exactly right. Yeah well since you are counted this book I do. You want to maybe talk a little bit about why you do or do not like it and I'll give maybe just overall impressions and I'll get mine too. Yeah sure so I mean. I'll give like a brief my understanding or kind of my summary of the book. which is that? It's essentially. It's by someone who I believe. He's like in marketing or something like that. He gives his spiel about how. He's like a data nerd. Who which I kinda hate that phrase as but that's sort of how he phrases it but it sounds like he's just interested he was? He was someone who is very methodological. Talk about the thing he was doing so like he talks about doing game shows and studying the other participants and figuring out like AAC not just trying to be this person but figuring out that like the personality matters or whatever so he's just someone who approaches the world in a very behave behavioral realistic scientific type way and so he points this sort of apparatus at creativity and sort of like in the realm of art art and marketing and all of these different things and so like you know Kinda the subtitles how to develop the right idea at the right time and I view this as like a very practical we'll analysis of what we actually mean by creativity such as like there's creativity in some ways as defined by the audience so you you can't act like it's in a vacuum and so kind of like how to have good ideas and how to get those ideas accepted by an audience and then kind of analyzing the behavior of of people who are deemed like good creatives in order to find themes so that he essentially lays out like Kinda here the steps to take. If you want to cultivate vate your creative craft within a certain field and I just found it to be so pragmatic in lots of good paradigm. Seems like when I was reviewing the book for this podcast. There are so many things where I was like. Oh Yeah I actually think about that all the time now like I've completely that's like a catch freeze in my head when I'm doing work And so yeah it was really different than some of the design thinking stuff because the design thinking stuff. It was a really good way of articulating. The work in a new way that really made sense and this is more of a practical guide for how to do the work a little bit less designee although not yeah a little bit less designee in the it's not about like Like I think it can be about art. You know so like something. That isn't necessarily practical. But you can. I mean I think that's one of the big thing like designs like applied applied art or something like practical art so that was sort of my take away and why I wanted us to read it. I think there's a huge amount on a practical advice. For Data Science data science product development analysis writing et CETERA. So so that's my spiel chairs so just very quickly. The so the author is he says he's the CEO of track maven which is a marketing data and intelligence service. Thank it's like he's like in marketing. Analytics I guess so. I enjoyed reading the book. I especially appreciated the I. I think it's called it. Basically it's part one of the book where he talks about. Basically at part one is basically myth bussing talking about like the creative genius and how we we kind of think of it in our minds and house actually this other thing and in particular that's like there is a there's two parts that are critical one. Is that the thing that that the person does the output right and then there's like the world in which it sits and how critically essential I think essentially that's what this book is about like if you put workplace an idea in the in the in the world at the wrong time or the wrong place or whatever like that has a very different result than putting it at the right time at the right place or whatever so yeah Even the same idea right so I appreciated that kind of like decomposition of like you know the way we think of genius the way we think of creative people depends critically on the kind of population level or kind of context that we are currently experiencing. Yeah like an example of that would just be if you took. I don't know like some modern form of music like rap and you put it in front of seventeenth century people. Even though I consider you know many songs to be creative genius level symes. It's like it'd be completely incomprehensible. Back man exactly. Yeah I mean if you just look at Kendrick. Lamar won the surprise right That wouldn't have happened uh-huh before and it didn't happen before so it's just you know so it's kind of like in retrospect seem obvious. But it's like it's very insightful Yeah I like the not to just interrupt your spiel a bunch. But I think that's a really good way of putting et. A lot of this book was like. Oh yeah that makes sense like it wasn't like oh that's totally mind mind-blowing new idea but he just broken down and like you know presented studies and just laid it out in such a practical way. That articulated eh ferry. I feel like I remember it. All and actionable. Yeah well I I think in some sense. That's the application of his theory. which is that like if you want to produce something that people will enjoy like? It has to be recognizable in some way right and so I think part of the experience of being like. Oh yeah this all makes sense is why maybe I don't I don't really know how popular does I assumed books popular popular. That would be one of the reasons because when people read it it makes sense to them kind of jobs with them right so yeah and so but if I were extremely foreign and kind of didn't make sense even after you read it the success of that would be different right absolutely so now I I think I may disagree with you a little bit on the practical practical implications of the book gone but only in part right so I think in part two of the book called has like these four laws of the creative curve. Some of that was useful but some I think you know the first three. I thought were quite interesting. The the last one was interational law firm. I didn't get it all. I didn't really understand instead. You'll have to explain that to me. That's funny yeah. Maybe let's introduce let's I I do want to define the creative curve serve Because essentially the that idea I found to be really helpful framing and the idea when he talks about the creative curve. What he really means is is that you sort of trends? Follow this general like essentially gallic a bell curve at Gauss Sian Distribution Picture that he draws the book is literally like Calcium. Kerr exactly yeah. He even like says percentages sometime. That are very clearly one standard deviation but the idea of says sorta you can think of the peak of the curve is when an idea is like he calls it the point of Cliche so I think about this and clothing trends for sure where so you know you have some sort of trend come in on the runways and only the sort of Fashiony stas fashion forward people sort of comprehend it. I think the average person you show runway look at those and then that idea is sorta gets like as you see fashion uses wearing. It feels more approachable able to people and then you hit the point a cliche where most people recognize that are cool with it then and then you start taper off like the going down. The bell curve is like it gets kind of passe and then you become over exposed to it and then you're like okay. I'm done with this. And so yeah and like and like the sweet spot but he talks about is sort of at like one standard deviation above where you like. You're one of the people who has the idea. Yeah that's starting to feel approachable. It still feels fresh so just just to be clear. So if you're looking at like a bell curve what you're talking about is one standard deviation to the left of the you mean yeah so and I just and also just the the x axis is kind of like time or familiarity with an idea and why ax is is preference right so at the peak of the of the curve. It's like people are really really want they. They really prefer the idea. And they're kind of pretty familiar with it. But then as familiarity increases even further actually less yeah and he talks about about like studies with songs where it's like you end up many. I hear a song frequently. It's sort of like okay interesting and then you start to love it And then eventually you are like okay. I hate this song now. Heard it a million times and they did studies undergrads where they play the same song to them a bunch or or something like they made up Chinese characters and show them to people and you had to see it a certain number of times before you kind of had a more positive emotional reaction to it So you can also think I've thought about this before you can also think of the creative curve as like a population density. Where like if if an idea if you take any like I dunno cliche idea like he also talks about in terms of the distribution of the population? That likes well now. That doesn't actually work. Does it but you can think about it as like where. If if there's an idea where do you fall on the curve and unlike if you're one standard deviation to the right of the mean you're kind of a laggard rate. It's like jumping on ragging later when everyone else's sort of over it or for your at the beginning I guess it's like yeah at what point in a trend if you take one trend over time and the Y.. Axis the X axis says like. When did you adopt it? You can think of yourself on the curve. It'll be distribution across the population of people. Exactly the curve. Either one person's journey overtime or you can think of the curve is like a cross section of like when people kind of Ca- pat catch onto the idea or something like that exactly. Yeah so like I mean and that one's obvious because I think everyone would know like do you jump on tech the moment it comes out or not wait until it's like adopted right.

Johns Hopkins Data Science Lab Alan Gannett Alan Dinette Amazon Roger Paying Hillary Parker Hadley Wickham Gauss Sian Distribution Ca CEO Kendrick Lamar product development Kerr
"hadley wickham" Discussed on Not So Standard Deviations

Not So Standard Deviations

02:58 min | 1 year ago

"hadley wickham" Discussed on Not So Standard Deviations

"I think it might be harder to start with the tidy verse <hes> Yeah but but <hes> we could debate whether that's a that's an appropriate philosophy. I think <hes> well like someone had to the analyzing data. Why would you write software to analyze data if like he didn't want consumers to use it or you're talking about are or will yeah like the The the first I paradigm where it's like? You stated that need become the programmer. What does well I think the way the language is the language is designed at a time when you analyze data by like writing your own Matrix Algebra you know so you there's been a lot of change obviously yeah yeah? I guess maybe what you're saying is just the balance of consumer to like. Maybe the situation they envisioned was like maybe a package has a few other people who use it not like there's going to be like thousands of consumers one package. I think that's fair to say that nobody envisioned that I think so and I think it's like I think this these things they kind of oscillate back and forth a little bit which is like there was a time where to analyze data like analyzing data being a programmer kind of the same thing <hes> because you couldn't analyze data unless you knew how to program a computer right <hes> but then those things have diverged I think so and and so like now it's like well you could have two separate things to analyze data and program right and so I think that is a that's a natural evolution I think of just kind of computing and technology but it has caused a little bit of a change to the way people think about our yeah well and actually that's I keep having a pop my head like Oh make that one point which is I do. Do you think tidy verse for better or worse. There's like a chief architect essentially and that person like Hadley Wickham is like pretty obsessed with user like usability and you X. Essentially or you Hawaii. I'm not sure which one both 'cause. It's just like yeah like they're having someone who's obsessed with you. Lacks is so much more important for that latter scenario of like tons and tons of consumers sooners of like a small number of packages but I think I think yeah like most of our and like the kind of older packages or even like newer packages from an academic standpoint. It was kind of like well if you if you're going to use this you're GONNA learn it. Ah like and we'll put effort into money and it's be usable but it's not like we're going to go crazy with it yeah. There's there's another element. I think that's kind of like there has an analogue in the commercial world. which is this idea eco-system locking and so like you know the classic example is Microsoft?.

programmer Microsoft Hadley Wickham chief architect Hawaii
"hadley wickham" Discussed on Not So Standard Deviations

Not So Standard Deviations

03:29 min | 1 year ago

"hadley wickham" Discussed on Not So Standard Deviations

"Yeah I'm cool. When you like subset nerds at work I am not I got like one compliment once someone in the merger like that's like that's like the pinnacle like the people who work in the virtual or so cool 'cause they like literally professional close acquirers and and so one time I like one of them complimented my dress and I it's just like Oh my God? This is like the goal like all the work. I've been doing trying to get cooler so yeah anyway all right so we talk about the world of our yeah. Let's just I've been dying to get to this episode recording because so much like emotion inside me so I don't know about you. I have a lot to say yeah I I thought I did but now I've been thrown into like. I don't know anything territory. Well okay so I think it's a little bit difficult to kind of know where to start so I think the easiest way is to make this whole thing about me because it's my podcast. I'm going to pretend like I am the source of everything okay but what is the place for the way I got sucked into this. This whole debate. I guess is so Hadley. Hadley Wickham posted a link to of talk that I gave last year about a year ago right now. At the user these are conference in Australia where I was talking about kind of the tidy verse and base are and how they're related and how they came to be right and which I thought was a pretty good talk yeah definitely this is a keynote for us to last year and I'm I'm out there this year so he posted a link to this and it probably in response to some other stuff that was going on but I choose to ignore that for the moment because it's all about me so anyway so okay great link posted and then after that guy posted like it was like a month of debate happening in my mentions basically like it's still happening. It's happening so so one of the key things that came up was I didn't mention there are a lot of packages that didn't mention in that talk and in particular the data table package but many others and I can't talk about everything and and there was some conservation over things that I left out fairly anytime you give a talk about anything the people you know there might that'd be some conservation from the people that you don't talk about. I guess right so there's some sense a generic problem but you have to make choices when you make these presentations right right and this is also like a a wound it was like I think to the data table folks. This was like an ongoing. This is perceived as like one of many instances of this happening yes so and I think the tidy versus there's a collection of packages one of which is deployer and and a lot.

Hadley Wickham Australia
"hadley wickham" Discussed on Data Skeptic

Data Skeptic

04:52 min | 1 year ago

"hadley wickham" Discussed on Data Skeptic

"Scientists anyone who codes comes and to learn and share knowledge, and at psycho or flow. I work on kind of two categories of. One category is directly related to how we generate revenue clients who want to hire people client who want to advertise to our users. The other category is related to more directly to human a to what you probably think of when you hear stack overflow. So I work on issues around. How can we make the process work better? When someone is asking a question. How can we make stack overflow more welcoming and more inclusive? So these kind of the two categories that I work on. And it turns out in all of those categories. I'm dealing with text. I'm not a data scientist who only works with text. But I work with texts at least every week one example of when I would use text, for example, we do our developer survey every year and on the developer survey, some of the questions are free response. Questions. So some of them are quite serious and some of them are fun. So so as example of quite serious one, we asked what do you think the best or, and then we also ask what you think are the worst thing about sack overflow is like what are your, you know, we're we're getting at what are people's pain points, you'll stack overflow, and I can I can statistically analyze those responses and then look at for example, an odds ratio. What are people more likely to say what words are people more likely to say, they are frustrated with versus what they are happy with. So an example, like a more fun question that we asked maybe two years ago. I think is we asked I'm what fictional representation of a coder. And we ask four versions of the question. Do you think is the most annoying the best the most inspiring and one other version, so we got all these representation? Oh, these answers. People. And it's what was people from. Oh, gosh, it was it was a delightful a delightful conglomeration fictional coders. And we could look at what we're men more likely to mention versus women more like dimensioned. So this is this is another example of when I've used text in my real job. We also use tax, for example on our more directly revenue generating side. So we have we have jobs on stack overflow and job listings. And so we have models to match jobless things based on their content with users, and so there's predictive models base there. So the so these are just a couple of the examples of how use text in my day to day real life. So your book is text mining with our tidy approach who should pick this up. What are they going to learn in it, we wrote this book Tex mining with our for our users who are interested in getting started with tech. Next. I would say this is for people who have some knowledge of tidy verse approaches to data analysis if someone is just getting started in our I would recommend like a first book to get started would be Hadley Wickham 's are for data science book. But if you looked at our for data science, and you understand bit about how deep liar works or you already have experience with deep liar and g g plot to than that is our book as for our book is for people who have enough experience, they're somewhat comfortable with deployer GD plot to and then you have text that you need to get started with if you have analyzed text in other idioms in our then this book is also for that person. If there if that for that user who says I may have tried another approach maybe have had some frustrations with either performance issues or usability issues. With some other approaches, and that this also book is for you as well. The first half of the book lays out concepts lays out, this approach introduces what are some of the common tasks that we do in text mining and the second half of the book each chapter is a complete case study. So a beginning to end let's start with a real set of data. Let's start from doing exploratory data analysis. I what is in this data set. And then through to the end of each case study of implementing some either machine learning model or like some other type of analysis to gain insight from that data set. Julia working people.

developer scientist Hadley Wickham Julia two years
"hadley wickham" Discussed on Not So Standard Deviations

Not So Standard Deviations

04:17 min | 1 year ago

"hadley wickham" Discussed on Not So Standard Deviations

"I mean, anything that that group comes out with it's usually really thoughtful EP I design and so. I use it. So yeah, I think that they tend to do they tend to do a pretty good job. It's something that I appreciate it. More more overtimes, the tend to do a pretty good job of like picking out the right level of abstraction, which is a super hard thing to do in software development. I think and so it's it's always just a series of compromises, right? And so picking speaking that right level of Straton is is really key. I think they do generally do a very good job of that. I think. Yeah. Another thing. He mentioned this interesting is that the one downside with having all these software, developers, developers working fulltime on software development is that there's less like dogfighting. And they're actually mentioning that the president hurry Tarif Khalaf, actually. And I don't know if I'm saying that right? Co off call off K W, af he was I was talking with Hadley Wickham an he was saying that tree factually does like the most kind of like real data science of everyone. 'cause he writes up all the reports. And you know, does all of the like analytics on how our studios doing threat like he's an actual user. Exactly. Yeah. I thought that was kind of funny. But so. Yeah. Anyway, Maximus talking about he used to be a pharmaceutical company. I can't remember which one. Yeah. In Burke, or something like that? Yeah. Yeah. Yeah. And so he was like building models half the time. So he kind of developed like you've talked about this where you end up developing a package because you're using it all the time. And so you keep it working pretty well. But yeah, I think they're doing that a little less now. But regardless I think almost everyone who works there on package development. Had some extended period of time where they were writing like, duly data science or statistics, or whatever you wanna call it. I think we'll probably just had. Had a related conversation with someone about this. How like you know, like I was talking about how like if you were gonna ask one how to use Microsoft Word. Would you would you rather ask someone who's used it for a long time? Like, you're writing papers, whatever. Or would you rather ask the software developer at Microsoft who built the application, right? And it's like, you know, it depends a little bit. But mostly I would probably want to ask someone who actually used the software right because they would all the little kind of problems and things that you do like no shortcuts, whatever that you have to use your job done. Right. Whereas you'd be surprised the software developers usually know this too. Well, they do but for so does like this one person per word. Second of all. I think there's a general. It's like do you want to do you want to ask someone who built the system or someone who uses the system, basically, and they're they're different people? And I think really good I think you when our. Does a good job of usually is to build the software from the perspective of the person who uses it, you know. And because I think you everyone kind of knows what software does not built like that looks like or and I think. It's or it. Usually, what the reason it happens because the person is like if the if the person builds software for themselves, you know, then it's like, they get it. But everyone else kind of does it, you know. Yeah. And and I think I'll go on dogs. I've had this problem. When people ask me about are, you know, because it's like I've been in are too long. You know? And so like, I know all the little quirky ends at outs. And they just want to read into data for him. You know? And I have the reverse problem which I've been using our for too long. It's like the beginner's mind. If you that's that's actually like is then thing like the the place the center where I live it's like the beginner's mind temple. But like the idea is like that kind of like fresh curious like looking at something as it is rather than like the castle you've created in your head like that's like this very special moment. And so it's like can you, cultivate, having that it's hard? It's like super hard. Yeah. I imagine it would get harder..

Hadley Wickham Tarif Khalaf Burke Straton software developer Microsoft president
"hadley wickham" Discussed on Not So Standard Deviations

Not So Standard Deviations

04:17 min | 1 year ago

"hadley wickham" Discussed on Not So Standard Deviations

"I mean, anything that that group comes out with it's usually really thoughtful EP I design and so. I use it. So yeah, I think that they tend to do they tend to do a pretty good job. It's something that I appreciate it. More more overtimes, the tend to do a pretty good job of like picking out the right level of abstraction, which is a super hard thing to do in software development. I think and so it's it's always just a series of compromises, right? And so picking speaking that right level of Straton is is really key. I think they do generally do a very good job of that. I think. Yeah. Another thing. He mentioned this interesting is that the one downside with having all these software, developers, developers working fulltime on software development is that there's less like dogfighting. And they're actually mentioning that the president hurry Tarif Khalaf, actually. And I don't know if I'm saying that right? Co off call off K W, af he was I was talking with Hadley Wickham an he was saying that tree factually does like the most kind of like real data science of everyone. 'cause he writes up all the reports. And you know, does all of the like analytics on how our studios doing threat like he's an actual user. Exactly. Yeah. I thought that was kind of funny. But so. Yeah. Anyway, Maximus talking about he used to be a pharmaceutical company. I can't remember which one. Yeah. In Burke, or something like that? Yeah. Yeah. Yeah. And so he was like building models half the time. So he kind of developed like you've talked about this where you end up developing a package because you're using it all the time. And so you keep it working pretty well. But yeah, I think they're doing that a little less now. But regardless I think almost everyone who works there on package development. Had some extended period of time where they were writing like, duly data science or statistics, or whatever you wanna call it. I think we'll probably just had. Had a related conversation with someone about this. How like you know, like I was talking about how like if you were gonna ask one how to use Microsoft Word. Would you would you rather ask someone who's used it for a long time? Like, you're writing papers, whatever. Or would you rather ask the software developer at Microsoft who built the application, right? And it's like, you know, it depends a little bit. But mostly I would probably want to ask someone who actually used the software right because they would all the little kind of problems and things that you do like no shortcuts, whatever that you have to use your job done. Right. Whereas you'd be surprised the software developers usually know this too. Well, they do but for so does like this one person per word. Second of all. I think there's a general. It's like do you want to do you want to ask someone who built the system or someone who uses the system, basically, and they're they're different people? And I think really good I think you when our. Does a good job of usually is to build the software from the perspective of the person who uses it, you know. And because I think you everyone kind of knows what software does not built like that looks like or and I think. It's or it. Usually, what the reason it happens because the person is like if the if the person builds software for themselves, you know, then it's like, they get it. But everyone else kind of does it, you know. Yeah. And and I think I'll go on dogs. I've had this problem. When people ask me about are, you know, because it's like I've been in are too long. You know? And so like, I know all the little quirky ends at outs. And they just want to read into data for him. You know? And I have the reverse problem which I've been using our for too long. It's like the beginner's mind. If you that's that's actually like is then thing like the the place the center where I live it's like the beginner's mind temple. But like the idea is like that kind of like fresh curious like looking at something as it is rather than like the castle you've created in your head like that's like this very special moment. And so it's like can you, cultivate, having that it's hard? It's like super hard. Yeah. I imagine it would get harder..

Hadley Wickham Tarif Khalaf Burke Straton software developer Microsoft president
"hadley wickham" Discussed on Not So Standard Deviations

Not So Standard Deviations

03:36 min | 1 year ago

"hadley wickham" Discussed on Not So Standard Deviations

"And we'll be launching a new special segment that we hope that you will be interested in participating with us in. We'll talk about it at towards the end of the episode. All right. Let's get going then. So I have one little bit of correspondence just from one of our awesome listeners just hell so CJ Mendez. I guess sent an Email saying that I don't know if this is really follow up it's related to design. The Gary hus- twit has a new documentary about deeter Rams who was the famous designer at Bron, and and he has his famous kind of ten principles of good design. I will link to in the show notes. And anyway, I've this guy Gary has to it has like three documentaries on design. I think really, yeah. I don't know if you've seen any of them. Here's a whole documentary on Helvetica. I've heard of that line. Yeah. And then he's got to one is on cities, which I haven't seen then another one is called objectified. Which is more like a traditional thing on design. I think nice. Yeah. That sounds great. I love to watch on Netflix. I would assume so I don't know it's a net flicks. These days. Filled that one in. That flakes are like smiling right now. They forgot about our last episode. Yeah. So anyway, anyway, since we're talking about design, I guess he sent that in anyway. So thanks for the little note. Did I did I talk about this last time where I think that I was noticing that Hadley Wickham seems to be kind of a design junky who you mentioned that a few episodes ago. Yeah. Yeah. I don't know why that's making me think about it. I mean, just the idea that like being interested in design. I think it I think people who have that like natural talent. Or you know, whatever we call it for when you don't understand. My someone's good is something I think that it prob. I would ponder if there's a correlation between that and carrying about design in general for like for making tools or making software, whatever. So you think if you think that someone has like a mystical quality to the yeah. Or like, if you're just like, oh, I don't understand how I think with him specifically with some other sought like open source developers. It's like, wow. How do they think of it? And then it's like, okay. What character traits? Do these people have in common? And usually they're like into design and bothered by things that aren't right, right. Yeah. Yeah. So we may can follow up on that later. We talk about this chapter, actually. Okay. Well, well, I think it's well, I don't know. Yeah. Yeah. I agree. It's relevant. I had one follow up actually on genuine follow from last time. I don't know. If you recall that we had this discussion debate, you might say about this year, the so-called data science diagram. Yeah. Data to tidying to exploratory to analysis, etc. Yeah. It's like you go from importing the data tidy like aka getting into format that you can use. And then this like cycle visualizing modeling and transform. And then you like exit the cycle to communicate. Right. So last time I made a comment that was along the lines of like how I did like the diagram, basically..

Gary hus Rams CJ Mendez Hadley Wickham Netflix
"hadley wickham" Discussed on Not So Standard Deviations

Not So Standard Deviations

04:32 min | 1 year ago

"hadley wickham" Discussed on Not So Standard Deviations

"There was this kind of idea of psychological safety in how important that is for people to work together while. Which to me is kind of like the design sprint slash blameless. Postmortem? That's that's like the whole thing. Is that? I mean with landless mortem, it's literally like kind of Chevy to down your throat like you will not blame anyone. Right. Right. Right. So like, yeah. Chris this idea of psychological safety, and I think the design sprint does to where you like by validating each other listening to everyone and framing it around the stakeholders, you're just kind of creating this environment. Where everyone feels like they can be creative and be validated for it. I'm like aren't everyone set up for success in every way possible to think, you know, about creatively about it to sign problem? So yeah. And then the second day was really cool. It was this guy named Grady boot have you heard of him? No. I don't think. So he said, oh, let me I'm yes. Those names grading booed. She's like like. Like pretty senior guy in the field of like, you know, software engineering, and specifically kind of like algorithms systems, and he's he's like the IBM research. He's chief scientists for software engineering at IBM research. So it's pretty high up. Yes. Sounds like it. Yeah. And he was super interesting where a he was he literally have the word oppositional thinking in his slides, which was cool any this talking about this concept? And and then he he talked a lot about like everything is the system really emphasizing. The fact that like AI machine learning, whatever this is just one tool that you have in your toolbox for like creating a functional system which again shouldn't be that much of a surprise. But it still is. Lahey framed, it was just like very concise and clear an a good message for a team of software engineers where or like a group of software engineers where I think that a I is probably like promoted as the panel panacea panacea panacea panacea. Yeah. Like the solution for everything. And then I was excited about in terms of mytalk. Was it made me think more about how like data analysis is it self a a system like it's it's almost like a mini production system. Every time you do a data analysis. Could you have to like bring in the data? You have to like Monday or like clean it up. Then you go through this kind of design playful process, and then you like have to create the deliverable of actually so it's like that. That's like not that different from a production system where you have to like ingest data do the machine learning on it. And then like put it into nap or something. I see what you're saying. I think, but I think the I guess the the problem I would have with that kind of characterization is that it seems somewhat linear, you know. And I think like the like a real data analytic process kind of has these episodes where you're kind of like, you're reaching out to stakeholders ask getting clarification cycling back and looking for more data or and I think it's I think that's one of the reasons why it's hard to kind of automate, right, right? No. I totally agree with you. I think what I'm trying to say is that eventually if you wanna have a reproducible analysis eventually at the end, do you do have to create like a quote unquote production system. Yes. Yeah. Like where the product is this kind of deliverable. Yeah. Okay. Yes. I agree with you know. Yeah. I totally agree that like I mean in the perfect world. I use kind of the Hadley Wickham. Tidy verse illustration he has where it's like you import the data. You tied it, and then you get into this kind of creative space of like modeling visualizing, transforming the data, and then you sort of exit from there into the deliverable. Yeah. I guess I feel like when I'm doing analysis, I usually exit frequently from the creativity into creating the deliverable because it's like, oh, yeah. I like, I know this insight will be important..

IBM research Chevy Hadley Wickham Grady Chris AI Lahey
"hadley wickham" Discussed on Not So Standard Deviations

Not So Standard Deviations

04:03 min | 1 year ago

"hadley wickham" Discussed on Not So Standard Deviations

"Do you have any other follow up? No, I think that's that covered it. I had just one bit of one little piece of news, which is that I did this interview for this five books. Website, do you know about this now? So it's a website. They interview famous people. They interview famous people in me physically. About like fi- five books that they would recommend for a give it topic like a topic of their expertise. So for example, they interviewed Hadley Wickham for like five computer science books. And so they interviewed me for the basically ideas that you named five books, and then they have. They have a discussion about it, and they kind of right up the discussion, so so they asked me for five data science books. And so my interview just went up on Monday. So. Did you include any design thinking book? I did. I included the book we're talking about. I'll excellent. Yeah, lacrosse perversive positively influenced you. Then changing the Roger paying brand. We're facing brand. How much do I o you as a consultant? I mean, yeah, like fifty percent of the royalties from this blog post, obviously. Yeah, I think it's funny because like I feel like I don't know what they're expecting for me. But part of me feels like they have been expecting more like. I don't know. Like I was thinking like using our data sites or like more kind of, and I kind of the books that I chose were slightly higher level would say, yeah, look, they were not. There was no software or coding or anything. I'm sure. Yeah, it's like I remember when I first joined multi, I try and stitch fix. We had a multi threaded blog post about like whether you're data science books and like all of them were like it was like as if people are choosing like the most advanced book they've read, right? Right? Yeah. Yeah, I think buying we're like gender ones or just like stuff that was like really not related. I think they might have had to save feeling. Cool Hillary's. That's kind of weird. I did get ten, yellow wins. Introduction to data are, was it called introduction to statistical learning. Great. Thank you for the author's, right? Yeah. Yeah, I don't. I forget who the other people are, though. I think one of them's like tip Sherani. Karate and hasty and garett James. Okay. Anyway, we, we actually went through that book a bit at sea with a group of analysts, and I thought it was really effective. Yeah. Like like they do some interesting stuff where you don't learn about. I think you don't learn about confidence intervals right away and they're kind of just like take this number multiplied by two, and we'll talk about later. To do the like one point. Nine. Six, right. Like. Yeah, and so I thought that was a good approach like it's so antics get into the whole sampling distribution thing. Really? Yeah. Yeah, inferences usually where everything kind of breaks down. Yeah, exactly. It's I, I remember I saw a really good talk. Gosh, I think it was. Man. I can't remember what conference it was a. It was that day to day Texas, I think. And Albert, Kim, who's at Smith College, he was talking about teaching kind of like it was like tips and tricks from like someone who really had focus on pedagogy for interested to six. And he basically was like, you have to teach the sampling distribution like with simulation, and you have to start with like physical, like taking the means and doing it over and over again..

Hadley Wickham Texas Kim consultant Smith College Roger Hillary garett James Karate Albert fifty percent one bit
"hadley wickham" Discussed on Chat With Traders

Chat With Traders

04:52 min | 2 years ago

"hadley wickham" Discussed on Chat With Traders

"Yeah, Cowell Olmec night of that and we might get to it just right now. I'd like Tosca you sparked their about spreadsheets and I presume you're talking mostly about accelerate. Yeah, Google sheets. I mean, these are the tune the two big ones that that I know of. Sorry, it's probably fair to say that there's a a large majority of people who have very familiar with excel. We'll Google sheets and spreadsheets in general. When is it beneficial for some unlike that to actually take the next step and learn an actual programming language. I think it's beneficial when that doing the same stuff all the time. So doing pivot tables who's one example. But if they, if you know if someone finds themself in excel on Monday, Tuesday, Wednesday, with different data data sets doing a similar, a similar thing. It makes sense for them to learn. To write some code in order to automate that. I think if. When it wouldn't be beneficial for them is if it's like a one off job doing a bunch of data entry for example, but in terms of automating time consuming wrote, toss kicks. It definitely definitely makes sense. Then it also makes sense when they want to do more robust modeling and understand why their models are saying what they're saying. I mean, you can modal stuff in excel sales, relatively powerful with with with bicyc- modeling. But if you want to build models that you can dig into and find out why they're doing what, what they're doing. I think both python and our or exceptional, but this is well. Another aspect I think is if you want to share your workflow with with other people, as I said, doing a lotta pointing and clicking collaborators and colleagues can't really get insight into what you're doing or is if you're working on a team of people writing code together, you can see exactly h. step, especially when it's well commented, right? Which I'm which huge fan of. So. These, these are several reasons. I would also say one of the problems with spreadsheets is that you know your data source and you'll luxury and you'll functions and full matting ole into twined. I mean, you see nightmare spreadsheets where people like highlight a row in order to mean something. And I think separation of of data from this type of logic, and from formatting is incredibly important to robots, data analysis and data signs in a like that last point you gave their, I hadn't really considered that actually. Now you mentioned pots on, and you've mentioned are obviously there's many programming languages which exist and which open sourced which anyone can access easy enough, how do they decide which one the going to fight us their time and energy on to learn like as a traitor? Yes. So I think this is a deeply personal. Personal question and I, I need to be as as sensitive as possible because there are, you know, very strong sides on each side of the argument. But I think so if if your main interest is in doing statistics and exploratory data analysis and data visual data visualization, and these types of things are is a really good place to to to do this, particularly in the why. In the way there's a a set of packages referred to as the tidy verse, which by Hadley Wickham, who's from fr- from New Zealand as as well. And he and he's colleagues have developed this kind of set of packages which allow for very, I think, thoughtful exploration of data and data visualization on one of the great things is they've written these packages that you can you write code kind of the way you think about the data, and they referred to not as functions in in these packages, but as verbs. So. Will fill to something then select something then arrange something and it's very easy, easy to read in that respect. So for these types of things I think are is incredibly strong. I do think python if you wanna, do you know machine learning on large data sets in production and and serious automation of tasks, which I think perhaps is is more what trade is interested in python is probably a win there, particularly in the in the machine learning and deep learning spice, although ours things strides straws, they're also would also add that I personally. The barriers to entry for python is slightly slightly higher, I think, but I like to refer to python as the Swiss army knife of programming languages because Rayleigh will be the best tool for the job, but you can do anything with it..

Google Hadley Wickham Rayleigh New Zealand
"hadley wickham" Discussed on DataFramed

DataFramed

07:54 min | 2 years ago

"hadley wickham" Discussed on DataFramed

"Learn. We jump right back. You know, interview with renite after a short segment. Now it's time for a segment, cold programming topic of the week. I'm here with Emily Robinson, a data scientist on the growth payment data camp. Emily has just launched a data camp, Coors cold, categorical data in the tidy. I and I wanted to have her on to discuss it because it's so exciting. IFO one has spent a bit too much of my professional and personal life wrangling categorical variables, timely high Hato Emily, can you do me a favor and define categories variables for Ellis's Sherry category. Nabil's are very Ables that fall into pre specified number of groups. Sample survey responses to a question about your country of origin would be a categorical variable. Technically, categorical variables aren't ordered. If they are ordered, they're called ordinal variables. For example, a question on income that asks if your income is between zero and ten thousand dollars, ten thousand and fifty thousand dollars or more than fifty thousand dollars. Would be an ordinal variable. There's a preset number of answers and they have an order while ordinal, variables may have numbers in them. They're not numerical variables. The values of numerical variables are just numbers. You can do mathematical operations like taking the mean or max, which you wouldn't be able to do with this ordinal. Income question, often people for two or no variables as categorical. So we'll do that here. Listens need to know for when they encounter these categorical variables. Well, it used to be that dealing with them an arc be very frustrating are has a special way of representing categorical variables called factors. One problem I often had was making -ffective visualizations. If you're putting an ordinal variable, you want your grass in the correct order. On the other hand, you may want to make a graph of average income by occupation, in which case it looks best if you order the occupation access by average income or maybe of sixty categories that you want to reduce to the twelve most common, putting the rest in other to fit into one graph. I swear never remember how to do any of these. And even when I found a solution on Google, it wasn't always easy to implement. But you said used to be, does that mean there's a better way now? Yes and twenty. Sixteen Hadley Wickham came out with a new package called four cats, which is all about working with factor variables. It is part of the tiny layers. So it works well with other data analysis packages like JD to fire and tidy are and solves a lot of the headache of working with factors and all the functions start with FCAT, underscore making a lot easier to remember. That's awesome. What was also Dereck amend to learn more about working with factors. Depends on what you want to do with them to get started. There's a chapter in our data signs, an introductory data science book by Hadley Wickham and Garrick Roland on factors, if you're interested in doing statistical tests, data can just launched a new course on inference for categorical data by Andrew Bray. An assistant professor of statistics at Reed college. If you want to learn more about the history of factors in our and specifically, why when you read in data are defaults making all strings into factors checkout Roger Ping's, a professor at John Hopkins hosts strings as factors, an unauthorized biography. A million McNamara assistant professor at university of Saint Thomas and nNcholas Horton professor at Amherst college wrote a paper called wrangling categorical data in our, they compare multiple methods for doing the same task and spoiler alert attorney comes out ahead in terms of the compactness and robustness of the code. That's part of the reason I chose twos tidy verse minute course. I'm not con white to check out yokels. Time to get straight back into chat with Renee. So it's something we've been talking around. Rene is Twitter which can be an incredible resource sparring data scientists, and maybe you can tell me a bit more about that. Yeah. So in addition to all the books in courses into, I really use Twitter a lot to get the lingo of data science in their things, great communities on Twitter, and you can usually use them by searching for certain hashtags. I'll give you a few of them for python people. There's pi- data Pilates as high for d. s. p. y. the number four. The s. for people learning are those are. Stats are ladies. A hashtag are for Diaz. So these are all hashtags. You can search a lot of those have slack channels too. So there's like a data science learning club slack channel that some followers of mine started a while back on based on my podcast learning activities, there's a slack call data for democracy for people who want to get into political Dato. There's a hashtag for data ethics, so I'm sure there are similar groups like these. In other social media like Facebook enlighten, but I mostly on Twitter. So I have a whole blog post about using Twitter to learn data science. And if you start searching for hashtags related to what you're learning on, usually start finding the kind of leaders or the the hubs in these communities, and you can learn a whole lot just by following them. And then if you ask a question and use that hashtag, you'll usually get an answer says, pretty cool, nuts, unwell linked to your article own about on how to use Twitter in the show notes as well. So for Luna's, how will I know when the ready to actually bay data scientists. So starting to ING. Yes. So I think people are ready to start applying for jobs before they feel falling ready to make that jumps out to don't wait too long to start looking and like we talked about, like doing those interviews is really instructional as well. But I'd say that you're ready when you're confident enough with those basics. So you know how to do exploratory data analysis in do some statistical summaries, you know that basic feature engineering how to get a data set into shape that you can use for machine learning. You know how to do some of that data preprocessing in cleanup. You can build a good report in a data visualization in communicate the results. Maybe you've used a few basic commonly used machine learning algorithms like logistic regression, random for us. So you're confident of with these basics that you know that you're not going to be totally struggling on the job. But once you feel that you have that solid understanding like how machine learning works in, you can apply it. You probably want to also add in a few specific techniques that will make you stand out. Either something you feel like you're good at maybe a really awesome at building, like pretty visualizations that are easy to read, maybe really good at that back end data engineering stuff says something that you can say is your specialty when you're applying for the jobs, but you don't need a check off the entire list of every algorithm in every tool in technique out there. I've interviewed for jobs that included skills that I already had throughout my career, and I was confident with plus some skills that I was still picking up. So if I knew that I could understand what people wanted, and I was confident that I could pick up this new tools and techniques along the way than I realized like I got a job before. I thought it was ready in, you know, at least I hope have been told that I've done really well there. So a lot of stuff you can pick up as you go. If you have the basics down. So you don't feel like you have to be an expert in every area like nobody is. So start start applying. You'll get a sense for what it is that you still

Twitter scientist Hato Emily Emily Robinson Hadley Wickham Nabil renite Coors Google assistant professor of statist professor Ellis Reed college Dereck JD Renee Amherst college assistant professor