Spatial Thinking in Data Science
There angela and welcome today's framed. Hey thanks for having me. It's a real pleasure to have you on the show and i'm excited for many reasons john's but one of the biggest is that we're here to talk about spacial thinking and geospatial data and the impact it can have industry in general and in terms of thinking about how data addison analytics can impact the world but before we get into that i just want to find out a bit about you so maybe you can tell me what you do and what you're known for in the data community share. My name is angela. I i think the second angela appear on this podcast that's right. We had angela bustling from. I robot preach. I just told you also my mum's name is angela so a big shout to my mom while lovelock's big friend of mine but great name. Anyway is the point their son the second most famous angela and the science community. Maybe the trump today. I ain't work as the our spatial advocate at the center for spatial data science at chicago and which is an odd job title and sort of came about out because my professor had a few projects that he wanted me to work on after i finished up my thesis for him so my head of background in economics public policy and i ended up working with my professor luc insulin at the university to complete that after writing thesis sort of hunting around for something to do next in luke offered me a position sort of building out spatial data science education resources and materials people so that's what i do is sort of hard to explain explain to people when you're at ours spatial advocate. What exactly you do with your time but i just tell them i. I really like maps in love spatial data so the other thing is it sounds like it could be quite niche but in fact as we'll get to the ecosystem of special data science and how can help us in thinking about a lot of questions is an incredibly important and on top of that of course using our to do so is incredibly impactful as well so right so our center develops an open source software that allows you to point and click analysis awesome spatial data so it's called giotta bennett's reached a lot of people. It's i think it's over ten years old at this point and that has been away that we sort of put fourth spatial analysis <unk> into the world but i think something that we're moving more to is trying to get people to use code to work with spatial data lina programming and trying to get comfortable with that miss something that we're trying to expand too so that's part of what my job is on the same page like i also in terms of community building around coating. I also run our lady chicago so i started the chapter in chicago which was something that was a byproduct of working king on my thesis so i basically had a suburb. I was working on that and someone told me about early eighties. I was like oh that would be very helpful for me so i hunted around for a group in chicago and i found that there wasn't one and i was like oh someone would start that and someone told me one of the staff in my my research center marina was like oh you should do that angela so i ended up starting the our lady's chicago chapter that summer just based on the encouragement of some some other women in my center so something that i've also been super involved with over the past you know year and a half men now we have a group of women just doing a lot of community the building organizing in chicago men supporting women as they get started with their data science careers or building out there already existed data's heads careers. Let's listen nice description option of the ladies mission and a principal in general. Could you just tell listeners how to get involved in if the block to reach out on loan a bit more i actually found out about our lady's through a tweet so if you don't already a big portion of the day since community is on twitter shooter mom huge huge amount i saw a photo of the early ladies at the international conference and it was just this whole room full of women doing data science and i figured that i wanted to do that and they had <hes> basically basically a tweet on the global twitter that said if you want to start a chapter reach out to global at our lady's dot org and i was like okay like i'll do that so i sent them. Emails emails like hey look. I want to start a chapter headway concerted and they got back to me with all the information. There's a international slack channel that all organisms are in so so that's something that you can get involved with pretty quickly and if you have one other person in your area who's interested in it just get started to people as a meet up so check out all he's dot org and also i suppose checkout. If the city you're in actually already has a chapter and get involved there'a yeah i found that people come to meet ups in there from all walks of life and i've learned a lot from just meeting people at our meet ups and have found mentors through the community so highly recommended awesome so you work a special advocate at the center for spider chicago now but if i recall correctly your background is in economics public policy that was originally started. It happens that in both of those fields there are problems that involve spatial data one of the methods classes that i took was a g._p._s. class aura geographic information systems class where i was learning how to nap things and sort of performed facial analysis via a graphical user interface this that was how i got started. It was a methods class for one of my policy classes that i ended up taking during that i sort of found that a lot of public policy a lot of economics deals with questions that have to do with spatial data so for example. If you want to locate services this is in your city as a policymaker. Where are you going to put them or economics. If you want to understand the housing market in an area where our our houses more expensive where houses under their market value that spatial question and so it was through those questions that i started getting more and more into spatial data at just because i needed to understand how to work with it to answer the questions i was interested in closer sounds like in this type of work doing data science before you will formally doing in data science right right like i. I didn't realize i was doing data science so might thesis. I had a lot of data and basically i just needed to find a way to like like get insights out of it and clean it before you do that interesting modeling. You have to be able to sort of be like okay like which one of these houses is is actually marked as tax foreclosed or something and making sure that you have that properly identified in things. That was something that i started noted like diving into my adviser. Insulin was super in that he told me about this thing called the tiny verse and i was like oh i've have used our before but never like whatever this is when i found that it provided a series of our analogies that were very very useful for or just simple data cleaning tasks without having to do sort of complicated syntax which i've worked with our before at written a paper on like i think it was like mental health health survey data but i mostly just used the core functionalities of are really hadn't done what you would call like data science on it so after i was encouraged to do that sort of went ahead and started teaching myself this stuff in the process of answering the question how does demolishing buildings things in a city effect the house prices of properties around it so because i wanted to answer this question i had ended up teaching myself data science while the <unk> so i would actually say that my thesis was maybe the first piece of data science portfolio. I still remember applying for. There's some sort of like java internship like coating sample and i was like oh like you. The stuff that i wrote in my thesis visit like i had to like create like a spatial lag whatever so then i just sent them the functions arrow as some of the exploratory data and that was my data science like code snippet that i sent them but it was actually just my thesis as you know. It was something that i didn't realize i was putting together until i had it and then as i go i guess i sort of started doing data science here. That's really cool. There were so many things i just want to kind of touch upon briefly first. Is you mentioned the taty us. You may be aware that we're fans of the todd davis here. We have a lot of courses around the tiny verse and all all the tools that have been developed in that ecosystem just a bit about me. I come from a python background and writing. Polyphonic code is one of my favorite things to do in the world and for that reason i i was initially you know when i entered the community a tidy skeptic in a lot of ways and i gotta tell you i'm a huge convert now and particularly with respect to <hes> i think the way you write taty code in particular in terms of the white relates to how you state in plain language what your code is doing what you're trying to do. Who by riding your code. In terms of the data transformations i think really is gonna open up data analytics and data science to such a broad community that didn't have access to writing code to to china methods all of this stuff beforehand so that's really to nesting functions the pipe. I like all of this stuff is going to open it up to so many more people make it so much more accessible addressable. Yeah i agree with that statement. I think especially coming from a field that's for example like in economics and people use things like state where taxes sort of like regress such and such as english essentially a new sort of state what you're going to do and then you. It doesn't look like computer. Science may be so i think like being able to sort of see in your code that ogm summarizing now and mike arranging the data and now i'm grouping by this was really helpful for me me sort of adopting that in coming back to my code you know like a month later like what am i doing and then being like okay like this is what i was doing so that was really useful for me. I do think i think it's a great way for beginners to get started and i can see places where i think it's a great place for beginners to plug in but i've also had people tell me if you doing very niche or high computational things you can move beyond the tiny version. Do more are but if you don't get if you don't start seeing the power power of programming in the first place. You're not gonna get to that point where you can do all this crazy fancy stuff so i think it's a good way to get people hooked on using code to process their data. I agree and i don't necessarily think this only with respect to the us in particular but in general accessible api is how often they can fool us what up fool us but how often they can make us think we're able to do all this stuff and then what point it breaks down and we realized i'll actually we need go back and learn all this stuff and and all of this stuff for example like machine learning with fit in predict a knowing that you know beginnings can go in and do to machine learning model straightaway right but what point does that breakdown on what point does that actually affect credibility reproducibility and correctness of results actually and i've heard it <unk> said like right now. The bottleneck isn't your computation time. It's often like the thinking time of the person writing the code absolutely so you touched on one other thing. I want to get into spatial thinking a soon as possible but you touched upon one other thing you talked about. Some people can do geospatial data using point and click interfaces. You mentioned people using graphical user interfaces. My question is can you do data signs in gooey. That's a good question.