Machine Programming: What Lies Ahead?
This podcast is brought to you by knowledge awarded Speaking today Justin Got Schlick. who leads the machine programming? Research team at Intel labs the newly formed research. George Group focuses on the promise if machine programming which is a fusion of machine learning formal methods programming languages compilers and computer pewter systems just in welcome to knowledge at Wharton. Thank you so much for being with us today. Wonderful thank you for having me so I just read that description caption of what machine programming is. But I think that you know given all the buzz around a lot of people are familiar with machine learning but most of them like me. Don't have a clue. What machine programming means? Perhaps you could explain the difference between the two certainly yeah. So at the very highest level level machine learning can be considered within the subset of artificial intelligence and there's many different types of machine learning techniques one of the most prominent right now are these things called deep neural networks. And that's a lot of what people are using to make the tremendous progress that we're seeing over the last decade decade The machine programming is really the idea that we're trying to automate the development and maintenance of software so the fundamental mental difference between the two. Is that with machine programming. You can kind of think of all of the field of machine learning being a subset of the field of machine programming but in addition to using machine learning techniques which are these approximate types of solutions will also use other things like formal Program Synthesis techniques that will give us mathematically proven Correct piece of software and then Between in those two points you can kind of think of those as a spectrum. You have the approximate solutions here. And the precise solutions here and then there's a fusion of a number of different ways that you can combine these. He's an every one of these things. Essentially is a part of the bigger landscape machine programming. So if I understood you right machine machine programming is when you create software that can that can create more software right How would that happen and I wonder if you could give us a couple of examples to help? Our audience understands yeah so the idea of creating software that creates its own soffer whereas really the core idea between or inside of Machine programming and a couple of examples of that is we recently built a system using using genetic algorithms and what it does is it will allow you to take certain input output examples and then by running through a number of iterations nations. We call them evolutions in the Genyk Algorithm space it will then automatically synthesize the program that will match the input output. You you do this sort of in the training phase and then it will take new input output examples that it's never seen before and then generate these new particular types of programs so that's like one example that you might have in the space machine programming and you would think about the impact that machine programming might might have on different industries which industries do you think a likely to be affected most by this and what period of time. Yeah this this is is a fantastic question and one that Could require a very long response while trying to keep it slightly abbreviated At the highest level one could imagine that any of the industries that are predominantly based in software are gonNA benefit this benefit from tremendously There is a recent survey that was done in earlier. Two Thousand Nineteen that showed I think We have something like half a million computer. Scientists positions that are open sues. He's reprogramming positions in industry that we need to fill. But we're only producing roughly ten percent of the actual programmers to fill those roles. So what what we're having in the software industry is essentially a bottleneck of supply. If we can start to automate some of the a simple tasks Reading a file parsing data helping US automate the software development testing this will. Oh I think tremendously accelerate the rate at which software is being developed. So I I would say that's probably the first very obvious place. The other area that I think is going to be impacted tremendously by. This are autonomous systems. In most of the spaces in autonomous systems accord ingredient of the systems is software so for example if one were to think about autonomous vehicles a large part of what's holding us back from uh-huh getting to level for level five atonomy which is the point where the car can essentially handle all of the nuances behaviors of driving in downtown Philly. Or something of that. Nature a big bottleneck of that is really the Implementation and the algorithms of the machine learning systems if we can automatically construct struck those these autonomous systems will also accelerate in their advancement. They'd like to come back in a little bit to the question about autonomous vehicles and the impact on the auto industry but given the fact that a lot of our audience is in the financial services industry. I wonder if we could go a little deep into that So so I mean I know that. He has made quite significant impact for example in areas like fraud detection Do think machine programming can also have a major impact Dan and if so what what might that look like absolutely This is a fantastic question and as I was mentioning earlier with machine learning systems Their Foundation is essentially learning through statistical analysis. In some sense. This makes them probabilistic that they are. We're getting close to the the right answer. But there's in many cases were not guaranteed. That will have the right answer. When we think about machine rushing programming we go back to that spectrum of how we have probabilistic to very precise solutions? It's my belief that for example in the financial sector there are certain cases where probabilistic solutions aren't sufficient. So one might speculate that as you're doing some sort of financial transaction having the probability that oh I sort of been rounding off the sense right. You know I'm close enough. Probably isn't sufficient if those transactions are happening billions of times today in that case. Ace we need a more precise Lucien and this is one of the areas that I think you could use machine programming an interesting Coming back to the other industry that you mentioned and the auto industry. What kind of impact do you think? machine programming will have on the whole drive towards autonomy. You I think thank you started talking about that context of Philadelphia if you could go a little more deep into what that metroplex for sure So as it was mentioning earlier earlier we recently built a system that is using this this Genetic Algorithm to automatically construct programs. What I didn't mention is one one of the pieces? That's part of Jenna. googlers distinct called fitness function. The fitness function is essentially. You can think of it as a way that you grade the the accuracy of the programs or the results that the genetic algorithm is giving so this genetic algorithm produces a result and the fitness function. Says you get a B. or you get an A.. Eh historically though fitness functions have been written by humans and not just any human really expert machine learning humans oftentimes sometimes what we find is the complexity of the problem. You're trying to solve is directly related to the complexity of the fitness function. So one could naturally naturally infer. Then why would you write. The fitness function. Just solve the problem yourself so we took a look at this and what we did. is we figured out a way using machine learning earning that we can automatically create the fitness function without human involved. So now back to your question if you think about this type of thing in the autonomous US vehicle space one of the things that's holding us back is the advancement of emmaus systems and historically the advancements that we've had with Emma systems have been through humans criticism but if we use machine programming one could imagine lake. We have with the genetic algorithm solution that the machine can actually start to invent lent. Its own machine learning systems that will then accelerate the progress of these Thomas Systems. So what are the implications of that one one of the things that I've heard about That's holding back autonomous systems as you are. The autonomous vehicles is the the fact that It might be too late to For for the system to make certain decision And you are because you don't want to actually hit something right you you. We probably need software that can predict what's about to happen before it actually actually happens. Is that one of the issues. Absolutely right this is tremendous insight And actually we had a European paper. Europe's is One of the leading research conferences in machine learning. We had a nervous paper in two thousand eighteen. That tried to start to address this problem. So historically what you're could describing here is Space Anomaly detection so the autonomous vehicle space when we think about these various behaviors. We think this is an anomaly and in particular. That's a time series anomaly. So for example you're trying to prevent this vehicle from colliding with this other vehicle or make sure it doesn't hit a pedestrian and as you pointed out it's too late if you already have. The event happened to detect it. So what we did. In order to address this is we recreated the mathematical breath medical foundation for an election specifically for time series. And so with this now what our hope is is that the community will adopt the new mathematical Article Foundation. We've created and they can apply this for time series anomaly detectors which will start to address those types of problems so and machine programming helps at all absolutely lutely so in the context of autonomous vehicles. You would think of using this mathematical foundation to better predict these days. But when you think about machine programming or you think firming in general many of the problems we're seeing today with software is we have these Correctness books we have security bugs. We we have you know these privacy violations. All of these things in some sense are time series in nature Program is really just a sequence of instructions one after another so if you take that mathematical foundation you can also apply it in the space of machine programming which is exactly what we're doing one thing. I'm I'm curious about is a machine programming like a lot of other areas of has been around since the nineteen fifties right. What's behind you know the the sudden sudden interest in machine programming now as it picking up in such a big way and why is Intel You know so interested in sort of investing it in a big man. This is a fantastic question and I'll try to break down to those two pieces I address. Why why we're seeing the The resurgence of this because it has been around since the nineteen fifties and then why is Intel interested If we look at why it's taking off today I would say principally. There's two reasons the first is I believe we're at an inflection. Point and the second is I believe my colleagues and I ed Intel labs and at MIT. We've made an important sort of observation and how to think about the future machine program so as far as the inflection point goes we you believe there's really through my view is really that there's three things that have created this. The first is we have tremendous advances in algorithms in in machine learning and informal methods. Things that Didn't exist say twelve months ago potentially are fundamental to the Advancement Smith of Machine programming the second is we have tremendous advances in compute today as their recent Turing Award winners Dave Patterson and John Hennessy are pointing out is we're living in sort of what they call the Golden Age of computing which they refer to as they call them domain gene specific architectures that for longtime it was really just the CPU. But now we're seeing based on the answers. Were having a machine learning and other areas we have these accelerators raiders that are specific to these domains. And so it's creating a tremendous opportunity for acceleration of these machine. Learning and fora methods wasn't possible and then the third piece is the abundance of big and dense data For example there's a repository that's called. I'll get up and get her basically as a place where people store their software and what we've seen by looking at it hub is Back in two thousand eight I think it had roughly thirty three thousand repositories two thousand nine. When I looked out at earlier this summer? I think it was somewhere around around Over two hundred million which is yeah. It's it's a tremendous growth. It's actually nearly a four order of magnitude growth in a decade and that kind of of growth and data as as you probably know data really drives a lot of these machine learning systems so this this is created essentially a vehicle and which we can start to explore the space so that's the inflection point but now the other point of the thing that's Intel and I think. MIT observed is fundamentally. What we're seeing is the way we've historically done programming? We think is flawed. That what there's essentially a blurring of the programmer's intention with these algorithms and with the system level details what we really want to do As we move forward is we want. The program are to just specify his or her intention that you you WanNa create a program that will tell you where the nearest starbucks says and you just say computer create a program. That will always notify me. When I'm nearest starbucks and then the computer handles all the details of the algorithms implement? It understands how to translate that to work on the hardware. That's on your cell phone or in a data center that type of thing So that those are the two the pieces that we think really are creating this opportunity for tremendous growth in machine programming. Now to your second point about why is this interesting to Intel Into obviously very interested in advances in hardware and me being an Intel for about a decade. Now one thing that I've seen which is really exciting is really used to be just a CPU company. And we're not today today the heterogeneous hardware landscape that we have at Intel is enormous we have neural network processors we have neuromorphic processors we have. GPS US we have a variety of accelerators FPJ's and we have the ton of CPU's the problem though is programming these things that we can have tremendous harder but oh can we possibly expect back that the average developer can program this and this is really why machine programming is essential to Intel is into understands that with this new heterogeneous hardware landscape landscape. That really is required to advance all of the technology that we're seeing. We need a way that it's simple enough for the average which program are to harness this massive amount of heterogeneous compute sounds amazing. The since he was speaking about the work that you did with him. I I understand that you wrote a paper called the three pillars of machine programming Some time ago about saw all these concepts and I was wondering if you could share some of the main insights from that and how they relate to some of the things we're talking about certainly yeah so back in I think it was two thousand seventeen Few of us from Intel labs teamed up with several people at MIT. And we came up with this this vision of what if we did this thing called machine programming. What would the landscape look like? And the main reason for this is we were seeing in these research venues that people were starting to explore machine programming but they were A bit disorganized that there wasn't structure around the way we were thinking and so the paper that we wrote the three pillars of machine programming is essentially a road map. On how how we want to express explore the research. Space there are three pillars Intention invention and adaptation The intention pillar is really what we would think of as the programmer is doing in the future. I don't really call these people programmers. I call them software creators because at the end end of the day our our Blue Sky Vision is these folks won't write a single line of code. They will express their intention either through natural language gestures visual diagrams. Whatever is best for them and and for those hardcore programs out there they can still write all the coat that they want? So that's the intention pillar The invention pillar is it then takes the programmers or the software creator's intention and then translates that into actual actual software. These were the algorithms data structures that type of thing once that's established then that work hand off to the adaptation pillar the tation tation pillar then takes that code and figures out okay. What does the software and hardware ecosystem look like for this particular program? How do we need to augment it to make it run efficiently securely correctly and then in the machine learning context accurately now in addition to Intel I'm sure there are other companies that are also working on machine programming and I wonder if there are any companies with whom you collaborate Whose work you could talk about just to explore how this field as well absolutely So we have many collaborators in industry as well as academia Jamia and some of our industrial partners that are looking into this are Microsoft and facebook. So Microsoft's they have a wonderful gentlemen over there submit Guliani who wanna who's seen in in many people's eyes as one of the founders of formal program synthesis and he actually has developed the system inside inside of excel that will automatically figure out what the users intent is they call this flash fill so this is a good concrete example sample of well real world evidence that this is not just a research toy you can actually build this into real products so Microsoft is deeply interested in this Another company is facebook so facebook is actually doing tremendous work in the space. The recently published a paper about a system called Aroma and what aroma does is essentially essentially works along the same lines of the three pillars. It's principally focused on trying to help with the Intention that at a programmer has an intention of trying to write some code and doesn't quite know exactly how to write that code the Roma System. Then we'll take a little bit of that code note and do an analysis over very large database and then send the user back. Is this what you meant. And it's sort of a human in the loop machine learning approximate solution Lucien. That's sort of good early evidence that there's a lot of Well while we think of the space machine programming as being very long journey. There's things that we can be doing today an industry that could be extremely valuable That sounds pretty remarkable again. No you spoke about several companies unease over. Which countries do you think are making progress that you find impressive in the area of machine programming In General Rene I. I've heard that China is advancing in leaps and bounds Could you talk about What's happening in other parts of the world and what the things that you're paying attention to absolutely So as you pointed out China is doing tremendous things. One of the things that they're doing as they have a very strong sort of governmental infrastructural support for a I and it's my belief that the US also has this but maybe not to the level. That China does us. It's something that I think. As a country we probably need to be a little bit more Aggressive and progressive about there's also a lot of involvement and advances says that are happening in Europe and That's also tied in both with their academic schools. They have very strong Emma leaders in academia yeah and then also the vision that have through their governmental infrastructure Mitch European countries using are doing the most interesting work that that's a great question Off The top of my head Germany is actually doing some really tremendous stuff as as one might imagine part of that has to do with the fact that they've been deeply involved left in autonomous vehicles and the natural evolution is Thomas Vehicles and then the byproduct of that is deep engagement in. Ai and machine learning now D.. which innovations in machine programming do you think are most promising? And where do you think. The next breakthroughs will occur in the immediate future. Yeah so this is. This is really fascinating question and as I was mentioning before there is a lot of low hanging fruit route where we can make advances and we can build things like aroma or flash fill that are very useful but there are some core challenges that At least with the folks that I'm interacting with at places like Stanford and MIT and Google deeply within Intel. Labs that we don't quite have the answer to The first is the structural representation of intention. So what I mean by. This is often times. When we're writing code the programmer's intention is diffused across the Cote? Will we really want to be able to do. Is Understand how to new properly represent the user's intention and we don't quite have this. There's a lot of advances that we've made historically with things like compilers and static analysis analysis tools that create different sort of graphical or tree structures. But when we've tried to apply these space machine programming they don't quite like fit that we can sort of you know. Push the square peg in the round hole. But it's not the right match so it didn't tell we're thinking about this thing thing that we're roughly calling the abstract semantic graph and the idea here is this structure. Whatever this is that we I don't quite understand? We'll be some sort of graphical representation of the semantics essentially the intention of what the user wants once we figure you're out how to build this thing. My belief is the field of machine programming. We'll see a tremendous spike of growth So a lot of people are working on this. I'm on working with collaborators Both Industry and then in academia folks at Penn Berkeley. Mit were all thinking. Deeply about this Hopefully we'll be able to figure out this This abstract semantic graph soon. And until we do. I think that we'll just sort of work with breath the maybe not perfect solutions and try to edge our way forward if you figure out what might sound implications be so some of the implications will be the programs that will be able to generate are likely to be orders of magnitude more complex than the ones that we can create today for example in the space of forum program synthesis or approximate solutions for machine programming. We may be restricted to. Let's say programs that are up to maybe one hundred instructions Or less if we figure out how to build this abstract semantic graph. It's my I believe that we will move from hundreds to thousands potentially millions of lines of code that so the implications of this thing. They're enormous purpose so when any new technology comes along especially or machine learning as as you described machine programming very often technologists have to justify these investments to the CFO or the CEO. Not just in terms of this is very cool technology but hi this fits in with the Roi of where the business wants to go are the fits in with the business strategy What are some of the metrics that you taint about in terms of What's the in measuring the Roi of machine programming right? This is a great question. And of course as has The leader of the machine permit research group at Intel It's my job not only to work on the research but then also justify its business value agents agents as you pointed out and one of the things that you you might know is very interested in performance. But we're not just just interested in hardware performance We're also interested in software performance. So one could imagine that if you have a programmer that's writing code that slow They might blame the Intel processors as being slow even though the problem is not the processor. It's actually the software one of the promises of machine programming and we're seeing early evidence of this. Is that the code. We can generate through. These automated methods will be super human in their performance correctness security so on so forth one concrete example of that is one of my colleagues injured atoms Jonathan Ring Kelly came on holly and deserve folks from Mit Mit Stanford and facebook. Or actually I think Andrew Adams just pivoted to adobe research They have built a system called. Hey light it's a programming language that separates out the programmer's intention from the actual scheduling of that attention and in their recent paper. This year that was published. I think in June. They've shown for the first time that the world's foremost experts in this programming language can't compete Pete with the machine that the machine is producing code now that is regularly more efficient. And I think it's by I'M GONNA be I'm just GONNA guess here by at least fifty percent it might be upwards of you know a hundred percent faster and this is the first time in the decade Gade that they've been working in highlight that they've been able to achieve this but so this gives us promise that if we can do it in Haleiwa. Maybe we can generalize. Is this and start to improve the efficiency of code everywhere. This is really important Intel because obviously we want everyone software to run US efficiently as possible and we don't want people to mistakenly believe that our hardware slow when actually the problem is somewhere else right now I I I hear you and it actually reminds me of offer of a broader concern that have often heard about the I which is that as a lot of automation begins to take place in a skit as get you get implemented. The impact of jobs could be considerable so for example the autonomous trucks. It has been a fear. Sure that lots of truck drivers could be losing their jobs. If you know autonomous vehicle stopped hauling You know goods across the highways. So do you think that there is a risk that if she in programming takes off that the same in could happen to software programmer. Jobs and that This this is something that the industry should be concerned about right. This is an excellent question and one that I'm asked Quite often My my honest opinion is actually the inverse will happen. Is that through machine programming. We will create many jobs perhaps millions or tens of millions of jobs and the reasoning is actually very simple right now. We have a global population and the billions. Yet the programmer pool today is a very small percentage. I don't know the exact percentage but I think that it's roughly around around like one percent of the global population with machine programming. What we are trying to achieve is enable the entire global population to create software for example? My mother She's incredible entrepreneur. She's created several businesses. Just done fantastic. But she's not got a programmer so the entire world of software is closed off to her and in fact. This is one of the reasons that I became so compelled is I see someone like her. WHO's wildly creative? She has some amazing ideas but because softwares closed off those ideas never get realized and hopefully with machine programming with this intentionally that we're discussing earlier this will this will create tens hundreds of millions of jobs it will also keep the programmers that we have today employed because there there is work to do on building these very complex systems and as we expand intention. We're going to require those people those what we call it until those Ninjas to be ensuring all the subsystems that are part of those three pillars are advancing appropriately. Because since we were talking about these adverse US consequences it reminded me of a conversation almost fifteen years ago with Andy Grove the former. CEO of Intel who we had interviewed in two thousand four And and he's he wants said that for every metric there should be under the bed metric that addresses the adverse consequences. The first so as you were thinking about some of the metrics that you would use to measure the success or the Roi of off Machine programming what could be some of the adverse consequences of machine programming and what metrics with us just to keep an eye to make sure things during our control. I'm really glad you asked this question. And on top of that before I answer I just WanNa say. I'm a huge fan of Andy Grove. It's wonderful to be at a company with such such strong legacy of leadership. And you know we see the impact that Andy has had Even today that the company is really trying to follow a lot of his principles. I I would agree with you. Who'll heartedly and in fact in a book that we wrote? In Two thousand four knowledge it worked and wrote he was identified as the top leader among twenty five leaders in the past five years. Oh that's fantastic. Yeah I never got the chance to work with him personally but I know people that have and from everything I think I hear. He was not only tremendous leader but also tremendous technologist which is is sort of a very rare combination But going back to your question about the adverse consequences we actually talked about this a little bit in our three pillars paper and this is part of the reason why we we wrote this paper or is that what we were. Seeing in some cases of advances is there would be research for example one of my colleagues that. I'm very fond of Alvin. Chong he's a professor at Berkley and he's doing this. Work called verified lifting and verified lifting essentially uses formal program synthesis techniques to who lift code from one programming language. And then drop it down onto another programming language. This is wildly useful for legacy systems that can't be maintained and because we don't have a programmer supply. We can lift code out. Put it in a new language where we have lots of programmers. However one of the things we noticed and I've discussed this with Alvin Alvin so he will probably not be shocked? When I mentioned this is that there's a potential byproduct of that lifting that can reduce intentionally attention -ality so for example? His work we would say principally falls in You know the invention and the adaptation Based based on how that code is transformed the intention of that code could be reduced for example things like variable names function. Names things that are really important to programmers may not map properly to the new structure. So as we're making forward progress in machine programming what we've asked the community community do is think of the context of the three pillars. And then try to understand argue inadvertently hurting another pillar and if you are are clarify that so we understand that. This is another thing that we now need to advance. Now since you mentioned some of the academic collaborations variations. I know that you spoke at Penn after precise event last week and I was wondering if we could end by talking a little bit about what kind of work. You're planning to do here at Penn.. Oh yeah so. I'm very delighted that you asked this question. And I'm really happy to be here I had the honor under of giving this talk at The precise industry day which was Oversold out I think people were sitting on the floor. It was just a very well all attended and rightfully so a lot of the thought leaders in the space of Computer Science and formal methods in machine learning are part of the precise center Recently I accepted invitation to help chair the technol- The technical industry group for precise and also act as their executive director for artificial intelligence My role with precise and with Penn.. Actually really I think is a twofold. The first is with precise they have a very strong sort of technical consortium of industry collaborators. And what I would like to do is ensure that all of the industrial partners are working in a very complementary way that we understand what the core challenges are and that. We're not working in a way. That's overlapping duplicating effort. So that's one part the other part that's really important to me is right now. We have sort of a lack lack of machine programming engineers and researchers in fact. There's very few of us which makes sense because the field even though it's been around since the fifties had Struggles to get to the point where it is today so what. We're working on with pen. And other academic institutes suits is to start to incorporate curriculum changes and get our undergrads Grad students more familiar with it and then also generate the new leading minds through the PHD programs. That are they going to drive the research. They're happening both and academia and in the industrial labs. CBS All sounds wonderful and Justin. Thank you so much for taking the time to explain all these things to us. It's wonderful to meet you and very happy to have you here at knowledge at Wharton. Yes it's been my pleasure. Thank you so much for having me for more insight from knowledge at Wharton. Please visit Knowledge Dot Morton Dot U. The panel dot e._d._U.