LM101-066: How to Solve Constraint Satisfaction Problems using MCMC Methods (Rerun)
Welcome to the sixty six episode of learning machines went to one. This is a rerun of twenty one which was titled how to solve large complex constraint satisfaction problems using Monte Carlo markup chain methods and don't worry. We will have exciting brand. New episodes of learning machines went to one starting this fall in two thousand seventeen. Hello everyone. Welcome to the twenty first podcast in the podcast series. Learning machines went to one in this series of podcasts. My goal is to discuss important concepts of artificial intelligence and learning in hopefully, and entertaining and educational manner. Many of the recent episodes of learning machines when at one how focused upon the problem of learning this episode does not discuss the learning problem at all in this episode. We will focus on the problem of infants rather than learning thus in some respects this episode is most similar to episodes three seven and eight which can be found at the learning machines went to one website, WWW dot learning. The Sheen's went to one dot com in this episode. We consider situations where we have a large collection of aerials each variable can take on a certain number of possible values some of these value. Variables we can observe lost some of these variables have values, which are not observable. First implicity assume for now each area bowl car sponsor to an assertion, which is either true or false. Thus each variable takes on the value of blunt indicating his car spotting assertion is true. Or each variable takes on the value of zero indicating its car spotting assertion is boss in based system. We might have a large collection of if then logical roles as discussed episode three of this podcast series. For example, consider a medical diagnosis problem with the first variable car spots to whether or not the patient has high blood pressure. We will call this variable. The high blood pressure variable also assume that we have a second variable which car spots to the outcome of a computer tomography or C T exam of the patient's heart intended to determine if there's a build up of calcium in the nation's art. This is a relatively expensive and invasive exam we've called the second variable the outcome variable which indicates whether or not there is a buildup of calcium in the arteries. And finally, let's have a third variable which indicates whether or not the person has experienced a nonlethal heart attack in the past week. We will call this the heart attack variable, of course in real life. We would want to have more than three variables for the purpose of representing the factors that influenced the presence or absence of a heart attack. Additional variables might include the presence or absence of a pain and went part of your body such as arm shoulder neck t job belly area or back. In addition to pain could be severe mild. Furthermore, other symptoms of a heart attack include the presence absence of anxiety, coughing, tting, dizziness, nausea, heart palpitations, shortness of breath and sweating, it's not difficult to imagine. In that realistic, medical diagnosis problem might have as many as forty possible. Binary variables the total number of possible medical situations that could be represented using forty binary variables is to race to the fortieth power. This is approximately equal to the number of stars in the zurve -able universe. Our ultimate goal is to discuss practical algorithms for making inferences involving not just forty variables. The hundreds or thousands of variables for illustrative purposes. However, we will focus on just three binary variables. High blood pressure c t exam outcome and heart attack outcome for these three binary variables this car. Sponsored to the situation where we have to to the third or just eight possible patterns of outcome now suppose that we are in the medical -fession, and we begin by postulating. If then logical rules, which specify relationships among these three variables for example. We might have been logical roles such as if the patient has high blood pressure, then the patient will not have a heart attack. If the patient has a heart attack, then the patient will have high blood pressure. If the patient has high blood pressure and a core CT exam outcome, then the patient will have a heart attack as disgusted episodes three seven eight of this podcast series. There are at least three challenging problems associated with representing knowledge using if then logical rules, I although in individual robot self might make sense, it is possible that the rule in conjunction with other rules is illogical. This is called the problem of determining if a set up if dead logical rules is mutually consistent second for particular situation. It is possible that. Len will not have sufficient information about the situation with your snack to the given collection of it then logical roles to come to a unique decision regarding a patient's health and third in real life. It is difficult to imagine that when postulate illegitimate role about anything in real life. There are always exceptions as disgusted previous podcasts. Episodes three seven and eight. So to address the problem of exceptions went can naturally extend the concept of an ideological role to the concept of a probabilistic if then logical role specifically the specific conditions in if part of the proper probabilistic rule hold then with certain probability that then part of the probably stick. If then rule hosts within this framework. The problem of inference is now redefined within a probabilistic setting. Given a patient has low blood pressure. Scher and C T exam outcome indicating a healthy heart. The goal of the problem is to give prince machine is simply to calculate the probability that the patient will have a heart attack. It is also important to note that within this problematic framework the constraints and variables are ordered. In other words, we can use the collection of probabilistic if then rules to infer whether seven we'll have a heart attack, given they have low blood pressure, but we can also use those same roles to infer whether someone would have low blood pressure, given they have a heart attack and their C T exam outcome indicates a healthy heart. Every variable can be either observable or observable every variable can be an impotent warn outfit of the inference machine. The concept of stating that a particular if then wrote does not hold absolutely. But just host with some probability that's provides a simple solution to the problem of exceptions since if we have. The statements such as if your blood pressure is low then it is very improbable. You will have a heart attack provides a natural way of representing knowledge when uncertainty about whether or not a role is appropriate is presence on the other hand, the problems of consistency and uniqueness still remain. Fortunately, the probabilistic formulation provides a natural solution to this problem as well. As through the use of the concept of a markup random field. It episode eleven we introduced the concept of the Mark of chain. A Mark of random field is a more generalized concept, which includes markup chains as special cases briefly a marker pen build may be loosely defined as a collection of probabilistic if then rolls the market Vanna 'field framework provides us with a methodology for constructing a collection of probabilistic logical. Rules, which not only handle exceptions. But also guaranteed to be neutrally consistent and you'll unique solutions in the form of specific probabilistic statements. The only real assumption that is required for this. To work is the positivity assumption. The positivity assumption essentially states that every possible situation has at least some tiny tiny tiny tiny chance of occurring. Or in other words, the positivity assumption can be paraphrased stating that nothing is possible. Given this assumption arbitrary collections of probabilistic if in logical roles can be combined manipulated in a straightforward manner. So for example, we would have to make this up Shen in our simple three variable car, you logical medico diagnosis problem that situations where someone has excellent blood pressure. And no indication of a problem of clogged arteries based upon their CT exam could have a heart attack. The might assume that such situations are very highly improbable. But nevertheless, conceivable intuitively, the positivity assumption seems consistent with world knowledge. Representation. Humans, for example, Cain, walls and shock. The current world champion Racquetball has won the world championships in pro Racquetball nine times. And is only lost one match in the past five years five years ago to ask reactive on physicist. Whether they got it would be possible for Cain to lose only one match in five year period while playing continuously on the pro tour. They would probably say was possible, but not very probable, in fact, if you ask canes parents the day after he was born whether they thought that he might Wedneday become a dominant force on the Racquetball pro tour for over a decade. I'm sure they would respond with. Anything is possible. Human cognition, appears to embrace all possibilities. Never surrendering to the concept of impossibility. Or in other words, perhaps the positivity assumption is the secret ingredient for true artificial intelligence, so given that we have the Mark of random field. How can we make inferences with expect to the Mark of filled answer? This question can be traced back in part to an early paper published in nineteen fifty three in the journal of chemical physics written by nNcholas metropolis. Ariana Rosa Marshall Rosenbluth Augusta Taylor and Edward teller. The title of the paper was at question estate calculations by fast computing, machines. This paper was it not the first one of the earliest examples of an important class of computing strategies used in statistical. Machine learning which called Monte-Carlo Mark of chain methods. This algorithm was alternately for two as a member of the class of metropolis algorithms, despite the fact that all members of the research group, probably had equal participation in its development. In nineteen Eighty-four, perhaps one of the most influential papers which resurrected popularized and extended the surly research thread. Was published by the mathematicians Stewart game and Donald Gaiman who were brothers, by the way, they're PayPal paper entitled to cast, relax. Asian gives distributions and the basin restoration of images with published in nineteen Eighty-four in the journal. I trip we transactions on pattern analysis and machine intelligence in the game. And given paper they discussed a special Monte Carlo. Mark of chain. Algorithm. Called the Gibbs sampler given the positivity assumption. We can always represent the evidence for particular pattern of binary variable values in terms of some function or role that maps that pattern of binary values into a number such a function or rule is called the energy function of the Mark of random fill now for the good news and the bad news. The good news is that in principle. We can write down mathematical formula. Which let's calculate the probability of any arbitrary pattern of values of the variables whose values are unknown in terms of energy function. Moreover, energy functions usually very easy to compute. Okay. Here's the bad news. The bad news is that for problems involving more than twelve or fifteen variables is often computational impossible using the fastest computers to calculate desired probabilities using the energy function. This is where the concept of Monte-Carlo markup, chain computing becomes important in Monte Carlo. Mark of chain or MC MC? Algorithm is basically an hour that which randomly transforms went pad and values into another pattern of values. The calculations are confrontationally simple and easy to perform regardless of the number of variables. For example. This method can be and has been applied to problems involving infants over thousands of variables car spying searching through a collection of binary patterns, which is many times greater than the number of atoms in the universe. The first important example of a Monte-Carlo markup chain algorithm is called metropolis algorithm. The sensu idea of the metropolis algorithm is as follows remember the energy of a pattern of variable. Values is a number which is typically easy to calculate smaller. Values of energy Carson to patterns of values, which are better solutions to the constraint. Satisfaction problem. Okay. Now, we are going to explain how the basic ideas of how metropolis Monte Carlo markup chain. Our the marks will use the terminology unclamps variable to refer to a variable whose values are allowed to change the unclamps variables car sponde- to the variables whose values we want to figure out. We also use the terminology clamped variable to revert to variables whose values are not allowed to change the clamp variables car sponsor. The variables whose values we know. Oh, okay. Let's now introduce the metropolis Monte Carlo Mark of chain algorithm. In step wine. We pick a pattern of values for the unclamps variables at random instead to we randomly select another pattern of values for the variables which could be based upon the current values of unclamps variables as well. As the current clamps variables in step three. We evaluate the energy function at both patterns of variable values, the original counter variable values, we picked step one and the second pattern of variable values we picked and step to if we find that the energy function evaluated that pattern of variable values. We guessed that in step two is lower than that energy function value at the pattern of variable values we had in step one. Then we go ahead and pick and use. The pattern of variable values that we randomly selected in step to on the other hand suppose that the pattern of variable values we picked step to how they greater energy than the Padma variable a values we had in step one in that case what we're gonna do is. We're going to take the difference of those two energies and use that energy difference to wait a coin. So that the coin is more likely to come up heads when that energy difference is large in then we're gonna flip that coin. And if it comes up heads, we're going to go ahead and take the new pattern of variable values selected in step to even though it has a lower very higher energy level. So in other words, if the coin flip comes up. Heads. We're going to keep the new pattern of variable values, even though that pattern of values is an inferior solution relative to that pattern of values. We had and step one and instead of four simply keep track separately of patterns of variable values, which have very low energy on a separate list. So these are good potential solutions and step five we go back to step two, and repeat this process over and over and over again, and this is the metropolis Monte Carlo markup chain, our though now as I said before sometimes the Trump was Monte Carlo marketing algorithm changes a pattern of variable values, which has a particular energy into Annetta pattern of variable values, which has greater energy value. This means the metropolis Monte Carlo. Mark of chain is actually sometimes selecting a pattern of values, which is worse. Lou. Shen when it makes a particular step, although it may seem a little non intuitive that when would be would want to do this. This is one of the great secrets to Monte-Carlo markup. Jane algorithms, the algorithm does this because sometimes in order to get to a really great solution. You might have to first visit some solutions, which are not so good. A second important special case of the metropolis Hastings. Our them is the gift sampler MC MC our them. Which is a more general version of the well-known vaults machine. MC MCI them the central idea of the gift sampler Monte-Carlo market chain. Algorithm is as follows a step one. We pick one variable from the random field at random, and then we use are probably stick if enrolls to calculate the probability that the variable. Value is equal to one. We then flip awaited coin which comes up heads with probability p and this probability p is said before calculated from step two. And if the coin comes up heads than we set the variable equal one, otherwise the set the variable equal zero also this probability p is also based upon energy difference. We keep track of the powder values, which are very low energy are separate list. And then go back to step two and repeat this above procedure of rain over and over again, thus in a Monte Carlo Markov chain algorithm. The pattern of variable values is progressively perturbed randomly, but the chance of selecting a particular pattern of values depends upon energy function. It can be shown that for any arbitrary. Mark arena. Field is always possible to construct a Monte-Carlo marketing algorithm. Moreover, it can be shown that a Monte Carlo Marcucci algorithm as remarkable property if the purchase of values is repeated multiple times eventually the percentage of times that a particular pattern of our values is visit. Did converge to the probability that values as specified by the Mark Fields energy function. This means that low energy states car spying to good solutions to the constraint. Satisfaction. Problem will be visited more frequently provided the algorithm is allowed to continue for a long enough time period. Thus it is only necessary for the computer three court situations with a pattern of variable values have low energy during this random process. The situations with have the lowest energy obtained as a result of the Monte-Carlo. Mark of genes. Mark decarlo markup chains out rhythms journey through the state space are then considered as final solutions to the constraint satisfaction problem. But that's not all there is even more. Good news. The rate of convergence of a Monte-Carlo Martine. Our them can't be shown to be. Geometric which in English means super fast. There are versions of Monte-Carlo marketing algorithms which involved approach called simulated annealing, which involves decreasing especial parameter called the temperature parameter our zero as the temperature parameters decreased the probability of randomly moving from a low energy state to a high energy state, gradually decreases that is the chance that you will take a step to an inferior solution is decreased. Now, these simulated annealing methods are very useful that we will not focus on such methods here because the rate of convergence of a Monte-Carlo markup chain simulated annealing. Our them is logarithmic, which in English means super slow. Moreover, the mathematics is more complicated. Simulated annealing methods will be disgusting greater detail in a future episode now if you go to the website, WWW dot learning machines went one dot com and click on episode twenty one then scroll to the bottom of episode twenty one you will find a pointer to the week appeared entry, which describes how to implement a ticker type of Monte Carlo Marcucci, our them 'cause metropolis Hastings. Mc MC our them. The metropolis Hastings MC MCI were them includes the popular give sampler metropolis and balsa machine. Monte-carlo Mark of out rhythms. As special cases. This. Approach of Monte-Carlo markup chain inference is the exciting approach which has numerous applications, especially with the availability of high speed computing resources. I will now go ahead and list. Just a few example, applications of Monte Carlo Martina algorithms, which are not in the area of medical diagnosis. Our first example is related to an area of interest mentioned by a new member alerting Sheen's went went community named Steven Steven is interested in the problem of unsupervised classification of spatial data. Michael Brandon fields are ideally suited for this type about Gatien each variable that field corresponds to a different spatial location. In addition, we allow each level in the field to take on many possible values car spawning to the set of possible objects at that spatial location south, for example, suppose that one studying how rivers erode landmasses when my. Have a large collection of variables car spawning. Two different spatial locations in geographic region corresponding to a river surrounded by landmasses. We call these locations sites SITES at each site. We assume that either one landmasses present without evidence of erosion to landmasses present with evidence of erosion or three water is present in practice the market van and field. My have thousands of such site variables where each site variable can take on one three possible values. Now, we incorporate simple probabilistic yet then rules such as the probability that a particular target site is a land mass without evidence of erosion will become larger when there are more sites surrounding the target site, which are landmasses without evidence of erosion the probability that. A particular target site is a land mass with evidence of erosion will become larger when there are more sites surrounding the target site, which are landmasses wit evidence of erosion, the probability that a particular target site is a water mass will become larger when there are more sites surrounding the target site, which are tight water mass. These three simple probabilistic if than logical roles can then be used to support spatial classification in a particular spatial classification problem. It is assumed that some of the sites and spatial classification are known and therefore clamped, for example, it might be known that this particular sites are definitely landmasses and particular sites are definitely river, a water, and particular sites are definitely landmasses in the process of erosion, however, there may be many other sites which are difficult to classify when can use a Monte Carlo markup chain type of algorithm to infer. For most probable values of unclamps site variables given the clamp variables this is cheap. As previously mentioned by using a Monte Carlo market chain our them to update the state variables of the Mark of field and keeping track of which padded. State variables have the smallest energy notice that in the medical diagnosis example, each variable in the market field, car sponsor awfully to one problematic. If then role, thus if you had one hundred variables that you have about one hundred probabilistic if then rules in this application concerned with NC provides classification of spatial data. There might be millions of variables. But still when my have only a few probabilistic, if then rules, the reason why this is possible in this application is that it is assumed that the probabilistic if then rules, which are defined are shared by the millions of variables in the markup field. This assumption sometimes called the spatial homogeneity assumption Ma. Monte-carlo Mark of chain. Algorithms have also been used to provide optimal layouts for integrated circuits when way to set up such a problem would be to assign each component in the integrated circuit design a variable. Value indicating one how it is connected to other components to Dario tation component and three the location of the component the energy function is chosen to be the total volume of the layout of the components. Another example application is to TISCO, natural language processing in this case. Each clamps variable corresponds to a particular word of phrase associated with each clamps variable is an unclaimed variable indicating this tactic or semantic interpretation of that word or phrase additional and clamped variables can also be included which car spun high Orson tactic structures such as now razors or higher order semantic structures. That is in addition to probabilistic if then rolls, which specify the probability that particular word or word. It should be labeled in a particular manner. One can have additional probabilistic if then rules specifying, which patterns of labels applause -able, and which are implausible a third example is the scheduling problem when approach for dealing with scheduling problems suppose, the goal of the scheduling problem is to determine whether particular employee should work the day shift the night ship or the late night shift for each employee and each day of the week. A particular variable is assigned which takes on the three possible values of day shift night shift and late night shift. Then we have probabilistic constraints such as one a certain number of employees must work at each shift to some employees may be on vacation three seven employees, reversers shifts for particular days and four one wants to avoid a signing a late night shift followed by day shift and similar situations. So in conclusion, the Monte-Carlo markup chain methodology is. Generic methodology, which is Steph applicable to a broad range of complicated, constraint satisfaction problems. It has the vantage that it affords considerable flexibility in dealing with a broad range of problems in a straightforward manner. However, it is important to emphasize that for specific types of small-scale problems with appropriate constraints, computer, scientists have discovered sophisticated strategies for forming a search versus Lucien we will not discuss the solutions for these specialized problems today, however, the Gulf today's discussion was introduced the powerful Monte Carlo markup chain methodology as possible to for consideration in the solution of very complicated, and very high dimensional probabilistic constraint satisfaction problems. In addition. This entire podcast did not discuss the problem of learning probabilistic constraints. Rather the focus was on using probabilistic constraints in a future podcast. We will discuss how when can build a probabilistic constraint satisfaction machine that learns from experience without a teacher. And that is we will describe unsupervised learning procedures for probabilistic constraint satisfaction machines in a future episode of learning machines went a one finally as you may have noticed I discussed an application today, which was relevant to one of the members of learning machines went went community. If you are a member of learning machines went went community lease update your user profile. You can update your user profile when you receive the Email newsletter by simply clicking on let us know what you want to hear link or if you're not a member of the learning machines went to win community when you join the community by visiting our website at www learning machines went to win dot com. You will have the opportunity to update your user profile at that time because from time to time I will review the profiles of members of learning. Machines wen went community. And do my best to talk about topics of interest to the members of this group. And finally, visit our website WWW dot learning. Sheen's went dot com where you can find the transcript of today's podcast and after website, you'll find a list of references to help you further. Explore these ideas. Thank you for participating in today's podcast with me. Dr Richard golden you were encouraged to visit the website WWW dot learning. The Sheen's when one dot com where you'll have the opportunity to comment on today's podcast read, the comments of other listeners obtain free supplemental references to related material obtain a free copy of the notes for today's podcast Willis notes for previous podcasts at those as well. In addition, you'll have access to free machine learning software, and you can listen to a learning machines one and one on IT tunes, Stitcher. Tune in radio and at such as beyond Todd podcast Republic. Double twists and pod kicker. If you liked the show and would like to support the development of learning machines went one, it is very important that you submit a review of this show, two items and Stitcher by visiting the website, WWW dot learning machines went to one dot com and Clinton the Batman submit every two I tune or submit a review fister your view will be greatly valued and greatly appreciated. Why's it so important that you submit a review because my active in this podcast series? It's the sheer towards the information about the field of machine learning and artificial intelligence with the general public. If this podcast cannot be found, and I tuned their Stitcher. Then my objective to share my podcast. We'll be more difficult to achieve. No, I do not know for sure. It is extremely probable that machine. Learning our examined the review the minute by Stitcher, and then use the ratings of those reviews to decrease for increase, the likelihood that a show will be found when went searches for the show using third gentian. So please support this show and submit your views to tune Institure by visiting the website, WWW dot learning machines, Leno one dot com. Thank you so much for your support. And finally, don't worry. This sound recording and its contents. He's copyright two thousand fourteen by far consulting Inc. All rights reserved. The music is provided by Jobe dot com.