# Maki Vac, Jenkins Morgan, Arima discussed on Data Skeptic

### Automatic TRANSCRIPT

And at the same time, we have several thousands of stores. So the combination of unique SKUs and stores is usually in the region of millions or even billions sometimes for instance for Walmart or target. So when you have such a massive number of times series and nowadays, you usually have to produce forecasts quite frequently for the fresh produce it might be even twice a day, but you need to produce forecasts. You can imagine that the computational burden is quite relevant here, publishing focus for a simple time series it might be half a second using standard physical methods. But if you multiply this half second with billions of SKUs based time series, then this becomes problematic. And usually companies use web services to produce this forecast. So all of this comes to a cost that is effectively measurably. Yeah, I can see if I have to do so many of these time series forecasts half a second or however long it takes times a billion is a lot, but why can't I just lean into the cloud and do it all massively parallel? You come, but still this again comes to a cost. So it could employ emotional web services, a big cluster, but at the end of the month, Amazon will send you a bill for the usage of that. And of course, the molecule have consumed the bigger the bill is. The point here of our research is how we can effectively minimize this cost of the computation for producing difficult forecasts. However, without harming the forecast calculation, so keeping the focused accuracy the same, if not better, but at the same time, making the computational burden less. In some ways, that sounds like a free lunch. What's the secret behind it? Well, the ship behind it is that there are so many models nowadays. In our attempt as academics to capture every possible pattern that exists out there every possible data generation process. We have a device over the years, all of these many different models, all these many different variations of exponential smoothing or arima models. And the fact is that if we even were in a position to select the best model in the best parameters, nobody would currently accept this particular model instead of parameters would be still the best in the future. So even if this is the best data we have, maybe another model will be the best to produce good forecasts. So my point is that we don't really need all these number of all these large number of models either in the exponential motion family or in the Remo family. So using less models and still being able to capture the basic time series patterns that is trend in seasonality, then effectively your search for optimal decreases. And you have a good enough model to produce good enough forecasts for whatever comes ahead, which you don't know whether this might be the same as before. It might be something that changes. And of course, this line of research, if we go back in the 80s with the work of maki vac and the forecasting competition back then, where he showed back then that simply exponential's moving models can effectively outperform the state of the art back then, which was arima book and Jenkins Morgan. However, since then, the exponential family increased a lot in size. So back then we have maybe a handful of models. In our days we have 30 different exponentials moving on. And that's exactly my point. Do we need all of these models? Do we really need or if you wish are we able to differentiate between a certain combination of trend and seasonality that is let's say multiplicative trends with additionality or multiplicative additive spend and so on and so forth, or maybe we should test have a set of simple models that collectively can cover all these different focus profiles and thus decrease the computational cost, the search for an optimal model if you use. So the whole research lies in the lines of simplicity. So simple models could produce good results even better results from more complicated models. But also lies in the sphere of some of the malady, which is effectively even if your model or your parameters are suboptimal, then this will not necessarily harm the forecast arches because optimal or suboptimal this categorization is based on the current data and future data might not be the same as the current data. The idea of a simpler model is very intuitively appealing to me. I think occam's razor is a good heuristic. But I'm curious how you measure simplicity or maybe you measure its inverse complexity. How do you know how simple or complex a particular model is? In this particular case, we explore different language of models. And I think we take a different approach when it comes to this between exponential think and arima. I will start with sarima, which is more straightforward. More complex animal model is a model with a higher degree of parameters. So in other places and moving average components and you may have several orders authority and at the same time, several are all moving average. And I'll find things say that when it comes to yearly and quarterly data, a maximum order of one or two is enough to achieve the best performance either in terms of point forecast accuracy or uncertainty, but at the same time, the least completion of cost. So for arima family of models, the complexity essentially is measured by the order of the model. Index fund is more thinking, though, then we have three components if you wish. We have level trend in seasonality, and while we don't decrease the number of components, what we do, we try to reduce the number of models through which our algorithm goes. So instead of going through all 30 models of the transitional thing, we define add induced set of models with contains 8 models and these models can cover all the focus on profiles when it comes to trend and seasonality. Could you elaborate on what those 8 models are and why they were selected as the 8 of choice. So we have 8 exponentials moving models in our reduced set. And these are mobile contains only level, no threat or sustainability. And what they would contain strength are modeling and a model would stand in seasonality. And here effectively we have four models and then we have another four models for our multiplicative type of error. So we have four models for other type of error and four models for the macula backfire. Overall, models that we exclude are models with multiplicative trench, which have shown to have explosive forecasts, and also we exclude models where the type of the error does not match the type of decision value. Well, if you're able to get pretty good if not better forecasts without using any multiplicative trends, it almost implies that set of research or things working with that area aren't adding a lot of value to the big picture here. Is there some tradeoff I'm not seeing? Are you maybe making your approach, not applicable to some niche area? How can you get by without this popular technique? Well, I think that including multiplicative trench from Africa perspective makes sense and makes if you wish the whole exponential moving framework more complete. At the same time, though the exclusion of a multiplicative trench is something that rope high and I discussed, I think it was 2015 of 2016 and after our discussions from device he said, very popular focused package for our software to set the EPS function which produces forecast for 20 more things. So that might look at the trends are excluded by default. We experimented with some M competition data. I'm architect competition data. We saw that excluding multiple trends actually brings us better results. So as I said, I think the same argument goes back to the 70s already,.