Skip to main content

When & Where to Stop Predictive Modeling?

When a data scientist develops a Predictive Model, he doesn’t know where to stop, when to stop and which model alternative to select? Here is what I think should be a framework to follow. The AIRS (accuracy, Implementability, reliability and stability) framework will help to take a scientific decision.
 Let us describe how it will work.
 Accuracy:
When one develops a predictive model, he always decides what should be the accuracy value to stop the model development iteration. Though this target was decided and accepted based on a predefined measure that was agreed with the business or with the customer beforehand, the data scientist has a major role to play during model development/iteration process. The predefined measure can be qualitative and/or quantitative. This measure can be evaluated during model development (in-sample) or after model in model production (out-sample). For example, if you are developing a forecasting model for sales, we consider Mean Absolute Percentage Error (MAPE) as a measure of accuracy. This will be measured during the model development as well as being measured during model in implementation and in production.
 Implementability:
The more the data, as we all know, we can develop a very accurate model to solve a business challenge. The data dimensions can be – duration of data availability, intervals of data collected, number of variables. The better the analytics infrastructure, the data scientist will have better flexibility to choose a right or advanced modeling technique that helps further to develop a more accurate model. The analytics infrastructure can be – software (standard vs. open source), hardware, etc. And the last is manpower skill/capability required to continue the model in implementation and in production. These three sub-level parameters let’s be defined as DIM (data, infrastructure and manpower). If you would like to attain highest accuracy, then you have to keep the DIM in highest maturity that incurs you very high cost and so the trade-off point is needed.
 Reliability:
As everything in the world has its own self-life, predictive model is not an exception. The accuracy of model decays over time. You thought that you stopped your model iteration because you were satisfied with its accuracy during development/iteration process. But, it may be providing you the absurd results while you are implementing or in production. Hence, you have to test your model across the various time dimensions – in terms of business volume growth, data growth and in real time value (like day, week, months, etc), etc. Test if this provides you the consistent accuracy across all these time dimensions.
 Stability:
As like stock exchanges observe sudden spike/fall, all businesses also observe the similar pattern. Frequency of this spike/fall may be different, but, it exists. For example, sales for retailer in festive/some special event will spike/fall depending on the nature of festival/the special event; for energy consumer, the sudden fall in temperature due to rainfall in a hot summer drastically reduces the energy consumption, and so on. These situations will stay there in every business. As the occurrence of this situation is uncertain, hence, this will reduce the accuracy of your predictive model because of either that situation was not considered in the model or there is less time to adapt. Hence, your predictive model should adapt to this situation - how much and how fast.
While you are developing, you may have developed few alternative models. Now, provide your score – 1 to 10 scales for all these alternative models for all the AIRS criteria. Sometimes, you may have to provide qualitative value. Choose the model or competing models which provides you best score.

Popular posts from this blog

Data analytics services market in India to triple by 2015

Techniques for Data Dimensionality Reduction

Text Analytics Platforms Part 1