Posts

Different types of sampling methods

1. Simple Random Sampling : A simple random sample (SRS) of size  n  is produced by a scheme which ensures that each subgroup of the population of size  n  has an equal probability of being chosen as the sample 2. Stratified Random Sampling : Divide the population into "strata". There can be any number of these. Then choose a simple random sample from each stratum. Combine those into the overall sample. That is a stratified random sample. (Example: Mosque A has 600 women and 400 women as members. One way to get a stratified random sample of size 30 is to take a SRS of 18 women from the 600 women and another SRS of 12 men from the 400 men.) 3. Multi-Stage Sampling : Sometimes the population is too large and scattered for it to be practical to make a list of the entire population from which to draw a SRS. For instance, when the a polling organization samples US voters, they do not do a SRS. Since voter lists are compiled by counties, they might first do a sample of th

Multinomial logistic regression

Multinomial Logistic Regression is the linear regression analysis to conduct when the dependent variable is nominal with more than two levels.  Thus it is an extension of logistic regression, which analyzes dichotomous (binary) dependents Multinomial regression is used to describe data and to explain the relationship between one dependent nominal variable and one or more continuous-level(interval or ratio scale) independent variables. Multinomial logistic regression is known by a variety of other names, including  polytomous LR ,   multiclass LR ,  softmax  regression ,  multinomial logit ,  maximum entropy  ( MaxEnt ) classifier,  conditional maximum entropy model Multinomial logistic regression is a particular solution to the classification problem that assumes that a linear combination of the observed features and some problem-specific parameters can be used to determine the probability of each particular outcome of the dependent variable. 

When & Where to Stop Predictive Modeling?

Share on LinkedIn Share on Facebook Share on Twitter When a data scientist develops a Predictive Model, he doesn’t know where to stop, when to stop and which model alternative to select? Here is what I think should be a framework to follow. The AIRS (accuracy, Implementability, reliability and stability) framework will help to take a scientific decision.  Let us describe how it will work.   Accuracy: When one develops a predictive model, he always decides what should be the accuracy value to stop the model development iteration. Though this target was decided and accepted based on a predefined measure that was agreed with the business or with the customer beforehand, the data scientist has a major role to play during model development/iteration process. The predefined measure can be qualitative and/or quantitative. This measure can be evaluated during model development (in-sample) or after model in model production (out-sample). For example, if you are developing a fore

2016 Predictions For Data & Analytics In Relation To The Skills Gap

In the coming year Big Data technology will continue to be the big thing for business. Experts estimate the amount of data some companies hold could be worth $8 trillion or more. The access to such a massive amount of data can be both a curse and a blessing for companies, as this would help reduce the time take in decision making on one hand and on the other hand increase the need to hire the right kind of resources to draw meaningful insights from this data.  To leverage big data, companies will have to overcome an enormous skills gap in the talent market. In fact, one of the LinkedIn reports predicted 'Data Scientist' to be the job of the year for 2015. While, we at Absolutdata feel 'Data Scientist' to be the job of the century.  In order to gear up for 2016 companies have already started hunting data scientists, and recruiters are scanning the market to fill the gap between demand and supply at a faster pace.  In order to do so, they have to be cautious about

Tool- Gephi

Gephi is an interactive visualization and exploration  platform  for all kinds of networks and complex systems, dynamic and hierarchical graphs. Exploratory Data Analysis : intuition-oriented analysis by networks manipulations in real time. Link Analysis : revealing the underlying structures of associations between objects, in particular in scale-free networks. Social Network Analysis : easy creation of social data connectors to map community organizations and small-world networks. Biological Network analysis : representing patterns of biological data. Poster creation : scientific work promotion with hi-quality printable maps. Open source application

NORMDIST function

Returns the normal distribution for the specified mean and standard deviation. This function has a very wide range of applications in statistics, including hypothesis testing. Important This function has been replaced with one or more new functions that may provide improved accuracy and whose names better reflect their usage. Although this function is still available for backward compatibility, you should consider using the new functions from now on, because this function may not be available in future versions of Excel. For more information about the new function, see NORM.DIST function. Syntax NORMDIST(x,mean,standard_dev,cumulative) The NORMDIST function syntax has the following arguments: X Required. The value for which you want the distribution. Mean Required. The arithmetic mean of the distribution. Standard_dev Required. The standard deviation of the distribution. Cumulative Required. A logical value that determines the form of the function. If cumulative is TRUE, NORMDIST retu

Mastering Story Telling

For you  Mastering Story Telling  should know the following: •             Explain why storytelling is beneficial •             Describe the five essential elements to consider when you craft a story •             Create an audience profile by developing personas •             Cull insights from your work that will draw the attention of your audience •             Determine the context(s) in which you will present your story and the special considerations needed for each situation •             Structure your story so that it has a beginning, a middle and an end •             Plan your storyline using tools such as scripts, storyboards, experience maps and wireframes •             Supplement your story with visuals •             Practice your storytelling as a way to get feedback about the story’s strengths and weaknesses •             Locate templates for a  wide variety of story planning tools at the  Data Visualization Communit