Posts

The Future of Healthcare: How Data Science and AI are Revolutionizing Medicine

Image
The healthcare industry is constantly evolving, and advancements in technology are playing a crucial role in transforming the way medicine is practiced. One of the most promising areas of technological innovation is the application of data science and artificial intelligence (AI) in healthcare. Data science and AI have the potential to revolutionize medicine by helping healthcare professionals make better, faster, and more accurate diagnoses, improve treatment outcomes, and ultimately save lives. One of the most significant ways that data science and AI are being used in healthcare is through medical imaging. Medical imaging is a critical component of diagnosing and treating many diseases and conditions, including cancer, heart disease, and neurological disorders. AI algorithms can analyze medical images and identify abnormalities with greater accuracy and speed than human experts. For example, a study published in the journal Nature in 2018 found that an AI system was able to detect b...

k means clustering

In today's world, data plays an integral role in driving business decisions. One of the most common and effective ways of analyzing data is through machine learning algorithms. K-Means clustering is an unsupervised machine learning algorithm that has gained popularity for its efficiency in data clustering. What is K-Means Clustering? K-Means clustering is an algorithm used to create clusters based on the data fed to the machine. The algorithm works by randomly assigning a color to a few data points, known as the centroid kids, and grouping the surrounding data points based on the mean distance from each centroid. Applying K-Means Clustering to Customer Segmentation For instance, retailers can use K-Means clustering to decide which customer gets promotional offers. They can create three clusters of customers, namely the loyal, somewhat loyal, and lowest priced shoppers, based on their shopping patterns. Then, they can create strategies to convert somewhat loyal customers into loyal ...

Text Analytics Platforms Part 1

Text analytics is still largely an immature science, and embraces several different approaches. Natural language processing (NLP) includes dozens of techniques for accomplishing tasks such as language translation, document categorization and tagging, extraction of meaningful terms and so on. Text mining on the other hand is primarily concerned with the extraction of meaningful metrics from unstructured text data so they can be fed into data mining algorithms for pattern discovery. Some suppliers have applied text analytics to very specific business problems, usually centering on customer data and sentiment analysis. This is an evolving field and the next few years should see significant progress. Other suppliers provide NLP based technologies so that documents can be categorized and meaning extracted from them. Text mining platforms are a more recent phenomenon and provide a mechanism to discover patterns which might be used in operational activities. Text is used to generate extra...

Techniques for Data Dimensionality Reduction

These are some of common Data Dimensionality Reduction before any variable reduction technique Missing Values Low Variance Filter. ... High Correlation Filter. ... Random Forests / Ensemble Trees. ... Principal Component Analysis (PCA). ... Backward Feature Elimination. ... Forward Feature Construction. Please look out for details of each of the method

Tidyverse package in R

Tidyverse package in R - It will open many data wrangling and visualization opportunities.

Web Scrapping code in R

Image

Hottest Skill in 2017: Analytics

 In a comprehensive research carried out by Analytics India Magazine and AnalytixLabs, a key finding denotes that there has been a 22% increase in the average salaries of analytics professionals in India since the last year. The Analytics India Salary Study 2017, suggests that for the year 2017, the average salary has been INR 11.7 Lacs across all experience levels and skill sets in comparison to INR 9.5 Lacs in 2016. The study gives an insight on various aspects of salary structure for professionals across factors such as experience level, cities, tools, company types and more. The study suggests an increased demand for senior professionals, thus pushing the average salaries higher than the last year. The percentage of analytics professionals with salaries in higher bracket of INR 50+ Lacs has significantly increased to 3.7% from just 1.1% a year ago. While the percentage of analytics professionals commanding salaries of less than INR 10 Lacs has gone lower; the percent...

Different types of sampling methods

1. Simple Random Sampling : A simple random sample (SRS) of size  n  is produced by a scheme which ensures that each subgroup of the population of size  n  has an equal probability of being chosen as the sample 2. Stratified Random Sampling : Divide the population into "strata". There can be any number of these. Then choose a simple random sample from each stratum. Combine those into the overall sample. That is a stratified random sample. (Example: Mosque A has 600 women and 400 women as members. One way to get a stratified random sample of size 30 is to take a SRS of 18 women from the 600 women and another SRS of 12 men from the 400 men.) 3. Multi-Stage Sampling : Sometimes the population is too large and scattered for it to be practical to make a list of the entire population from which to draw a SRS. For instance, when the a polling organization samples US voters, they do not do a SRS. Since voter lists are compiled by counties, they might first do a sam...

Multinomial logistic regression

Multinomial Logistic Regression is the linear regression analysis to conduct when the dependent variable is nominal with more than two levels.  Thus it is an extension of logistic regression, which analyzes dichotomous (binary) dependents Multinomial regression is used to describe data and to explain the relationship between one dependent nominal variable and one or more continuous-level(interval or ratio scale) independent variables. Multinomial logistic regression is known by a variety of other names, including  polytomous LR ,   multiclass LR ,  softmax  regression ,  multinomial logit ,  maximum entropy  ( MaxEnt ) classifier,  conditional maximum entropy model Multinomial logistic regression is a particular solution to the classification problem that assumes that a linear combination of the observed features and some problem-specific parameters can be used to determine the probability of each particular outcome of the ...

When & Where to Stop Predictive Modeling?

Share on LinkedIn Share on Facebook Share on Twitter When a data scientist develops a Predictive Model, he doesn’t know where to stop, when to stop and which model alternative to select? Here is what I think should be a framework to follow. The AIRS (accuracy, Implementability, reliability and stability) framework will help to take a scientific decision.  Let us describe how it will work.   Accuracy: When one develops a predictive model, he always decides what should be the accuracy value to stop the model development iteration. Though this target was decided and accepted based on a predefined measure that was agreed with the business or with the customer beforehand, the data scientist has a major role to play during model development/iteration process. The predefined measure can be qualitative and/or quantitative. This measure can be evaluated during model development (in-sample) or after model in model production (out-sample). For example, if you are developi...

2016 Predictions For Data & Analytics In Relation To The Skills Gap

In the coming year Big Data technology will continue to be the big thing for business. Experts estimate the amount of data some companies hold could be worth $8 trillion or more. The access to such a massive amount of data can be both a curse and a blessing for companies, as this would help reduce the time take in decision making on one hand and on the other hand increase the need to hire the right kind of resources to draw meaningful insights from this data.  To leverage big data, companies will have to overcome an enormous skills gap in the talent market. In fact, one of the LinkedIn reports predicted 'Data Scientist' to be the job of the year for 2015. While, we at Absolutdata feel 'Data Scientist' to be the job of the century.  In order to gear up for 2016 companies have already started hunting data scientists, and recruiters are scanning the market to fill the gap between demand and supply at a faster pace.  In order to do so, they have to be cautious about ...

Tool- Gephi

Gephi is an interactive visualization and exploration  platform  for all kinds of networks and complex systems, dynamic and hierarchical graphs. Exploratory Data Analysis : intuition-oriented analysis by networks manipulations in real time. Link Analysis : revealing the underlying structures of associations between objects, in particular in scale-free networks. Social Network Analysis : easy creation of social data connectors to map community organizations and small-world networks. Biological Network analysis : representing patterns of biological data. Poster creation : scientific work promotion with hi-quality printable maps. Open source application

NORMDIST function

Returns the normal distribution for the specified mean and standard deviation. This function has a very wide range of applications in statistics, including hypothesis testing. Important This function has been replaced with one or more new functions that may provide improved accuracy and whose names better reflect their usage. Although this function is still available for backward compatibility, you should consider using the new functions from now on, because this function may not be available in future versions of Excel. For more information about the new function, see NORM.DIST function. Syntax NORMDIST(x,mean,standard_dev,cumulative) The NORMDIST function syntax has the following arguments: X Required. The value for which you want the distribution. Mean Required. The arithmetic mean of the distribution. Standard_dev Required. The standard deviation of the distribution. Cumulative Required. A logical value that determines the form of the function. If cumulative is TRUE, NORMDIST retu...

Mastering Story Telling

For you  Mastering Story Telling  should know the following: •             Explain why storytelling is beneficial •             Describe the five essential elements to consider when you craft a story •             Create an audience profile by developing personas •             Cull insights from your work that will draw the attention of your audience •             Determine the context(s) in which you will present your story and the special considerations needed for each situation •             Structure your story so that it has a beginning, a middle and an end •             Plan your storyli...

Story Telling

 These are the  pillars  of story telling 1. Audience   Learning about your primary audience members is essential to crafting a meaningful story. Use the persona template to establish the characteristics of your key audience groups. As you continue developing your story, revisit your audience profile to ensure that your messages are on point     2. SOCIAL STYLEs®   SOCIAL STYLE® is a measurement of observable behavior that explains how people perceive and are affected by others' behavior. The SOCIAL STYLEs® model organizes behavior into four types (Analytical, Amiable, Expressive and Driving). Leveraging this knowledge can help you tailor your presentations and stories when you know the people with whom you will be working.     Insights   Though it might be tempting to delve into an explanation of all of the insights you’ve developed, remember that your story should speak to the business need your audience wan...

Creating SAS datasets

You will make up your own names for your SAS datasets and variables. These names must conform to these rules: no longer than 8 characters, start with a letter, and contain only letters, numbers, or underscores (_). SAS is not case-sensitive. You can use capital or lowercase letters in your SAS variables. However, when you specify filenames (as you do with the include and file SAS commands), you must type it exactly as it exists in UNIX. The DATA step The data step is used to describe and modify your data. Within the data step you tell SAS how to read the data and generate or delete variables and observations. The data step transforms your raw data into a SAS dataset. There are four statements that are commonly used in the DATA Step DATA statement names the dataset INPUT statement lists names of the variables CARDS statement indicates that data lines immediately follow. INFILE statement indicates that data is in a file and the name of the file. Generally, the data s...

What is Web Analytics?

Today, I was asked in an interview. What is web analytics? Tell me all you know about Web Analytics. This may not be a perfect answer, but gives a direction to answer the same. Web analytics is the measurement, collection, analysis and reporting ofweb data for purposes of understanding and optimizing web usage.[1][dead link] Web analytics is not just a tool for measuring web traffic but can be used as a tool for business and market research, and to assess and improve the effectiveness of a web site. Web analytics applications can also help companies measure the results of traditional print or broadcast advertising campaigns. It helps one to estimate how traffic to a website changes after the launch of a new advertising campaign. Web analytics provides information about the number of visitors to a website and the number of page views. It helps gauge traffic and popularity trends which is useful for market research. There are two categories of web analytics; off-site and on...