Text Analytics Platforms Part 1
Text analytics is still largely an immature science, and embraces several different approaches. Natural language processing (NLP) includes dozens of techniques for accomplishing tasks such as language translation, document categorization and tagging, extraction of meaningful terms and so on. Text mining on the other hand is primarily concerned with the extraction of meaningful metrics from unstructured text data so they can be fed into data mining algorithms for pattern discovery.
Some suppliers have applied text analytics to very specific business problems, usually centering on customer data and sentiment analysis. This is an evolving field and the next few years should see significant progress. Other suppliers provide NLP based technologies so that documents can be categorized and meaning extracted from them. Text mining platforms are a more recent phenomenon and provide a mechanism to discover patterns which might be used in operational activities. Text is used to generate extra features which might be added to structured data for more accurate pattern discovery. There is of course overlap and most suppliers provide a mixture of capabilities. Finally we should not forget information retrieval, more often branded as enterprise search technology, where the aim is simply to provide a means of discovering and accessing data that are relevant to a particular query. This is a separate topic to a large extent, although again there is overlap.
The terms ‘text mining’, ‘machine learning’ and ‘natural language processing’ have different meaning depending on who you speak with. For our purposes ‘text mining’ is the application of algorithms to text data for the purpose of finding exploitable patterns. ‘Machine learning’ is when a software system learns something so that a task can be performed more effectively next time around. NLP is a set of capabilities which allow humans to extract meaning from text data.