The ‘Problem’ of Unstructured and Semi-structured Data
Traditionally business intelligence was like the way of the ninjas – exclusively the domain of data scientists. Data was stored, queried, and analyzed by data scientists to understand the what, where, when, how, and why behind some happening. Data was structured, and generated in much smaller volumes than today. However, the biggest difference between data from then and now, is the variety of data being generated, and this not just by data scientists, but predominantly by consumers. Today, we also deal with unstructured, and semi-structured data. This data resides on the billions of social media pages across the Web, and is fueled by the ease of access to the Internet from the multitude of connected devices that are integral to our lifestyle. There’s also the gray area of semi-structured data that exists in HTML, text files, and PDF documents, which may have some structure in the form of tags, and markers, but for the most part is unstructured text. While this unstructured and semi-structured data may be seen as having little or no value by some business analysts, for those pushing the frontier of data analysis, it presents a gold mine of an opportunity to derive value out of. This opportunity is in the form of predictive analytics.Predictive Analytics for Desired Outcomes
- Descriptive model: This method analyzes past performance by mining historical and current data to decide a course of action. Descriptive models identify many different relationships between customers or products, and decide what approach needs to be taken going forward. Almost all management reporting such as sales, marketing, operations, and finance, uses this type of post-mortem analysis. It seeks to answer the questions – what is happening? how many? how often? where? when? what exactly is the problem? and what actions need to be taken?
- Predictive model: Analyze past performance to assess how likely a customer is to exhibit a specific behavior. The focus on predicting a single customer behavior such as credit risk. It addresses the questions – what could happen? what if these trends continue? and what will happen next if…?
- Prescriptive model: Also known as decision models, it describes the relationship between all the elements of a decision involving variables in order to predict the results of those decisions. It asks the questions – how can we achieve the best outcome? how can we address variability? what other product would they be interested in?
Rinse-and-Repeat Approach
We can’t ignore the possible pitfalls of predictive analytics. 100% accuracy of analysis is not possible in most, if not all, cases of predictive analysis because of the following reasons:- Historical data cannot decisively predict the future
- There may be unknown variables that are not accounted for when training the predictive analytics model
- The models can be manipulated to show biased, and unrealistic predictions
If you enjoyed reading this, be sure to check out the other posts in this series:
Part 2 – 5 Businesses on the Frontier of Predictive Analytics
Part 3 – 4 More Businesses on the Frontier of Predictive Analytics
Part 4 – 3 Insanely Great Dashboards from Recorded Future
Part 5 – Stripping Down the Gorgeous Sift Science Dashboard
Part 6 – 9 Ways We Use Predictive Analytics Without Even Knowing It
Asar
September 6, 2013, 12:44 amHi Twain,
Nice post indeed! With tons of data (most of them unstructured) pouring by the milliseconds, there is a growing need to glean and mash them into digestible bits of useful information. With the deluge of Big Data (I’m still confused how big it is, or could get) today, end-users are no longer interested in looking at tables, they’d rather look at charts and graphs and decide things on the fly. This is where Data Visualization helps them extract information. So, in a way it is the visualization which leads them to delve deeper into the data and extract meaningful information.
Waiting for the next part.. 😀
Regards,
Asar
twain
September 13, 2013, 4:18 pmCouldn’t have said it better, Asar. Finding meaning from all the data is what it’s all about. I can’t wait to get to the visualization part of this series, which is coming up. For now, I’ve posted the second part in this series which you should check out at https://ow.ly/oOxD9