As you might expect, I choose topics to write about in response to the conversations and questions that I hear week after week, mostly with clients, sometimes with colleagues and occasionally during industry events. There has been a lot of chatter about the cloud recently for obvious reasons, but this last week has been marked by two interesting conversations with very large global organisations in fast moving markets.
Essentially, it was the same question asked twice, but one that has been coming up in various ways all year. What stuck out in one of the meetings was the way the question was positioned by the CEO. It was clearly designed to be a challenge to the CIO: ‘I want to know the real activity in our key markets and across our product lines as and when I need to focus in. However, I can’t trust either the P&L reporting, as this is a smoothed and massaged set of figures, or the conventional reporting, as this doesn’t give me the flexibility to slice and dice what is actually behind the figures’.
So there you have it. A perfect example of the shift from traditional back office business intelligence (BI) reporting in a structured manner, to the unstructured use of various forms of data around front office market activities, a point I made in an earlier blog post. You can work out the obvious responses to how you might approach this problem by data collecting, analysis engines, etc. But all of this assumes that this is structured and repeatable (and expensive and time consuming to build and maintain as well) and for structured, substitute stability in business activities.
The real question in all of this is how to deal with rapidly changing dynamic situations which are inherently low on repeatability, may be even one offs, where the consequences are significant and the need for information with which to make a decision is vital. The second and equally unspoken part of this is that vast quantities of information are being created and stored internally and externally so why can’t this be accessed and used? Well, we all know the answer to that. It’s because the data is inconsistent in format and semantic or contextual meaning; so we need to extract, transform, and re-load it (ETL) into a database so it can be used. With that, we are right back where we started from with the same cost and time challenges posed by the construction of a structured data warehouse.
But do we need all three stages of ETL for transitory data that we will use for a very short period around something unique and then discard? There are now several technology vendors who are offering the tools for very effective extraction to access the data from a variety of chosen sources, and then to enable this to be held for direct use by some equally effective new analysis engines. Just think for a moment about this whole process. Normal reporting across an enterprise is about exceptions, so normalisation and standardisation is obviously a key aspect and given the amount of data, the sum of ‘human time’ required with the results needs to be low.
Now go back to the original question at the beginning of this piece, the people involved are already committed to wanting and needing to spend time on the topic, and they are not asking for a normalised comparison within their current reporting. So if you can extract the data you need from the sources that exist in a focussed enough manner then the final analysis of this critical data can be a lot more human extensive. Given the kind of situation that this is likely to be, i.e. a sharply focussed event or issue on the edge of the business that needs to be addressed in a very specific way, it becomes a much more manageable matter. The question is all about being able to access and import the key extracts of data from a wide selection of sources and formats to improve the information on which the decision will be based.
So who should do this and how? Well in both the defence and financial industries there are well established players able to deliver really impressively, but it’s always been expensive. Now it’s changing as the companies move into mainstream verticals. We are getting some great results with some players such as Kapow Technologies – have a look at their Web extraction capabilities. Or, take a look at the sophisticated work that Palantir have done with some government agencies in this excellent video.
I leave you with a last thought, if the goal of an organisation is to win business through better knowledge of markets then why would we be looking internally for the information we need to understand those markets, and activities? And, if we are not looking internally then why should our standard internal methods be the right approach for the very different external requirements?




CTO Blog

Andy
Quite an interesting debate on relevancy of productive BI solutions that offer rapid integration for Business Decision makers. The Data warehouse model of merging internal and external data and also using external data cleansing etc tools complicates rendition of decision making tools/reports for business. The one time merge and create all data prior to rendition may not be a relevant approach as Web 2.0, mashup tools for data have come of age. The traditional BI vendors are touting more including soem Cloud models and I think IT management would start looking for rapid decision making tools for business. Could as well be an IT innovation as BI takes center stage in 2010.
It is the response of the mainstream BI vendors of Oracle and SAP that is interesting in respect of this. Both have annouced and in the case of Oracle actually moved to deliver real -time BI driven from and by the middleware layer. This is clearly linked to the structured data of their applications. The question them comes up as to whether or how this could be mashup with (controlled) external sources as well.
There’s a highly readable whitepaper from SAP / BusinessObjects on this particular subject of unstructured data/text analysis.
http://download.sap.com/solutions/sapbusinessobjects/large/information-management/data-integration/textanalysis/brochures/download.epd?context=D803473CB7C3D1FE48229AC6C8498A02355C27884729E74639A20131A3EF9AE22B9AC2F47E05AD6BF986894C0D16C4CCED826B2187E56722
Thansk Hans – thats really helpfull. And some further news on this front is that SAP held an event for analysts, media and key customers that they called SAP Influencer 2009 which included that they would focus on Bus Objects and further integration through NetWeaver. If you go to the SAP Press room you will find some good stuff on this
Andy, I also recommend looking at http://www.qlikview.com/ who have some pretty impressive technology. I have worked alongside them in the past and was very impressed.
Also, IBM also announced earlier this month (http://www-03.ibm.com/press/us/en/pressrelease/28932.wss) the London Analytics Solutions Centre, I assume making better use of their acquisition of Cognos & SPSS, but also showing the importance on this area as you highlight.
I think we’re at a tipping point with BI. Yes, it makes sense that BI should be *the next big thing* in the new year, driven by the need to make sense of the massive volume of data we’re accumulated, but I doubt that BI in its current form is up to the task.
As one of the CEOs Andy spoke to mentioned “I want to know … when I need to focus in.” The CEOs problem is not more data, but the right data. As Andy rightfully points out in that earlier blog post, we’ve been focused on harvesting the value from our internal, manufactured data, ignoring the latent potential in our unstructured data (let alone the unstructured data we can find outside the enterprise). The challenge is not to find more data, but the right data.
It’s amazing how little data you need to make an effective decision–if you have the right data. Andrew McAfee wrote a nice blog post a few years ago (1. is the closest I can find to it), pointing out that the mass of data we pile into a conventional business case just clouds the issues, creating long cause-and-effect chains that make it hard to come to an effective decision. His solution was the one page business case: capability delivered, (rough) business requirements, solution footprint, and (rough) costing. It might be one page, but there is enough information, the *right* information, to make an effective decision. I’ve used his approach ever since.
Current BI seems to be approaching the horse from the wrong direction, much like Andrew’s business case problem. We focus on sifting through all the information we have, trying to glean any trends and correlations which might be useful. This works as small to moderate scales, but once we reach the *huge* end of the scale it starts to groan under its own weight. It’s the law of diminishing returns–adding more information to the mix will only have a moderate benefit compared to the effort required to integrate an process it.
A more productive method might be to use a hypothesis-driven approach (2.). Rather than look for anything that might be interesting, why not go spelunking for specific features which we know will be interesting? The features we’re looking for in the information are (almost always) to support a decision. Why not map out that decision, similar to how we map out the requires for a feedback loop in a control system, and identify the types of features that we need to support the decision we want to make (3.)? We can segment our data sets based on the features’ gross characteristics (inside vs. outside, predictive vs. historical …) and the search in the appropriate segments for the features we need. We’ve broken one large problem–find correlations in one massive data set–into a series of much more manageable tasks.
Finding the *right* features is our real challenge.
1. http://www.capgemini.com/ctoblog/2009/08/have_we_really_understood_what.php
2. http://www.cs.chalmers.se/Cs/Education/Courses/mdi/2006/lectures/Monator1.pdf
3. http://peter.evans-greenwood.com/2009/10/12/working-from-the-outside-in/
thanks for the helpfull url Nigel.
in answer to Peters point – as every he is right on the mark – there is a very old priciple that we seem to be forgetting in all of this. I post below from Wikipedia, but the point is ‘because we can, is not a reason to do something’, in other words having the power to compute and analyse at the right price is not necessarily a reason to try to ‘boil the ocean’
Occam’s razor (or Ockham’s razor[1]), entia non sunt multiplicanda praeter necessitatem, is the principle that “entities must not be multiplied beyond necessity” and the conclusion thereof, that the simplest explanation or strategy tends to be the best one. The principle is attributed to 14th-century English logician, theologian and Franciscan friar, William of Ockham. Occam’s razor may be alternatively phrased as pluralitas non est ponenda sine necessitate (“plurality should not be posited without necessity”).[2]
Occam’s razor states that the explanation of any phenomenon should make as few assumptions as possible, eliminating those that make no difference in the observable predictions of the explanatory hypothesis or theory. The principle is often expressed in Latin as the lex parsimoniae (translating to the law of parsimony, law of economy or law of succinctness). When competing hypotheses are equal in other respects, the principle recommends selection of the hypothesis that introduces the fewest assumptions and postulates the fewest entities while still sufficiently answering the question. It is in this sense that Occam’s razor is usually understood. To quote Isaac Newton, “We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances. Therefore, to the same natural effects we must, so far as possible, assign the same causes.”[3]
James Kobielus @ Forrester has posted his top trends for BI in 2010, which is interesting to view in the light of this post.
http://blogs.forrester.com/business_process/2009/12/advanced-analytics-predictions-for-2010.html
Thanks Peter definately something that would be good to see. unfortunately the url isnt working just now, can you check?
Thanks for mentioning Kapow Technologies. Great minds think alike! I just posted a blog post last week about how critical it is to get the right data in real-time, versus settling for inferior data just because it already exists or is easy to get to. We are proud of the products we’ve built as we’ve been able to help many customers with the exact problems you discuss in your blog.
and here is the url for Stefan’s blog that also offers a useful white paper at the end. http://kapowtech.com/blog/ thanks for the updated link Stefan