Wouldn’t it be a dream to assemble Big Data, IoT and Analytics in the cloud using different software vendors? The outside world of Internet of Things, the internal “line of business” data, social media tweets and analysing it all using your tool of choice. The world of complexities is getting bigger and bigger by the day, yesterday it was Cloud and Big Data, today it is IoT and predictive algorithms (data science). So many interfaces to take care of …
SQL became a standard of the American National Standards Institute (ANSI) back in 1986, and of the International Organization for Standardisation (ISO) one year later. So what will it take to develop an analytical visualisations “query” standard?
IBM Watson has made a start with its cloud API to access its services. There are a lot of examples on what analytical services a cloud provider could offer, to name a few: Relationship Extraction, Visual Recognition or Tradeoff Analytics. Here is an example of Visual Recognition of a picture I uploaded myself (the demo pictures are too easy):
It’s funny that the dolphin has a probability to be a bird at 59%, but Watson would have chosen the right answer of Dolphin with 76%.
but they are all proprietary and could easily change API definition at any point in time, drop services or simply start charging on a per-call basis. There are benefits to choosing an all-in-one provider, but I have something different in mind.
To me these services are all just bricks and we need some mortar to bind them and unify them to a structure. The more complete answer is a standard API framework led by a large company to interrogate analytical services, likeMulesoft or Apigee provide as a platform for you to build. They have already implemented with many customers and can push further simplifications like standards.
Instead of being locked in with one solution provider, would it not be more competitive if you could swap out the underlying solution which doesn’t meet your requirements any more and switch to a market leader or a cheaper cloud provider? All the existing work, the services above the API layer, would NOT need to be redeveloped because all departments are accessing the same API functionality and benefits from an open platform.
No silos, no single point of access to data and no worrying about the infrastructure, you can focus on enhancing your services.
It still leaves the issue of the underlying webservices which are provided to answer the API calls made. This brings us back to standards, and it’s not like PMML (Predictive Model Markup Language) standard files to exchange predictive models produced by data mining and machine learning algorithms. It’s about API call standards published in WSDL (Web Services Description Language), where you are able to call a chart on the last month sales in California for product X and receive a result in HTML5 regardless of the underlying software used, an abstraction layer basically.
Considering that SAP Business Objects still holds 21% of the Analytics market share while the new kids, like Tableau are growing faster (last I have seen nearly $400mil revenue), but don’t own quite yet enough market share, if we just take the TOP5 players, then we reach 70% of all analytics share within the market.
If, like in this example: MS and SAP work together to establish web services which are exchangeable between vendors, this would simplify the architecture approach for customers a lot. And not only could you swap vendors out, but you could combine them as well, without the consumer knowing or having to worry where the service provider actually sits. Also we could then start building context aware analytics right into your business processes, what many businesses like Workday, SAP, Salesforce are doing already but with the API approach you can swap them out!
More on the Business Objects web services software development kit (SDK) here.
The Big Data Web API interface may actually be a lot easier to establish by Apache, as a lot of Big Data software providers publish their own flavour of Hadoop, there is a real chance here that we will get a common Web API not just for the data part, but also for Analytics and the management of the cluster nodes. Cloudera has shared their Resource Manager Web API on github where you can control a lot of the administration aspects of Hadoop components YARN, ZooKeeper, HBase, Hive, Oozie, Hue, Flume, Impala … even Spark.
Internet of Things
With IoT still being in the early stages, I believe it can provide an interface for communication, even on small wearable devices. From an analytics point of view it would make more sense to collect all the data first in a Hadoop cluster and then analyse it from there. Here is a blog post I have written about this topic, running real-time Internet of Things.
Google’s approach using “Weave & Brillo“, Weave for connectivity and Brillo as OS sounds like a good start in my opinion, but what the market will go for is to be seen.
In conclusion, the advantages of an Web API based platform are:
- No disruption to services built on top of analytics or big data
- Interchangeable analytics providers
- No cloud or on-premise distinction
- Choice of most cost effective analytics or big data provider
I may start a .ORG website to standardise the analytics web API world! It may have already started here!