CTO Blog

CTO Blog

Opinions expressed on this blog reflect the writer’s views and not the position of the Capgemini Group

When is a database not a database?

Category :

Okay, it’s a silly question. But after years of Relational Databases, RDMS (and frankly with database skills and operation being one of the most core and central planks of the IT department) we suddenly have a rash of announcements. This can only be summarized by stating that the databases in the new announcements are firstly, very different from traditional RDMS, and secondly, show decided differences in how data should be handled and stored. In those not-so-distant days, the competition was all speeds and feeds, with Oracle and IBM each trying to go one better. Useful comparisons  could be made even just a year ago.

We are Working in New Database Territory & SAP HANA

This blog got kicked off by a couple of things, starting with the announcement from SAP that SAP HANA is a total winner for them and that they are keen to press on and become a fully competitive player in the database market. I read in a handy report from The Register: “SAP also wants to displace "legacy databases" underneath its vast ERP suite, and has rejiggered its budgets and come up with another $337m dedicated to what it calls the SAP HANA Adoption Program”. The same report also included the figures from an SAP benchmark test, ‘at Volkswagen's Shanghai facility’. “A NetWeaver Business Warehouse query that took 20 minutes on a disk-based setup (using an unnamed database) took 45 seconds running atop HANA using memory and flash”. Wonderful! Incredible!

Fortunately, The Register is not one to be taken in by these kinds of statements, and followed with the crucial statement, “The exact specs of the configurations were not divulged, and of course they matter a lot” – and that is the big punch line. We are working in new territory using new techniques, and there are no established benchmarks except one: unstructured data does not work on structured Relational Databases, and if you try, the results are awful and easily exceeded by using a new type of approach with one of the ‘new’ types of database!

Unstructured Data Reveals New Intelligence & HP Vertica

In exactly the same style, HP sent me the stunning testing results of their Vertica Database against a competitor. The difference was so extreme that, just as with the SAP test, you had to ask if this was a like for like! And then we can add several other Hadoop-based or other approaches and get another set of amazing results... But exactly what are we testing for? Let’s go back to Vertica and try to understand what it really is. And believe me, it’s tough to grasp as it turns several notions on their heads. HP currently describes it on their website as, “The HP Vertica Analytics System provides revolutionary real-time analytics - purpose built for tomorrow’s demands today. Simple to use, it delivers the fastest time-to-value immediately to business users, DBAs, and programmers”. Simple enough, but isn’t that the same as what SAP would claim for HANA and how it’s used?

Oddly enough, Wikipedia seems to have a better write up on Vertica in terms of what it is and does than the HP site, and it provides an equally good explanation of SAP HANA too.  More importantly, it’s possible to arrive at some understanding of the differences in the approach and use of both from this. The sea change in technology – that the web, clouds, and services bring and is referred to as big data – is still moving in terms of what and how solutions will be built. But at its heart lies data – that data is not transaction oriented and structured. No, it’s radically different in being both unstructured and accessed in apparently random manners for ‘insights’ to provide new intelligence.

Back to Basics to Understanding Data & Amazon Web Services

We learnt the hard way at the beginning of the PC Network disruptive change that data, data models, and data management were crucial. Now is the time to start studying data and its use all over again and that means considering what requirements you are delivering and how to use and store data. I don’t think we can take much for granted on databases from the traditional world in going forward. Instead I think we are all going to have to go back to basics and create a new 101 understanding of what we need and select products accordingly.

Sorry did I say products? Maybe I should have said services as there was a 3rd announcement that made me sit up sharply and wonder about this. Rather casually, Amazon Web Services let slip that the enormous numbers of objects their Storage Cloud was handling. And BusinessCloud9 included a nice graph that projected by Q2 2012 more than one trillion objects would be stored by Amazon. So perhaps that’s the way forward and we won’t have to worry about choosing and sizing a database and storage at all!


About the author

Andy Mulholland
Andy Mulholland
Capgemini Global Chief Technology Officer until his retirement in 2012, Andy was a member of the Capgemini Group management board and advised on all aspects of technology-driven market changes, together with being a member of the Policy Board for the British Computer Society. Andy is the author of many white papers, and the co-author three books that have charted the current changes in technology and its use by business starting in 2006 with ‘Mashup Corporations’ detailing how enterprises could make use of Web 2.0 to develop new go to market propositions. This was followed in May 2008 by Mesh Collaboration focussing on the impact of Web 2.0 on the enterprise front office and its working techniques, then in 2010 “Enterprise Cloud Computing: A Strategy Guide for Business and Technology leaders” co-authored with well-known academic Peter Fingar and one of the leading authorities on business process, John Pyke. The book describes the wider business implications of Cloud Computing with the promise of on-demand business innovation. It looks at how businesses trade differently on the web using mash-ups but also the challenges in managing more frequent change through social tools, and what happens when cloud comes into play in fully fledged operations. Andy was voted one of the top 25 most influential CTOs in the world in 2009 by InfoWorld and is grateful to readers of Computing Weekly who voted the Capgemini CTOblog the best Blog for Business Managers and CIOs each year for the last three years.

Leave a comment

Your email address will not be published. Required fields are marked *.