Okay, it’s a silly question. But after years of Relational Databases, RDMS (and frankly with database skills and operation being one of the most core and central planks of the IT department) we suddenly have a rash of announcements. This can only be summarized by stating that the databases in the new announcements are firstly, very different from traditional RDMS, and secondly, show decided differences in how data should be handled and stored. In those not-so-distant days, the competition was all speeds and feeds, with Oracle and IBM each trying to go one better. Useful comparisons could be made even just a year ago.
We are Working in New Database Territory & SAP HANA
This blog got kicked off by a couple of things, starting with the announcement from SAP that SAP HANA is a total winner for them and that they are keen to press on and become a fully competitive player in the database market. I read in a handy report from The Register: “SAP also wants to displace “legacy databases” underneath its vast ERP suite, and has rejiggered its budgets and come up with another $337m dedicated to what it calls the SAP HANA Adoption Program”. The same report also included the figures from an SAP benchmark test, ‘at Volkswagen’s Shanghai facility’. “A NetWeaver Business Warehouse query that took 20 minutes on a disk-based setup (using an unnamed database) took 45 seconds running atop HANA using memory and flash”. Wonderful! Incredible!
Fortunately, The Register is not one to be taken in by these kinds of statements, and followed with the crucial statement, “The exact specs of the configurations were not divulged, and of course they matter a lot” – and that is the big punch line. We are working in new territory using new techniques, and there are no established benchmarks except one: unstructured data does not work on structured Relational Databases, and if you try, the results are awful and easily exceeded by using a new type of approach with one of the ‘new’ types of database!
Unstructured Data Reveals New Intelligence & HP Vertica
In exactly the same style, HP sent me the stunning testing results of their Vertica Database against a competitor. The difference was so extreme that, just as with the SAP test, you had to ask if this was a like for like! And then we can add several other Hadoop-based or other approaches and get another set of amazing results… But exactly what are we testing for? Let’s go back to Vertica and try to understand what it really is. And believe me, it’s tough to grasp as it turns several notions on their heads. HP currently describes it on their website as, “The HP Vertica Analytics System provides revolutionary real-time analytics – purpose built for tomorrow’s demands today. Simple to use, it delivers the fastest time-to-value immediately to business users, DBAs, and programmers”. Simple enough, but isn’t that the same as what SAP would claim for HANA and how it’s used?
Oddly enough, Wikipedia seems to have a better write up on Vertica in terms of what it is and does than the HP site, and it provides an equally good explanation of SAP HANA too. More importantly, it’s possible to arrive at some understanding of the differences in the approach and use of both from this. The sea change in technology – that the web, clouds, and services bring and is referred to as big data – is still moving in terms of what and how solutions will be built. But at its heart lies data – that data is not transaction oriented and structured. No, it’s radically different in being both unstructured and accessed in apparently random manners for ‘insights’ to provide new intelligence.
Back to Basics to Understanding Data & Amazon Web Services
We learnt the hard way at the beginning of the PC Network disruptive change that data, data models, and data management were crucial. Now is the time to start studying data and its use all over again and that means considering what requirements you are delivering and how to use and store data. I don’t think we can take much for granted on databases from the traditional world in going forward. Instead I think we are all going to have to go back to basics and create a new 101 understanding of what we need and select products accordingly.
Sorry did I say products? Maybe I should have said services as there was a 3rd announcement that made me sit up sharply and wonder about this. Rather casually, Amazon Web Services let slip that the enormous numbers of objects their Storage Cloud was handling. And BusinessCloud9 included a nice graph that projected by Q2 2012 more than one trillion objects would be stored by Amazon. So perhaps that’s the way forward and we won’t have to worry about choosing and sizing a database and storage at all!