Social Media has revolutionised the way we interact and that has also changed the way we do business. Social media has enabled people to listen to, create, share, comment and exchange information in virtual communities and networks with speed. Social media has allowed access to anything and anyone from anywhere. This revolution has offered differentiation to the customers by offering an end-to-end holistic customer experience in influencing the buying decisions. Effective use of social media has enabled businesses to improve Net Promoter Score, brand impressions and sustainability of the customer relationships, resulting in higher customer satisfaction.
(1) Understanding Your Customers, Gathering Data & Analyzing It
In order for a business to reach out to the customers, it is important to first understand the customer. Some of the key questions to understand customer include 4Ps and 1C:
- Person – Who are my customers?
- Place – Where are those customers located?
- Product – What are their likes and dislikes?
- Preferences – Why do they have specific preferences?
- Channel – How to reach them?
Businesses gather enormous data in order to answer the above questions. Once one has access to such data, analysis of volumes of data is required to understand and enrich customer experience.
(2) Testing the Data
Testing for these volumes of data requires Analytics applications to gather, analyse, interpret, and presenting content from multiple channels including Web and Mobile.
Testing analytics applications requires exploration of Social Media, Mobility, Analytics, and Cloud (SMAC) world. It involves gathering customer data from social and mobile channels, leveraging the data, storing these in cloud platforms. The 3 key characteristics of data include Volumes, Variety and The main challenge to address in testing analytics application involves for validating 6 V’s.
Characteristics of data to be tested and testing to be done include
- Data Volumes (Test for Semantics, Distributed processing, and scalability)
- Data Variety (Test for visualization, schemas, and data federation)
- Data Velocity (Test real-time, on the fly integration and on-demand storage)
In addition to bearing in mind the volumes, variety and velocity of the data to be tested, the Testing should be carried out for
- Test for Validity of data (apply rules and remove invalid data)
- Test for Variability in data (Inconsistency)
- Test for Veracity of data (Quality/Accuracy)
Data gathering is usually done using Data mining techniques and text mining. Once data is gathered we employ machine learning techniques to present the data. The data is then tested for above 6 Vs.
In each of the above step – gathering, presentation, testing; there is scope for automation, analytics (descriptive, predictive and prescriptive). The complexity increases further with increased channels utilised across web and mobile.
(3) Testing the BI/BA Applications
Need for quicker analytics, use of social media, mobility applications has led to use of Agile Business Intelligence technologies such as Big-data and Hadoop where need for faster change adoption is driving use of agile methods.
Analytics solutions should be tested for common testing techniques such as Security Testing, Performance Testing, Usability Testing, and custom techniques such as failover testing.
- Security Testing – to focus on authorization and authentication of users, availability of data.
- Performance Testing – To focus on accuracy of data, and performance under high load
- Usability Testing – To check if the application is providing right information, e.g. providing a single view of customer from multiple data sources.
- Failover – To ensure data is available during critical failures, if application reaches a pre-defined threshold.
(4) How is Testing done differently for Big-Data / Hadoop Applications?
How has testing changed with evolution of Big-Data (as compared to testing Enterprise DataWarehouse)?
One of the angle to it is the increasing use of Big-Data over Cloud, resulting in increased convergence of “Analytics” and “Cloud”.
I would like to offer a viewpoint based on three criteria – “Data”, “Platform” and “Infrastructure”. and “validation tools”.
- Software & Data – Bigdata applications work with unstructured/semi-structure data (Dynamic Schema) and compared to static schema with which EDW applications function. Hence, while EDW applications can do with testing based on “Sampling”, the Big-data applications require testing the “population”. Testing the data for Volume, Variety and velocity here means testing for semantics, visualization, and real-time availability of data, respectively.
- Platform – Since Bigdata applications are hosted on cloud (Platform as a Service), the applications need to be tested for ability of “Distributed Processing” and to ensure “Integration on the fly” without availability of formal data schemas in EDW world.
- Infrastructure – Bigdata applications do not have limitation about linear growth of data as the data can be stored in multiple clusters through Hadoop Distributed File systems (HDFS), a reliable shared storage system that can be analysed using MapReduce technology. There is exponential increase in number of requirements to be tested hence test suites need to be based on reuse and optimization, else face maintenance disaster. In case of EDW, storage is based on file systems and linear growth of data.
- Validation Tools – In Bigdata world there are not yet many defined tools. Use of programming tools such as MapReduce that supports coding in Perl, Java, Ruby and Python, and wrappers like HIVE QL built on MapReduce is common. In EDW world, the validation tools are based on SQL, use of Excel Macros and UI.
So opportunity to do independent testing in Big data world involves validating
- Whether requirements are mapped to right data sources and if any data sources are missed out
- Whether structured and non-Structured data are stored in right places, without duplication and if there are any data synchronization needs
- Whether test data is created with correct schema and whether the same can be replicated more easily using tools?
- Whether the system is behaving as expected when cluster is added or removed.
What has changed for Analytics Testing in SMAC World?
To address the testing for Bigdata, one needs to understand the system and pre-populate data in the system. Testers need to create installations to be able to carry out real-time tests.
There are not many testing tools around BigData – the testers are learning basic components of BigData, to be able to use BigData Design and Development tools (same as developers) to be able to test BigData. Tools such as querysurge can help test up to 100% data quickly.
Experience in testing EDW applications has helped shorten learning curve for testers testing BigData as they apply the knowledge of extract, transform and load from DWH to the Hadoop HDFS.