Now my summer holidays are done, I want to publish the list of vendors or technologies active in the Big Data space. I have organized them following the five Big Data steps. You’ll find a full description of each Big Data step in my No SQL, No Hadoop post.

To summarize all vendors and technologies, I propose an infographic with lot of logos and a quick explanation below.

  • In the Data Acquisition stream, we have technological providers like (in alphabetical order) Ab Initio, HP, IBM (Datastage, Streams, Data mirror), Informatica (PowerCenter, PowerExchange, CEP), Kalido, Microsoft, Numenta, Oracle, SAP, SAS, Splunk, Syncsort, Talend and Tibco and data providers like ComScore, Datasift, Experian, Factual, GfK, Gnip, IMS, Inrix, Kaggle, Knoema, LexisNexis, Microsoft (with their Windows Azure Marketplace data market), Nielsen, Reuters, Salesforce Radian6, Symphony IRI, social network websites like Facebook, Google+, LinkedIn, Tumblr, Twitter or Viadeo and of course all the Open Data providers, like governments, regions, etc.
  • In the Marshalling domain, we have Very Large Data Warehousing and BI Appliances, actors like Actian, EMC² (Greenplum), HP (Vertica), IBM (Netezza), Kognitio, Microsoft (SQL 2012 and PDW), Oracle (Exadata), Paraccel, SAP (HANA and Sybase IQ), SAS and Teradata. In the NoSQL domain, main technologies and vendors are Amazon (as cloud provider or with their own NoSQL solution), Cassandra, Cloudera (CDH, Hadoop distribution), CouchDB, EMC², Google, Hadoop (of course), Google, Hortonworks (Hadoop distribution), HP, IBM, KX, MapR (Hadoop distribution), Marklogic, Microsoft (Hadoop on Windows and Azure), MongoDB, Neo4J, Oracle, Palantir, Snaplogic, Sparsity, Splunk, Teradata (Aster Data) and ZL Technologies. In the content management space, we mainly have Adobe, Alfresco, EMC² (Documentum), IBM (FileNet), HP (Autonomy), Microsoft, OpenText and Oracle.
  • In the Analytics phase, we have the predictive technologies (such as data mining) and vendors which are Adobe, EMC², GoodData, Hadoop Map Reduce, HP, IBM (SPSS), Karmasphere, Kxen, Microsoft, Mzinga, Oracle, R, Salesforce, SAS, SAP (R on HANA) and Teradata (Aprimo). Data Virtualization (and data federation) is currently led by Composite, Denodo, HP (IDOL), IBM, Informatica, Microsoft, Oracle (Exalytics), SAP and Teiid (JBoss community).
  • Both in Analytical and Action phases, BI tools vendors are Actuate, Dassault Systèmes (Exalead), Domo, Esri, GoodData, Google, HP (Autonomy), IBM (Cognos suite), Information Builders, LogiXML, Microsoft (SQL 2012), Microstrategy, NeutrinoBI, Oracle (OBI Foundation), Panopticon, Panorama, Pentaho, Qlikview, Roambi, SAP (BI4 suite), SAS, SpagoBI, Tableau and Tibco.
  • In the Action phase, we have all the Data Acquisition providers plus the ERP, CRM and BPM actors, including Adobe, Eloqua, EMC², IBM, iGrafx, Microsoft, OpenText, Oracle, Pega, Progress software, SAP, Salesforce, Software AG, Teradata (Aprimo) and Tibco.
  • In the Data Governance area, Master Data Management (MDM), metadata and data quality tools are owned by Adaptive, HP, IBM, Informatica, Kalido, Microsoft, Oracle, Orchestra Networks, SAP, SAS, Talend and Tibco.


Note that the Complex Event Processing (CEP) Tools are part of Acquisition (streaming data acquisition), Marshalling (eg in-memory storage as data is used or compared immediately) and Analytics (eg Monitoring functions to detect abnormal activity) streams.

Note that the BI Tools are part of Analytics (Computing Key Performance Indicators) and Action (eg Creating Alerts in a push mode by mail for instance) streams.

Of course, this is not an exhaustive list. I will (try to) update it regularly and I’ll take into account your comments of course.