Subscribe
Recent Posts
- 21st Century Business Architecture?
- Enterprise SOA without Enterprise Architecture – hunting for treasure without a map?: Guest blogger - Jonathan Ebsworth
- It's okay Cyber Storm II has passed
- The Index of Convergence - Chris Yapp
- Are eBay – changing their game?
- What is the definition of Middleware these days?
- Extract, Transform and Load – now available for MashUps
- My Laptop, Your Laptop
- Exit Yahoo – an early end to technology innovation?
- Blogging is maturing and that means changing
Navigate
Search the blog
« My Laptop, Your Laptop | Main | What is the definition of Middleware these days? »
Extract, Transform and Load – now available for MashUps
As more and more enterprises move to adopt MashUps the question of what content, and from where is it coming, is being asked more and more often. I guess it's part of the move we are all taking part in from using structured data, created by our own computers, to an increasing use of unstructured data, which could be described as created, and used by, people with the resulting ‘inconsistencies’.
Users are increasingly recognising the value of a MashUp to combine this mass of unstructured content found on the Web into a focussed view that suits their requirements. Maybe we should coin the term ‘structured presentation’? A well built MashUp is truly a satisfying experience, producing the same feelings in the user as I think the spreadsheet must have produced when they first experienced it. Freedom to do what I require the way I need it to be done.
However there are a couple of concerns;
The first is that the MashUps can only handle content that is already available in a web formatted presentation format, so if what you want to MashUp is not in this format then it's not available for you to use. That’s a little more serious then it sounds. It’s basically saying that the huge amounts of data / information from valuable legacy enterprise applications are simply unusable, and as this is going to cover a lot of very valuable internal data this is definitely not a good thing.
The second point is the provenance of the data being used; there is a very, very, wide range of choices out there on the web, but who knows if that data is reliable? Back to a comment that I have quoted before; ‘too much of what we find and use today is given its provenance by the search engine’.
Extract Transform and Load, ETL, is pretty old technology used through the shifts from Data Warehousing, DW, into Master Data Management, MDM, but right now I am seeing some real interesting products that bring ETL bang up to date for use in the Web 2.0 era. Essentially these products allow a wide range of data types to be processed into Web formats ready for use by a MashUp. Extremely useful for the ability to build and populate a particular MashUp, but consider a different approach. One aligned to the freedom that Web 2.0 delivers to people to do what they want, the way they want to do it.
Why not liberate a lot of enterprise data by using ETL to create an enterprise ‘pool’ of data in web presentation format, then design MashUps for different user groups that draw on this pool. This way you will achieve massive flexibility to provide each user or group of users with what they want, but it will also be safely based on data that has a known provenance. If you need to, then good quality external data can also be selected and added to the pool. That’s a safe way to get a combination of enterprise control plus the benefits of MashUps in place without the risks or limitations that are otherwise present.
It also allows the auditors to be faced, and answered, with the truthful reply that you really do have control of how data enters the enterprise and is used. A not insignificant point when you realise that any date brought into the enterprise for use in a MashUp is deemed as the responsibility of your enterprise. Makes provenance management pretty key! There are several vendors for this but here are two interesting ones; http://www.topshareware.com/Apatar-Data-Mashup-Integration-download-51419.htm and http://www.kapowtech.com/index.html.
TrackBacks
TrackBack URL for this entry: http://www.capgemini.com/cgi-bin/blog/mt-tb.cgi/426

Comments
# on April 21, 2008 11:37 AM, Mark Kerr said:
IBM's recently announced MashupHub, released as part of the Info 2.0 initiative, also addresses this exact area.
http://www-306.ibm.com/software/data/info20/
# on April 21, 2008 2:33 PM, Mark Kerr said:
Just to add a little mode depth to my previous comment.
I agree that for for Web2.0 and mashups to deliver the full value they promise for the Enterprise, then there are a couple of significant steps forward required.
Firstly there is a big 'gulf' to be bridged between the world of Enterprise data, typically locked up in rather complex operational databases and business data warehouses, and the typical mashup creator looking for a 'feed'. Bridging this gulf requires adaptors to traditional enterprise sources like SQL data bases and ERP systems, as well as some nifty capabilities to filter, merege and reformat the data to make it consumable by the mashup creator. Secondly CIOs will, rightly, be concerned about traditional issues like security and auditability and will be looking for software that opens up enterprise data to mashups to provide 'enterprise class' capabilities in these areas.
It is therefore interesting to see 'blue chip' software vendors like IBM with traditional strengths in Information Management for the enterprise start to address the world of mashups.
# on April 21, 2008 5:40 PM, Guillaume Leleu said:
Hopefully the next generation of Semantic Web contents should help to solve that issue by allowing "generic" parser/framework/search engine to aggregate and consolidate (unstructured) content to enable push-like scenario rather than pull one (with transformation) via Mashups. Most of the structured content (issued by business critical applications like Oracle & SAP suites, etc) is already available in a "tagged" way (meaning relevant for a dedicated user query which is context-specific) via a service layer.
The next question could be...what will be the tomorrow's meta-protocol to allow intra, extra and internet content search through the existing search engines in place? After the cross-applications era, will it be the cross and meta-search engines one?
Regards
Guillaume
# on April 21, 2008 5:44 PM, andy mulholland said:
two interesting blogs; and both in different ways draw attention to the changing world of what we want to use as 'data'. i tend to think of it as less a report and more an event, and thats what occupies my thoughts on how this will develop.
# on April 22, 2008 9:09 PM, Mike Pittaro said:
Andy, good post.
You're exactly right about where things are going, and we've been working on it at SnapLogic since 2006.
Provenance is important, you can check out my post on this from last Movember at http://blog.snaplogic.org/?p=113
Metadata is critical to the success of this style of ETL in the new 'pull' model of application development.
# on April 23, 2008 5:15 AM, Anonymous said:
hi mike
thanks for the link to the blog at snaplogic its an excellent post on the topic and also links back to an earlier post of mine on this topic of splitting master data and mashup type data - see http://www.capgemini.com/ctoblog/2008/03/master_data_has_it_become_a_ba.php
# on April 23, 2008 9:21 AM, Nigel Green said:
Andy,
You've just described a product called VI SixD ( www.viagents.com ). SixD is a SaaS platform that integrates any Content with any Event and presents it through 'Value Network' filters to create personalised data 'mash-ups' of trusted data sources. VI Agents' probably wouldn't describe it that way, but that's what it does (I guess I should know I designed it!). In my mind, it represents a new breed of IaaS platforms (I=Integration) or PaaS (P=Platform) a la Force.com (actually SixD led the way – it was first available as a service in 2004 but few understood what it was!). Interestingly, they focused on content and event integration first and web formatting last (after my time) but are now apparently enabling front-end mash-ups.
Nigel
# on April 23, 2008 9:56 AM, andy mulholland said:
thanks Nigel a usefull addition, and i also got a colleague who pointed out that the use of ETL is also key in developing new decision support BI based solutions as well. indeed the whole topic of mashups and new wave BI - meaning decision support for events - is linked up.
# on May 8, 2008 12:34 AM, Jesper Rønn-Jensen said:
Hey Andy
I recently did an experiment with internal data on 14,000 capgemini employees. Via an ETL framework for Ruby called ActiveWarehouse-ETL, I built my own employee database for showing relations between people. (although, I wouldn't call it a mashup, as I had only one data source...)
The ETL paradigm is really fun to play with as you can morph and twist your data to fit your needs.
PS. Impressive to see your activity here at the blog. Still very interesting articles :)
# on May 8, 2008 2:16 PM, andy mulholland said:
HI Jesper and good to hear from you again.
there is a definate wow and proof factor in your experiment but you also raise an interesting point.
I also think we are coming along ways from the original MashUp concept on the 'open internet' as a personal tool towards it being an enterprise technique to create standardised formats for holding and displaying data. Therefore it may not be a MashUp to the first definition but in an enterprise sense it qualifies and has big value.
great example thanks!