It’s been a number of months since my last blog on graph based Christmas puzzles – I hope you enjoyed it, as I certainly had fun writing it. The reason for my quietness of late isn’t due to a lack of things to say, but rather due to being busy leading and developing a new graph initiative at Capgemini. I’ve been leading a team of graph experts, data scientists and UX designers to create a new graph-based investigate tool which we’ve code-named Haystack.
What is Haystack?
There are many use cases for Haystack, but as one example, let’s suppose that you are a lead investigator in an insurance fraud team, and you are investigating a ring of individuals suspected of making fraudulent property insurance claims. By using Haystack, your investigation becomes much smarter, simpler and clearer, with you being able to identify links and relationships you previously didn’t know existed.
With Haystack we have already implemented a number of rich features and tools, including:
1. A geo mode to view elements of the graph on a map, giving a new perspective to data. In future we plan to add support for annotation layers, such as GeoJSON shapes, and adding the ability to overlay graph data on top of an image such as a blueprint or a floor plan.
2. Graph and machine learning algorithms from Neo4j’s Graph Data Science library. Currently Haystack includes the following classes of algorithms:
- Community detection algorithms which are used to detect distinct communities within a larger graph. In fraud analytics, such algorithms are useful for detecting anomalies in customer behaviour, to determine whether suspected criminal behaviour is isolated to one or two individuals or whether a group of people are acting in a collusive manner as a fraud ring.
- Centrality algorithms help to identify important nodes. In fraud and criminal analytics, it is critical to identify ring-leaders of organised crime as they exert significant influence over their network.
- Pathfinding and search algorithms which can calculate the shortest, fastest, or most optimal path between nodes. In fraud work, such applications include filtering transactions that have extremely short paths between people. It is also critical when following-the-money, ie knowing the ultimate beneficiaries and who was involved in the chain of transactions.
- Similarity algorithms which can detect how “close” two nodes in the graph are based on the neighbourhoods around them.
- Link prediction algorithms help uncover and find unobserved relationships, ie they predict how likely it is for there to be an edge between a given pair of nodes.In addition to graph algorithms that describe the topology of a network, we are including a number of other AI capabilities, such as to the ability to use NLP techniques to extract information (eg values, dates, entities) from unstructured documents such as invoices, emails, legal contracts, letters, reports and so.
3. Advanced filtering capabilities, such as the ability to remove or hide nodes or edges based on a specific property. For example, during a complex investigation you may want to eliminate (ie hide) all addresses where the “Country” property of a node equals “England”.
In addition to these graph capabilities, with Haystack we are developing a solution that can overcome organisational boundaries by allowing users to request external data from inside the application itself. We want to seamlessly combine local, remote and federated data sources, making the data available either through the virtualisation layer or ingesting it using on-premise or cloud ETL processes.
What are the use cases for Haystack?
There are many use cases to which Haystack could be applied, not just financial fraud detection. Our Haystack platform (and graphs in general) could be used for any monitoring or investigative use-case. In fact, it can be used for any use-case where relationships can be defined between the entities of interest. For example:
- Cold chain logistics for pharmaceuticals. Where pharmaceutical products are carried in a cold manner (such as some of the Covid vaccines) and no deviation of temperature or trajectory is allowed, with Haystack we can visualise and monitor the entities involved in the distribution network, ie the pharma companies, shippers, customs, airports, warehouses and clinics. By integrating real-time monitoring, it would be possible to take remedial steps, such as alerting the body that has custody of the goods that a drop in temperature has been detected, thereby avoiding waste and mitigating risks.
- Client and customer management. With Haystack we can create 360 degree views of clients and customers, easily identifying, for example, high lifetime value customers, detecting and preventing churn, and improving upsell and cross-sell opportunities.
- Analysis of suppliers and products. By developing a holistic view of suppliers, the parts and components they provide, as well as what you have in stock and where, Haystack can help improve ordering and procurement processes. In particular, the analytics engine can help create (and visualise) a long-term view of supply and demand in the market, thereby helping to make more intelligent buying and selling decisions.
Would you like to know more?
With Haystack, we are developing a single solution that empowers investigators, analysts and caseworkers to investigate and evaluate local and federated data for decisive action for almost any use-case, whether it is to detect fraud, or to monitor and manage complex supply chains, or legal and regulatory issues such as GDPR compliance.
If you have any questions, or if you are interested in knowing more about Haystack, please contact me by emailing firstname.lastname@example.org.
Calum Chalmers is a senior data scientist in the Insights & Data practice, with over 20 years’ analytical and machine learning experience. He first fell in love with graph theory when studying mathematics at Glasgow University and at the University of Warwick.