SPARK Summit East

  • February 16, 2016 to February 18, 2016
  • New York Hilton Midtown, 1335 Avenue of the Americas, New York, New York 10019, USA

Data Science and Engineering at Scale


Spark Summit, the largest big data event dedicated to Apache Spark is back in NYC in 2016. Join us from Tuesday, February 16 through Thursday, February 18 2016 at the New York Hilton Midtown. Hear from leading production users of Spark, Spark SQL, Spark Streaming and related projects; find out where the project development is going; learn how to use the Spark stack in a variety of applications; and hear from businesses that are utilizing Spark to meet their needs.

Relationship Extraction from Unstructured Text Based on Stanford NLP with SPARK

Date/time: Wednesday, February 17, 3.00 – 3.30 PM
Location: Conference Room- Gramercy
Yana Ponomarova, Capgemini Insights & Data Global Practice
Nicolas Claudon, Big Data Architect, Capgemini Insights & Data Global Practice

Description: About 80% of the information created and used by an enterprise is unstructured data located in content. This figure is growing at twice the rate of structured data. Therefore, mastering and using the knowledge scattered around the abundance of the unstructured documents in an organization can bring about a lot of value.

In the context of our client, a global Oil & Gas company, the valuable information was scattered within large volumes of the engineering reports. Those reports have been written by engineers, in a free and unconstrained format, often times by non-native English speakers, and focusing on the technical characteristics of Oil & Gas operations.

The primary challenge for the client was to extract the supply chain relationships (supplier, receiver, object of delivery and transport) from those reports in order to evaluate the interdependency between its sites around the Globe and better manage the operational risks.