Our client is leading insurance company in Japan and creating big data hub using latest Hadoop framework. Data from various sources will be ingested into Data hub, it will be cleaned, transformed and used for analysis. For integrate different source systems like Mainframe, Oracle DB we are going to use Informatica PCDQPWX & MDM tools.
- Requirement analysis
- Data Ingestion using Sqoop, automated python scripts
- Data transformation using Spark and HiveQL
- Troubleshooting and Optimization of complex queries
- Help to Informatica developers on Hadoop issues
- Basic Hadoop administration and job monitoring
- In depth understanding of distributed environment
- Working experience on Hadoop framework including HDFS, Hive, HBase, MapReduce, Pig, Oozie, Tez.
- Experience on RDBMS and SQL
- Experience working with large data sets, experience working with distributed computing (MapReduce, Hadoop, Hive, Pig, Apache Spark, etc.).
- Experience on data Ingestion using Sqoop, Understanding of CDC technologies and Apache NiFi
- Experience in Unix/Linux shell scripting, python and Java
- Data transformation using spark, streaming
- Experience in ETL
- Good communication skills