Data Scientist – 6 to 9 years – Pune
Proven track record in data analytics either in a commercial or research setting.
Expert balance of statistical knowledge, machine learning and coding skills.
Good communication and data storytelling skills.
Willingness to learn, ability to think skeptically about problems and results, curious to explore new techniques and domains.
Not afraid of working with high volume and/or untidy data.
Expert knowledge of R and/or Python essential and the ability to learn new techniques is key.
Know-how in Hadoop ecosystem is key.
Ability to work independently in a quickly evolving environment.
Analytics / mathematics (understanding the math behind) Modelling (mathematics)
Statistics (test hypotheses, data distribution, etc)
Clustering (kmeans, hierarchical, etc)
Classification (logistic regression, svm, decision tree, random forest, neural network)
Regression (linear regression, decision tree, random forest, neural network)
Classical optimisation (gradient descent, newton rapshon, etc)
Graph theory (network analytics)
Heuristic optimisation (genetic algorithm, swarm theory)
Deep leaning (lstm, convolutional nn, recurrent nn)
Agent based modelling
Vizualisation Qlick view, Tableau, …
Matplotlib, seaborn, etc
Languages Python (pandas, scikit, sklearn)
Spark (sql, ml, graphX, streaming)
Tools/IDE Notebooks (jupyter notebook, zeppeling, databricks)
Data management Extracting data (web scraping)
Data cleaning (imputation, missing values detection)
Data exploration (corelation, outliers detection, trends, etc)
Text mining (TFID, n-grams, lemming, stemming, NLP)
Data loading (jdbc connection, connection to database, FTP connection)
Table profiler automation
Table comparer automation
Engineering Code packaging
Environment configuration (versioning, packages installation)