Do data scientists prefer R or Python?

Publish date:

The rapid emergence of data science has been fueled by both R and Python. However, competition between these two open-source languages has existed for quite some time, furthering the perception of “R camps” and “Python camps” with communities being split between the two.

The observations I have are a bit different from what we generally see across many public forums. They are based on my interactions with some of the data scientists I have known or worked with. There is no clear winner between R and Python. The winner is the business requirement that is being addressed; and in most cases, that business requirement should guide the selection of one or the other of these languages.

R and Python both have their specific characteristics; they both are good at doing a few things and not that good at addressing other things. We expect to be able to effectively create deep statistics-driven models with R, given it’s sophisticated statistical capabilities, while we expect application implementation to be easier and more effective with Python.

Let’s look at how R and Python compare across the characteristics data scientists look for:

  1. Statistical support – R scores above Python, with extensive statistical packages being supported compared to Python
  2. Ease of implementation – Python is considered to be easy to learn and implement whereas R is considered to have a steep learning curve. Python’s readability is purportedly far easier than R. Python’s native object support property is a huge differentiator in its favor.
  3. Speed – R, being a low level language, gets overshadowed by Python, which is a high-level language and thus delivers faster without consuming too many memory resources
  4. Data analytics – R effectively supports large data sets with huge packages available, ensuring easier implementation. While Python continues to improve, with new packages being added regularly, R appears to be much readier.
  5. Deep learning – Python scores over R with seamless integration available for TensorFlow, Keras and others. R continues to see its capability expanding, with new packages being added. However, it still has some distance to cover.
  6. Visualization – One of the main reasons for R’s popularity is its visualization capabilities. R has advanced graphical capabilities that come through packages effectively whereas Python’s visualization can be complex work and not so tidy.
  7. Community support – Python continues to see a bigger and stronger community with fewer transitions to the R community.

Given the above comparisons, R scores over Python in term of statistical support, data analytics for large data sets, and visualization, lending itself better suitability for high statistics usage use cases. Python on the other hand scores above R in ease of implementation, speed, deep learning support, and community support, which makes it better for application implementations. There are possible use cases where both R and Python can be required. One such case can be where we are looking to do some experiments using a statistical model over R before working on implementation with Python.

Interestingly, in the last two years we have seen the R and Python communities  join forces to deliver a more comprehensive platform for data science. RStudio is one such initiative by the R community to integrate Python along with a few other initiatives on both ends. The combined force will definitely do better for the data science community however, R and Python both continue making a significant difference to data science with the advancements they are going through standalone.

The data scientist community needs to continue supporting both these open-source languages and help these grow further to support more use cases. Considering we continue finding new use cases for data science implementations consistently, we will undoubtedly need all the capabilities both these languages can offer individually and collectively now and going ahead.

For more info, you can reach out to the author, Sumit Kumar, Senior Director at Capgemini North America.

Related Posts

AI

Beyond the AIOps hype: Part 2

Sindhu Bhaskaran
Date icon August 7, 2020

In this, the second in her blog series exploring AI for IT operations (AIOps), artificial...

AI

The Cowpath Chronicles 2: Is that a dog, a cat or a thunderstorm?

Ron Tolido
Date icon July 28, 2020

From observing the flight of birds to harnessing AI and Machine Learning, weather prediction...

Data Analytics

Monitor data proactively to increase model resilience

Chandrasekhar Balasubramanyam
Date icon July 22, 2020

A new approach can keep your predictive models a step ahead – even during a crisis such as a...