Skip to Content

Do data scientists prefer R or Python?

Capgemini
September 15, 2020

The observations I have are a bit different from what we generally see across many public forums. They are based on my interactions with some of the data scientists I have known or worked with. There is no clear winner between R and Python. The winner is the business requirement that is being addressed; and in most cases, that business requirement should guide the selection of one or the other of these languages.

R and Python both have their specific characteristics; they both are good at doing a few things and not that good at addressing other things. We expect to be able to effectively create deep statistics-driven models with R, given it’s sophisticated statistical capabilities, while we expect application implementation to be easier and more effective with Python.

Let’s look at how R and Python compare across the characteristics data scientists look for:

  1. Statistical support – R scores above Python, with extensive statistical packages being supported compared to Python
  2. Ease of implementation – Python is considered to be easy to learn and implement whereas R is considered to have a steep learning curve. Python’s readability is purportedly far easier than R. Python’s native object support property is a huge differentiator in its favor.
  3. Speed – R, being a low level language, gets overshadowed by Python, which is a high-level language and thus delivers faster without consuming too many memory resources
  4. Data analytics – R effectively supports large data sets with huge packages available, ensuring easier implementation. While Python continues to improve, with new packages being added regularly, R appears to be much readier.
  5. Deep learning – Python scores over R with seamless integration available for TensorFlow, Keras and others. R continues to see its capability expanding, with new packages being added. However, it still has some distance to cover.
  6. Visualization – One of the main reasons for R’s popularity is its visualization capabilities. R has advanced graphical capabilities that come through packages effectively whereas Python’s visualization can be complex work and not so tidy.
  7. Community support – Python continues to see a bigger and stronger community with fewer transitions to the R community.

Given the above comparisons, R scores over Python in term of statistical support, data analytics for large data sets, and visualization, lending itself better suitability for high statistics usage use cases. Python on the other hand scores above R in ease of implementation, speed, deep learning support, and community support, which makes it better for application implementations. There are possible use cases where both R and Python can be required. One such case can be where we are looking to do some experiments using a statistical model over R before working on implementation with Python.

Interestingly, in the last two years we have seen the R and Python communities  join forces to deliver a more comprehensive platform for data science. RStudio is one such initiative by the R community to integrate Python along with a few other initiatives on both ends. The combined force will definitely do better for the data science community however, R and Python both continue making a significant difference to data science with the advancements they are going through standalone.

The data scientist community needs to continue supporting both these open-source languages and help these grow further to support more use cases. Considering we continue finding new use cases for data science implementations consistently, we will undoubtedly need all the capabilities both these languages can offer individually and collectively now and going ahead.

For more info, you can reach out to the author, Sumit Kumar, Senior Director at Capgemini North America.