Skip to Content

Computer Vision: where it all started and where it is going

19 Aug 2021

Computer Vision is a technology we are using every day, but how was this technology developed? Join us in a journey through its history.

Computer Vision: where it all started and where it is going

Even if you are not a day-to-day follower of Artificial Intelligence news, I bet you hear about Computer Vision quite often. Apple Face ID, credit card identification, Instagram facemasks – they are all examples of Computer Vision.

Figure 1. Src:

We are the Computer Vision Guild in Insights and Data (I&D) Capgemini UK, and we are inviting you to join us in a journey through a series of blogs covering the origin of Computer Vision, what are the typical problems it is solving and what this technology holds in store in the future.

The foundations

One may be surprised how old Computer Vision is – the very first related research and studies began in 1950s. Inspired by neurophysiological research on how human vision works, computer scientists started their attempts to digitalise this process. In short human vision works in sequence:  the eye identifies the edges first, these edges construct a more complex shape, this shape is being interpreted into a specific object.

The problem here is that this whole process happens in a fraction of a millisecond, and this is something human knows from birth. One cannot attach a camera to a computer and expect it act like a human. So, a lot of research groups started working in this field, one of which was Russel Kirshch’s group – they developed the first ever digital image scanner (Figure 2).

Figure 2. First ever computer scanned image

It was a true breakthrough for the field, and later the same group developed various algorithms for counting objects, edge detection and more.

The next major milestone in Computer Vision history was the invention of a neocognitron – a multi-layered artificial neural network – by Kunihiko Fukushima. The basis which he developed with his invention will become what we know today as a Convolutional Neural Network (CNN) – the main building block of all modern Computer Vision Deep Learning models. We will be covering this topic in more detail in one of our later blogs, but in short, the idea of a CNN is to process the image through a neural network which specifically designed to identify core features on the images (e.g., shapes) and based on this. Features either classify what is there on the image or where the object you are searching for is located.

Why it all exploded recently?

In the past 10 years we saw a tremendous rise in the field of Computer Vision. There are three core reasons for that:

  • Cheap componential power;
  • New advanced algorithms;
  • Availability of a large amount of data.

With the rise of clouding computing and a lot of providers emerging on the market, as well as advances in hardware making Deep Learning relatively easy and not expensive activity.

So now if you are a Deep Learner developer or researcher you don’t need to spend thousands of pounds to setup a powerful GPU cluster or wait for your model to train for days, thus giving you the ability to iterate more on your models.

The second driver is the development of new cutting-edge algorithms in Computer Vision. Once Computer Vision became more applicable to the real-world problems more and more computer engineers and researchers, as well as big corporations started to invest their time and money in inventing new approaches to tackle Computer Vision problems. This never-ending race encourages people from all over the world to work their best to make tiny and huge steps to improve state of the art models even more, e.g., papers with code website alone has more than 10 000 whitepapers in major Computer Vision fields.

The last, and probably the most important, driver is data. A lot of data. And by a lot, I mean millions and millions of images surfing through the Internet, available to anyone to pick and work with them. The biggest and finest database of labelled images is ImageNet. Fei-Fei Li was the one standing behind the idea of gathering and labelling millions of images to boost the Computer Vision field started working on this idea back in 2007. She was then joined by Christiane Fellbaum, one of the creators of WordNet. They introduce the ImageNet database to the public in 2009 and a year later an ImageNet Large Scale Visual Recognition challenge was held for the first time. This is the challenge which will give the world such models as AlexNet, Inception, ResNet – they are still used today in various applications around the world.

Where are we going now?

Computer Vision has been hugely successful in recent years and it’s prominence will only continue to accelerate. This technology is being integrated into our day-to-day lives without us even noticing, just remember that funny dog filter that you shared with friends. But it is far beyond recreation and fun. A lot of businesses use this technology to enhance their production lines by constantly tracking the assembly line, people’s security and healthcare is drastically improving by replacing human inspections of dangerous industry assets with drone inspections. Self-driving cars, augmented reality – they all benefit from this technology, promising us even better and high-tech future. We at Capgemini utilise Computer Vision to tackle our clients’ business challenges. This ranges from enhancing industrial assets monitoring to automated inspections of infrastructure and building sites.

Closing remarks

We hope you enjoyed this introduction to Computer Vision field and history and now feel the scale of this magical technology. In the upcoming blogs we will dive deeper into common problems Computer Vision is solving, discuss some core technological ideas and technics and cover latest achievements and development in this field. If you want to learn more, or perhaps start a discussion, please reach me via the email (