Computer Vision Resource Collection

A curated collection of resources from Reshma Abraham from our Education team

Introduction

Computer vision is a sub-field of artificial intelligence that focuses on enabling computers to understand and interpret visual information from the real world in the form of images and videos and make decisions or predictions based on that understanding. In other words, computer vision is all about teaching computers to 'see' and understand the world, much like how humans do. The goal of computer vision is to replicate human vision using machine learning and other techniques. To do this, it uses methods from mathematics, physics, computer science, and more.

Let's say you're looking at a photo of a park. As a human, you can easily identify things like trees, people, the sky, maybe a bench or a walking path. You can also understand more complex things - you can tell if it's day or night, you can guess the weather, you can identify if the people in the park adults or children are, you can tell if they're playing a game or just sitting, and so on. But for a computer, this photo is just a bunch of numbers - each pixel in the photo has a number that represents its colour, and that's all a computer sees initially. Computer vision is the field that tries to teach computers to understand these numbers the way we understand the park scene.

Objectives

The field of computer vision has many applications, including autonomous vehicles, image editing, health diagnostics, facial recognition, and many others. It's an active area of research and is continually advancing to tackle more complex and diverse visual understanding tasks.

Demystify Core Concepts: The primary objective of this computer vision collection is to simplify and elucidate core concepts and principles, helping practitioners to gain a fundamental understanding of computer vision technology, its theoretical foundations, and various sub-fields.
Develop Practical Skills: The resources aim to equip AI practitioners with the necessary practical skills to implement and leverage computer vision technology. This includes hands-on coding exercises, detailed walk-throughs of popular computer vision algorithms, and training on specific tools and libraries like OpenCV, TensorFlow, and PyTorch.
Encourage Innovation: The collection of resources seeks to inspire AI practitioners to innovate and push the boundaries of what's possible with computer vision. This includes resources on cutting-edge research, case studies of innovative uses of computer vision, and challenges to encourage experimentation.

Who is this for?

Thisjou collection of resources on computer vision is designed to cater to a broad spectrum of individuals with diverse needs and interests. Here's a look at who might find this information especially beneficial:

Beginners in AI and Machine Learning: If you're new to the field of Artificial Intelligence (AI) and Machine Learning (ML) and are keen to understand how machines can be trained to 'see' and interpret images and videos, these resources will provide an excellent foundation.
Software Developers and Engineers: Professionals who are interested in expanding their skill set to include computer vision or are tasked with developing applications that involve image or video processing will find the information valuable.
AI Enthusiasts and Hobbyists: If you're intrigued by the capabilities of AI and want to understand more about how machines interpret visual data, these resources will satisfy your curiosity and may inspire you to delve deeper.
Companies and Industry Professionals: For organisations and professionals working in industries where computer vision has applications - such as autonomous vehicles, healthcare, security, and retail - these resources can help in understanding the potential benefits and challenges of implementing computer vision technologies.

In essence, anyone with an interest in understanding and applying computer vision, whether for personal knowledge, academic pursuits, or professional growth, will find this collection invaluable.

Resources

(please note that each resource contains a link external to the Diverse AI website that will open in a new window)

General Certified Courses

Introduction to Computer Vision and Image Processing - Course by Coursera
Deep Learning for Computer Vision with Python and TensorFlow - Video tutorial by FreeCodeCamp.org
Python for Computer Vision with OpenCV and Deep Learning - Course by Udemy
Deep Learning and Computer Vision A-Z™: OpenCV, SSD & GANs - Course by Udemy

Image Classification

This is the task of categorising an image into one of many different categories. For example, identifying whether a photo is of a dog, cat, or car.

Papers and articles and tutorials

Image Classification using Deep Neural Networks: A beginner-friendly Approach using TensorFlow - Medium Publication
Image Classification Explained - v7Labs Tutorial
Top 4 Pre-Trained Models for Image Classification with Python Code - Analytics Vidhya
A Complete Guide to Image Classification in 2023i - Viso.ai Tutorial
Everything about Mask R-CNN: A Beginner’s Guide - Viso.ai Tutorial
TensorFlow Image Classification: Building Classifiers for Fashion MNIST and CIFAR-10 - Edureka Tutorial
Papers with Code: Image Classification

Courses

Basic Image Classification with TensorFlow - Course by Coursera
Deep Learning: Image Classification with Tensorflow in 2023 - Course by Udemy

Activities and exercises

Object Detection

This task involves identifying the presence, location, and type of one or more objects in an image.

Papers and articles and tutorials

The Ultimate Guide to Object Detection. - v7 labs Tutorial
Object detection with deep learning and OpenCV - Pyimagesearch Tutorial
YOLO Object Detection Explained - Datacamp Tutorial
Turning any CNN image classifier into an object detector with Keras, TensorFlow and OpenCV - Pyimagesearch tutorial
Object Detection Tutorial in TensorFlow: Real-Time Object Detection - Edureka Tutorial
The Complete Guide to Object Tracking - v7labs Tutorial
Object Detection and Tracking using MediaPipe - Google Developer's blog
Papers with Code: Object Detection
Paper list from 2014 to now(2019)

Courses

Train YOLO for Object Detection with Custom Data - Course by Udemy
YOLOv8: Object Detection, Tracking & Web App in Python 2023 - Course by Udemy
Object Detection Using Facebook's Detectron2 - Course by Coursera

Object Detection 101 Course - Including 4xProjects - Video Tutorials by CV Zone

Activities and exercises

Image Segmentation

Segmentation is the task of partitioning an image into multiple segments or "superpixels". Semantic segmentation assigns a class to each pixel in the image, while instance segmentation identifies each distinct object of each class.

Papers and articles and tutorials

A Step-by-Step Introduction to Image Segmentation Techniques - Analytics Vidhya Tutorial
Image Segmentation: The Basics and 5 Key Techniques - Datagen Tutorial
An Introduction to Image Segmentation: Deep Learning vs Traditional [+Examples] - v7labs Tutorial
Demystifying UNet and Learning Image Segmentation - Analytics Vidhya Tutorial
Guide to Image Segmentation in Computer Vision: Best Practices -Encord Tutorial
Image Segmentation with Deep Learning (Guide) - Viso.ai Tutorial

Papers with Code: Image Segmentation

Activities and exercises

Facial Recognition

A task involving the identification or verification of a person's identity based on their face.

Papers and articles and tutorials

OpenCV Face Recognition - Pyimagesearch Tutorial
Face Recognition with Python and OpenCV - Great Learning Tutorial
Real-Time Face Recognition: An End-To-End Project -Towards Data Science Publication
Face Detection with Python using OpenCV - DataCamp Tutorial
Face Recognition with Eigenfaces – Computer Vision Tutorial -Zenva Tutorial
Face Recognition with Siamese Networks, Keras, and TensorFlow - Pyimagesearch Tutorial
Face Recognition Using Principal Component Analysis - MachineLearningMastery Tutorial
What is facial recognition? -AWS Documentation
Using Deep Learning to Design Real-time Face Detection and Recognition Systems - Turing blog

Courses

Computer Vision: Face Recognition Quick Starter in Python - Course by Udemy
Deep Learning: Face Recognition - Linkedin Learning

Activities and exercises

Action Recognition

This task involves identifying a human action (like running, jumping, etc.) in an image or video.

Papers and articles and tutorials

Deep Learning Architectures for Action Recognition - A Towards Data Science Publication
A gentle introduction to human activity recognition - inData Labs Tutorial
Human Activity Recognition with OpenCV and Deep Learning - Pyimagesearch Tutorial
Human Activity Recognition (HAR): Fundamentals, Models, Datasets -v7 labs Tutorial
Pose landmark detection guide for Python - Google Developer’s Blog
Human Action Recognition using Detectron2 and LSTM - LearnOpenCV Tutorial

Papers with Code: Action recognition in videos

Activities and exercises

Optical Character Recognition (OCR)

The task involves converting images of typed, handwritten, or printed text into machine-encoded text. In other words, understanding the Text in Images.

Papers and articles and tutorials

Optical Character Recognition: What is It and How Does it Work [Guide] - V7 labs Tutorial
Optical Character Recognition (OCR) – The 2023 Guide - Viso.ai Tutorial
What is Optical Character Recognition? - Appen blog
Build your own Optical Character Recognition (OCR) System using Google’s Tesseract and OpenCV - Analytics Vidhya Tutorial
Pyimagesearch OCR Guides and Tutorials - Pyimagesearch Tutorial
Deep Learning-Based OCR for Text in the Wild - Nanonets Tutorial

Activities and exercises

Image Generation: Generative Adversarial Networks

This task focuses on creating new images that were not part of the original dataset but still closely resemble some given real data. Applications include image synthesis, super-resolution, and style transfer.

Papers and articles and tutorials

Intro to Generative Adversarial Networks (GANs) - Pyimagesearch Tutorial
Image Translation with Pix2Pix - Pyimagesearch Tutorial
A Beginner's Guide to Generative AI - Pathmind Blog
Super-Resolution Generative Adversarial Networks (SRGAN) - Pyimagesearch Tutorial
Training a DCGAN in PyTorch - PyTorch Tutorial

Activities and exercises

Back to Education page