Computer Vision: Let’s find out why it matters?
Using software to parse the world’s visual content is as big of a revolution in computing as we still remember the mobile phones that we used to have 10 years ago, and this technology will provide a major edge for developers and businesses to build amazing products and services that will reduce our workload and provide an immense growth to AI sector.
Computer Vision is the process of using machines to understand and analyze both photos and videos. While these types of algorithms have been around in various forms since the 1960s, recent advances in Machine Learning, as well as leaps forward in data storage, computing capabilities, and finding cheap high-quality input devices are used to collect the data, have driven major improvements in how well our software can explore this kind of content in real life.
Computer vision as its name suggests it is a combination of computer and its ability to understand objects, things, views, etc. in such a way humans use to understand things with the combination of brain and eyes that are the components responsible for your sense of sight.
Doing actions such as recognizing an animal, describing a view, differentiating among visible objects are really as easy as a snap for humans. But you’d be surprised to know that it took decades of research to discover and impart the ability to detect an object to a computer with reasonable accuracy.
Computer vision has been around for more than 50 years, but recently, we see a major resurgence of interest in how machines ‘see’ and how computer vision can be used to build products for consumers and businesses. Few examples of such applications are such as Amazon Go, Google Lens, Autonomous Vehicles, Face Recognition.
Business use cases for computer vision
Computer vision is one of the areas in Machine Learning and Artificial Intelligence where core concepts are already being integrated into major products that we use every day. Google is using maps and leverages this image data and identifies street names, businesses, and office buildings. Facebook is also using computer vision to identify people in photos and do a number of things with that information that they gather.
But it’s not just technology-based companies that are relying on Machine Learning and Business Intelligence for image-based applications. Ford, it’s an American car manufacturing company that has been around literally since the early 1900s, is investing heavily in autonomous vehicles (AVs). Much of the technology relies on analyzing the multiple videos feeds coming into the car and using computer vision to analyze and pick a path of action on the bases of that report.
Some of the other major areas where computer vision can help us in the medical field. Much of diagnosis is image processing, like reading x-rays, MRI scans, and other types of diagnostics that are in the form of images. Google has been working with medical research teams to explore how deep learning can help medical workflows, and have made significant progress in terms of accuracy. To paraphrase from their research page:
“Collaborating closely with doctors and international healthcare systems, we developed a state-of-the-art computer vision system for reading retinal fundus images for diabetic retinopathy and determined our algorithm’s performance is on par with U.S. board-certified ophthalmologists. We’ve recently published some of our research in the Journal of the American Medical Association and summarized the highlights in a blog post.”
Tasks in Computer Vision
There has been progressing in the field, especially in recent years with commodity systems for optical character recognition and face detection in the cameras and smartphones.
“When it comes to Computer vision is always considered as an extraordinary point in its development. Only in recent studies, it has been possible to build useful computer systems using ideas from computer vision.”Computer Vision: A Modern Approach.
The computer vision titled “Computer Vision: Algorithms and Applications” provides a list of some high-level problems where we have seen success with computer vision.
- Optical character recognition (OCR)
- Machine inspection
- 3D model building
- Medical imaging
- Automotive safety
- Match move
- Motion capture
- Fingerprint recognition and biometrics
It is a broad area of study with many specialized tasks and techniques, as well as specializations to target application domains.
It may be helpful to zoom in on some of the simpler computer vision tasks that you are likely to encounter or be interested in solving given the vast number of publicly available digital photographs and videos available.
Popular computer vision applications trying to recognize things in photographs here are some examples:
- Object Classification: Identifying the broad category of objects is in this photograph?
- Object Identification: What’s the type of a given object in this photograph?
- Object Verification: Is the object in the photograph?
- Object Detection: Where are the objects available in the photograph?
- Object Landmark Detection: What are the key points that are focused on the photograph?
- Object Segmentation: What pixels belong to the object in the image?
- Object Recognition: What are the objects in this photograph and where are they?
Other examples are related to information retrieval; for example: finding images like an image or images that consist of objects.
Smartphones: The QR codes, photography-based applications (Android Lens Blur, iPhone Portrait Mode), panorama based construction (Google Photo Spheres), face and object detection, expression detection application (smile), Snapchat filters (face tracking), Google Lens, Night Sight (Pixel).
Web: Image search, Google photos (face recognition, object recognition, scene recognition, geolocalization from vision), Facebook (image captioning), Google maps aerial imaging (image stitching), YouTube (content categorization).
VR/AR: Outside-in tracking (HTC VIVE), inside out tracking (simultaneous localization and mapping, HoloLens), object occlusion (dense depth estimation).
Medical imaging: CAT / MRI reconstruction, assisted diagnosis, automatic pathology, connectomics, AI-guided surgery.
Media: VFX Visual based effects for film, TV (reconstruction), virtual sports play based on AI (reconstruction), semantics-based auto edits (reconstruction, recognition).
Insurance: Claims automation, Damage analysis, Property inspection.
For an exhaustive list of Computer Vision applications in the industry, see this page maintained by David Lowe, Senior Research Scientist at Google.