Machine learning and computer vision methods have recently received a lot of attention, in particular when it comes to data analytics. The success of deep neural networks that can help cars drive autonomously and make smartphones recognize speech and translate text attests to the value of using machine learning methods to tackle complex real-world problems. A further prominent example is the success of Google’s AlphaGo AI that defeated the world champion Lee Sedol in playing Go. This is remarkable in particular since Go has previously been considered as one of the most complex games due to the larger number of game states.
As the amount of data collected by cameras and scientific instruments continues to rise, automated analysis methods will become ever more important in the future. Reappearing workflows, such as the segmentation of structures, will likely be either fully automated or supported by computers.
In this post, we’d like to give you a short introduction to and overview of these methods and an idea of suitable application fields.
Machine Learning (ML)
Machine learning is a discipline at the intersection between computer science and math that aims to develop computational models that can model data patterns and relations in a way such that the model can generalize from training data to previously unseen test data and thus enable a computer to learn based on examples. In terms of mathematical aspects it draws from optimization and computational statistics, and in computer science it leverages efficient data structures (e.g. trees) and efficient implementation techniques to scale for large amounts of data. Spam filtering and speech recognition were among the first applications of ML.
ML models relationships between inputs and outputs in an approximate way such that once a model has been trained on available observations of the inputs and outputs, it can estimate (predict) outputs based on inputs alone. In state-of-the-art models, the training step determines anything from several hundreds to millions of model parameters – it is quite obvious that this vast amount of parameters cannot be set manually. Thus, machine learning approaches employ optimization algorithms that fit model parameters by leveraging an empirical data distribution, the so-called ‘training data’. This training process allows us to model interactions far beyond what has been possible with traditional modeling (see Figure 1 for a comparison), but requires the availability of sample data for training. ML enables a new paradigm of programming, so to speak. An ML program’s behavior is determined by “optimizing” it with data and the expected output, in addition to writing program code in a programming language such as C++.
Types of ML approaches fall into two primary categories: unsupervised and supervised. In unsupervised training, the training step can exploit the structure of training data but needs to do without annotations (also called ‘labels’) for the observed data. Examples of unsupervised learning methods include clustering approaches, component analysis, manifold learning and density estimation.
In supervised modeling the optimization algorithm not only requires the data, it also needs to know a target value to predict with the model. Consequently, the data needs to be annotated with labels that specify the task at hand. Example methods in supervised learning include classification, regression and ranking.
Until 2010 the predominant approach for machine learning applications was to extract application-specific features from data and apply the machine learning model on top of them. Since then, deep learning methods have become more and more successful. These methods perform end-to-end learning, meaning that even the previously hand-coded domain-specific features are automatically derived from data. Since learning the feature representation blows up the parameter space considerably, even more data is required to make deep learning methods work well. One important advantage of deep learning is that the algorithm remains unchanged for many applications and only the data and annotations change.
Bishop, Christopher; Pattern Recognition and Machine Learning; Springer-Verlag New York, 2006
Ian Goodfellow, Yoshua Bengio, Aaron Courville; Deep Learning; Book in preparation for MIT Press, 2016
Yann LeCun, Yoshua Bengio, Geoffrey Hinton; Deep Learning; Nature 521, 436–444, 2015
Computer Vision (CV)
Computer vision is concerned with the efficient automated perception and understanding of visual data. At its core computer vision aims to automatically understand scenes. Nonetheless the task’s difficulty is frequently underestimated as the task is effortlessly solved by human beings. Notably, the automated understanding of a scene has been given as a summer project to a group of students at MIT in 1966.
Originally, CV started as a research direction that strives to explain human vision by means of computational tools. Over the years it evolved more and more into a technical and engineering discipline. Since then, the holy grail in computer vision has been to assist or even replace human vision by a fully automatic understanding of scenes by a computer system. Technically, this is an ill-posed inverse problem with noisy observations, making the problem notoriously difficult in general. It is widely agreed that machine learning can play an important role in terms of finding a successful solution, but it is not likely to solve the entire problem on its own.
Sample tasks in computer vision are the detection and tracking of objects, the segmentation of structures and the registration of images:
- Here object detection means to automatically determine a known object or object class within an image in terms of exact location and object size. When identical objects appear throughout a video stream and its single detections are associated with each other this is typically termed (object) tracking. Successful CV applications that leverage object detection and tracking methods range from face detection in point-and-shoot cameras to pedestrian detection in automotive safety applications and the tracking of cells in light sheet microscopy.
- Segmentation involves grouping pixels within an image such that pixels of a group share a common semantic meaning. Applications for segmentations comprise, for example, the exact determination of the outline of a cell in microscopic images, the extraction of neurons from images of brain tissue and the interactive marking of object boundaries for image editing.
- Registration encompasses the alignment of two images of the same scene such that the content of each pixel is perfectly matched. Image stitching in smart phones from a number of single shots into a panoramic scene is one application, the alignment of array tomography slices is another.
Due to the tasks’ complexity, state-of-the-art methods frequently leverage machine learning methods such that computer vision has become one of the major domains in which machine learning methods have successfully been applied. While machine learning is an important tool for a number of computer vision tasks, computer vision is more than an applied ML discipline. It also involves other complex tasks such as 3D scene modeling, multi-view camera geometry and structure-from-motion, stereo correspondence, point cloud processing, motion estimation, etc.
Both fields – machine learning and computer vision – are rapidly evolving and it will be exciting to see their application in a vast array of fields, from robotics and web-scale applications to autonomous cars and microscopy.
David Marr: Vision. A Computational Investigation into the Human Representation and Processing of Visual Information; W. H. Freeman and Company, 1982
Richard Szeliski; Computer Vision: Algorithms and Applications; Springer-Verlag New York, 2015
Simon J. D. Prince; Computer Vision Models, Learning, and Inference; Cambridge University Press, 2012
Olaf Ronneberger, Philipp Fischer, Thomas Brox; U-Net: Convolutional Networks for Biomedical Image Segmentation; MICCAI, 2015
Piotr Dollar, Christian Wojek, Bernt Schiele, Pietro Perona; Pedestrian Detection: An Evaluation of the State of the Art; IEEE Transactions on Pattern Analysis and Machine Intelligence 34(4), 2012
Mathias, R. Benenson, M. Pedersoli, and L. Van Gool; Face Detection without Bells and Whistles. In Proceedings of ECCV, 2014
Vazquez-Reina, D. Huang, M. Gelbart, J. Lichtman, E. Miller, H. Pfister; Segmentation Fusion for Connectomics; ICCV, 2011
M. Schiegg*, P. Hanslovsky*, C. Haubold, U. Koethe, L. Hufnagel, F. A. Hamprecht; Graphical Model for Joint Segmentation and Tracking of Multiple Dividing Cells; Bioinformatics 31(6), 2015