Saturday, September 8, 2012


Book Review: Programming Computer Vision with Python

by Jan Erik Solem
published by O'Reilly, 2012


Computer vision is a fascinating subject and in recent years it has gone from being an academic pursuit to a practical everyday technology. Anyone who does Google searches or uses a smartphone likely makes use of computer vision algorithms on a regular basis, perhaps without even knowing it.  Computer vision plays key roles in a broad range of fields from law enforcement, manufacturing, biology, and medicine to social media and gaming.

I was excited to read Jan Erik Solem's new book, Programming Computer Vision with Python, because it combines a practical survey of many of these mature technologies with the clarity and ease of use of my favorite programming language, Python.

I have a background in computer vision and I wanted to learn more about topics like multi-view geometry methods, so, for my purposes, Solem's book was a dream come true. The first five chapters lead you through a series of important mathematical and software tools which make multi-view 3D reconstructions a natural and practical application. I was doing it myself by the end of chapter five. The example code is clear and from the author's website and via github.
I had a great time reading the book and going through the programming exercises. I can recommend the book strongly. I just wish I could figure out to whom to recommend it!

The practical step by step approach that Solem uses allowed me to dig into the math behind the algorithms while being able to play with working code. This is a great way to learn. So I can imagine that for a college or graduate student, in the right sort of course, the book would be invaluable. It requires a little linear algebra, geometry and familiarity with vector spaces. I can imagine another audience for this book: the smart, ambitious programmer who wants to use computer vision as part of his or her cool-new-product. The book also covers classifying and searching images using various approaches, along with image segmentation techniques and an introduction to the OpenCV library for speed in a realtime object tracking application.. It even discusses building web applications which make use of these techniques.

The book is definitely not a stand-alone textbook; it leaves out a the sort of wider perspective that a course or textbook would provide. This isn't a criticism per se, but I think the book would have benefited from short asides of the sort used in some books which highlight the context and the motivations behind the algorithms.

For example, one of the fundamental problems in computer vision is the correspondence problem: how do you know if one point on a object in an image corresponds to the same point in another image taken from a different position in space or time. Years of work and hundreds or possibly thousands of doctoral theses have devoted themselves to different ways of attempting to solve this problem. It is one of the fundamental (so-called) ill-posed inverse problems of vision. (It's called “ill-posed” because there is not sufficient information in the images to generate a unique solution.)  In chapter 2, Solem introduces a series of local feature detectors, culminating with the Scale-Invariant Feature Transform (SIFT). The presentation is so matter-of-fact that you wouldn't know that SIFT and similar algorithms are a major advance forward in solving this fundamental problem.

So far, I haven't mentioned the other strength of the book, which is Python itself and its surrounding scientific programming ecology. For years, Python been one of the clearest and most re-usable of interactive programming languages. With the evolution of tools and libraries such scipy, IPython, sage, and the scikits, we have entered a golden age for doing numerical work in Python.

It was a pleasure to be able to read through the book with an IPython notebook open, so that I could interact with the code as I read. The notebook format left me with beautiful, publication quality graphs and images and typeset mathematics based upon the book’s examples, and recorded my work on exercises and my own experiments. I wouldn't be surprised if, in the future, similar books include IPython notebooks as part of their teaching materials.

What is left out of the Solem's discussion is some of the other major packages for computer vision and machine learning. There’s scikit-learn, scikit-image, Luis Coelho's Mahotas, and the UC Davis Cell Profiler library as well as many other libraries from research groups which develop primarily in other languages but which have Python bindings. But computer vision is a big subject, and any finite sized book needs some focus. The bibliographic references are more than enough if one wants to learn more. Certainly, whether you are a student, an ambitious app developer, or a "recreational" computer scientist like myself, Solem's book will be a useful and fun addition to your bookshelf.

Chris Lee-Messer
September 7, 2012


Technical information



The book is published by O'Reilly and has O'Reilly's distinctive look-and-feel with bullhead catfish as the animal on the front cover. I read the ebook version as a PDF without difficulty on computer screen. The style is easy to read, focused, but not overly formal. The quality of the editing was good and the code examples worked. Setup of the software and how to obtain the data sets used in the examples is covered in the appendices and is easy for an experienced Python user with an Internet connection. I'm not sure how easy it would be for someone completely new to Python, numpy, and scipy. Those with access to a well supported Linux distribution have it easy as all the packages are installed with a single click. Packaged distributions like Enthought Python (Windows, Mac, and Linux) and Pythonxy (Windows) get you most of the way.

1. Basic Image Handing and Processing
   - practicalities of using Python to manipulate images
2. Local Image Descriptors
   - Harris corner detector, Scale-Invariant Feature Transform, matching geotagged Images
   
3. Image to Image Mappings
  - Homographies, Warping Images, automated stitching of images to create panoramas

4. Camera Models and Augmented Reality
  - Pin-hole camera model, camera calibration, pose estimation, augmented reality

5. Multiple View Geometry
  - Epipolar geometry, computing with cameras and 3D structure, multiple view reconstruction, stereo images

6. Clustering Images
  - K-Means Clustering, Hierarchical Clustering, Spectral Clustering

7. Searching Images
  - Content-Based Image Retrieval, Visual Words, Indexing Images, Searching the Database for Images, Ranking Results Using Geometry, Building Demos and Web Applications

8. Classifying Image Content
  - K-Nearest Neighbors, Bayes Classifier, Support Vector Machines, Optical Character Recognition

9. Image Segmentation
  - Graph Cuts, Segmentation Using Clustering, Variational Methods

10. OpenCV
   - The OpenCV Python Interface, OpenCV Basics, Processing Video, Tracking

 

No comments: