Many people have drawn Venn diagrams to locate machine learning and related ideas in the intellectual landscape. Drew Conway’s diagram may have been the first. It has at least been frequently referenced.
By this classification, Hector Cuesta’s new book Practical Data Analysis is located toward the “hacking skills” corner of the diagram. No single book can cover everything, and this one emphasizes practical software knowledge more than mathematical theory or details of a particular problem domain.
The biggest strength of the book may be that it brings together in one place information on tools that are used together but whose documentation is scattered. The book is great source for sample code. The source code is available on GitHub, though it’s more understandable in the context of the book.
Much of the book uses Python and related modules and tools including:
- NumPy
- mlpy
- PIL
- twython
- Pandas
- NLTK
- IPython
- Wakari
It also uses D3.js (with JSON, CSS, HTML, …), MongoDB (with MapReduce, Mongo Shell, PyMongo, …), and miscellaneous other tools and APIs.
There’s a lot of material here in 360 pages, making it a useful reference.
“The biggest strength of the book may be that it brings together in one place information on tools that are used together but whose documentation is scattered.”
Have you had a chance to read “Agile Data Science” or “Building Machine Learning Systems with Python” (both available from O’reilly) ?
If so, how does “Practical Data Anaysis” compare to them ?
Would you recommend it to someone who has already read the first two ?