10 Data Science Books You Should Read This Year
Data Science and Artificial Intelligence aren’t going anywhere, so we’ve compiled a list of 10 books that are worth a read regardless of your background or skill level.
High-Level Data Science and Machine Learning
These first few books are great if you have no experience with Data Science or Machine Learning. Perhaps you’re a business leader or manager who wants to apply DS concepts, get a high level view of the data science process, or learn about the business applications of DS and ML.
The Art of Data Science provides a fantastic overview of the data analysis workflow. The authors articulate how data analysis is fundamentally an iterative process (an “epicycle”) where information is learned and then incorporated at every step. This book describes, simply and in general terms, the process of analyzing data in a way that is accessible for anyone who is curious about data science or data analysis. The authors have extensive experience both managing data analysts and conducting their own analyses, and have carefully observed what produces coherent results and what fails to produce useful insights into data.
Predictive Analytics is another comprehensive, yet accessible resource for anyone who wants to learn how predictive analytics work. This book goes through many real-life applications, spanning domains as varied as finance, crime and terrorism, and politics. What you will walk away with is a solid understanding of why predictive analytics has become such a big deal in recent years, and why it will continue to grow as our collective reliance on data continues to grow.
Math and Statistics
These math and statistics books are geared to give you a less intimidating introduction to many of the key concepts required in data science and machine learning. They’re also rather entertaining, not like the math books you’re used to reading in school!
If you slept through Stats 101, this book is a lifesaver. This book focuses on the underlying intuition that drives statistical analysis, stripping away the technical details. You will walk away with an understanding of key concepts such as inference, correlation, and regression analysis, as well as insight into how biased or careless people can manipulate or misrepresent data.
This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not. Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R programming language, and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.
You’ll walk away with an understanding of how and why to conduct exploratory data analysis, sampling methods, foundations of experimental design, as well as the basics of supervised and unsupervised learning.
Data Visualization and Storytelling
These books will show you that there is a proper way to do data visualization, and give you the skills needed to design charts and dashboards to get the right insights out.
Storytelling with Data teaches you the fundamentals of data visualization and how to communicate effectively with data. You'll discover the power of storytelling and the way to make data a pivotal point in your story. The lessons in this illuminative text are grounded in theory, but made accessible through numerous real-world examples—ready for immediate application to your next graph or presentation.
Storytelling is not an inherent skill, especially when it comes to data visualization, and the tools at our disposal don't make it any easier. This book demonstrates how to go beyond conventional tools to reach the root of your data, and how to use your data to create an engaging, informative, compelling story.
Effective visualization is the best way to communicate information from the increasingly large and complex datasets in the natural and social sciences. But with the increasing power of visualization software today, scientists, engineers, and business analysts often have to navigate a bewildering array of visualization choices and options.
This practical book takes you through many commonly encountered visualization problems, and it provides guidelines on how to turn large datasets into clear and compelling figures. What visualization type is best for the story you want to tell? How do you make informative figures that are visually pleasing? Fundamentals of Data Visualization teaches you the elements most critical to successful data visualization.
7. The Visual Display of Quantitative Information
The classic book on statistical graphics, charts, tables by Edward Tufte - a true must read! Theory and practice in the design of data graphics, 250 illustrations of the best (and a few of the worst) statistical graphics, with detailed analysis of how to display data for precise, effective, quick analysis. Design of the high-resolution displays, small multiples. Editing and improving graphics. The data-ink ratio. Time-series, relational graphics, data maps, multivariate designs. Detection of graphical deception: design variation vs. data variation. Sources of deception. Aesthetics and data graphical displays.
Machine Learning
If you are ready to get your feet wet, these books will give you an in-depth exposition of machine learning concepts with practical application and hands-on examples.
8. Introduction to Machine Learning with Python: A Guide for Data Scientists
Machine learning has become an integral part of many commercial applications and research projects, but this field is not exclusive to large companies with extensive research teams. If you use Python, even as a beginner, this book will teach you practical ways to build your own machine learning solutions. With all the data available today, machine learning applications are limited only by your imagination.
You’ll learn the steps necessary to create a successful machine-learning application with Python and the scikit-learn library. Authors Andreas Müller and Sarah Guido focus on the practical aspects of using machine learning algorithms, rather than the math behind them. Familiarity with the NumPy and matplotlib libraries will help you get even more from this book.
This book is an excellent resource that can get you up to speed with the basics of the most widely used machine learning algorithms, including techniques on how to process data, advanced methods for model evaluation and parameter tuning, and principles on creating your modeling workflow. It is beginner-friendly with no assumption that the reader has a heavy programming background. Not to mention, the accompanying GitHub repository is undeniably useful for learning.
This is a compact “how to do data science” manual perfect as a go-to handbook for managers or software developers looking to integrate ML pipelines into their projects. Honestly unbeatable as a short handbook.
10. The Elements of Statistical Learning: Data Mining, Inference, and Prediction
One of the more academic looking books on this list, however undeniably valuable. Sufficiently technical and can serve as a good lasting reference that you definitely should keep on your shelf. This book describes the important ideas in a variety of fields such as medicine, biology, finance, and marketing in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of colour graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, graphical models, random forests, ensemble methods, least angle regression, classification trees and boosting---the first comprehensive treatment of this topic in any book.
Honorable mentions:
Everybody Lies — Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are by Seth Stephens-Davidowitz, Timothy Andres Pabon, et al.
Big Data — A Revolution That Will Transform How We Live, Work, and Think by Victor Mayer-Schonberger
How Chart’s Lie — Getting Smarter About Visual Information by Alberto Cairo
Calling Bullshit — The Art of Skepticism in a Data-Driven World by Carl T. Bergstrom and Jevin D. West
Stories That Stick — How Storytelling Can Captivate Customers, Influence Audiences, and Transform Your Business by Kindra Hall
The Book of Why — The New Science of Cause and Effect by Judea Pearl
Deep Learning by Ian Goodfellow, Yoshua Bengio, et al.
Deep Learning Illustrated — A Visual, Interactive Guide to Artificial Intelligence
Foundations of Deep Reinforcement Learning — Theory and Practice in Python
Are there other books you think that should be on this list? Leave a comment below with your favorites!