Data Analysis with R A comprehensive guide to manipulating, analyzing, and visualizing data in R
Anthony Fischetti
- Second Edition
- UK Packt> 2018
- 553
1. RefresheR RefresheR Navigating the basics Getting help in R Vectors Functions Matrices Loading data into R Working with packages Exercises Summary 2. The Shape of Data The Shape of Data Univariate data Frequency distributions Central tendency Spread Populations, samples, and estimation Probability distributions Visualization methods Exercises Summary 3. Describing Relationships Describing Relationships Multivariate data Relationships between a categorical and continuous variable Relationships between two categorical variables The relationship between two continuous variables Visualization methods Exercises Summary 4. Probability Probability Basic probability A tale of two interpretations Sampling from distributions The normal distribution Exercises Summary 5. Using Data To Reason About The World Using Data To Reason About The World Estimating means The sampling distribution Interval estimation Smaller samples Exercises Summary 6. Testing Hypotheses Testing Hypotheses The null hypothesis significance testing framework Testing the mean of one sample Testing two means Testing more than two means Testing independence of proportions What if my assumptions are unfounded? Exercises Summary 7. Bayesian Methods Bayesian Methods The big idea behind Bayesian analysis Choosing a prior Who cares about coin flips Enter MCMC – stage left Using JAGS and runjags Fitting distributions the Bayesian way The Bayesian independent samples t-test Exercises Summary 8. The Bootstrap The Bootstrap What's... uhhh... the deal with the bootstrap? Performing the bootstrap in R (more elegantly) Confidence intervals A one-sample test of means Bootstrapping statistics other than the mean Busting bootstrap myths Exercises Summary 9. Predicting Continuous Variables Predicting Continuous Variables Linear models Simple linear regression Simple linear regression with a binary predictor Multiple regression Regression with a non-binary predictor Kitchen sink regression The bias-variance trade-off Linear regression diagnostics Advanced topics Exercises Summary 10. Predicting Categorical Variables Predicting Categorical Variables k-Nearest neighbors Logistic regression Decision trees Random forests Choosing a classifier Exercises Summary 11. Predicting Changes with Time Predicting Changes with Time What is a time series? What is forecasting? Creating and plotting time series Components of time series Time series decomposition White noise Autocorrelation Smoothing ETS and the state space model Interventions for improvement What we didn't cover Citations for the climate change data Exercises Summary 12. Sources of Data Sources of Data Relational databases Using JSON XML Other data formats Online repositories Exercises Summary 13. Dealing with Missing Data Dealing with Missing Data Analysis with missing data Visualizing missing data Types of missing data Unsophisticated methods for dealing with missing data So how does mice come up with the imputed values? Exercises Summary 14. Dealing with Messy Data Dealing with Messy Data Checking unsanitized data Regular expressions Other tools for messy data Exercises Summary 15. Dealing with Large Data Dealing with Large Data Wait to optimize Using a bigger and faster machine Be smart about your code Using optimized packages Using another R implementation Using parallelization Using Rcpp Being smarter about your code Exercises Summary 16. Working with Popular R Packages Working with Popular R Packages The data.table package Using dplyr and tidyr to manipulate data Functional programming as a main tidyverse principle Reshaping data with tidyr Exercises Summary 17. Reproducibility and Best Practices
Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples. Packed with engaging problems and exercises, this book begins with a review of R and its syntax with packages like Rcpp, ggplot2, and dplyr. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with messy data, large data, communicating results, and facilitating reproducibility. This book is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst.