Data Analysis with R A comprehensive guide to manipulating, analyzing, and visualizing data in R Anthony Fischetti
Material type: TextPublication details: UK Packt> 2018Edition: Second EditionDescription: 553ISBN:- 9781788393720
- 001.422 FIS
Item type | Current library | Collection | Call number | Status | Date due | Barcode |
---|---|---|---|---|---|---|
Books | IIITDM Kurnool COMPUTER SCIENCE ENGINEERING | Non-fiction | 001.422 FIS (Browse shelf(Opens below)) | Available | 0005839 | |
Books | IIITDM Kurnool COMPUTER SCIENCE ENGINEERING | Non-fiction | 001.422 FIS (Browse shelf(Opens below)) | Available | 0005840 | |
Books | IIITDM Kurnool COMPUTER SCIENCE ENGINEERING | Non-fiction | 001.422 FIS (Browse shelf(Opens below)) | Available | 0005841 | |
Books | IIITDM Kurnool COMPUTER SCIENCE ENGINEERING | Non-fiction | 001.422 FIS (Browse shelf(Opens below)) | Available | 0005842 | |
Reference | IIITDM Kurnool Reference | Non-fiction | 001.422 FIS (Browse shelf(Opens below)) | Not For Loan | 0005843 |
Browsing IIITDM Kurnool shelves, Shelving location: Reference, Collection: Non-fiction Close shelf browser (Hides shelf browser)
1. RefresheR
RefresheR
Navigating the basics
Getting help in R
Vectors
Functions
Matrices
Loading data into R
Working with packages
Exercises
Summary
2. The Shape of Data
The Shape of Data
Univariate data
Frequency distributions
Central tendency
Spread
Populations, samples, and estimation
Probability distributions
Visualization methods
Exercises
Summary
3. Describing Relationships
Describing Relationships
Multivariate data
Relationships between a categorical and continuous variable
Relationships between two categorical variables
The relationship between two continuous variables
Visualization methods
Exercises
Summary
4. Probability
Probability
Basic probability
A tale of two interpretations
Sampling from distributions
The normal distribution
Exercises
Summary
5. Using Data To Reason About The World
Using Data To Reason About The World
Estimating means
The sampling distribution
Interval estimation
Smaller samples
Exercises
Summary
6. Testing Hypotheses
Testing Hypotheses
The null hypothesis significance testing framework
Testing the mean of one sample
Testing two means
Testing more than two means
Testing independence of proportions
What if my assumptions are unfounded?
Exercises
Summary
7. Bayesian Methods
Bayesian Methods
The big idea behind Bayesian analysis
Choosing a prior
Who cares about coin flips
Enter MCMC – stage left
Using JAGS and runjags
Fitting distributions the Bayesian way
The Bayesian independent samples t-test
Exercises
Summary
8. The Bootstrap
The Bootstrap
What's... uhhh... the deal with the bootstrap?
Performing the bootstrap in R (more elegantly)
Confidence intervals
A one-sample test of means
Bootstrapping statistics other than the mean
Busting bootstrap myths
Exercises
Summary
9. Predicting Continuous Variables
Predicting Continuous Variables
Linear models
Simple linear regression
Simple linear regression with a binary predictor
Multiple regression
Regression with a non-binary predictor
Kitchen sink regression
The bias-variance trade-off
Linear regression diagnostics
Advanced topics
Exercises
Summary
10. Predicting Categorical Variables
Predicting Categorical Variables
k-Nearest neighbors
Logistic regression
Decision trees
Random forests
Choosing a classifier
Exercises
Summary
11. Predicting Changes with Time
Predicting Changes with Time
What is a time series?
What is forecasting?
Creating and plotting time series
Components of time series
Time series decomposition
White noise
Autocorrelation
Smoothing
ETS and the state space model
Interventions for improvement
What we didn't cover
Citations for the climate change data
Exercises
Summary
12. Sources of Data
Sources of Data
Relational databases
Using JSON
XML
Other data formats
Online repositories
Exercises
Summary
13. Dealing with Missing Data
Dealing with Missing Data
Analysis with missing data
Visualizing missing data
Types of missing data
Unsophisticated methods for dealing with missing data
So how does mice come up with the imputed values?
Exercises
Summary
14. Dealing with Messy Data
Dealing with Messy Data
Checking unsanitized data
Regular expressions
Other tools for messy data
Exercises
Summary
15. Dealing with Large Data
Dealing with Large Data
Wait to optimize
Using a bigger and faster machine
Be smart about your code
Using optimized packages
Using another R implementation
Using parallelization
Using Rcpp
Being smarter about your code
Exercises
Summary
16. Working with Popular R Packages
Working with Popular R Packages
The data.table package
Using dplyr and tidyr to manipulate data
Functional programming as a main tidyverse principle
Reshaping data with tidyr
Exercises
Summary
17. Reproducibility and Best Practices
Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples. Packed with engaging problems and exercises, this book begins with a review of R and its syntax with packages like Rcpp, ggplot2, and dplyr. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with messy data, large data, communicating results, and facilitating reproducibility. This book is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst.
There are no comments on this title.