you will select one of the datasets in the data curation repository and review the specific business need found in the dataset documentation. Specifically, you will plan the analysis steps, and perform a complete exploratory data analysis on the dataset using either R, Python, or Rapid Miner Studio. Use descriptive and meaningful comments in the code or process file as documentation.
In your paper, describe the process used to plan the analysis, explain the process and rationale used to perform the following exploratory data analysis processes for the selected dataset (with justification for methods chosen), the observations of and the results from the analysis (plus visualizations for each), and the details for insights gained from the process:
Measures of Variability and Central Tendency
Frequency, Variance, and Standard Deviation
Outlier Detection and Distribution Modality
Univariate Analysis Methods
Cluster Analysis and Data Grouping
Be sure appropriate annotated visualizations (tables, charts, graphs) are properly referenced in the narrative and provided in the Appendix.
Describe statistical findings and insights obtained as a result and subsequent actions that should be taken with the dataset.
Length: 12-15 pages, not including title and reference pages, and screenshots of visualizations provided in Appendix
Attachments – .ipynb file (Jupyter Notebook) or .r file (R) or .rmp file (Rapid Miner Studio)
References: Include a minimum of 5 scholarly references.
Review the dataset information and the data dictionary provided.
Click the link for CSV Data File (includes Replicate Weights) to download the zip file. Select the file, hhpub20.csv for analysis.
Review the dataset information in the PDF December 2019 Food Security Technical Documentation.
Click the CSV link under CPS Supplement