Research Assignment with R script
Need Research Assignment with R Script
4/24/20 Assignment 1 DE VA.docx P a g e | 1
Research Assignment 1
The Outline for Research Assignment 1 and Research Assignment 1 will use this document.
Use the Documenting Research Guide to understand how to use the information in this document for
either of these submissions.
Ask questions if needed!
Topic: The Center for Disease Control and Prevention (CDC) uses the social vulnerability
index (SVI) to evaluate the impact of disasters on communities, weighting the damage
with social factors in the states of Delaware and Virginia.
Problem: The data consolidated by the CDC is used to determine the most vulnerable areas
should a disaster occur. In a perfect world, the indicators of vulnerability would
represent the people correctly. Currently, this far-from-perfect method is the best that
has been developed. There may be indicators that are not adequately predictive of
social vulnerability.
Question 1: What relationships exist in the states of Delaware and Virginia between the
socioeconomic indicators, household, and composition indicators, disability indicators,
and social vulnerability when using the data consolidated by the CDC (2018a)?
Question 2: What indicators in the states of Delaware and Virginia between the socioeconomic
indicators, household, and composition indicators, disability indicators have the most
influence in predicting social vulnerability when using the data consolidated by the CDC
(2018a)?
Data:
The data and data dictionaries are online.
o Center for Disease Control and Prevention. (2018a). Social Vulnerability Index [data
set]. https://svi.cdc.gov/Documents/Data/2018_SVI_Data/CSV/SVI2018_US.csv
o Center for Disease Control and Prevention. (2018b). Social Vulnerability Index [code
book]. https://svi.cdc.gov/Documents/Data/2018_SVI_Data/SVI2018Documentation.pdf
o Note: The raw data must be this report in its original form when it enters the R script
file. Use the data dictionary to understand the data.
Create a subset of the data to represent the sample of secondary data in this analysis.
o The SVI indexs variable name is
RPL_THEMES, in column 99
o Socioeconomic
Persons below the poverty
estimate
Civilian unemployed estimate
Per capita income estimate
Persons with no high school
diploma
o Household and composition disability
features
Ages 65 and older
Ages 17 and under
Persons with a disability, over the
age of 5
Single-parent households
o The state field
4/24/20 Assignment 1 DE VA.docx P a g e | 2
Note: Do not use more than one indicator for each measure defined in this section.
Variable names preceded with E_ are actual measures, while M_ represents the
margin of error estimates.
Other prefixes are follow-on calculations or qualitative information, do not include variables
that are not identified in the research questions, as listed in the data section.
Do not include the margin of error estimates at this time.
Considering the research questions, after subsetting, there will be 10 variables used in this
analysis.
Data Cleaning:
Do not remove missing values during cleaning. If missing values need to be removed for
analysis method, do it during the preparation for analysis. A code represents missing values.
Use the data dictionary to understand the data sample and how missing values are
represented.
When changing an object or part of an object, validate the change that occurred as expected.
The steps that are taken in cleaning are not discussed in the research paper.
There is a code that represents missing values; ensure this is found in the data dictionary!
These values will have to be recoded as NA. Not figuring it out? Please email me.
Analyze:
Conduct two types of analysis: visual analysis to identify relationships and a random forest
model to identify influential indicators in predicting the social vulnerability.
The sub-stages of Analyze are necessary at least two times; profile, prepare, and apply. This
method is for programming, not documenting research.
During the visual analysis, only present meaningful visuals to understand what the
relationships exist between the indicators for the social vulnerability index.
Ensure you establish that the model is valid and reliable before discussing the influential
indicators.
Documenting research:
Results, Impact of the Results:
Ensure that assertions and assessments in the results and discussion sections are derived
from the analysis in R.
Do not speculate. Use evidence. When documenting the results, consider the generalizability.
Future Recommendations:
Include recommendations for future analysis, based on the research in R.
An example might look something like this:
o An opportunity for further research, based on gaps found in the random forest modeling,
is to look at the ability to tune the parameters further, to improve the performance in
predicting the
o Additionally, an opportunity for future research is exploration modeling to determine
what other variables, when eliminated, have little or no impact on the ability to predict
the SVI based on the supporting characteristics in the data.
4/24/20 Assignment 1 DE VA.docx P a g e | 3
Bonus challenge:
Create a random forest model for each state that is assigned. Ensure that this analysis is within the
scope of the research.
Tip: An additional research question that meets the five criteria from the first lecture will bring
this analysis within the scope. Make sure the question is structured to encompass the additional
research. The challenge does not replace the original research requirements for this assignment.
Required files to submit:
1) Research paper in APA 7 format; MS Word document file type
2) R Script; final version
Important Information:
You will receive an email confirming the submission. Should you receive that email, your
submission is received.
o An error is derived from the use of SafeAssign. SafeAssign does not recognize r file
types. The warning does not impact the submission.
The research paper will be written in a professional writing style, following APA 7 student
paper format, use the student paper template.
o The document shall be 3-5 pages and at least 1000 words. The page count does
include the cover page, tables, or figures, or the reference page.
o Ensure that every reference in the reference list is also cited in the text.
o Do not forget to cite and reference the source of the data.
It is ill-advised to modify the problem statement and research questions provided.
If the research problem or research questions are modified, the requirements of the analysis
will not change.
There are several different versions of this assignment. If the submitted work is in line with a
different version than assigned, the submitted work is a demonstration of academic
dishonesty. Do not share the work with peers. Do not accept work that you did not do.
Take a look at the rubric to get the best grade possible.