The American Community Survey of New York data describe a family’s economic, housing, and demographic information such as yearly incomes, family type (married, female head, male head, etc), number of people, number of children, if the family is living on FoodStamp (1=Yes, 0=No), number of people who are currently employed, etc. Use this data set to answer the following questions:
Questions:
1. Choose an appropriate model to determine if there is family income (i.e., FamilyIncome in the data set) difference among different types of families (i.e., FamilyType in the data set) in the state of New York?
2. Choose an appropriate model to determine if there is family income (FamilyIncome) difference between families with kids and families without kids? (Note: There is currently no column holding data about whether a family has kids. You may want to use the with() function in R to create a new column to hold the binary answers.)
Click Here for How to Use the with() Function.
3. Choose an appropriate model, and use FamilyIncome, NumChildren, NumPeople, NumWorkers to predict the probability that a family is living on FoodStamp.
(Note: please ignore the warning message such as glm.fit: fitted probabilities numerically 0 or 1 occurred if you see one)
Please save the screenshots of the data analysis results and your interpretations of the results into a Word document, and name it as “MyAssignment1”. Thank you!
How to Use the with() function
The purpose of the
with() function is to attach a new vector to a data frame, if certain conditions are qualified. For instance, in the acs_ny.csv file, there is a column called “NumChildren”. It shows how many children a family has. However, in question 2, we want to compare if there is a difference in family incomes between families with kids and families without kids. We don’t need to consider how many kids each family has. We only need to consider whether or not a family has kids. We need to create a new vector called “HasKidsOrNot” according to the values in “NumChildren”. If the value in “NumChildren” is zero, we want to give a “No” in “HasKidsOrNot”. Otherwise, we give a “Yes” in “HasKidsOrNot”. By doing this, we can classify the data set into families with kids and families without kids. We use the
with() function to achieve this goal.
1) Import the data set into R.
acsData<-read.table("acs_ny.csv",sep=",",header=T)
2) Use head() function to take a look at the current dataframe
head(acsData)
Notice that the last vector is Language.
3) Use the with() function
acsData$HasKidsOrNot<-with(acsData,NumChildren>0)
4) Use head() function to take a look at the current data frame
head(acsData)
Notice that the last vector is
HasKidsOrNot. What we did was to attach a new vector called
HasKidsOrNot to
acsData data frame by using the
with () function. The with() function did an estimation by using the values in the
NumChildren vector. If the value is greater than 0, which means a family has at least one kid, with() will give a “Yes” to HasKidsOrNot; Otherwise, with() will give a “No” to HasKidsOrNot.
Now, you can use the ANOVA test to perform the analysis for this question.
image1
image2