The American Community Survey of New York data describe a family’s economic, housing, and demographic information such as yearly incomes, family type (married, female head, male head, etc), number of people, number of children, if the family is living on **FoodStamp** (1=Yes, 0=No), number of people who are currently employed, etc. Use this data set to answer the following questions:

**Questions:**

1. Choose an appropriate model to determine if there is family income (i.e., **FamilyIncome** in the data set) difference among different types of families (i.e., **FamilyType** in the data set) in the state of New York?

2. Choose an appropriate model to determine if there is family income (FamilyIncome) difference between families with kids and families without kids? (Note: There is currently no column holding data about whether a family has kids. You may want to use the **with()** function in R to create a new column to hold the binary answers.)

Click Here for How to Use the with() Function.

3. Choose an appropriate model, and use FamilyIncome, **NumChildren**, **NumPeople**, **NumWorkers** to predict the probability that a family is living on FoodStamp.

(Note: please ignore the warning message such as *glm.fit: fitted probabilities numerically 0 or 1 occurred* if you see one)

Please save the screenshots of the data analysis results and your interpretations of the results into a Word document, and name it as “MyAssignment1”. Thank you!

**How to Use the with() function**

The purpose of the

with() function is to attach a new vector to a data frame, if certain conditions are qualified. For instance, in the acs_ny.csv file, there is a column called “**NumChildren**”. It shows how many children a family has. However, in question 2, we want to compare if there is a difference in family incomes between families with kids and families without kids. We don’t need to consider how many kids each family has. We only need to consider whether or not a family has kids. We need to create a new vector called “**HasKidsOrNot**” according to the values in “NumChildren”. If the value in “NumChildren” is zero, we want to give a “No” in “HasKidsOrNot”. Otherwise, we give a “Yes” in “HasKidsOrNot”. By doing this, we can classify the data set into families with kids and families without kids. We use the

with() function to achieve this goal.

1) Import the data set into R.

**acsData**<-read.table("acs_ny.csv",sep=",",header=T)

2) Use head() function to take a look at the current dataframe

**head(acsData) **

Notice that the last vector is Language.

3) Use the with() function

**acsData$HasKidsOrNot<-with(acsData,NumChildren>0)**

4) Use head() function to take a look at the current data frame

head(acsData)

Notice that the last vector is

HasKidsOrNot. What we did was to attach a new vector called

HasKidsOrNot to

acsData data frame by using the

**with () **function. The with() function did an estimation by using the values in the

NumChildren vector. If the value is greater than 0, which means a family has at least one kid, with() will give a “Yes” to HasKidsOrNot; Otherwise, with() will give a “No” to HasKidsOrNot.

Now, you can use the ANOVA test to perform the analysis for this question.

image1

image2