Homework 1: Data==============================
****** Submit a soft copy to the Blackboard and bring a hard copy to class.
******
****** You may work in a group of 2 ******
****** Points: 35
1. (5 points) Chapter 2.6 Exercises: Question 3
2. (10 points) Chapter 2.6 Exercises: Question 19 (a,c,e)
3. (10 points) State the type of each attribute (nominal, ordinal, interval,
or ratio) given below before and after we have performed the following
transformation.
(a) Hair color of a person is mapped to the following values: black =0, brown
= 1, red = 2, blonde = 3, grey = 4, white = 5.
(b) Grade of a student (from 0 to 100) is mapped to the following scale: A =
4.0, A- = 3.5, B = 3.0, B- = 2.5, C = 2.0, C- = 1.5, D = 1.0,D- = 0.5, E =
0.0
(c) Height of a person is changed from meters to feet.
4. (10 points) Null values in data records may refer to missing or
inapplicable values.
Consider the following table of employees for a hypothetical organization:
Name
Sales
commission Occupation
================================
John
5000
Sales
Mary
1000
Sales
Bob
null
Non-sales
Lisa
null
Non-sales
The null values in the table refer to inapplicable values since sales
commission are calculated for sales employees only. Suppose we are interested
to calculate the similarity between users based on their sales commission.
(a) Explain what is the limitation of the approach to compute similarity if
we replace the null values in sales commission by 0.
(b) Explain what is the limitation of the approach to compute similarity if
we replace the null values in sales commission by the average value of sales
commission (i.e., 3000).
(c) Propose a method that can handle null values in the sales commission so
that employees that have the same occupation are closer to each other than to
employees that have different occupations. Removing rows are not acceptable
here.