1) You are approached by the marketing director of a local company, who believes that hehas devised a foolproof way to measure customer satisfaction. He explains his scheme as
follows: “It’s so simple that I can’t believe that no one has thought of it before. I just keep
track of the number of customer complaints for each product. I read in a data mining book
that counts are ratio attributes, and so, my measure of product satisfaction must be a ratio
attribute. But when I rated the products based on my new customer satisfaction measure
and showed them to my boss, he told me that I had overlooked the obvious, and that my
measure was worthless. I think that he was just mad because our bestselling product had
the worst satisfaction since it had the most complaints. Could you help me set him
straight?”
(a) Who is right, the marketing director or his boss? If you answered, his boss, what would
you do to fix the measure of satisfaction?
(b) What can you say about the attribute type of the original product satisfaction attribute?
2) For the following vectors, x and y, calculate the indicated similarity or distance measures
(a) x = (1, 1, 1, 1), y = (2, 2, 2, 2) cosine, correlation, Euclidean
(c) x = (0, −1, 0, 1), y = (1, 0, −1, 0) cosine, correlation, Euclidean
(e) x = (2, −1, 0, 2, 0, −3), y = (−1, 1, −1, 0, 0, −1) cosine, correlation
3. State the type of each attribute (nominal, ordinal, interval, or ratio) given below before
and after we have performed the following transformation.
(a) Hair color of a person is mapped to the following values: black =0, brown = 1, red = 2,
blonde = 3, grey = 4, white = 5.
(b) Grade of a student (from 0 to 100) is mapped to the following scale: A = 4.0, A- = 3.5, B =
3.0, B- = 2.5, C = 2.0, C- = 1.5, D = 1.0,D- = 0.5, E = 0.0
(c) Height of a person is changed from meters to feet.
4 .Null values in data records may refer to missing or inapplicable values.
Consider the following table of employees for a hypothetical organization:
Name Sales
commission Occupation
================================
John
5000
Sales
Mary
1000
Sales
Bob
null
Non-sales
Lisa
null
Non-sales
The null values in the table refer to inapplicable values since sales commission are
calculated for sales employees only. Suppose we are interested
to calculate the similarity between users based on their sales commission.
(a) Explain what is the limitation of the approach to compute similarity if we replace the
null values in sales commission by 0.
(b) Explain what is the limitation of the approach to compute similarity if we replace the
null values in sales commission by the average value of sales commission (i.e., 3000).
(c) Propose a method that can handle null values in the sales commission so that
employees that have the same occupation are closer to each other than to employees that
have different occupations. Removing rows are not acceptable here.