Discretize attribute A4 into two intervals using equal-width partitioning and label them as “More” and “Less” accordingly.
b) Use the simple matching coefficient method to prepare the dissimilarity matrix below.
c) Based on the document dissimilarity matrix in part (b), cluster the five documents using complete-linkage agglomerative clustering and draw the corresponding dendrogram.
d) For the following new document posting, which cluster or clusters of your solution in part (c) should it be assigned to? Justify your answer.
Suppose you are asked to provide data mining consulting services to an Internet DVD shop. After interviewing the shop’s manager and the database administrator, the following movie database is collected.
If you are asked to cluster the movies, identify or propose an appropriate dissimilarity measure for it and prepare the following dissimilarity matrix for the five movies in the database above
The single linkage agglomerative clustering has been suffering from the weakness of low scalability (high time complexity). Other than the traditional sampling approach, propose a way to speed up its computation