I need questions from 1 to 6 and i don’t require 7th one.
Homework Assignment #2 – Calculating Size of Distributed File Systems
Exercise 3-1 Imagine that you want to analyze one terabyte (1 TB) of
data that is residing in a single machine with eight input/output channels,
where each channel has a reading speed of 150 megabytes per second
(MB/s).
1. Calculate the time it takes for the reader to read the entire file.
2. To speed up the reading operation, consider adding more machines
and creating a distributed cluster. What is the minimum number of
machines you should install in the cluster so the entire read time is less
than 10 seconds?
Exercise 3-2 A cluster with 50 machines is storing blocks of data that
belong to customer complaints. The size of the file is 5 TB, and each
machine has four channels with a reading speed of 100 MB/s for each
channel. Is the number of machines (50) sufficient to read the data in
under 20 seconds? If not, how many more similar machines need to be
added to the cluster?
Exercise 3-3 You want to store a 500 MB file into a cluster with 12 nodes, which
are located in four different racks (three nodes per rack) as shown in the figure
below.
1. If a data block can store 128 MB, how many data blocks are needed to split
this file?
2. Use a replication factor of 3 and the write principles discussed earlier to
allocate the data blocks into this cluster.
3. Repeat steps 1 and 2 but with a block size of 256 MB.
Exercise 3-4 Use the same cluster in the figure shown below for a file size of 50
GB. Each Data Node can store up to 8 GB of data. You need to allocate the data
blocks, each of a 256 MB size, in the cluster using a replication factor of 3.
1. Is the number of Data Nodes (12) sufficient to store this data file?
2. If not, how many more Data Nodes are needed? If needed, add them to the
cluster in a separate rack and allocate the blocks in the modified cluster.
3. If 12 is sufficient, allocate the data blocks in the cluster.
4. Repeat steps 1 through 3 but with a block size of 128 MB.
Exercise 3-5 Consider the block allocations shown in the figure below. Using a
replication factor of 3, are all blocks allocated in the correct Rack and Data Node?
If no, reallocate the blocks correctly. Explain your decision.
Exercise 3-6 Consider the block allocations shown in the figure below. Using a
replication factor of 3, are all blocks allocated in the correct Rack and Data Node?
If no, reallocate the blocks correctly. Explain your decision.
Exercise 3-7 Use the HDFS commands provided in Appendix A-Part 2 (HDFS) to
perform the following tasks. Submit a document in which each command is
associated with a screenshot of its result.
1. Create a directory in HDFS.
2. Copy any file from the local machine to the newly created directory.
3. List the directory’s contents.
4. View the contents of the file.
5. Rename any file in HDFS.
6. Create another directory in HDFS and move any file from one directory to
another.
7. Delete any file.
8. Delete any directory.
9. Move any file from HDFS to the local machine.
10. Display the size of files.
11. Change the group of any file or directory.
12. Change permissions of any file or directory.