Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

MapReduce and YARN Quiz Answers – Queslers

All Modules MapReduce and YARN Quiz Answers

Apache Hadoop is one of the most popular tools for big data processing. It has been successfully deployed in production by many companies for several years. Though Hadoop is considered a reliable, scalable, and cost-effective solution, it is constantly being improved by a large community of developers. As a result, the 2.0 version offers several revolutionary features, including Yet Another Resource Negotiator (YARN), HDFS Federation, and high availability, which make the Hadoop cluster much more efficient, powerful, and reliable.

The most serious limitations of classical MapReduce are primarily related to scalability, resource utilization, and the support of workloads different from MapReduce. In the MapReduce framework, the job execution is controlled by two types of processes: a single master process called JobTracker and a number of subordinate processes called TaskTrackers.

Apache Hadoop 2.0 includes YARN, which separates the resource management and processing components. The YARN-based architecture is not constrained to MapReduce. In YARN, MapReduce is simply degraded to a role of a distributed application (but still a very popular and useful one) and is now called MRv2. MRv2 is simply the re-implementation of the classic MapReduce engine, now called MRv1, which runs on top of YARN.

The course reviews MapReduce1 and provides insight into the design and implementation of YARN: ResourceManager instead of a cluster manager, ApplicationMaster instead of a dedicated and short-lived JobTracker, NodeManager instead of TaskTracker, a distributed application instead of a MapReduce job.

Enroll on Cognitive Class

Module 1: Introduction to MapReduce and YARN

Question: Which phase of MapReduce is optional?

  • Shuffle
  • Reduce
  • Combiner
  • Map

Question: Which node is responsible for assigning (key, value) pairs to different reducers?

  • Shuffle node
  • Reducer node
  • Combiner node
  • Mapper node

Question: Where are the output files of the Reducer task stored?

  • A data warehouse
  • Hadoop FS
  • Within the Reducer node
  • Linux FS

Module 2: Limitations of Hadoop v1 & MapReduce v1

Question: What is an issue or limitation of the original MapReduce v1 paradigm?

  • It’s not scalable
  • It only has one TaskTracker
  • It only supports Parquet file types
  • It only has one JobTracker

Question: How is YARN an improvement over the MapReduce v1 paradigm?

  • It’s completely open source
  • It splits the JobTracker into two processes: ResourceManager and ApplicationManager
  • It reduces multi-tenancy to improve performance
  • It splits the TaskTracker into two processes: ResourceManager and ApplicationManager

Question: Existing applications can run on YARN without recompilation. True or False?

  • True
  • False

Module 3: The Architecture of YARN

Question: The main change from Hadoop v1 to Hadoop v2 was the consolidation of both resource management and job processing. True or False?

  • True
  • False

Question: The NodeManager is a more generic and efficient version of the TaskTracker. True or False?

  • True
  • False

Question: A new ApplicationMaster is launched for each job and ends when the job completes. True or False?

  • True
  • False

Final Exam

Question: Which of the following is the correct sequence of MapReduce flow?

  • Reduce —> Combine —> Map
  • Combine —> Reduce —> Map
  • Map —> Reduce —> Combine
  • Map —> Combine —> Reduce

Question: Which of the following can be used to control the number of part files in a MapReduce program’s output directory?

  • Shuffle parameters
  • Number of Reducers
  • Counter
  • Number of Mappers
  • Duplicate of ‘Question 2’

Question: Which of the following operations will work improperly when using a Combiner?

  • Average
  • Maximum
  • Count
  • Minimum

Question: Which of the following is true about MapReduce?

  • Compression of input files is optional.
  • Output from the Map phase is replicated.
  • The programmer must write the Map code, the Shuffle code, and the Reduce code.
  • MapReduce programs must be written in Java.

Question: Input data to MapReduce is record-oriented and blocks of data contain the same number of full records. True or False?

  • False.
  • True.

Question: Which statement is true about the Reduce phase of MapReduce?

  • Output results are sent to the client program.
  • Data arrives from the Shuffle phase already sorted by key.
  • The Reducer phase sums up the values associated with each key.
  • Each Reduce task processes all the data for one key only.

Question: Which statement is true about the Reduce phase of MapReduce?

  • Containers are used instead of slots in MRv1, and can be used with either Map or Reduce tasks in MRv2.
  • There is one JobTracker in the cluster.
  • MapReduce jobs written in Java for MRv1 never require recompilation.
  • Each job has an ApplicationManager that obtains Container IDs from the NodeManager.

Question: With YARN, long-running jobs acquire and retain fixed-size containers before execution starts. True or False?

  • False.
  • True.

Question: Which of the following statements is true?

  • The NameNode in Hadoop 2 is fully fault-tolerant, whereas in Hadoop 1 it was a single point of failure.
  • The NodeManager in Hadoop 2 replaces the TaskTracker in Hadoop 1.
  • YARN requires a minimum of two nodes, one master and one slave, to run
  • Both MapReduce and YARN can scale to any cluster size.

Question: The command provides the CLASSPATH needed for compiling Java programs written for MapReduce or YARN. True or False?

  • False.
  • True.

Question: Which statement is true about MapReduce’s use of replication in HDFS?

  • Only one copy of each replicated block is processed by MapReduce in normal operation.
  • Speculative execution is normally performed on all copies of each “split.”
  • Each DataNode uses RAID to store its data.
  • Multiple copies of each record are kept on each node.

Question: On which file system (FS) is the output of a Mapper task stored?

  • Linux FS, and it is replicated 3 times.
  • HDFS, and it is replicated 3 times.
  • Linux FS, but it is not replicated.
  • HDFS, but it is not replicated.

Question: Which of the following statements is true?

  • You can set the number of Reducers.
  • The Shuffle phase is optional.
  • You can set the number of Mappers and the number of Reducers.
  • The number of Combiners is the same as the number of Reducers.
  • You can set the number of Mappers.

Question: What will a Hadoop job do if you try to run it with an output directory that is already present?

  • It will create new files, but with a different suffix.
  • It will create another directory to store the output.
  • It will erase all files in that directory before running.
  • It will not run.

Question: What are the main components of the ResourceManager in YARN? Select two.

  • Scheduler
  • JobTracker
  • DataManager
  • HDFS
  • ApplicationManager
Conclusion:

We hope you know the correct answers to MapReduce and YARN If Why Quiz helped you to find out the correct answers then make sure to bookmark our site for more Course Quiz Answers.

If the options are not the same then make sure to let us know by leaving it in the comments below.

Course Review:

In our experience, we suggest you enroll in this course and gain some new skills from Professionals completely free and we assure you will be worth it.

This course is available on Cognitive Class for free, if you are stuck anywhere between quiz or graded assessment quiz, just visit Queslers to get all Quiz Answers and Coding Solutions.

Explore More Solutions on Queslers >>

Leetcode Solution

Hacker Rank Solution

CodeChef Solution

Coursera Assignment Solution

Cognitive Class Answers

Leave a Reply

Your email address will not be published. Required fields are marked *