304 North Cardinal St.
Dorchester Center, MA 02124

# Spark MLlIB Quiz Answers – Cognitive Class

## Get All Modules Spark MLlIB Quiz Answers

Spark provides a machine learning library known as MLlib. Spark MLlib provides various machine learning algorithms such as classification, regression, clustering, and collaborative filtering. It also provides tools such as featurization, pipelines, persistence, and utilities for handling linear algebra operations, statistics and data handling. This course will start you off on your journey and walk you through some of the machine learning libraries and how to use them.

Enroll on Cognitive Class

Module 1 – Spark MLlib Data Types

Question: Sparse Data generally contains many non-zero values, and few zero values.

• True
• False

Question: Local matrices are generally stored in distributed systems and rarely on single machines.

• True
• False

Question: Which of the following are distributed matrices?

• RowMatrix
• ColumnMatrix
• CoordinateMatrix
• SphericalMatrix
• RowMatrix and CoordinateMatrix
• All of the Above

Module 2 – Review of Algorithms

Question: Logistic Regression is an algorithm used for predicting numerical values.

• True
• False

Question: The SVM algorithm maximizes the margins between the generated hyperplane and two clusters of data.

• True
• False

Question: Which of the following is true about Gaussian Mixture Clustering?

• The closer a data point is to a particular centroid, the more likely that data point is to be clustered with that centroid.
• The Gaussian of a centroid determines the probability that a data point is clustered with that centroid.
• The probability of a data point being clustered with a centroid is a function of distance from the point to the centroid.
• Gaussian Mixture Clustering uses multiple centroids to cluster data points.
• All of the Above

Module 3 – Spark MLlib Decision Trees and Random Forests

Question: Which of the following is a stopping parameter in a Decision Tree?

• The number of nodes in the tree reaches a specific value.
• The depth of the tree reaches a specific value.
• The breadth of the tree reaches a specific value.
• All of the Above

Question: When using a regression type of Decision Tree or Random Forest, the value for impurity can be measured as either ‘entropy’ or ‘variance’.

• True
• False

Question: In a Random Forest, featureSubsetStrategy is considered a stopping parameter, but not a tunable parameter.

• True
• False

Module 4 – Spark MLlib Clustering

Question: In Spark MLlib, the initialization mode for the K-Means training method is called

• k-means–
• k-means++
• k-means||
• k-means

Question: In K-Means, the “runs” parameter determines the number of data points allowed in each cluster.

• True
• False

Question: In Gaussian Mixture Clustering, the sum of all values outputted from the “weights” function must equal 1.

• True
• False

Final Exam

Question: In Gaussian Mixture Clustering, the predictSoft function provides membership values from the top three Gaussians only.

• True
• False

Question: In Decision Trees, what is true about the size of a dataset?

• Large datasets create “bins” on splits, which can be specified with the maxBins parameter.
• Large datasets sort feature values, then use the ordered values as split calculations.
• Small datasets create split candidates based on quantile calculations on a sample of the data.
• Small datasets split on random values for the feature.

Question: A Logistic Regression algorithm is ineffective as a binary response predictor.

• True
• False

Question: What is the Row Pointer for a Matrix with the following Row Indices: [5, 1 | 6 | 2, 8, 10]

• [1, 6]
• [0, 2, 3, 6]
• [0, 2, 3, 5]
• [2, 3]

Question: For multiclass classification, try to use (M-1) Decision Tree split candidates whenever possible.

• True
• False

Question: In a Decision Tree, choosing a very large maxDepth value can:

• Increase accuracy
• Increase the risk of overfitting to the training set
• Increase the cost of training
• All of the Above
• Increase the risk of overfitting and increase the cost of training

Question: In Gaussian Mixture Clustering, a large value returned from the weights function represents a large precedence of that Gaussian.

• True
• False

Question: Increasing the value of epsilon when creating the K-Means Clustering model can:

• Decrease training cost and decrease the number of iterations that the model undergoes
• Decrease training cost and increase the number of iterations that the model undergoes
• Increase training cost and decrease the number of iterations that the model undergoes
• Increase training cost and increase the number of iterations that the model undergoes

Question: In order to train a machine learning model in Spark MLlib, the dataset must be in the form of a(n)

• Python List
• Textfile
• CSV file
• RDD

Question: What is true about Dense and Sparse Vectors?

• A Dense Vector can be created using a csc_matrix, and a Sparse Vector can be created using a Python List.
• A Dense Vector can be created using a SciPy csc_matrix, and a Sparse Vector can be created using a SciPy NumPy Array.
• A Dense Vector can be created using a Python List, and a Sparse Vector can be created using a SciPy csc_matrix.
• A Dense Vector can be created using a SciPy NumPy Array, and a Sparse Vector can be created using a Python List.

Question: In a Decision Tree, increaing the maxBins parameter allows for more splitting candidates.

• True
• False

Question: In classification models, the value for the numClasses parameter does not depend on the data, and can change to increase model accuracy.

• True
• False

Question: What is true about Labeled Points?

• A – A labeled point is used with supervised machine learning, and can be made using a dense local vector.
• B – A labeled point is used with unsupervised machine learning, and can be made using a dense local vector.
• C – A labeled point is used with supervised machine learning, and can be made using a sparse local vector.
• D – A labeled point is used with unsupervised machine learning, and can be made using a sparse local vector
• All of the Above
• A and C only

Question: In the Gaussian Mixture Clustering model, the convergenceTol value is a stopping parameter that can be tuned, similar to epsilon in k-means clustering.

• True
• False

Question: In Gaussian Mixture Clustering, the “Gaussians” function outputs the coordinates of the largest Gaussian, as well as the standard deviation for each Gaussian in the mixture.

• True
• False

Question: What is true about the maxDepth parameter for Random Forests?

• A large maxDepth value is preferred since tree averaging yields a decrease in overall bias.
• A large maxDepth value is preferred since tree averaging yields a decrease in overall variance.
• A large maxDepth value is preferred since tree averaging yields an increase in overall bias.
• A large maxDepth value is preferred since tree averaging yields an increase in overall variance.

### Conclusion:

We hope you know the correct answers to Spark MLlIB If Queslers helped you to find out the correct answers then make sure to bookmark our site for more Course Quiz Answers.

If the options are not the same then make sure to let us know by leaving it in the comments below.

##### Course Review:

In our experience, we suggest you enroll in this and gain some new skills from Professionals completely free and we assure you will be worth it.

This course is available on Cognitive Class for free, if you are stuck anywhere between quiz or graded assessment quiz, just visit Queslers to get all Quiz Answers and Coding Solutions.

Building Cloud Native and Multicloud Applications Quiz Answers

Accelerating Deep Learning with GPUs Quiz Answers

Blockchain Essentials Cognitive Class Quiz Answers

Deep Learning Fundamentals Cognitive Class Quiz Answers