304 North Cardinal St.
Dorchester Center, MA 02124

# Data Science with Scala Quiz Answers￼

## Get Data Science with Scala Quiz Answers

Put your Scala knowledge to good use by tackling Big Data analytics problems. Learn to leverage the integration of Apache Spark™ and Scala. Learn how use Spark’s machine learning pipelines to fit models and search for optimal hyperparameters using Scala in a Spark cluster.

Enroll on Cognitive Class

Module 1: Basic Statistics and Data Types

Question: You import MLlib’s vectors from ?

• org.apache.spark.mllib.TF
• org.apache.spark.mllib.numpy
• org.apache.spark.mllib.linalg
• org.apache.spark.mllib.pandas

Question: Select the types of distributed Matrices :

• RowMatrix
• IndexedRowMatrix
• CoordinateMatrix

Question: How would you caculate the mean of the following ?

`val observations: RDD[Vector] = sc.parallelize(Array(`

`Vectors.dense(1.0, 2.0),`

`Vectors.dense(4.0, 5.0),`

`Vectors.dense(7.0, 8.0)))`

`val summary: MultivariateStatisticalSummary = Statistics.colStats(observations)`

• summary.normL1
• summary.numNonzeros
• summary.mean
• summary.normL2

Question: what task does the following lines of code?

`import org.apache.spark.mllib.random.RandomRDDs._`

`val million = poissonRDD(sc, mean=1.0, size=1000000L, numPartitions=10)`

• alculate the variance
• calculate the mean
• generate random samples
• Calculate the variance

Question: MLlib uses the compressed sparse column format for sparse matrices, as Such it only keeps the non-zero entrees?

• True
• False

Module 2: Preparing Data

Question: WFor a dataframe object the method describe calculates the ?

• count
• mean
• standard deviation
• max
• min
• all of the above

Question: What line of code drops the rows that contain null values, select the best answer ?

• val dfnan = df.withColumn(“nanUniform”, halfTonNaN(df(“uniform”)))
• dfnan.na.replace(“uniform”, Map(Double.NaN -> 0.0))
• dfnan.na.drop(minNonNulls = 3)
• dfnan.na.fill(0.0)

Question: What task does the following lines of code perform ?

`val lr = new LogisticRegression()`

`lr.setMaxIter(10).setRegParam(0.01)`

`val model1 = lr.fit(training)`

• perform one hot encoding
• Train a linear regression model
• Train a Logistic regression model
• Perform PCA on the data

Question: The StandardScaleModel transforms the data such that ?

• each feature has a max value of 1
• each feature is Orthogonal
• each feature to have a unit standard deviation and zero mean
• each feature has a min value of -1

Module 3: Feature Engineering

Question: Spark ML works with?

• tensors
• vectors
• dataframes
• lists

Question: the function `IndexToString()` performs One hot encoding?

• True
• False

Question: Principal Component Analysis is Primarily used for ?

• to convert categorical variables to integers
• to predict discrete values
• dimensionality reduction

Question: one import set prior to using PCA is ?

• making sure every feature is not correlated
• taking the log for your data
• subtracting the mean

Module 4: Fitting a Model

Question: You can use decision trees for ?

• regression
• classification
• classification and regression
• data normalization

Question: the following lines of code: `val Array(trainingData, testData) = data.randomSplit(Array(0.7, 0.3))`

• split the data into training and testing data
• train the model
• use 70% of the data for testing
• use 30% of the data for training
• make a prediction

Question: in the Random Forest Classifier constructor .setNumTrees() ?

• sets the max depth of trees
• sets the minimum number of classes before a split
• set the number of trees

Question: Elastic net regularization uses ?

• L0-norm
• L1-norm
• L2-norm
• a convex combination of the L1 norm and L2 norm

Module 5: Pipeline and Grid Search

Question: what task does the following code perform: `withColumn("paperscore", data("A2") * 4 + data("A") * 3) `?

• add 4 colunms to A2
• add 3 colunms to A1
• add 4 to each elment in colunm A2
• assign a higher weight to A2 and A journals

Question: In an estimator ?

• there is no need to call the method fit
• fit function is called
• transform fuction is only called

Question: Which is not a valid type of Evaluator in MLlib?

• RegressionEvaluator
• MultiClassClassificationEvaluator
• MultiLabelClassificationEvaluator
• BinaryClassificationEvaluator
• All are valid

Question: In the following lines of code, the last transform in the pipeline is a:

val rf = new RandomForestClassifier().setFeaturesCol(“assembled”).setLabelCol(“status”).setSeed(42)

import org.apache.spark.ml.Pipeline

val pipeline = new Pipeline().setStages(Array(value_band_indexer,category_indexer,label_indexer,assembler,rf))

• principal component analysis
• Vector Assembler
• String Indexer
• Vector Assembler
• Random Forest Classifier

Final Exam

Question: What is not true about labeled points?

• They are used in unsupervised machine learning algorithms
• They associate sparse vectors with a corresponding label/response
• They associate dense vectors with a corresponding label/response
• All are true
• None are true

Question: Which is true about column pointers in sparse matrices?

• By themselves, they do not represent the specific physical location of a value in the matrix
• They never repeat values
• They have the same number of values as the number of columns
• All are true
• None are true

Question: What is the name of the most basic type of distributed matrix?

• SparseMatrix
• RowMatrix
• IndexedRowMatrix
• SimpleMatrix
• CoordinateMatrix

Question: A perfect correlation is represented by what value?

• 1
• 100
• 0
• -1
• 3

Question: A MinMaxScaler is a transformer which:

• Makes zero values remain untransformed
• Rescales each feature to a specific range
• Takes no parameters
• All are true
• None are true

Question: Which is not a supported Random Data Generation distribution?

• Uniform
• Poisson
• Delta
• Exponential
• Normal

Question: Sampling without replacement means:

• The expected size of the sample is unknown
• The expected size of the sample is the same as the RDDs size
• The expected number of times each element is chosen is randomized
• The expected number of times each element is chosen
• The expected size of the sample is a fraction of the RDDs size

Question: What are the supported types of hypothesis testing?

• Pearson’s Chi-Squared Test for independence
• Kolmogorov-Smirnov test for equality of distribution
• Pearson’s Chi-Squared Test for goodness of fit
• All are supported
• None are supported

Question: For Kernel Density Estimation, which kernel is supported by Spark?

• KernelDensity
• KDEMultivariate
• KDEUnivariate
• Gaussian
• All are supported

Question: Which DataFrames statistics method computes the pairwise frequency table of the given columns?

• crosstab()
• cov()
• freqItems()
• corr()
• pairwiseFreq()

Question: Which is not true about the fill method for DataFrame NA functions?

• It is used for replacing nil values
• It is used for replacing null values
• It is used for replacing NaN values
• All are true
• None are true

Question: Which transformer listed below is used for Natural Language processing?

• Normalizer
• ElementwiseProduct
• StandardScaler
• OneHotEncoder
• None are used for Natural Language processing

Question: Which is true about the Mahalanobis Distance?

• It is measured along each Principle Component axis
• It is a scale-variant distance
• It is a multi-dimensional generalization of measuring how many standard deviations a point is away from the median
• It has units of distance
• It does not take into account the correlations of the dataset

Question: Which is true about OneHotEncoder?

• It must be told which column is its input
• It creates a Sparse Vector
• It must be told which column to create for its output
• All are true
• None are true

Question: Principle Component Analysis is:

• Is never used for feature engineering
• Used for supervised machine learning
• A dimension reduction technique
• All are true
• None are true

Question: MLlib’s implementation of decision trees:

• Partitions data by rows, allowing distributed training
• Does not support regressions
• Supports only continuous features
• Supports only multiclass classification
• None are true

Question: Which is not a tunable of SparkML decision trees?

• minInfoGain
• maxBins
• maxMemoryInMB
• minDepth
• minInstancesPerNode

Question: Which is true about Random Forests?

• They combine many decision trees in order to reduce the risk of overfitting
• They support non-categorical features
• They only support binary classification
• They do not support regression
• None are true

Question: When comparing Random Forest versus Gradient-Based Trees, what must you consider?

• Parallelization abilities
• Depth of Trees
• How the number of trees affects the outcome
• All of these
• None of these

Question: Which is not a valid type of Evaluator in MLlib?

• MultiClassClassificationEvaluator
• MultiLabelClassificationEvaluator
• BinaryClassificationEvaluator
• RegressionEvaluator
• All are valid

### Conclusion:

We hope you know the correct answers to Data Science with Scala If Queslers helped you to find out the correct answers then make sure to bookmark our site for more Course Quiz Answers.

If the options are not the same then make sure to let us know by leaving it in the comments below.

##### Course Review:

In our experience, we suggest you enroll in this and gain some new skills from Professionals completely free and we assure you will be worth it.

This course is available on Cognitive Class for free, if you are stuck anywhere between quiz or graded assessment quiz, just visit Queslers to get all Quiz Answers and Coding Solutions.

Building Cloud Native and Multicloud Applications Quiz Answers

Accelerating Deep Learning with GPUs Quiz Answers

Blockchain Essentials Cognitive Class Quiz Answers

Deep Learning Fundamentals Cognitive Class Quiz Answers