304 North Cardinal St.
Dorchester Center, MA 02124

# Data Science for Non-Programmers Educative Quiz Answers

## Get Data Science for Non-Programmers Educative Quiz Answers

Ready to move past Excel for complex business analysis? Then you’ll find this course very helpful.

This hands-on introductory Data Science course is aimed at professionals and students who don’t have any experience with programming. It will help you advance your career by preparing you to conduct meaningful data analysis in Python on any dataset — large or small.

You’ll begin with the fundamentals of Python, with focus on CSV files in Python, covering concepts like data preprocessing and Exploratory Data Analysis (EDA). In the second half, you’ll focus on predictive and inferential analysis using statistical and machine learning techniques, and learn how these techniques can help solve business problems.

Enroll on Educative

#### Exercise 1: Average of a List

``````def average(input_list):
sum_list = 0
for i in input_list:
sum_list = sum_list + i

avg = sum_list/len(input_list)
return avg

``````

#### Exercise 2: Factorial of a Number

``````def factorial(n):
if n==0 or n==1:
return 1
if n < 1:
return -1

product = 1
while(n > 1):
product = product * n
n = n-1

return product

``````

#### Quiz 1:

Q1. A Dataframe is a 2-Dimensional object to store tabular data.

• True
• False

Q2. Suppose we have a `Gender` column in our dataframe (`df`) which has the values `Male` and `Female`. Which of these will give us a filtered dataframe of males. Select all answers you think are correct.

• Option 1
``df = df['Male']``
• Option 2
``df = df[df['Male']]``
• Option 3
``````condition = df['Gender'] == 'Male'
df = df[condition]
``````
• Option 4
``````df = df[ df['Gender'] == 'Male']
``````
• Option 5
``````condition = df['Gender'] != 'Female'
df = df[condition]

``````

Q3. Which of these can be used to set the value of the first cell in the `Age` column to 2323 if `Age` is the first column in the dataset? Select all answers you think are correct.

• Option 1
``````df[0,'Age'] = 23
``````
• Option 2
``````df.loc[0,'Age'] = 23
``````
• Option 3
``````df.iloc[0,'Age'] = 23
``````
• Option 4
``df.iloc[0,0] = 23``

Q4. Which of the following are aggregation functions, i.e., functions that take in a series and return a single value? Select all answers you think are correct.

• `min`
• `mean`
• `sum`
• `groupby`

Q5.The `apply` function is used to apply custom functions to the data.

• True
• False

Q6. We can NOT group data for more than one variable.

• True
• False

Q7. Both `groupby` and `pivot_table` are used for summarizing data.

• True
• False

Q8.

``df.plot(kind = 'box',subplots = True, sharex=False, sharey = False)``

In the above use of the `plot` function, `subplots=True` tells the function to arrange all boxplots in rows and columns inside a group of plots.

• True
• False

#### Exercise 3: Cleaning NYC Property Sales

##### Change values:

``````def change_values(df):

condition = df['BOROUGH'] == 1
df.loc[condition,'BOROUGH'] = 'Manhattan'

condition = df['BOROUGH'] == 2
df.loc[condition,'BOROUGH'] = 'Bronx'

condition = df['BOROUGH'] == 3
df.loc[condition,'BOROUGH'] = 'Brooklyn'

condition = df['BOROUGH'] == 4
df.loc[condition,'BOROUGH'] = 'Queens'

condition = df['BOROUGH'] == 5
df.loc[condition,'BOROUGH'] = 'Staten Island'

return df

``````
##### Missing values:

``````def remove_missing(df):
present = df['SALE PRICE'].notnull()
df = df[present]
return df

``````
##### Duplicate values:

``````def remove_duplicates(df):
df = df.drop_duplicates(subset=df.columns)
return df

``````
##### Outliers:

``````def remove_outliers(df):
# Retrieve only outlier columns
new_df = df[['RESIDENTIAL UNITS', 'COMMERCIAL UNITS','TOTAL UNITS', 'LAND SQUARE FEET','GROSS SQUARE FEET','YEAR BUILT']]

# find max and min using IQR
Q1 = new_df.quantile(0.10)
Q3 = new_df.quantile(0.90)
IQR = Q3-Q1
minimum = Q1 - 1.5*IQR
maximum = Q3 + 1.5*IQR

# condition on which to filter
condition = (new_df <= maximum) & (new_df >= minimum)
condition = condition.all(axis=1)

# Filter rows that have outliers
df = df[condition]

return df
``````

#### Quiz 2: Analyzing Individual Quantities

Q1. What is the mean of `LIMIT_BAL`?

• 176488
• 160000
• 167488
• 170000

Q2. How many times do `LIMIT_BAL` values appear in the interval (100000.0, 200000.0] ?

• 7882
• 5061
• 2054

Q3. What is the 75% percentile of `LIMIT_BAL`?

• 50000
• 140000
• 240000

Q4. What is the skew value of `LIMIT_BAL`?

• 0.50
• 1.99
• 2.53
• 0.99

#### Quiz 3: Exploring Categorical Quantities

Q1. How many married persons have defaulted in our dataset?

• 10455
• 5209
• 3206
• 3342

Q2. How many single persons have NOT defualted in our dataset?

• 12628
• 5206
• 3342
• 10455

Q3. What is the probability of a married person defaulting next month?

• 0.24
• 0.23
• 0.20
• 0.21

Q4. A single person is more likely to default the next month than a married person in our dataset.

• True
• False

#### Quiz 4: Exploring Numerical Quantities

Q1. How many people lie in the interval (0, 100000] of `LIMIT_BAL` who have defaulted?

• 3684
• 8817
• 3454

Q2. What is the probability of people defaulting who get `LIMIT_BAL` in the interval (100000, 200000] ?

• 0.24
• 0.13
• 0.19
• 0.34

Q3. As the `LIMIT_BAL` given to a person increases, the probability of the person defaulting decreases.

• True
• False

#### Exercise 4: Exploring E-Commerce

``````def exercise_1(df):
temp = df.groupby('CustomerID').size()
temp = temp.sort_values(ascending=False)
temp = temp.iloc[:5]
return temp

def exercise_2(df):
temp = df.groupby('CustomerID').sum()
temp = temp['AmountSpent']
temp = temp.sort_values(ascending=False)
temp = temp.iloc[:5]
return temp

def exercise_3(df):
temp = df.groupby('Country').size()
temp = temp.sort_values(ascending=False)
temp = temp.iloc[:5]
return temp

def exercise_4(df):
condition = df['PurchaseYear'] == 2011
temp = df[condition]
temp = temp.groupby('PurchaseMonth').size()
return temp

def exercise_5(df):
temp = df.groupby('Description').sum()
temp = temp['Quantity']
temp = temp.sort_values(ascending=False)
temp = temp.iloc[:10]
return temp

``````

#### Exercise 5: Churn Prediction

``````def churn_predict_acc(X,Y,test_inputs,test_outputs):
# Write code here

lr = LogisticRegression()
lr.fit(X,Y)

preds = lr.predict(test_inputs)
acc = accuracy_score(y_true = test_outputs,y_pred = preds)
return acc

``````

#### Quiz 5: Machine Learning in Python

Q1. Artificial Intelligence is a sub domain of Machine Learning.

• True
• False

Q2. Decision Trees capture non linear relationships between variables.

• True
• False

Q3. Linear Regression models can NOT capture non linear relationships.

• True
• False

Q4. Out of the following algorithms:

1. Decision Trees
2. Support Vector Machines

Which performs better?

• Decision Trees
• Support Vector Machines
• Depends on the problem and the dataset

Q5. Random Forest is a boosting algorithm.

• True
• False

Q6. In bagging, individual models train on data that is sampled _____.

• without replacement
• with replacement

Q7. Which of the following algorithms can be used for unsupervised learning? Check all answer that you think are correct.

• SVMs
• KMeans
• Mean Shift
• Random Forests

Q8. PCA is used for

• clustering
• dimensionality reduction
• none of these

Q9.

``km = KMeans(n_clusters = 2)km.fit(data)result = km.predict(data)``

In the above code, what is being stored in `result`?

• The cluster centers
• The cluster numbers to which each observation in `data` belongs to
• None of these

Q10. Clustering can NOT be used to segment customer groups.

• True
• False
##### Conclusion:

I hope this Data Science for Non-Programmers Educative Quiz Answers would be useful for you to learn something new from this problem. If it helped you then don’t forget to bookmark our site for more Coding Solutions.

This Problem is intended for audiences of all experiences who are interested in learning about Data Science in a business context; there are no prerequisites.

Keep Learning!

More Coding Solutions >>

LeetCode Solutions

Hacker Rank Solutions

CodeChef Solutions