**Physical Address**

304 North Cardinal St.

Dorchester Center, MA 02124

Ready to move past Excel for complex business analysis? Then you’ll find this course very helpful.

This hands-on introductory Data Science course is aimed at professionals and students who don’t have any experience with programming. It will help you advance your career by preparing you to conduct meaningful data analysis in Python on any dataset — large or small.

You’ll begin with the fundamentals of Python, with focus on CSV files in Python, covering concepts like data preprocessing and Exploratory Data Analysis (EDA). In the second half, you’ll focus on predictive and inferential analysis using statistical and machine learning techniques, and learn how these techniques can help solve business problems.

Answer:

```
def average(input_list):
sum_list = 0
for i in input_list:
sum_list = sum_list + i
avg = sum_list/len(input_list)
return avg
```

Answer:

```
def factorial(n):
if n==0 or n==1:
return 1
if n < 1:
return -1
product = 1
while(n > 1):
product = product * n
n = n-1
return product
```

Q1. A **Dataframe** is a 2-Dimensional object to store tabular data.

**True**- False

Q2. Suppose we have a `Gender`

column in our dataframe (`df`

) which has the values `Male`

and `Female`

. Which of these will give us a filtered dataframe of males. Select all answers you think are correct.

- Option 1

`df = df['Male']`

- Option 2

`df = df[df['Male']]`

**Option 3**

```
condition = df['Gender'] == 'Male'
df = df[condition]
```

**Option 4**

```
df = df[ df['Gender'] == 'Male']
```

**Option 5**

```
condition = df['Gender'] != 'Female'
df = df[condition]
```

Q3. Which of these can be used to set the value of the first cell in the `Age`

column to 2323 if `Age`

is the first column in the dataset? Select all answers you think are correct.

- Option 1

```
df[0,'Age'] = 23
```

**Option 2**

```
df.loc[0,'Age'] = 23
```

- Option 3

```
df.iloc[0,'Age'] = 23
```

**Option 4**

`df.iloc[0,0] = 23`

Q4. Which of the following are **aggregation** functions, i.e., functions that take in a series and return a single value? Select all answers you think are correct.

**min****mean****sum**`groupby`

Q5.The `apply`

function is used to apply custom functions to the data.

**True**- False

Q6. We can NOT group data for more than one variable.

- True
**False**

Q7. Both `groupby`

and `pivot_table`

are used for summarizing data.

**True**- False

Q8.

`df.plot(kind = 'box',subplots = True, sharex=False, sharey = False)`

In the above use of the `plot`

function, `subplots=True`

tells the function to arrange all boxplots in rows and columns inside a group of plots.

**True**- False

Answer:

```
def change_values(df):
condition = df['BOROUGH'] == 1
df.loc[condition,'BOROUGH'] = 'Manhattan'
condition = df['BOROUGH'] == 2
df.loc[condition,'BOROUGH'] = 'Bronx'
condition = df['BOROUGH'] == 3
df.loc[condition,'BOROUGH'] = 'Brooklyn'
condition = df['BOROUGH'] == 4
df.loc[condition,'BOROUGH'] = 'Queens'
condition = df['BOROUGH'] == 5
df.loc[condition,'BOROUGH'] = 'Staten Island'
return df
```

Answer:

```
def remove_missing(df):
present = df['SALE PRICE'].notnull()
df = df[present]
return df
```

Answer:

```
def remove_duplicates(df):
df = df.drop_duplicates(subset=df.columns)
return df
```

Answer:

```
def remove_outliers(df):
# Retrieve only outlier columns
new_df = df[['RESIDENTIAL UNITS', 'COMMERCIAL UNITS','TOTAL UNITS', 'LAND SQUARE FEET','GROSS SQUARE FEET','YEAR BUILT']]
# find max and min using IQR
Q1 = new_df.quantile(0.10)
Q3 = new_df.quantile(0.90)
IQR = Q3-Q1
minimum = Q1 - 1.5*IQR
maximum = Q3 + 1.5*IQR
# condition on which to filter
condition = (new_df <= maximum) & (new_df >= minimum)
condition = condition.all(axis=1)
# Filter rows that have outliers
df = df[condition]
return df
```

Q1. What is the mean of `LIMIT_BAL`

?

- 176488
- 160000
**167488**- 170000

Q2. How many times do `LIMIT_BAL`

values appear in the interval (100000.0, 200000.0] ?

**7882**- 5061
- 2054

Q3. What is the 75% percentile of `LIMIT_BAL`

?

- 50000
- 140000
**240000**

Q4. What is the skew value of `LIMIT_BAL`

?

- 0.50
- 1.99
- 2.53
**0.99**

Q1. How many married persons have defaulted in our dataset?

- 10455
- 5209
**3206**- 3342

Q2. How many single persons have NOT defualted in our dataset?

**12628**- 5206
- 3342
- 10455

Q3. What is the probability of a married person defaulting next month?

- 0.24
**0.23**- 0.20
- 0.21

Q4. A single person is more likely to default the next month than a married person in our dataset.

- True
**False**

Q1. How many people lie in the interval (0, 100000] of `LIMIT_BAL`

who have defaulted?

**3684**- 8817
- 3454

Q2. What is the probability of people defaulting who get `LIMIT_BAL`

in the interval (100000, 200000] ?

- 0.24
- 0.13
**0.19**- 0.34

Q3. As the `LIMIT_BAL`

given to a person increases, the probability of the person defaulting decreases.

**True**- False

Answer:

```
def exercise_1(df):
temp = df.groupby('CustomerID').size()
temp = temp.sort_values(ascending=False)
temp = temp.iloc[:5]
return temp
def exercise_2(df):
temp = df.groupby('CustomerID').sum()
temp = temp['AmountSpent']
temp = temp.sort_values(ascending=False)
temp = temp.iloc[:5]
return temp
def exercise_3(df):
temp = df.groupby('Country').size()
temp = temp.sort_values(ascending=False)
temp = temp.iloc[:5]
return temp
def exercise_4(df):
condition = df['PurchaseYear'] == 2011
temp = df[condition]
temp = temp.groupby('PurchaseMonth').size()
return temp
def exercise_5(df):
temp = df.groupby('Description').sum()
temp = temp['Quantity']
temp = temp.sort_values(ascending=False)
temp = temp.iloc[:10]
return temp
```

Answer:

```
def churn_predict_acc(X,Y,test_inputs,test_outputs):
# Write code here
lr = LogisticRegression()
lr.fit(X,Y)
preds = lr.predict(test_inputs)
acc = accuracy_score(y_true = test_outputs,y_pred = preds)
return acc
```

Q1. Artificial Intelligence is a sub domain of Machine Learning.

- True
**False**

Q2. Decision Trees capture non linear relationships between variables.

**True**- False

Q3. Linear Regression models can NOT capture non linear relationships.

**True**- False

Q4. Out of the following algorithms:

- Decision Trees
- Support Vector Machines

Which performs better?

- Decision Trees
- Support Vector Machines
**Depends on the problem and the dataset**

Q5. Random Forest is a boosting algorithm.

- True
**False**

Q6. In bagging, individual models train on data that is sampled _____.

- without replacement
**with replacement**

Q7. Which of the following algorithms can be used for unsupervised learning? Check all answer that you think are correct.

- SVMs
**KMeans****Mean Shift**- Random Forests
- AdaBoost

Q8. PCA is used for

- clustering
**dimensionality reduction**- none of these

Q9.

`km = KMeans(n_clusters = 2)`

km.fit(data)

result = km.predict(data)

In the above code, what is being stored in `result`

?

- The cluster centers
**The cluster numbers to which each observation in**`data`

belongs to- None of these

Q10. Clustering can NOT be used to segment customer groups.

- True
**False**

I hope this Data Science for Non-Programmers Educative Quiz Answers would be useful for you to learn something new from this problem. If it helped you then don’t forget to bookmark our site for more Coding Solutions.

This Problem is intended for audiences of all experiences who are interested in learning about Data Science in a business context; there are no prerequisites.

Keep Learning!

**More Coding Solutions >>**