Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Ready to move past Excel for complex business analysis? Then you’ll find this course very helpful.
This hands-on introductory Data Science course is aimed at professionals and students who don’t have any experience with programming. It will help you advance your career by preparing you to conduct meaningful data analysis in Python on any dataset — large or small.
You’ll begin with the fundamentals of Python, with focus on CSV files in Python, covering concepts like data preprocessing and Exploratory Data Analysis (EDA). In the second half, you’ll focus on predictive and inferential analysis using statistical and machine learning techniques, and learn how these techniques can help solve business problems.
Answer:
def average(input_list):
sum_list = 0
for i in input_list:
sum_list = sum_list + i
avg = sum_list/len(input_list)
return avg
Answer:
def factorial(n):
if n==0 or n==1:
return 1
if n < 1:
return -1
product = 1
while(n > 1):
product = product * n
n = n-1
return product
Q1. A Dataframe is a 2-Dimensional object to store tabular data.
Q2. Suppose we have a Gender
column in our dataframe (df
) which has the values Male
and Female
. Which of these will give us a filtered dataframe of males. Select all answers you think are correct.
df = df['Male']
df = df[df['Male']]
condition = df['Gender'] == 'Male'
df = df[condition]
df = df[ df['Gender'] == 'Male']
condition = df['Gender'] != 'Female'
df = df[condition]
Q3. Which of these can be used to set the value of the first cell in the Age
column to 2323 if Age
is the first column in the dataset? Select all answers you think are correct.
df[0,'Age'] = 23
df.loc[0,'Age'] = 23
df.iloc[0,'Age'] = 23
df.iloc[0,0] = 23
Q4. Which of the following are aggregation functions, i.e., functions that take in a series and return a single value? Select all answers you think are correct.
min
mean
sum
groupby
Q5.The apply
function is used to apply custom functions to the data.
Q6. We can NOT group data for more than one variable.
Q7. Both groupby
and pivot_table
are used for summarizing data.
Q8.
df.plot(kind = 'box',subplots = True, sharex=False, sharey = False)
In the above use of the plot
function, subplots=True
tells the function to arrange all boxplots in rows and columns inside a group of plots.
Answer:
def change_values(df):
condition = df['BOROUGH'] == 1
df.loc[condition,'BOROUGH'] = 'Manhattan'
condition = df['BOROUGH'] == 2
df.loc[condition,'BOROUGH'] = 'Bronx'
condition = df['BOROUGH'] == 3
df.loc[condition,'BOROUGH'] = 'Brooklyn'
condition = df['BOROUGH'] == 4
df.loc[condition,'BOROUGH'] = 'Queens'
condition = df['BOROUGH'] == 5
df.loc[condition,'BOROUGH'] = 'Staten Island'
return df
Answer:
def remove_missing(df):
present = df['SALE PRICE'].notnull()
df = df[present]
return df
Answer:
def remove_duplicates(df):
df = df.drop_duplicates(subset=df.columns)
return df
Answer:
def remove_outliers(df):
# Retrieve only outlier columns
new_df = df[['RESIDENTIAL UNITS', 'COMMERCIAL UNITS','TOTAL UNITS', 'LAND SQUARE FEET','GROSS SQUARE FEET','YEAR BUILT']]
# find max and min using IQR
Q1 = new_df.quantile(0.10)
Q3 = new_df.quantile(0.90)
IQR = Q3-Q1
minimum = Q1 - 1.5*IQR
maximum = Q3 + 1.5*IQR
# condition on which to filter
condition = (new_df <= maximum) & (new_df >= minimum)
condition = condition.all(axis=1)
# Filter rows that have outliers
df = df[condition]
return df
Q1. What is the mean of LIMIT_BAL
?
Q2. How many times do LIMIT_BAL
values appear in the interval (100000.0, 200000.0] ?
Q3. What is the 75% percentile of LIMIT_BAL
?
Q4. What is the skew value of LIMIT_BAL
?
Q1. How many married persons have defaulted in our dataset?
Q2. How many single persons have NOT defualted in our dataset?
Q3. What is the probability of a married person defaulting next month?
Q4. A single person is more likely to default the next month than a married person in our dataset.
Q1. How many people lie in the interval (0, 100000] of LIMIT_BAL
who have defaulted?
Q2. What is the probability of people defaulting who get LIMIT_BAL
in the interval (100000, 200000] ?
Q3. As the LIMIT_BAL
given to a person increases, the probability of the person defaulting decreases.
Answer:
def exercise_1(df):
temp = df.groupby('CustomerID').size()
temp = temp.sort_values(ascending=False)
temp = temp.iloc[:5]
return temp
def exercise_2(df):
temp = df.groupby('CustomerID').sum()
temp = temp['AmountSpent']
temp = temp.sort_values(ascending=False)
temp = temp.iloc[:5]
return temp
def exercise_3(df):
temp = df.groupby('Country').size()
temp = temp.sort_values(ascending=False)
temp = temp.iloc[:5]
return temp
def exercise_4(df):
condition = df['PurchaseYear'] == 2011
temp = df[condition]
temp = temp.groupby('PurchaseMonth').size()
return temp
def exercise_5(df):
temp = df.groupby('Description').sum()
temp = temp['Quantity']
temp = temp.sort_values(ascending=False)
temp = temp.iloc[:10]
return temp
Answer:
def churn_predict_acc(X,Y,test_inputs,test_outputs):
# Write code here
lr = LogisticRegression()
lr.fit(X,Y)
preds = lr.predict(test_inputs)
acc = accuracy_score(y_true = test_outputs,y_pred = preds)
return acc
Q1. Artificial Intelligence is a sub domain of Machine Learning.
Q2. Decision Trees capture non linear relationships between variables.
Q3. Linear Regression models can NOT capture non linear relationships.
Q4. Out of the following algorithms:
Which performs better?
Q5. Random Forest is a boosting algorithm.
Q6. In bagging, individual models train on data that is sampled _____.
Q7. Which of the following algorithms can be used for unsupervised learning? Check all answer that you think are correct.
Q8. PCA is used for
Q9.
km = KMeans(n_clusters = 2)
km.fit(data)
result = km.predict(data)
In the above code, what is being stored in result
?
data
belongs toQ10. Clustering can NOT be used to segment customer groups.
I hope this Data Science for Non-Programmers Educative Quiz Answers would be useful for you to learn something new from this problem. If it helped you then don’t forget to bookmark our site for more Coding Solutions.
This Problem is intended for audiences of all experiences who are interested in learning about Data Science in a business context; there are no prerequisites.
Keep Learning!
More Coding Solutions >>