Predictive Data Analysis with Python Educative Quiz Answers

Get Predictive Data Analysis with Python Educative Quiz Answers

In this course, you will learn how to perform predictive data analysis using Python. The ideal audience is those who want to start their careers as data analysts. The main goal of this course is to show you how to use statistics to draw useful insights from data which can help in predicting future behavior or patterns.

Beyond that, you’ll learn all the tools of the trade that data scientists use everyday including: NumPy, Pandas, Matplotlib, and Seaborn. You’ll learn not only how to extract meaningful insights from data, but you’ll also learn how to create stunning visualizations that you can use for reports.

Various datasets of real-world scenarios are used in each lesson to get you accustomed to handling any type of data. At the end of the course, you will work on two real-world projects that demonstrate how data analysis techniques are being used in the financial and advertisement sector to generate revenue.

Enroll on Educative

Challenge 1: Find Min and Max from a 2-D NumPy Array

Answer:

def getMinMax(arr):
  
    res = []
    
    for i in range(arr.shape[0]):
      
      res.append(arr[i].min())
      
      res.append(arr[i].max())
      
    return res

Quiz 1:

Q1. Which command slicea all the elements of an array of size 6? Multiple options can be correct.

  • [:]
  • slice()
  • [0:6]
  • All of the above

Q2. NumPy array size can be modified like a list.

  • True
  • False
  • Depends on certain condition

Q3.

import numpy as np
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])

The command to slice the array to get [[5,6] [8,9]] would be?

  • arr[1:2:,:]
  • arr[1:3,0:2]
  • arr[1:3,1:3]
  • None of the above

Q4.

import numpy as np

arr = np.arange(0,10,1)
new_arr = arr[1:7]
new_arr[:] = 100

What are the values in arr?

  • Error will generate
  • [0 1 2 3 4 5 6 7 8 9]
  • [0 100 100 100 100 100 100 7 8 9]
  • None of the above

Challenge 2: Summing and Swapping

Answer:

def Sum_Swap(df):
    
    minm_c = np.min(df, axis = 1)
    
    maxm_c = np.max(df, axis = 1)
    
    df['row_sum'] = minm_c + maxm_c

    minm_r = np.min(df, axis = 0)
    
    maxm_r = np.max(df, axis = 0)
    
    df.loc['col_sum'] = minm_r + maxm_r

    a, b = df['row_sum'].copy(), df.loc['col_sum'].copy()
    
    df['row_sum'], df.loc['col_sum'] = b, a
    
    return df

Quiz 2:

Q1.

import pandas as pd

srs1 = pd.Series([1, 2, 3], index = ['A', 'B', 'C'])
srs2 = pd.Series([4, 5, 6], index = ['C', 'D', 'B'])
print(srs1 + srs2)

What is the output of the above code snippet?

  • A 1
    • B 8.0
    • C 7.0
    • D 5
  • A 7.0
    • B 7.0
    • C 7.0
    • D 7.0
  • A NaN
    • B 8.0
    • C 7.0
    • D NaN
  • None of the above

Q2.

import numpy as np
import pandas as pd

arr2d = np.arange(16).reshape(4,4)
df = pd.DataFrame(arr2d, index=['R2', 'R3', 'R4'], columns=['C1','C2','C3','C4'])
df = df.reindex(['R1', 'R2', 'R3', 'R4']).ffill(axis = 0)
print(df)

What are the values of the new Row “R1”?

  • R1 0.0 1.0 2.0 3.0
  • R1 NaN NaN NaN NaN
  • Generate Error
  • None of the above

Q3. The loc method of pandas is used for what purpose?

  • It makes the rows and columns immutable.
  • It is used to locate elements.
  • It is not a function of pandas.
  • It is used to fill NaN values

Q4. The axis = 1 affects what part of DataFrame if used with dropna() function?

  • Row
  • Column
  • Indexes
  • None of the above

Challenge 3: Clean the Data

Answer:

def clean_data(df):

    df = df.dropna()

    out_list = ['median_house_value', 'median_income', 'housing_median_age']

    quantiles_df = (df.quantile([0.25,0.75]))

    for out in out_list:

        Q1 = quantiles_df[out][0.25]
        Q3 = quantiles_df[out][0.75]

        iqr = Q3 - Q1

        lower_bound = (Q1 - (iqr * 1.5))
        upper_bound = (Q3 + (iqr * 1.5))

        col = df[out]

        col[(col < lower_bound)] = lower_bound

        col[(col > upper_bound)] = upper_bound

    return df

Quiz 3:

Q1. Which parameter is used in the merge() function to assign the type of the join?

  • where
  • move
  • how
  • here

Q2. The operation where the innermost column index is added as the innermost row index is known as ?

  • Pivoting
  • Stacking
  • Unstacking
  • Multilevel indexing

Q3. How can outliers be identified in a data set?

  • Plotting the data
  • Using the IQR method
  • Both A & B
  • Outliers don’t exist

Q4.

import numpy as np
import pandas as pd

df = pd.DataFrame({'L1':['A','A','B','B','C','C'],'L2':['C1','C2','C1','C2','C1','C2'],
    'val_1':np.arange(1,7,1),'val_2':np.arange(7,13,1)})

print(df.groupby(['L1','L2']))

What is the output of the above code snippet?

  • Grouped values from dataframe L1 and L2
  • Nothing will be printed
  • Syntax error
  • A Grouped object

Quiz 4:

Q1. The plot used to display distribution of data is called?

  • Heatmaps
  • Scatter plot
  • Histogram
  • Violin plot

Q2. A box plot helps identify what? (Multiple answers can be correct)

  • Trends in data
  • Outliers in data
  • Distribution of data
  • None of the above

Q3. Heatmaps help identify what?

  • Color difference
  • Points with concentration of data
  • Assign numbers to colors
  • None of the above

Q4. A KDE plot informs us of what?

  • The probabilities of the outcomes
  • The height and width of a graph
  • Where majority of data lies
  • None of the above

Quiz 5:

Q1. Which of these requests, the scraper makes to the webpage for access?

  • PUT
  • LISTEN
  • GET
  • BIND

Q2. Which package of selenium is used to instantiate a browser instance?

  • driver
  • url
  • chrome
  • webdriver

Q3.The “–headless” option is used for what?

  • To render the browser invisible
  • To maximize the browser
  • To remove the search bar from browser
  • None of the above

Q4. Which command is used to put delays between page requests?

  • delay()
  • wait()
  • sleep()
  • break()

Predictive Data Analysis Exam

Q1. What does the shape[1] method of a Numpy array return?

  • The row count of the array
  • The column count of the array
  • The dimensions of the array
  • None of the above

Q2. What would be the output of the following code snippet?

import pandas as pd

srs = pd.Series(['d', 'b', 'c', 'a'])

print(srs.rank())
  • 4.0 2.0 3.0 1.0
  • 2.0 4.0 1.0 3.0
  • 4.0 3.0 2.0 1.0
  • 1.0 2.0 3.0 4.0

Q3. The header=None parameter of read_csv function is used for what purpose?

  • To check for the header of csv file while reading
  • To ignore the header of csv file while reading
  • To add the header to the csv file if it is not present
  • None of the above

Q4. Which function of selenium is used to fetch data from the web pages?

  • find_element_by_xpath
  • find_elements_by_xpath
  • find_element_by_id
  • All of the above

Q5. A high standard deviation value of a dataset indicates what?

  • Data points of the dataset are close to their mean value
  • Data points of the dataset are far from their mean value
  • Data points of the dataset are equal to their mean value
  • None of the above

Q6. If mean value of a dataset is equal to the median value then that dataset is?

  • Positively Skewed
  • Negatively Skewed
  • Normally Distributed
  • Uniformly Distributed

Q7. Which of the following factor contributes to an unstable dataset?

  • Skewness
  • Standard Deviation
  • Variance
  • All of the above

Q8. If N lottery tickets are sold to N different people, then each person will have the same probability of winning the prize.

This kind of data distribution is?

  • Normal Distribution
  • Binomial Distribution
  • Uniform Distribution
  • None of the above

Q9. Which of the following commands will fill the NaN values in the given DataFrame with the mean of non-null values?

import numpy as np
import pandas as pd

null_val = np.nan

df = pd.DataFrame([[1, 2, 3],[null_val, 2, null_val],[1, 2, null_val],[null_val, null_val, null_val]])
  • df.fillna(df.mean())
  • df[df.fillna()[df.mean()]]
  • df[df.fillna(df.mean())]
  • All of the above

Q10. How can we obtain the 70^{th}70th and 20^{th}20th percentile of every column in the following DataFrame?

import pandas as pd

df = pd.DataFrame(np.random.rand(900,3))
  • df.quantile([0.20, 0.70])
  • df.quantile([0.70, 0.20, 0.10])
  • df.quantile([0.70, 0.20])
  • None of the above

Q11. How many data points are obtained if we use the quantile method to find outliers in a dataset?

  • 3
  • 4
  • 5
  • 2

Q12. How to check the count of null values in each column of a DataFrame named df?

  • df = df.isnull() df = df.sum() print(df)
  • df = df.sum() df = df.isnull() print(df)
  • Both A & B
  • None of the above

Q13. What would be the output of the following code snippet?

import pandas as pd

df = pd.DataFrame({
'Col1':['L', 'M', 'M', 'N', 'L', 'O', 'P'],
'Col2': [10, 9, 8, 7, 9, 8, 8]})

print(df.drop_duplicates(['Col2']))
  • Col1 Col2 0 L 10 1 M 9 2 M 8 3 O 7 4 P 8
  • Col1 Col2 0 L 10 1 M 9 2 M 8 3 N 7
  • Col1 Col2 0 L 10 1 M 9 3 N 7 5 O 8 6 P 8
  • None of the above

Q14. Which of the following statement sorts the given DataFrame into ascending order?

df = pd.DataFrame([2, 10, 3, 4, 9, 10, 2, 7, 3], 
columns=['A'])
  • df.sort_values(0)
  • df.sort_values(1)
  • df.sort_values(-1)
  • None of the above

Q15. What would be the output of the following code snippet?

import pandas as pd

df = pd.DataFrame({
'Col1':['L', 'M', 'M', 'N', 'L', 'O', 'P'],
'Col2': [10, 9, 8, 7, 9, 8, 8],
'Col3': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})

df = df.replace(['a','c', 'e', 'g'], ['A', 'C', 'E', 'G'])

df = df.rename(columns = {'Col1':'A', 'Col2':'B', 'Col3':'C'})

print(df)
  • Col1 Col2 Col3 0 L 10 A 1 M 9 b 2 M 8 C 3 N 7 d 4 L 9 E 5 O 8 f 6 P 8 G
  • A B C 0 L 10 a 1 M 9 b 2 M 8 c 3 N 7 d 4 L 9 e 5 O 8 f 6 P 8 g
  • A B C 0 L 10 A 1 M 9 b 2 M 8 C 3 N 7 d 4 L 9 E 5 O 8 f 6 P 8 G
  • None of the above

Q16. What would be the output of the following code snippet?

import pandas as pd

df = pd.DataFrame({
'I1':['A','A','B','B'],
'I2':['A','B','A','B'],
'A':np.arange(7,11,1),
'B':np.arange(1,5,1)})

df = df.groupby(['I1','I2']).count()
print(df)
  • A B I1 I2 A A 1 5 B 2 6 B A 3 7 B 4 8
  • A B I1 I2 A A 10 6 B 9 5 B A 8 4 B 7 2
  • A B I1 I2 A A 1 1 B 1 1 B A 1 1 B 1 1
  • None of the above

Q17. What would be the output of the following code snippet?

import pandas as pd

df = pd.DataFrame({
'Col1':['a', 'b', 'a', 'd', 'a'],
'Col2': [1, 2, 3, 4, 5],
'Col3': [6, 7, 8, 9, 10]})

df = df.groupby('Col1').agg({
'Col2': lambda a: a.sum(),
'Col3': lambda b: b.count()})

print(df)
  • Col2 Col3 Col1 d 4 1 b 2 1 a 9 3
  • Col2 Col3 Col1 a 9 3 b 2 1 d 4 1
  • Col2 Col3 Col1 b 2 3 a 9 1 d 4 1
  • None of the above

Q18. Which plot would be suitable if we need to divide data into a specific number of sets?

  • KDE Plot
  • Histogram
  • Scatter Plot
  • Heatmap

Q19. How would you deal with the points that lie outside the range of maximum variance in a box plot?

  • Increase the maximum variance range
  • Consider a different plot
  • Ignore the points
  • Both A & C
  • Both A & B

Q20. Which plot is suitable for predicting the behavior of variables in a dataset?

  • Line Plot
  • Box Plot
  • Scatter Plot
  • Regression Plot

Q21. Which technique is used to smooth the dataset by removing the noise?

  • Correlation
  • Monte Carlo Method
  • Moving Average Method
  • None of the above

Q22. A high negative correlation value indicates what?

  • A strong directly proportional relationship between the two entities.
  • A strong inversely proportional relationship between the two entities.
  • A poor relationship between the two entities.
  • A negative relationship between the two entities.

Q23. The Rank function is used to sort indexes of a Series or a DataFrame.

  • True
  • False

Q24. According to CLT, the mean of a random sample will closely resemble the mean of the whole population as the sample size decreases.

  • True
  • False

Q25. The median value of a dataset exists in the second quartile.

  • True
  • False

Q26. The right merge returns a DataFrame that has all the rows of the DataFrame placed on the left side of the merge() function.

  • True
  • False

Q27. In a KDE plot, the y-axis value can never be greater than 1.

  • True
  • False

Q28. Match the plots in the left column with their relating identifiers on the right side:

  • Box Plot
  • KDE Plot
  • Scatter Plot
  • Regression Plot
  • Prediction
  • Outliers
  • Quartiles
  • Probability

Q29. In this coding question, you are asked to implement the function covid(df) that takes a DataFrame as a parameter and returns a new processed DataFrame. The dataset used for this contains information about the novel coronavirus, the count of confirmed cases, deaths, and recovered are mentioned with dates and their respective countries. The dataset is accessible from here. Download and open it to analyze its contents.

The dataset has the following five columns:

  • Country/Region – The name of the country
  • Date – The date on which the cases are recorded
  • Confirmed – The number of positive cases
  • Deaths – The number of deaths
  • Recovered – The number of recovered patients

The data is already loaded into the DataFrame provided in the function and your task is to return another DataFrame that has the following modifications:

  1. Calculate a new column Still_Infected and add it in the original DataFrame.
  2. Get the maximum Still_Infected case value for each country and then arrange the countries in the increasing order of the Still_Infected cases.

Keep in mind that the returned DataFrame should only contain the countries with their maximum Still_Infected case value.

Q30. In this coding question, you are asked to implement the function games(df) that takes a DataFrame as a parameter and returns a new processed DataFrame. The dataset used for this contains information about different gaming platforms, which games they released in what year, and how much sales were made. The dataset is accessible from here. Download and open it to analyze its contents

The dataset has the following four columns:

  • name – The name of the game
  • platform – The platform which released the game
  • year_of_release – The year in which the game was released
  • Total_sales – The total sales the game made that year in millions.

The data is already loaded into the DataFrame provided in the function. Your task is to return another DataFrame that contains platforms with the highest total sales in descending order. Only those platforms will be considered in this list whose cumulated sales are greater than 100.0

Keep in mind that the returned DataFrame should only contain the platforms with their respective cumulated sales value.

Conclusion:

I hope this Predictive Data Analysis with Python Educative Quiz Answers would be useful for you to learn something new from this problem. If it helped you then don’t forget to bookmark our site for more Coding Solutions.

This Problem is intended for audiences of all experiences who are interested in learning about Data Science in a business context; there are no prerequisites.

Keep Learning!

More Coding Solutions >>

LeetCode Solutions

Hacker Rank Solutions

CodeChef Solutions

Leave a Reply

Your email address will not be published.