01. What is used to update coefficients in logistic regression?
a) number of features
b) gradient descent
c) slope
d) kernel
02. Which is an accurate statement regarding logistic regression?
a) Logistic regression is a non-linear classifier.
b) Logistic regression can be used for unsupervised learning.
c) Logistic regression can be used for binary classification.
d) The logistic function f(x) = 1/(1 + exp(-(wx + b))) can take values between [0, inf].
03. What are three hyperparameters that are used when building a simple decision tree model?
a) kernel
b) learning rate
c) maximum depth
d) split criterion
e) number of nearest neighbors
f) minimum number of samples in a leaf node
04. A client requests a general artificial intelligence (AI) tool that they can plug into their data warehouse. What is the best response to this request?
a) There is no general AI tool currently that works universally.
b) Apply neural networks to your data.
c) IBM Watson is the tool you are looking for.
d) AI can create value without any human-intervention.
05. To reduce the overall time to complete a data ingestion job, what two actions should be taken?
a) Assemble the data pipeline into a series of immutable transformations, which can be combined after the processing.
b) Partition the data within each pipeline to take advantage of parallel processing (multiple server cores, processors, etc.).
c) Look for outliers in the data, missing values, and skewness of the data.
d) Build a dedicated pipeline for each dataset to ensure that all of them can be processed independently and concurrently.
e) Apply a chi-squared statistical test to rank the impact of each feature on the concept label and discard the less impactful features before model training.
06. What should be the first step to begin the task of collecting initial data?
a) Copy data from several sources to a central repository to review the data
b) Determine if a poll is required to collect data
c) Verify the technical skills that are required to collect data
d) Understand the business requirement to find out what would be the relevant data needed
07. Which two statements are true in the context of evaluating machine learning models?
a) Accuracy of 95% is always a good result.
b) Random guessing can be used as a baseline.
c) The F2-score puts equal weight on precision and recall.
d) F-score is the harmonic mean between precision and recall.
e) Evaluation metrics on training data are more important than on test data.
08. The "aperture problem" in machine vision is best defined as?
a) Identifying a whole object or scene based on seeing only a small part of that object or scene
b) generating "snakes" of active contours based on boundary curves
c) pattern matching based on an undertrained model
d) over-fitting a model based on close-up images
09. What are two common ways to handle missing values when cleaning data?
a) delete records
b) replace with '1'
c) replace with mean
d) replace with '100'
e) replace with standard deviation
10. A client, a tomato grower, provides a dataset of measurements of tomato plants and environmental data.
A data scientist thinks the features probably have a significant amount of redundancy. The data scientist decides to apply dimensionality reduction to the data features.
Which three techniques are examples of dimensionality reduction?
a) k-means clustering
b) batch normalization
c) combinatorial optimization
d) autoencoder neural network
e) principal component analysis (PCA)
f) t-distributed stochastic neighbor embedding (t-SNE)