This site is out of date.

To see our Spring 2021 course site, click here!

Concept Check Solutions

  • Lecture 2 - Linear Regression (January 29, 2020)
    • Question A: Graph 2 is Line 2, Graph 3 is Line 1, Graph 4 is Line 2
    • Question B: Graph 2 is large residual, Graph 3 is extreme x value, Graph 4 is pattern in the residuals
  • Lecture 3 - Probabilistic Regression (February 3, 2020)
    • Question A: Laplace is $|(\epsilon)|$, Gaussian is $\epsilon^2$, Student T is $\log(1+\epsilon^2)$
    • Question B: Laplace is Line 3, Guassian is Line 1, Student T is Line 2
  • Lecture 4 - Linear Classifiation (February 5, 2020)
    • Question A: Minimum TPR is 0
    • Question B: Maximum TPR is 6/7
  • Lecture 5 - Probabilistic Classifiation (February 10, 2020)
    • Question A: Line 2 (if generative), Impossible to gauge without more information (if assumptions unknown)
    • Question B: Yes, the unlabeled data helps us here (if generative), although might not always help (if discriminative, for example)
  • Lecture 6 - Model Selection - Frequentist (February 12, 2020)
    • Question A: Yes (for A), No probably not (for B)
    • Question B: Yes probably (for A), Unclear (for B)
    • Question C: No (for A), Yes (for B)
  • Lecture 7 - Model Selection - Bayesian (February 19, 2020)
    • Question A: Yes
    • Question B: $a_0 = 0$ and $a_2 = 1 - a_1$
    • Question C: 0 to 0.5 uniformly
    • Question D: 0.5 with probability 1
  • Lecture 8 - Neural Networks 1 (February 24, 2020)
    • Question A: Yes (even if one filter)
    • Question B: Yes (if assuming multiple filters), No (if assuming only one filter)
    • Question C: Yes (even if one filter)
  • Lecture 9 - Neural Networks 2 (February 26, 2020)
    • Question A: No
    • Question B-i: # Paramaters << # Data - No, won't fit the noise
    • Question B-ii: # Paramaters == # Data - Yes, expect roughly 1 perfect model
    • Question B-iii: # Paramaters >> # Data - Yes, expect many perfect models
    • Question C: SGD implicitly performs regularization
  • Lecture 10 - Support Vector Machine 1 (March 2, 2020)
    • NA - Just a discussion this class
  • Lecture 11 - Support Vector Machine 2 (March 2, 2020)
    • Question A: Overfit
    • Question B: Less of a problem, likely will fit decently well with a smoother boundary
    • Question C: Many support points (suggests a wiggly boundary), small margin, cross validation
  • Lecture 13 - Clustering (March 23, 2020)
    • Question A: First, 3-4 merge and 6-7 merge. Then, 3-4-6-7 merge. Then, 0-3-4-6-7 merge.
    • Question B: No
    • Question C: Less
  • Lecture 14 - Mixture Models (March 25, 2020)
    • Question A: No
    • Question B: Image 3
    • Question C: Image 1
  • Lecture 15 - Principal Component Analysis (March 30, 2020)
    • Question A: x1, x2, x3 (in order from most variance explained to least)
    • Question B: No
    • Question C: No
  • Lecture 16 - Topic Models (April 1, 2020)
    • Question A1: [1/3, 1/3, 1/3], [1/3, 1/3, 1/3], [1/3, 1/3, 1/3]
    • Reasoning: First, to recap, a draw from a K-dimesional Dirichlet distribution is a probability vector of length K. If we drew from a Dirichlet and three times we saw [1/3, 1/3, 1/3], that would mean that most of the distribution falls exactly in the middle if you were to visualize this in 3D. This corresponds to the three equally higher numbers in our alpha (think of the beta, which is the Dirichlet with K=2, when both parameters of the beta distribtuion are high and the same, we have the mass in the middle). So, our answer is [1/3, 1/3, 1/3], [1/3, 1/3, 1/3], [1/3, 1/3, 1/3], as it corresponds to the $\alpha_1$ of [10, 10, 10].
    • Question A2: [0, 1, 0], [1, 0, 0], [0, 0, 1]
    • Reasoning: Since we've drawn three Dirichlet's with all the probability mass on one dimension, this means that distribution is very concentrated in the three corners. This corresponds to three equally lower numberes in our alpha (think of the beta, which is the Dirichlet with K=2, when both parameters of the beta distribtuion are low and the same, we have the mass at the edges). So, our answer is [0, 1, 0], [1, 0, 0], [0, 0, 1], as it corresponds to the $\alpha_2$ of [0.01, 0.01, 0.01].
    • Question B1: [13, 11, 10]
    • Reasoning: To do this question, since we've seen three data points in the 0th dimension and 1 data point in the 1st dimension, we simply add 3 and 1 to our prior (in the corresponding indices). The prior here was [10, 10, 10], so we have [10+3, 10+1, 10+0]. This is analgous to the Beta story from lecture al ong time ago with the coin flips, where as we saw coin filps, we updated our prior parameters by adding the number of heads to one and the number of tails to the other.
    • Question B2: [3.01, 1.01, 0.01]
    • Reasoning: To do this question, same logic as the previous question! The prior here was [0.01, 0.01, 0.01], so we have [0.01+3, 0.01+1, 0.01+0].
    • Question C: No
  • Lecture 17 - Graphical Models (April 6, 2020)
    • Question A:
      • Model 1 has 54 (see alternative answer below)
      • Model 2 has 5
      • Model 3 has 45
      • Explanation:

        Currently, for Model 1, we calculate 54 in the following way. We know that they are all discrete variables that can take on 3 values each. Remember that we are not asking for A in the parameter count (although you should still mentally have A in mind). This means B, C, and E each have 9 parameters since each has one parent (so we do 3^2). D has 27 since it has two parents (3^3). In total we hae 54. However, we could techincally argue that with discrete variables that only take on 3 values each, we can really encode all 3 bits of info with just two parameters, since you can extract the third one just by doing 1 minus the sum of the oother two parameters. This is also a valid way to think of the problem. This would lead to B, C, E each having 4 parameters (2^2) and D having 8 parameters (2^3), leading to a total of 16.If this ever comes up on a test or homework, we will make clear what the assumptions are.

        For Model 2, we have B, C, and E each having just 1 parameter, since there is one parent to multiply by (we have a linear relationship). D has 2 parameters, since there are two parents. Leads to a sum of 5.

        For Model 3, we have B, C, and E each having 9 parameters since these all need a 3 by 3 matrix (since it's 3 dimensional now). Model D has 27 parameters (requires a 3 by 6 matrix). This leads to a sum of 45.

    • Question B: Only includes linear functions
  • Lecture 18 - Inference in Bayes Nets (April 8, 2020)
    • Question A: Cut link from Z to T
    • Question B: $\sum_z p(y=1|do(t=1), z=z) p(z=z)$
    • Question C: No
  • Lecture 19 - Hidden Markov Models (April 13, 2020)
    • Question A: [0, 1, 0]
    • Question B: [0, 1/2, 1/2]
  • Lecture 20 - Markov Decision Processes (April 15, 2020)
    • Question A: No
    • Question B: No
    • Question C: Yes
    • Question D: No
  • Lecture 21 - Reinforcement Learning (April 20, 2020)
    • Question A: Around the top
    • Question B: Straight to the right
    • Question C: Around the top
    • Question D: Straight to the right