Concept Check Solutions

Lecture 2 - Linear Regression (January 29, 2020)
- Question A: Graph 2 is Line 2, Graph 3 is Line 1, Graph 4 is Line 2
- Question B: Graph 2 is large residual, Graph 3 is extreme x value, Graph 4 is pattern in the residuals
Lecture 3 - Probabilistic Regression (February 3, 2020)
- Question A: Laplace is $|(\epsilon)|$, Gaussian is $\epsilon^2$, Student T is $\log(1+\epsilon^2)$
- Question B: Laplace is Line 3, Guassian is Line 1, Student T is Line 2
Lecture 4 - Linear Classifiation (February 5, 2020)
- Question A: Minimum TPR is 0
- Question B: Maximum TPR is 6/7
Lecture 5 - Probabilistic Classifiation (February 10, 2020)
- Question A: Line 2 (if generative), Impossible to gauge without more information (if assumptions unknown)
- Question B: Yes, the unlabeled data helps us here (if generative), although might not always help (if discriminative, for example)
Lecture 6 - Model Selection - Frequentist (February 12, 2020)
- Question A: Yes (for A), No probably not (for B)
- Question B: Yes probably (for A), Unclear (for B)
- Question C: No (for A), Yes (for B)
Lecture 7 - Model Selection - Bayesian (February 19, 2020)
- Question A: Yes
- Question B: $a_0 = 0$ and $a_2 = 1 - a_1$
- Question C: 0 to 0.5 uniformly
- Question D: 0.5 with probability 1
Lecture 8 - Neural Networks 1 (February 24, 2020)
- Question A: Yes (even if one filter)
- Question B: Yes (if assuming multiple filters), No (if assuming only one filter)
- Question C: Yes (even if one filter)
Lecture 9 - Neural Networks 2 (February 26, 2020)
- Question A: No
- Question B-i: # Paramaters << # Data - No, won't fit the noise
- Question B-ii: # Paramaters == # Data - Yes, expect roughly 1 perfect model
- Question B-iii: # Paramaters >> # Data - Yes, expect many perfect models
- Question C: SGD implicitly performs regularization
Lecture 10 - Support Vector Machine 1 (March 2, 2020)
- NA - Just a discussion this class
Lecture 11 - Support Vector Machine 2 (March 2, 2020)
- Question A: Overfit
- Question B: Less of a problem, likely will fit decently well with a smoother boundary
- Question C: Many support points (suggests a wiggly boundary), small margin, cross validation
Lecture 13 - Clustering (March 23, 2020)
- Question A: First, 3-4 merge and 6-7 merge. Then, 3-4-6-7 merge. Then, 0-3-4-6-7 merge.
- Question B: No
- Question C: Less
Lecture 14 - Mixture Models (March 25, 2020)
- Question A: No
- Question B: Image 3
- Question C: Image 1
Lecture 15 - Principal Component Analysis (March 30, 2020)
- Question A: x1, x2, x3 (in order from most variance explained to least)
- Question B: No
- Question C: No
Lecture 16 - Topic Models (April 1, 2020)
- Question A1: [1/3, 1/3, 1/3], [1/3, 1/3, 1/3], [1/3, 1/3, 1/3]
- Question A2: [0, 1, 0], [1, 0, 0], [0, 0, 1]
- Question B1: [13, 11, 10]
- Question B2: [3.01, 1.01, 0.01]
- Question C: No
Lecture 17 - Graphical Models (April 6, 2020)
- Question A:
  - Model 1 has 54 (see alternative answer below)
  - Model 2 has 5
  - Model 3 has 45
  - Explanation:
    
    Currently, for Model 1, we calculate 54 in the following way. We know that they are all discrete variables that can take on 3 values each. Remember that we are not asking for A in the parameter count (although you should still mentally have A in mind). This means B, C, and E each have 9 parameters since each has one parent (so we do 3^2). D has 27 since it has two parents (3^3). In total we hae 54. However, we could techincally argue that with discrete variables that only take on 3 values each, we can really encode all 3 bits of info with just two parameters, since you can extract the third one just by doing 1 minus the sum of the oother two parameters. This is also a valid way to think of the problem. This would lead to B, C, E each having 4 parameters (2^2) and D having 8 parameters (2^3), leading to a total of 16.If this ever comes up on a test or homework, we will make clear what the assumptions are.
    
    For Model 2, we have B, C, and E each having just 1 parameter, since there is one parent to multiply by (we have a linear relationship). D has 2 parameters, since there are two parents. Leads to a sum of 5.
    
    For Model 3, we have B, C, and E each having 9 parameters since these all need a 3 by 3 matrix (since it's 3 dimensional now). Model D has 27 parameters (requires a 3 by 6 matrix). This leads to a sum of 45.
- Question B: Only includes linear functions
Lecture 18 - Inference in Bayes Nets (April 8, 2020)
- Question A: Cut link from Z to T
- Question B: $\sum_z p(y=1|do(t=1), z=z) p(z=z)$
- Question C: No
Lecture 19 - Hidden Markov Models (April 13, 2020)
- Question A: [0, 1, 0]
- Question B: [0, 1/2, 1/2]
Lecture 20 - Markov Decision Processes (April 15, 2020)
- Question A: No
- Question B: No
- Question C: Yes
- Question D: No
Lecture 21 - Reinforcement Learning (April 20, 2020)
- Question A: Around the top
- Question B: Straight to the right
- Question C: Around the top
- Question D: Straight to the right

This site is out of date.

CS 181

Concept Check Solutions