Reasoning: First, to recap, a draw from a K-dimesional Dirichlet distribution is a probability vector of length K. If we drew from a Dirichlet and three times we saw [1/3, 1/3, 1/3], that would mean that most of the distribution falls exactly in the middle if you were to visualize this in 3D. This corresponds to the three equally higher numbers in our alpha (think of the beta, which is the Dirichlet with K=2, when both parameters of the beta distribtuion are high and the same, we have the mass in the middle). So, our answer is [1/3, 1/3, 1/3], [1/3, 1/3, 1/3], [1/3, 1/3, 1/3], as it corresponds to the $\alpha_1$ of [10, 10, 10].
Question A2: [0, 1, 0], [1, 0, 0], [0, 0, 1]
Reasoning: Since we've drawn three Dirichlet's with all the probability mass on one dimension, this means that distribution is very concentrated in the three corners. This corresponds to three equally lower numberes in our alpha (think of the beta, which is the Dirichlet with K=2, when both parameters of the beta distribtuion are low and the same, we have the mass at the edges). So, our answer is [0, 1, 0], [1, 0, 0], [0, 0, 1], as it corresponds to the $\alpha_2$ of [0.01, 0.01, 0.01].
Question B1: [13, 11, 10]
Reasoning: To do this question, since we've seen three data points in the 0th dimension and 1 data point in the 1st dimension, we simply add 3 and 1 to our prior (in the corresponding indices). The prior here was [10, 10, 10], so we have [10+3, 10+1, 10+0]. This is analgous to the Beta story from lecture al ong time ago with the coin flips, where as we saw coin filps, we updated our prior parameters by adding the number of heads to one and the number of tails to the other.
Question B2: [3.01, 1.01, 0.01]
Reasoning: To do this question, same logic as the previous question! The prior here was [0.01, 0.01, 0.01], so we have [0.01+3, 0.01+1, 0.01+0].
Question C: No
Lecture 17 - Graphical Models (April 6, 2020)
Question A:
Model 1 has 54 (see alternative answer below)
Model 2 has 5
Model 3 has 45
Explanation:
Currently, for Model 1, we calculate 54 in the following way. We know that they are all discrete variables that can take on 3 values each. Remember that we are not asking for A in the parameter count (although you should still mentally have A in mind). This means B, C, and E each have 9 parameters since each has one parent (so we do 3^2). D has 27 since it has two parents (3^3). In total we hae 54. However, we could techincally argue that with discrete variables that only take on 3 values each, we can really encode all 3 bits of info with just two parameters, since you can extract the third one just by doing 1 minus the sum of the oother two parameters. This is also a valid way to think of the problem. This would lead to B, C, E each having 4 parameters (2^2) and D having 8 parameters (2^3), leading to a total of 16.If this ever comes up on a test or homework, we will make clear what the assumptions are.
For Model 2, we have B, C, and E each having just 1 parameter, since there is one parent to multiply by (we have a linear relationship). D has 2 parameters, since there are two parents. Leads to a sum of 5.
For Model 3, we have B, C, and E each having 9 parameters since these all need a 3 by 3 matrix (since it's 3 dimensional now). Model D has 27 parameters (requires a 3 by 6 matrix). This leads to a sum of 45.
Question B: Only includes linear functions
Lecture 18 - Inference in Bayes Nets (April 8, 2020)