{
"cells": [
{
"cell_type": "markdown",
"id": "27c5d444",
"metadata": {},
"source": [
"# Latent Variable Models and Variational Bayes\n",
"\n",
"\n",
"- **[1]** (##) For a Gaussian mixture model, given by generative equations\n",
"\n",
"$$\n",
"p(x,z) = \\prod_{k=1}^K (\\underbrace{\\pi_k \\cdot \\mathcal{N}\\left( x | \\mu_k, \\Sigma_k\\right) }_{p(x,z_{k}=1)})^{z_{k}} \n",
"$$\n",
"\n",
"proof that the marginal distribution for observations $x_n$ evaluates to \n",
"\n",
"$$\n",
"p(x) = \\sum_{j=1}^K \\pi_k \\cdot \\mathcal{N}\\left( x | \\mu_j, \\Sigma_j \\right) \n",
"$$\n",
"\n",
"\n",
"- **[2]** (#) Given the free energy functional $F[q] = \\sum_z q(z) \\log \\frac{q(z)}{p(x,z)}$, proof the [EE, DE and AC decompositions](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Latent-Variable-Models-and-VB.ipynb#fe-decompositions). \n",
" \n",
"\n",
"- **[3]** (#) The Free energy functional $\\mathrm{F}[q] = -\\sum_z q(z) \\log p(x,z) - \\sum_z q(z) \\log \\frac{1}{q(z)}$ decomposes into \"Energy minus Entropy\". So apparently the entropy of the posterior $q(z)$ is maximized. This entropy maximization may seem puzzling at first because inference should intuitively lead to *more* informed posteriors, i.e., posterior distributions whose entropy is smaller than the entropy of the prior. Explain why entropy maximization is still a reasonable objective. \n",
" \n",
"\n",
"- **[4]** (#) Explain the following update rule for the mean of the Gaussian cluster-conditional data distribution (from the example about mean-field updating of a Gaussian mixture model):\n",
"\n",
"$$\n",
"m_k = \\frac{1}{\\beta_k} \\left( \\beta_0 m_0 + N_k \\bar{x}_k \\right) \\tag{B-10.61} \n",
"$$\n",
" \n",
"- **[5]** (##) Consider a model $p(x,z|\\theta)$, where $D=\\{x_1,x_2,\\ldots,x_N\\}$ is observed, $z$ are unobserved variables and $\\theta$ are parameters. The EM algorithm estimates the parameters by iterating over the following two equations ($i$ is the iteration index):\n",
"\n",
"$$\\begin{align*}\n",
"q^{(i)}(z) &= p(z|D,\\theta^{(i-1)}) \\\\\n",
"\\theta^{(i)} &= \\arg\\max_\\theta \\sum_z q^{(i)}(z) \\cdot \\log p(D,z|\\theta)\n",
"\\end{align*}$$\n",
"\n",
"Proof that this algorithm minimizes the Free Energy functional \n",
"$$\\begin{align*}\n",
"F[q,\\theta] = \\sum_z q(z) \\log \\frac{q(z)}{p(D,z|\\theta)} \n",
"\\end{align*}$$\n",
" \n",
"\n",
"- **[6]** (###) Consult the internet on what *overfitting* and *underfitting* is and then explain how FE minimization finds a balance between these two (unwanted) extremes.\n",
" \n",
"\n",
"- **[7]** (##) Consider a model $p(x,z|\\theta) = p(x|z,\\theta) p(z|\\theta)$ where $x$ and $z$ relate to observed and unobserved variables, respectively. Also available is an observed data set $D=\\left\\{x_1,x_2,\\ldots,x_N\\right\\}$. One iteration of the EM-algorithm for estimating the parameters $\\theta$ is described by ($m$ is the iteration counter)\n",
"$$\n",
"\\hat{\\theta}^{(m+1)} := \\arg \\max_\\theta \\left(\\sum_z p(z|x=D,\\hat{\\theta}^{(m)}) \\log p(x=D,z|\\theta) \\right) \\,.\n",
"$$\n",
"\n",
" (a) Apparently, in order to execute EM, we need to work out an expression for the 'responsibility' $p(z|x=D,\\hat{\\theta}^{(m)})$. Use Bayes rule to show how we can compute the responsibility that allows us to execute an EM step. \n",
"\n",
" (b) Why do we need multiple iterations in the EM algorithm? \n",
"\n",
" (c) Why can't we just use simple maximum log-likelihood to estimate parameters, as described by \n",
"$$\n",
"\\hat{\\theta} := \\arg \\max_\\theta \\log p(x=D,z|\\theta) \\,?\n",
"$$ \n",
"\n",
"- **[8]** In a particular model with hidden variables, the log-likelihood can be worked out to the following expression:\n",
"$$\n",
" L(\\theta) = \\sum_n \\log \\left(\\sum_k \\pi_k\\,\\mathcal{N}(x_n|\\mu_k,\\Sigma_k)\\right)\n",
"$$\n",
"Do you prefer a gradient descent or EM algorithm to estimate maximum likelihood values for the parameters? Explain your answer. (No need to work out the equations.)\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9e4b3855",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Julia 1.5.2",
"language": "julia",
"name": "julia-1.5"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}