summary { font-weight: normal; margin: -.5em -.5em 0; padding: .5em; }
details[open] { padding: .5em; }
details[open] summary { border-bottom: 1px solid #aaa; margin-bottom: .5em; }
.alignleft { border: 1px solid #aaa; border-radius: 10px; padding: .5em .5em .5em; float: left; } .alignright { border: 1px solid #aaa; border-radius: 10px; padding: .5em .5em .5em; float: right; }
</style>
Deep Learning Course v2.0
Universitat de Barcelona, 2020
Jordi Vitrià
Supervised learning is usually concerned with the two following inference problems:
-
Classification: Given
$(\mathbf{x}_i, y_i) \in \mathcal{X}\times\mathcal{Y} = \mathbb{R}^p \times {1, ..., C}$ for$i=1, ..., N$ , we want to estimate for any new$\mathbf{x}$ ,$$\arg \max_y P(Y=y|X=\mathbf{x}).$$ -
Regression: Given
$(\mathbf{x}_i, y_i) \in \mathcal{X}\times\mathcal{Y} = \mathbb{R}^p \times \mathbb{R}$ , for$i=1, ..., N$ , we want to estimate for any new$\mathbf{x}$ ,$$\mathbb{E}\left[ Y|X=\mathbf{x} \right].$$
!!! Note Certainty is perfect knowledge that has total security from error (...). (Wikipedia)
!!! Note Uncertainty has been called an unintelligible expression without a straightforward description. It describes a situation involving insecurity and/or unknown information. It applies to predictions of future events, to physical measurements that are already made, or to the unknown. (Wikipedia)
Regarding uncertainty, we can be in several situations:
- Complete certainty. This is the case of (theoretical) macroscopic mechanics, where we can access the complete information about the input and there is no uncertainty about the output because we have a complete model.
- Uncertainty with risk. This is the case of a game (f.e. dices) with uncertain output, but where we exactly know the probability distribution over outcomes. The most common cause of this situation is that we cannot get the complete information about the input.
- Fully reducible uncertainty: The case of a game with complete information about the input, with a certain output, but where some parameters of the model cannot robustly estimated. The most common cause for this situation are lack of sufficient data. This kind of uncertainty can be reduced to the first case by gathering sufficient data.
- Partially reducible uncertainty. This is the case of games where we have no full knowledge about the probability distribution over outcomes. There are different causes for this situation: the model changes over time, the model is too complex to be robustly estimated by using a feasible dataset, etc.
When working with predictive systems, measuring uncertainty is important for dealing with the risk associated to decisions. If we can measure risk, we can define policies related to the use of predictions.
In the framework of predictive models, uncertainty has been categorized by taking into account its main source: epistemic or aleatoric.
Aleatoric uncertainty captures our uncertainty with respect to information which our data cannot explain. It can be explained away with the ability to observe all explanatory variables with increasing precision.
We can actually divide aleatoric into two further sub-categories:
- Data-dependant or Heteroscedastic uncertainty is aleatoric uncertainty which depends on the input data and is predicted as a model output.
- Task-dependant or Homoscedastic uncertainty is aleatoric uncertainty which is not dependant on the input data. It is not a model output, rather it is a quantity which stays constant for all input data and varies between different tasks.
(Source: https://alexgkendall.com/computer_vision/bayesian_deep_learning_for_safe_ai/)
Epistemic uncertainty captures our ignorance about which model generated our collected data. Sometimes, this uncertainty can be explained away given enough data, and is often referred to as model uncertainty. Epistemic uncertainty is really important to model for small datasets where the training data is sparse.
(Source: https://alexgkendall.com/computer_vision/bayesian_deep_learning_for_safe_ai/)
A neural network can be used to implement probabilistic models
For classification,
For regression,
Given a training dataset
The usual optimization objective during training is the negative log likelihood. For a categorical distribution this is the cross entropy error function, for a Gaussian distribution this is proportional to the sum of squares error function.
Let's consider a noisy regression problem:
def f(x, sigma):
epsilon = np.random.randn(*x.shape) * sigma
return 10 * np.sin(2 * np.pi * (x)) + epsilon
The noise gives rise to aleatoric uncertainty.
A network can now be trained with a Gaussian negative log likelihood function (neg_log_likelihood) as loss function assuming a fixed standard deviation (noise).
This is equivalent to consider the following loss function:
where the model predicts a mean
What about an unknown and/or data-dependent noise model? In this case we can add these parameters as part of the parameters the neural network estimates from data.
This is equivalent to consider the following loss function:
where the model predicts a mean
As you can see from this equation, if the model predicts something very wrong, then it will be encouraged to attenuate the residual term, by increasing uncertainty
We can use stochastic neural networks, where each layer compute the parameters of some distribution, and its forward pass consists of taking a sample from that parametric distribution. However, the difficulty is that we can’t backpropagate through samples.
The problem of backpropagating through stochastic nodes can be circumvented if we can re-express the sample
Multiplying the likelihood with a prior distribution
Both MLE and MAP give point estimates of parameters. If we instead had a full posterior distribution over parameters we could make predictions that take weight uncertainty into account. This is covered by the posterior predictive distribution
Unfortunately, an analytical solution for the posterior
This is known as the variational free energy. The first term is the Kullback-Leibler divergence between the variational distribution
We see that all three terms in equation
In the following example, we'll use a Gaussian distribution for the variational posterior, parameterized by
A training iteration consists of a forward-pass and and backward-pass. During a forward pass a single sample is drawn from the variational posterior distribution. It is used to evaluate the approximate cost function defined by equation
Since a forward pass involves a stochastic sampling step we have to apply the so-called re-parameterization trick for backpropagation to work. The trick is to sample from a parameter-free distribution and then transform the sampled
For numeric stability we will parameterize the network with
Uncertainty in predictions that arise from the uncertainty in weights is called epistemic uncertainty. This kind of uncertainty can be reduced if we get more data. Consequently, epistemic uncertainty is higher in regions of no or little training data and lower in regions of more training data. Epistemic uncertainty is covered by the variational posterior distribution. Uncertainty coming from the inherent noise in training data is an example of aleatoric uncertainty. It cannot be reduced if we get more data. Aleatoric uncertainty is covered by the probability distribution used to define the likelihood function.
Variational inference of neural network parameters is now demonstrated on a simple regression problem. We therefore use a Gaussian distribution for X, y drawn from a sinusoidal function.
- Ethics of Data: Privacy, transparency, trust...
- Ethics of Algorithms: accountability, auditing, ...
- Ethics of Practices: consent, ...
Exercise: Linear Regression with Synthetic Data Colab exercise, which explores linear regression with a toy dataset. [Colab Notebook](https://colab.research.google.com/github/google/eng-edu/blob/master/ml/cc/exercises/linear_regression_with_synthetic_data.ipynb?utm_source=mlcc&utm_campaign=colab-external&utm_medium=referral&utm_content=linear_regression_synthetic_tf2-colab&hl=en).
Question: Which of the following model's predictions have been affected by selection bias?
- Engineers built a model to predict the likelihood of a person developing diabetes based on their daily food intake. The model was trained on 10,000 "food diaries" collected from a randomly chosen group of people worldwide representing a variety of different age groups, ethnic backgrounds, and genders. However, when the model was deployed, it had very poor accuracy. Engineers subsequently discovered that food diary participants were reluctant to admit the true volume of unhealthy foods they ate, and were more likely to document consumption of nutritious food than less healthy snacks.
- Engineers built a model to predict the likelihood of a person developing diabetes based on their daily food intake. The model was trained on 10,000 "food diaries" collected from a randomly chosen group of people worldwide representing a variety of different age groups, ethnic backgrounds, and genders. However, when the model was deployed, it had very poor accuracy. Engineers subsequently discovered that food diary participants were reluctant to admit the true volume of unhealthy foods they ate, and were more likely to document consumption of nutritious food than less healthy snacks.
- Engineers built a model to predict the likelihood of a person developing diabetes based on their daily food intake. The model was trained on 10,000 "food diaries" collected from a randomly chosen group of people worldwide representing a variety of different age groups, ethnic backgrounds, and genders. However, when the model was deployed, it had very poor accuracy. Engineers subsequently discovered that food diary participants were reluctant to admit the true volume of unhealthy foods they ate, and were more likely to document consumption of nutritious food than less healthy snacks.
There is no selection bias in this model; participants who provided training data were a representative sampling of users and were chosen randomly. Instead, this model was affected by reporting bias. Ingestion of unhealthy foods was reported at a much lower frequency than true real-world occurrence.
Pregunta: Which of the following model's predictions have been affected by selection bias?
There is no selection bias in this model; participants who provided training data were a representative sampling of users and were chosen randomly. Instead, this model was affected by reporting bias. Ingestion of unhealthy foods was reported at a much lower frequency than true real-world occurrence.Una possible implementació de l'exponenciació,
def pow(x,y):
a = x
for i in range(2,y+1):
a = a * x
return a
- Seria correcte aquest algorisme?
- Quina complexitat
$O(\cdot)$ tindria (suposant que les multiplicacioms tenen $O(1)$)?
This is example page of the API doc style created Aras Pranckevičius at Unity for Morgan McGuire's Markdeep and now part of the official Markdeep distro. It uses content that could be an imaginary documentation page. Here would be some introduction text.
La recursió
-
ha d'haver-hi un o més casos base: són condicions de les dades que es resolen directament, sense que calgui una nova crida al programa
-
a cada crida recursiva ens apropem a un dels casos base: ja sigui perquè es decrementa una variable, perquè es redueix una llista...
D'una manera simplificada podríem dir que els algorismes recursius segueixen el patró:
def recursiu(x):
if x=cas_final:
return VALOR
else:
return CRIDA_RECURSIVA(x reduïda)+Calculs
!!! Error: Compte amb la recusió. No feu mai més de 100 crides recursives d'un programa!
$$f(x) = 23$$
Exercise: Linear Regression with Synthetic Data Colab exercise, which explores linear regression with a toy dataset. [Colab Notebook](https://colab.research.google.com/github/google/eng-edu/blob/master/ml/cc/exercises/linear_regression_with_synthetic_data.ipynb?utm_source=mlcc&utm_campaign=colab-external&utm_medium=referral&utm_content=linear_regression_synthetic_tf2-colab&hl=en).
[Previous Page](deep2.html)
[Next Page](deep2.html)