# Objective function, loss function and cost function

Last updated on：a year ago

It is not easy to define them because some researchers think there is no difference among them, but the others don’t. I want first to conclude about the information I have found. When I become more familiar with them, I will implement some more details.

# No difference

Ian Goodfellow:

The function we want to minimize or maximize is called the objective function, or criterion. When we are minimizing it, we may also call it the cost function, loss function, or error function. In this book, we use these terms interchangeably, though some machine learning publications assign special meaning to some of these terms.

The Objective function, cost function, and loss function are the same.

# Difference

Andrew Ng:

Finally, the loss function was defined with respect to a single training example. It measures how well you’re doing on a single training example. I’m now going to define something called the cost function, which measures how well you’re doing an entire training set. So the cost function J which is applied to your parameters W and B is going to be the average with one of the m of the sum of the loss function applied to each of the training examples and turn.”

The loss function is the cost of a single training example, but the cost function is the cost of the whole training set or the sum of the loss function.

Loss function is usually a function defined on a data point, prediction and label, and measures the penalty. For example:

• square loss $l(f(x_i|\theta),y_i) = \left (f(x_i|\theta)-y_i \right )^2$, used in linear regression
• hinge loss $l(f(x_i|\theta), y_i) = \max(0, 1-f(x_i|\theta)y_i)$, used in SVM
• 0/1 loss $l(f(x_i|\theta), y_i) = 1 \iff f(x_i|\theta) \neq y_i$, used in theoretical analysis and definition of accuracy

Cost function is usually more general. It might be a sum of loss functions over your training set plus some model complexity penalty (regularization). For example:

• Mean Squared Error $MSE(\theta) = \frac{1}{N} \sum_{i=1}^N \left (f(x_i|\theta)-y_i \right )^2$
• SVM cost function $SVM(\theta) = |\theta|^2 + C \sum_{i=1}^N \xi_i$ (there are additional constraints connecting $\xi_i$ with $C$ and with training set)

Objective function is the most general term for any function that you optimize during training. For example, a probability of generating training set in maximum likelihood approach is a well defined objective function, but it is not a loss function nor cost function (however you could define an equivalent cost function). For example:

• MLE is a type of objective function (which you maximize)
• Divergence between classes can be an objective function but it is barely a cost function, unless you define something artificial, like 1-Divergence, and name it a cost

They are strictly different.

To be continued…