Because the reasons are in decreasing order, the cumulative function is a concave function.
2.
The goal is to choose a policy \ pi that will maximize some cumulative function of the random rewards, typically the expected discounted sum over a potentially infinite horizon: