Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MSc in Financial Mathematics, FM50

Stochastic Control for Optimal Trading

1    Part 1: Literature review

A stochastic optimal control problem deals with uncertainties when making decisions to maximize or minimize an objective function. With a given objective function, decision makers need to determine a strategy, which is the stochastic control, to optimize the objective function in a random environment. The decision-making problem is the so-called stochastic control problem. One powerful tool to study the stochastic control problems is the dynamic programming principle and the associated Hamilton- Jacobi-Bellman (HJB) equation. Optimal controls are obtained by solving the HJB equation. For an overview of this approach, students are referred to [Pha09], [FS06, Tou12].  Stochastic control theory has applications to a wide range of areas, from engineering to financial mathematics and economics.

For Part 1, students are expected to present an overview of possible applications of stochastic control problems arising in financial mathematics.  You should describe some examples of problems to which these techniques can be applied and you should try to write down some HJB equations for the problems you consider and the solutions if known.  You should be proactive in researching the literature, which involves published journal papers and books. Working papers should be used mostly for orientation, given that their content has not been peer reviewed. It is particularly important that you synthesize the information gathered from these sources and presents it as a flowing story that is consistent both in terms of notation and mathematical and financial content.

2    Part 2: An optimal trading problem

In this part, we consider an extension of the standard optimal trading problem. We refer the student to Chapter 6 in [CJP15] for an introduction to the subject.

Consider a completed filtered probability space (Ω , F, F = (Ft )tT , P), with Ft  the natural filtration generated by the 2−dimensional Brownian motion W = (Wα , WS ), with W α , and WS  independent, and T = [0,T], where T > 0 is the trading horizon. We let the mid-price process of the traded asset follow

dSt  = κs αt dt + σ s dWtS ,                                                          (1)

where κs , σ s  ∈ R+  and αt  is the informed trader’s signal, which we assume follows

dαt  = −κα αt dt + σ α dWtα ,                                                        (2)

for κα , σ α  ∈ R+ .  The process (νt )tT  is understood as the speed at which the trader trades in the market.   In particular, we have the understanding that when νt   >  0 it means that the trader is purchasing the security and when νt  < 0 the trader is selling the security.

A := {ν = (νt ){tT}   | ν is F progressively measurable, and E [\0 T (νs )2 ds] < }        (3)

and we let ν ∈ A be a given trading strategy. Given ν , the inventory process of the informed trader, denoted by (Qt(ν))tT, satisfies

dQt(ν)  = νt dt,    Q0(ν)  = 0 .                                                               (4)

dXt(ν)  = −νt (St + κνt )dt,    X0(ν)  = 0 ,                                                 (5)

where κ is the temporary price impact parameter that captures the quality of the liquidity that the broker offers to their clients.

ν (t,α,q,S,x) = Et,x,S,q  [XT(ν) + QT(ν) ST  a(QT(ν))2 ϕ \t T (Q  )s(ν) 2 ds] ,                     (6)

with the value function given by

H(t,α,q,S,x) = sup Hν (t,α,q,S,x) ,                                                (7)

ν∈A

and the notation Et,x,S,q  means

Et,x,S,q [ · ] = E[ · | Xt(ν)  = x, St  = S, Qt(ν)  = q] .                                           (8)

Task 1.  Find the explicit solution to the control problem described above, i. e ., find the value function H  and  the  optimal  control  ν in  closed-form.    What  happens  to  the  optimal  trading  strategy  if the Brownian motions W α  and WS  have a correlation ρ  0?

To accomplish this task, you can follow the standard approach, which consists in (formally) proving the dynamic programming principle satisfied by the value function H and deriving the associated HJB equation (see e.g. [CJP15, Pha09]). Then, compute the optimal control in feedback form and substitute it back in the HJB equation and derive the PDE satisfied by the value function.

Using the PDE satisfied by the value function,  one can propose the ansatz  H(t,α,q,S,x)  = x + S q + h(t,α,q) and derive the PDE satisfied by h. Then, propose a linear-quadratic ansatz (in q) for the function h and deduce a system of ODEs, that you should solve next. Use the solution for H to find a closed-form solution to the optimal trading strategy.

Consider the following model parameters:  S0  = 100, κs  = 1, σ s  = 2, T = 1, κα  = 10, σ α  = 5, κ = 1 × 10 3 , a = 1, b = 0, and ϕ = 0. Consider the discrete version of the model using time steps of ∆ = T/1000 and implement the optimal trading strategy (ν).

Task  2.  Based  on  the  10,000  simulations  using  the  above  parameters,  fill  the following  table  and produce histograms of the four random variables  below:

 

Expected value

Standard deviation

XT(ν)*

QT(ν)*                  XT(ν)*   + QT(ν)*  ST

 

 

Task 3. Study the optimal strategy and its sensitivity to the model parameters .  What can be said about the limiting behaviour of the optimal strategy ν as a → ∞?

To accomplish this task, you can plot trajectories of (αt )0tT, (νt )0tT, (Qt(I))0tT, and (Xt(I))0tT for a few outcomes of chance and comment on what the optimal strategy does and why one observes  the plotted behaviour.  To analyse the sensitivity of the optimal strategy with respect to model pa-  rameters, one can consider different values of a and ϕ and comment on their influence on the optimal  strategy.

Task 4.  Consider now a  benchmark strategy that trades following αt ,  i. e ., νt(B)  = αt .   Why would this be  a good  benchmark?  Produce  histograms for PT(ν)   := XT(ν)  + QT(ν) ST(ν)  − a(QT(ν))2  ϕtT (Qs(ν))2 ds under strategies ν B  and ν I, and compare the means for the quantity PT(ν)  under both strategies .  Explain your results .

3    Part 3:  Original contribution

In this part, you should develop your own ideas either on an extension of the model proposed in Part 2, or on an independent control problem. Some possible ideas could be:

• More sophisticated price process models, e.g., jump-diffusion models, transient price impact, permanent price impact, impacting the signal.

• Use neural networks to find the optimal feedback control following Section 2.1 in  [GPW21]. Once you have managed to replicate the results obtained in Part 2 of the thesis, then use neural networks to solve more general versions of the current problem, e.g., other utility functions, different impact functions, more sophisticated models for the asset price, etc.

• Develop a reinforcement learning framework to solve a discrete-time version of the problem in Part 2 and implement it. In particular, discuss advantages of doing so; see [HXY21, CJSB22].

• Study the case where there are two (or more) signals.  Study the tradeoff between following a short-term α-signal versus a longer term signal.

• Consider ambiguity aversion within the framework.

• Consider a model with two correlated assets, and an α-signal that enters both assets. Set up an optimal trading problem of your design within this framework.


References

[CJP15]   A´ lvaro Cartea, Sebastian Jaimungal, and Jos´e Penalva.   Algorithmic  and  high-frequency

trading. Cambridge University Press, 2015.

[CJSB22] A´ lvaro Cartea, Sebastian Jaimungal, and Leandro S´anchez-Betancourt. Deep reinforcement

learning for algorithmic trading.  In  Machine  Learning  in  Financial Markets:  A  guide  to contemporary practices  (to  appear) .  Edited  by  C.-A .  Lehalle  and  A .  Capponi.  Cambridge University Press, 2022.

[FS06]      Wendell H Fleming and Halil Mete Soner.  Controlled Markov processes  and viscosity solu-

tions, volume 25. Springer Science & Business Media, 2006.

[GPW21] Maximilien Germain, Huyˆen Pham, and Xavier Warin.  Neural networks-based algorithms

for stochastic control and pdes in finance.  arXiv preprint arXiv:2101 . 08068, 2021.

[HXY21]  Ben Hambly, Renyuan Xu, and Huining Yang.  Recent advances in reinforcement learning

in finance. arXiv preprint arXiv:2112. 04553, 2021.

[Pha09]    Huyˆen Pham.  Continuous-time  stochastic  control and  optimization  with financial  applica-

tions, volume 61. Springer Science & Business Media, 2009.

[Tou12]    Nizar Touzi.   Optimal  stochastic  control,  stochastic  target problems,  and  backward  SDE,

volume 29. Springer Science & Business Media, 2012.