Minsky Sim EconomicsMathematicsFinanceReinforcement LearningAgent-Based Modelling

Theory: From Stability to Fragility

Mathematical and conceptual reference for the Minsky Market Simulation. Covers the FIH, agent balance sheets, price formation, leverage dynamics, and the reinforcement learning formulation.

Mathematical and conceptual reference for the Minsky Market Simulation.


Contents

  1. Minsky’s Financial Instability Hypothesis
  2. Agent-Based Modelling in Financial Markets
  3. Fundamental Value Process
  4. Price Formation Mechanism
  5. Agent Balance Sheets
  6. Leverage, Debt, and Margin Calls
  7. Agent Decision Models
  8. Reinforcement Learning Formulation
  9. Market Instability Metrics
  10. The Minsky Mechanism End-to-End
  11. Limitations and Modelling Assumptions
  12. Further Reading

1. Minsky’s Financial Instability Hypothesis

1.1 Background

Hyman Minsky (1919–1996) was an American economist whose work, largely ignored during his lifetime, became influential after the 2008 global financial crisis. His central claim, the Financial Instability Hypothesis (FIH), inverts the standard equilibrium view of financial markets: rather than markets being self-correcting, stability itself sows the seeds of instability.

The core insight is that during periods of calm, agents systematically underestimate risk and increase leverage. This collective shift in risk-taking transforms a stable system into a fragile one. When a shock eventually arrives: even a small one: the over-leveraged system is unable to absorb it, and cascading forced selling produces a crash far larger than the original shock.

1.2 The Three Financing Regimes

Minsky classified borrowers into three regimes based on their ability to service debt from income:

Hedge finance: the borrower can cover both interest payments and principal repayment from expected cash flows. In the simulation, proxied by low leverage: L<1.5L < 1.5

Speculative finance: the borrower can cover interest payments but cannot repay principal without rolling over the debt. Proxied by moderate leverage: 1.5L<3.01.5 \leq L < 3.0

Ponzi finance: the borrower cannot even cover interest payments from cash flows. Solvency depends entirely on rising asset prices. Proxied by high leverage: L3.0L \geq 3.0

1.3 Stability Breeds Instability

The critical dynamic is the endogenous transition between regimes. In a calm market: observed volatility is low, recent returns have been positive, defaults are rare, and lenders extend credit at low rates. Rational agents respond by increasing leverage to capture higher returns. Individually defensible; collectively, the shift from hedge to speculative to Ponzi finance makes the entire system fragile.

The feedback loop:

Low volatility
    → agents increase leverage
    → leveraged buying pushes prices up
    → rising prices validate risk-taking
    → measured volatility stays low (prices trend smoothly upward)
    → agents increase leverage further
    → ...

1.4 The Minsky Moment

A Minsky moment is the point at which the system tips from fragility into crisis. A small negative shock triggers margin calls for Ponzi-financed agents. Forced selling drives prices down. Falling prices trigger margin calls for speculative agents. Further forced selling drives prices down further. The cascade continues until leverage across the system is dramatically reduced, often overshooting fundamental value downward.

The key property: the crash is disproportionate to the trigger. A 2% price fall from a shock can produce a 30% crash through the forced deleveraging spiral.


2. Agent-Based Modelling in Financial Markets

2.1 Why Agent-Based Models

Standard financial models assume homogeneous, fully rational agents, equilibrium pricing, normally distributed returns, and no feedback between agent actions and prices. Real markets exhibit fat-tailed return distributions, volatility clustering, asset price bubbles and crashes, and heterogeneous beliefs.

Agent-based models (ABMs) replace the representative rational agent with a population of heterogeneous, adaptive agents. Each agent follows a simple local rule. Macro-level phenomena: bubbles, crashes, volatility clustering: emerge from micro-level interactions rather than being imposed by assumption.

2.2 Emergence

In complex systems, emergence refers to properties of the whole that cannot be predicted from the properties of any individual part. In the simulation: no single agent is programmed to cause a crash. The crash emerges from the interaction of individual leverage decisions, price impact, and the margin call mechanism. This is what makes the model a genuine test of Minsky’s hypothesis: instability is not hard-coded, it must arise endogenously.

2.3 Heterogeneous Agents

The simulation uses four agent archetypes, each embodying a different belief about how prices work:

AgentBeliefMarket role
FundamentalPrice reverts to intrinsic valueStabilising
MomentumTrends persistDestabilising
NoiseNo systematic viewProvides liquidity and randomness
RLLearns from reward signalAdaptive, potentially destabilising

3. Fundamental Value Process

The fundamental value FtF_t represents the true intrinsic worth of the asset, evolving independently of market prices. It follows a discrete-time geometric Brownian motion:

Ft+1=Ft(1+μF+εt),εtN(0,σF2)F_{t+1} = F_t \left(1 + \mu_F + \varepsilon_t\right), \quad \varepsilon_t \sim \mathcal{N}(0, \sigma_F^2)

where μF\mu_F is the drift (set to 0 in baseline experiments) and σF\sigma_F is the fundamental volatility (kept small, e.g. 0.003 per step).

The key design choice is keeping σF\sigma_F small so that large price movements cannot be explained by fundamental shocks alone: any crash must be endogenously generated.

In continuous time, the equivalent process is:

dF=μFFdt+σFFdWtdF = \mu_F F \, dt + \sigma_F F \, dW_t

where WtW_t is a standard Wiener process.


4. Price Formation Mechanism

4.1 Net Demand

At each time step, agents submit buy and sell orders. Aggregating across all NN active agents:

Dt=Dt+Dt,D~t=DtNactiveD_t = D_t^+ - D_t^-, \qquad \tilde{D}_t = \frac{D_t}{N_{\text{active}}}

Normalising by active agent count prevents price explosions when many agents submit large simultaneous orders.

4.2 Price Impact Function

The price updates via a log-linear impact function:

Pt+1=Ptexp ⁣(αD~tλ+ηt),ηtN(0,ση2)P_{t+1} = P_t \cdot \exp\!\left(\frac{\alpha \tilde{D}_t}{\lambda} + \eta_t\right), \quad \eta_t \sim \mathcal{N}(0, \sigma_\eta^2)

where α\alpha is the price impact coefficient, λ\lambda is the liquidity parameter, and ηt\eta_t is i.i.d. Gaussian market noise.

The exponential form ensures Pt>0P_t > 0 always, and is consistent with the Kyle (1985) framework for price impact in the presence of informed traders.

4.3 Forced-Sell Feedback

When margin calls fire in step tt, the resulting forced sell volume Φt\Phi_t is buffered and injected as additional sell demand in step t+1t+1:

Dt+1Dt+1ΦtD_{t+1} \leftarrow D_{t+1} - \Phi_t

This one-step delay models realistic broker execution lag, and allows the price decline from forced selling to propagate and trigger further margin calls: the mechanism behind the Minsky cascade.


5. Agent Balance Sheets

Each agent ii holds cash ci,tc_{i,t}, shares qi,tq_{i,t}, and debt di,td_{i,t}.

Asset exposure:

Ei,t=Ptqi,tE_{i,t} = P_t \cdot q_{i,t}

Wealth (equity):

Wi,t=ci,t+Ptqi,tdi,tW_{i,t} = c_{i,t} + P_t q_{i,t} - d_{i,t}

Leverage:

Li,t=Ei,tmax(Wi,t,ε)L_{i,t} = \frac{|E_{i,t}|}{\max(W_{i,t},\, \varepsilon)}

where ε>0\varepsilon > 0 prevents division by zero. L=1L = 1 means fully invested with no borrowing; L=2L = 2 means the agent has borrowed an amount equal to their equity.

5.1 Trade Mechanics

Buying Δq>0\Delta q > 0 shares at price PtP_t: if cost ci,t\leq c_{i,t}, deduct from cash. Otherwise use all cash and borrow the shortfall (di,td_{i,t} increases).

Selling Δq>0\Delta q > 0 shares: proceeds first repay debt.

repay=min(di,t,proceeds)\text{repay} = \min(d_{i,t},\, \text{proceeds})

Note: selling shares does not increase wealth directly: proceeds go to debt repayment first. This is key: when prices fall, leveraged agents cannot easily restore their balance sheet by selling, because proceeds are absorbed by debt.


6. Leverage, Debt, and Margin Calls

6.1 Interest Accrual

At the start of each time step, debt compounds at the per-step borrowing rate rbr_b:

di,t+1=di,t(1+rb)d_{i,t+1} = d_{i,t} \cdot (1 + r_b)

For rb=0.0003r_b = 0.0003, the annualised equivalent is approximately 16%: penalising prolonged heavy leverage.

6.2 Forced Liquidation

When leverage exceeds LmaxL_{\max}, the agent must sell enough shares to bring leverage back to LmaxL_{\max}. The target share count is:

q=LmaxWi,tPtq^* = \frac{L_{\max} \cdot W_{i,t}}{P_t}

Crucially, selling to reduce leverage does not directly increase wealth: it just reduces the size of both sides of the balance sheet. Wealth only recovers if prices subsequently rise.

Default occurs when wealth is non-positive after full liquidation:

Wi,t=ci,tdi,t0    defaultW_{i,t} = c_{i,t} - d_{i,t} \leq 0 \implies \text{default}

6.3 Why Forced Selling Creates a Cascade

The feedback loop operates through three channels:

  1. Price channel: forced selling pushes Pt+1P_{t+1} down
  2. Wealth channel: falling prices reduce Wi,tW_{i,t}, increasing leverage for all leveraged agents
  3. Contagion channel: rising leverage triggers margin calls for previously safe agents

Formally, for Li>1L_i > 1 (leveraged), a price drop ΔP<0\Delta P < 0 causes leverage to increase:

ΔLiqiΔPWi(1Li)>0when Li>1,  ΔP<0\Delta L_i \approx \frac{q_i \Delta P}{W_i} \left(1 - L_i\right) > 0 \quad \text{when } L_i > 1,\; \Delta P < 0

This is the mathematical core of the Minsky instability mechanism.


7. Agent Decision Models

7.1 Fundamental Trader

The mispricing signal is:

si,t=FtPtFts_{i,t} = \frac{F_t - P_t}{F_t}

Desired position: qi,t=κFsi,tqmaxq_{i,t}^* = \kappa_F \cdot s_{i,t} \cdot q_{\max}

Fundamental traders are stabilising: they provide a force pulling prices toward FtF_t. Without them, prices have no anchor.

7.2 Momentum Trader

The signal is the rolling return over the last kk steps:

mt=PtPtkPtkm_t = \frac{P_t - P_{t-k}}{P_{t-k}}

Desired position: qi,t=κMmtqmaxq_{i,t}^* = \kappa_M \cdot m_t \cdot q_{\max}

Momentum traders are destabilising: they amplify price moves. A price rise increases mtm_t, generating more buying, which pushes prices higher: a positive feedback loop.

The tension between fundamental and momentum traders is what makes the simulation interesting. With only fundamental traders, prices always converge to FtF_t. With only momentum traders, prices diverge. The mixture produces realistic boom-bust dynamics.

7.3 Noise Trader

Noise traders draw an action uniformly from {buy, sell, hold} with random order size. They prevent trivial equilibrium and provide realistic background market activity.


8. Reinforcement Learning Formulation

8.1 The Markov Decision Process

The RL agent’s interaction with the market is formalised as an MDP (S,A,T,R,γ)(\mathcal{S}, \mathcal{A}, \mathcal{T}, \mathcal{R}, \gamma).

8.2 State Space

The observation vector otR11\mathbf{o}_t \in \mathbb{R}^{11}:

ot=[PtFt,  rt,  σtroll,  mt,  ci,t,  qi,t,  di,t,  Li,t,  Lˉt,  ntMC,  DDt]\mathbf{o}_t = \left[\frac{P_t}{F_t},\; r_t,\; \sigma_t^{\text{roll}},\; m_t,\; c_{i,t},\; q_{i,t},\; d_{i,t},\; L_{i,t},\; \bar{L}_t,\; n_t^{\text{MC}},\; \text{DD}_t \right]
FeatureDescription
Pt/FtP_t / F_tPrice-to-fundamental ratio
rtr_tLast-step log return
σtroll\sigma_t^{\text{roll}}Rolling volatility (20-step)
mtm_tRolling momentum (10-step)
ci,t,qi,t,di,tc_{i,t}, q_{i,t}, d_{i,t}Agent’s own cash, shares, debt
Li,tL_{i,t}Agent’s own leverage
Lˉt\bar{L}_tMarket-wide average leverage
ntMCn_t^{\text{MC}}Recent margin calls
DDt\text{DD}_tCurrent drawdown from peak

8.3 Action Space

The agent chooses a target exposure level (multiple of current wealth):

ActionTarget leverage
00× (full cash)
10.5×
21× (unleveraged)
31.5×
4
5

8.4 Reward Functions

Three reward functions are compared:

Profit-only: Rtprofit=ΔWt/WtR_t^{\text{profit}} = \Delta W_t / W_t

Risk-adjusted: Rtrisk=ΔWtλDDtμLi,tR_t^{\text{risk}} = \Delta W_t - \lambda \cdot \text{DD}_t - \mu \cdot L_{i,t}

System-aware: Rtsys=ΔWtλDDtμLi,tγLˉtR_t^{\text{sys}} = \Delta W_t - \lambda \cdot \text{DD}_t - \mu \cdot L_{i,t} - \gamma \cdot \bar{L}_t

The system-aware reward tests whether an agent can be incentivised to internalise systemic risk: the negative externality their leverage imposes on others.

8.5 Deep Q-Network (DQN)

DQN (Mnih et al., 2015) approximates the Q-function with a neural network Qθ(s,a)Q_\theta(s, a). The loss is:

L(θ)=E(s,a,r,s)D[(yQθ(s,a))2],y=r+γmaxaQθ(s,a)\mathcal{L}(\theta) = \mathbb{E}_{(s,a,r,s') \sim \mathcal{D}} \left[\left(y - Q_\theta(s, a)\right)^2\right], \qquad y = r + \gamma \max_{a'} Q_{\theta^-}(s', a')

Two stabilisation techniques are essential:


9. Market Instability Metrics

Rolling volatility: Standard deviation of log returns over a window WW.

Mispricing: Mt=(PtFt)/FtM_t = (P_t - F_t) / F_t. Positive: overvaluation. Negative: post-crash undershoot.

Maximum drawdown: MDDt=minst(PsmaxusPu)/maxusPu\text{MDD}_t = \min_{s \leq t} (P_s - \max_{u \leq s} P_u) / \max_{u \leq s} P_u

Sharpe ratio: SRi=(rˉi/σri)Nsteps per year\text{SR}_i = (\bar{r}_i / \sigma_{r_i}) \cdot \sqrt{N_{\text{steps per year}}}

Note: Sharpe is unreliable for bimodal return distributions: precisely the conditions Minsky’s theory predicts.

Gini coefficient:

G=2i=1NiWiNi=1NWiN+1NG = \frac{2 \sum_{i=1}^{N} i W_i}{N \sum_{i=1}^{N} W_i} - \frac{N+1}{N}

High-crash regimes can show low Gini because everyone is equally ruined: the model distinguishes prosperous equality from catastrophic equality.


10. The Minsky Mechanism End-to-End

Phase 1: Stable equilibrium (steps 0–100): PtFtP_t \approx F_t, all agents hedge-financed, low volatility.

Phase 2: Leverage build-up (steps 100–300): Momentum traders and RL agent increase positions. Leveraged buying pushes Pt>FtP_t > F_t. Rising prices confirm momentum signals. Speculative-finance agents multiply.

Phase 3: Fragility (steps 300–350): Price significantly above fundamental. High proportion of agents in speculative/Ponzi regimes. System is fragile: margin calls would cascade from any small fall.

Phase 4: Shock and cascade (step 350): Small negative shock triggers first margin calls. Forced selling propagates. Price falls disproportionate to the initial shock.

Phase 5: Post-crash deleveraging (steps 350–500): Surviving agents hold mostly cash. Price undershoots FtF_t. Fundamental traders gradually push price back. The cycle can begin again.


11. Limitations and Modelling Assumptions

AssumptionWhat is omitted
Single risky assetNo cross-asset contagion or portfolio diversification
No order bookNo bid-ask spread, no market depth
No banking sectorNo credit contraction, no central bank
No collateral chainsRepo markets and rehypothecation absent
Hard leverage capReal margin requirements are dynamic
No short-sellingAgents cannot profit from declining prices
i.i.d. shocksReal macro shocks are regime-dependent with fat tails

These limitations restrict generalisability but do not invalidate the mechanism: the model provides evidence of a possible channel, not proof that real RL trading systems cause crashes.


12. Further Reading

Minsky’s Financial Instability Hypothesis

Agent-Based Models in Finance

Price Impact and Market Microstructure

Reinforcement Learning

← Back to Minsky Market Sim