Gradual proactive regulation of body state by reinforcement learning of homeostasis 2026 Fujiwara and Naoki

Andy

Senior Member (Voting rights)

Highlights​

  • A continuous HRL learns cue-triggered anticipatory compensation (tolerance).
  • Context-gated inhibition captures extinction and rapid reacquisition.
  • Asymmetric reinforcement weighting enables learning near set points.
  • A multivariable HRL reveals priority-driven trade-offs and non-recovery.
  • The model reproduces ethanol-induced hypothermia, tolerance, and rapid reacquisition.

Abstract​

Living systems maintain physiological variables such as temperature, blood pressure, and glucose within narrow ranges—a process known as homeostasis. Homeostasis involves not only reactive feedback but also anticipatory adjustments shaped by experience. Prior homeostatic reinforcement learning (HRL) models have provided a computational account of anticipatory regulation under homeostatic challenges. However, existing formulations lack mechanisms for gradual, trial-by-trial adjustment and for extinction learning.

To address this issue, we developed a continuous HRL framework that enables trial-wise tuning of anticipatory regulation. The model incorporates biologically informed components: asymmetric reinforcement, weighting negative outcomes more than positive outcomes; and a dual-unit, context-gated inhibitory mechanism. We applied the framework to thermoregulatory conditioning with ethanol-induced hypothermia and successfully reproduced cue-triggered compensation, gradual tolerance, and rapid reacquisition after extinction. We then extended the framework to multiple physiological variables influenced by shared neural or hormonal control signals, where compensating one variable can necessarily incur costs in others (e.g., heating at the expense of a fuel-like resource). Under uneven regulatory priorities, deviations propagated through shared control, yielding cascading, system-wide failure to stabilize near the ideal state—a failure mode discussed in autonomic dysregulation (e.g., dysautonomia, myalgic encephalomyelitis/chronic fatigue syndrome). Overall, our framework provides a computational basis to advances a systems-level understanding of multi-organ homeostatic dysregulation in vivo.

Open access
 

4.3. Pathological Relevance​

In the multidimensional model, assigning disproportionately low priority to a homeostatic variable downweights its deviations during learning, leading to misattribution of disturbances and non-convergence of anticipatory regulation under shared and low-dimensional control. Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) offers a cautious point of contact with this class of dynamics. Clinically, ME/CFS is associated with orthostatic intolerance and other autonomic abnormalities (Freeman and Komaroff, 1997, Garner and Baraniuk, 2019), as well as reduced heart-rate variability linked to fatigue severity (Escorihuela et al., 2020, Nelson et al., 2019).

More broadly, recent accounts emphasize impaired allostatic regulation and aberrant interoceptive inference rather than simple sensory deficits (Barrett and Simmons, 2015, Stephan et al., 2016). Importantly, the present model does not imply a literal reduction in the physiological importance of specific variables. Rather, we interpret such conditions computationally, as imbalances in effective priority during learning arising from altered interoceptive or autonomic control.

In this framing, clinical manifestations can be related to specific parameter regimes of the model, linking observed autonomic dysregulation to underlying learning dynamics rather than to fixed sensory deficits. In principle, perturbation-based learning paradigms combined with physiological time-series data may help assess whether non-conv rgent adaptation characterizes particular patient populations.
 
Back
Top Bottom