ecent advances in large language models (LLMs) have enabled new possibilities in simulating complex physiological systems through reasoning, generation, and agentic coordination. In this work, wepresent Organ-Agents, a novel multi-agent framework that simulates the dynamics of human physiology using LLM-driven agents. Each agent, referred to as a Simulator, is assigned to model a specificphysiological system such as the cardiovascular, renal, immune, or respiratory system. The trainingof the Simulators consists of two stages: supervised fine-tuning on system-specific time-series data,followed by reinforcement-guided inter-agent coordination that incorporates dynamic reference selection and error correction with assistantive agents. To support training, we curated a cohort of 7,134sepsis patients and 7,895 matched controls, constructing high-resolution, multi-domain trajectoriescovering 9 physiological systems and 125 clinical variables. Organ-Agents achieved high simulationaccuracy on 4,509 held-out patients, with average per-system mean squared error (MSE) below 0.16across all systems and robust performance across severity strata based on sequential organ failureassessment (SOFA) scores. Generalization capability was confirmed via external validation on 22,689intensive care unit (ICU) patients from two tertiary hospitals, showing moderate performance degradation under distribution shifts while maintaining overall simulation stability. In terms of clinicalplausibility, Organ-Agents reliably reproduces multi-system critical event chains (e.g., hypotension,hyperlactatemia, hypoxemia) with preserved event order, coherent phase progression, and minimaldeviations in both trigger timing and physiological values. Subjective evaluation by 15 critical carephysicians further confirmed the realism and physiological coherence of simulated trajectories, withmean Likert ratings of 3.9 and 3.7, respectively. The Simulator also supports counterfactual simulation under alternative fluid resuscitation strategies for sepsis, producing physiological trajectoriesand APACHE II scores that closely align with matched real-world patient groups. To further assessthe preservation of clinically meaningful patterns, we evaluated Organ-Agents in downstream earlywarning tasks using seven representative classifiers. Most models showed only marginal AUROCdegradation when transferring from real to generated and counterfactual trajectories, with performance drops generally within 0.04, indicating that the simulations preserved decision-relevantinformation for clinical risk simulation. Together, these results position Organ-Agents as a clinically credible, interpretable, and generalizable digital twin for in physiological modeling, enablingprecision diagnosis, treatment simulation, and hypothesis testing across critical care settings.