top of page
Hospital_edited.jpg

Hospital Ressources Optimization 

In nowadays hospitals meet some problems as their optimal way of working especially when their is a lot of pressure from the high number of patient compared to the small number of medical staff. Since 3 years a research from the Technion have been conducted in order to find solution to this problem. The research main goal is to find an algorithm that will be able to give real time advises for medical staff in a hospital department. Our space model is a map of the catheterization lab department of the hospital. It contains different type of room with different proprieties. Moreover, the department contains actors that can be medical staff, patient, or visitors. They are the active part of the model, they represent the building occupant, their movement are defined by narratives. The main goal is to optimize the global space utilization, time, and patient satisfaction.

Introduction

Our research focuses on improving hospital department efficiency by enhancing patient satisfaction, reducing procedure times, and optimizing space utilization. Each patient follows a unique narrative—a sequence of events requiring resources like medical staff and rooms. The challenge lies in coordinating these narratives to create an optimal operational policy.

Traditionally, buildings are passive structures, unaware of occupant activities, leading to inefficiencies in resource allocation and conflicts. Recent advancements in sensing technologies, such as temperature, occupancy, and noise sensors, integrated with Building Management Systems (BMS), have improved energy efficiency and occupant comfort. Wearable devices and ambient sensors (e.g., cameras, thermal sensors) are also used in healthcare for patient monitoring and activity detection. However, these solutions are reactive and localized, lacking a holistic understanding of human behavior across the entire building.

To address these limitations, we propose a simulation-powered Building Management System that uses Visible Light Communication (VLC) to track human presence and assets, simulate what-if scenarios, and optimize Key Performance Indicators (KPIs). Unlike traditional methods, our approach:

  • Models spaces, people, and activities interdependently for predictive analytics, going beyond mere physical replication (e.g., Digital Twins).

  • Focuses on dynamic resource allocation within existing buildings, rather than evaluating architectural design changes.

  • Uses abstract representations of spaces and operations for real-time simulation, enabling complex scenario testing for workflow decision-making.

  • Integrates spatial, operational, and social modeling techniques, offering a more comprehensive approach than traditional Operations Research methods like queuing models or Petri Nets.

This system was previously tested in a hypothetical scenario involving a catheterization lab. In this study, we extend the framework by developing a novel building activities management system that accounts for detailed decision-making by each actor. This enables the prediction and analysis of multidimensional resource allocation strategies (people, spaces, equipment) and their impact on spatial, social, and operational KPIs.

A simulation study at the Catheterization Lab at St. Bernardine Medical Center demonstrates the efficacy of this approach. While the VLC system for real-time sensing is not yet deployed, the study highlights the potential of this system to transform hospitals into adaptive, efficient environments that proactively address operational challenges.

Building Activities Management

The proposed system consists of three main components: a digital model of the building ecosystem, a simulation engine, and an analysis and evaluation method. The digital model captures spaces, actors, and activities, informed by data from field studies and occupancy sensors when available. The simulation engine generates alternative future scenarios of occupancy and activities, while the analysis and evaluation method quantifies the impact of these scenarios on spatial, social, and operational Key Performance Indicators (KPIs), defined in collaboration with stakeholders. Below, we detail the key components of the simulation model.

The Space Model represents the built environment where activities occur. The environment is abstracted into a graph, where nodes represent inhabitable spaces such as rooms, corridors, and open areas, and links indicate how spaces are connected. Nodes store static information, such as a space’s function (e.g., operating room, waiting area), as well as dynamic information, such as the identity of occupants and their ongoing activities. Each space is managed by a room manager, which coordinates the behavior of actors within it. For example, the room manager of a catheterization lab ensures all required participants are present for a surgery and coordinates the procedure’s execution.

The Actor Model represents the building occupants, including their roles (e.g., nurses, doctors, patients) and group memberships (e.g., a doctor associated with specific nurses and patients). It also tracks dynamic information, such as an actor’s location and status (e.g., whether a patient is pre- or post-procedure).

The Activity Model captures the interactions between actors and spaces, such as movement or domain-specific behaviors like a catheterization procedure. Each activity has a duration, a list of participants, and one or more spaces where it can occur.

The Narrative Model represents goal-oriented behaviors of actors engaged in structured sequences of activities, either individually or collaboratively. For example, a catheterization procedure narrative includes activities such as preparing the patient and room, moving the patient to the lab, executing the procedure, and discharging the patient. Each narrative determines the next destination for actors and delegates control to the relevant room manager for local behavior coordination. Once an activity is completed, control returns to the narrative, which identifies the next step.

The Narrative Management System coordinates multiple narratives and resolves conflicts when narratives compete for the same resources (e.g., spaces, people, equipment). For instance, if two narratives require a holding area with only one available bed, the room manager reports the conflict to the narrative manager. The narrative manager then simulates alternative resolution strategies and recommends the one that best balances spatial, social, and operational KPIs.

In this study, the system components are modeled using Python, enabling behavior simulation over time and generating analytical data tables. The goal is to develop a system capable of providing interactive decision support to building occupants, optimizing resource allocation and operational efficiency in real-time.

This case study evaluates the proposed building activities management system in the Cardiac Catheterization Lab (CCL) at St. Bernardine Medical Center (SBMC), a 342-bed facility with one of Southern California’s largest heart programs. The CCL handles outpatients, inpatients, and emergency cases like STEMI, facing challenges in staffing, operations, and space utilization. Decisions on resource allocation—spaces, staff, and activities—must balance efficiency, utilization, and satisfaction, often complicated by simultaneous actions of unaware actors in different spaces.

The system models workflows, such as catheterization procedures, to predict conflicts and optimize resource allocation. By simulating scenarios, it recommends strategies to improve efficiency and satisfaction, addressing the CCL’s dynamic and complex environment.

Data and Modeling 

The CCL includes a 3-bed holding area for pre- and post-procedure care. It interacts with the Cardiac Ambulatory Care Unit (CACU), a 16-bed unit for outpatient preparation and recovery, located on a different hospital level. Other units include the inpatient ward (IP), ICU, Emergency Department (ED), and Acute Care Unit (ACU), a 12-bed unit for outpatient surgery prep and recovery. A 20-person waiting room for families is located outside the CCL.

The CCL is staffed by 16 cardiologists, 20 nurses, 8 X-ray technicians, and support personnel. Patient transfers are conducted by nurses or technicians, following protocols based on patient type and monitoring needs. Block scheduling assigns rooms to cardiologists or groups. Diagnostic procedures involve a cardiologist, two nurses, and one technician, while interventional procedures add a nurse and anesthetist. Procedure duration varies by type and patient condition.

The workflow includes pre-procedure preparation in the holding room, patient transfer to the CCL, procedure execution, and recovery. Delays can occur due to cardiologist availability, room/staff readiness, or Turnaround Time (TAT)—the interval between procedures for room cleaning and preparation.

For simulation, the workflow is abstracted into a graph representation, focusing on two catheterization labs. Nodes represent activities with expected durations, and arcs show traversal times between nodes. Abstract time units simplify the model, with significant transfer times to/from the CACU. Patients typically spend 2 hours in pre-procedure prep, up to 2 hours in the CCL, and 2-6 hours in recovery, depending on the procedure and complications.

we simulate the activities of five patients (three post-procedure, two pre-procedure), two doctors, five nurses, and two X-ray technicians across two Cath Labs. Both labs complete their procedures simultaneously, requiring two post-procedure patients to be moved to the holding room for recovery. However, the holding room is already full, with two pre-procedure patients and one post-procedure patient (Figure 4 - T1).

The Building Management System detects this situation, presenting the staff with four possible actions:

  1. Move post-procedure patient P1 to the corridor (COR), awaiting a free space in the holding room.

  2. Move post-procedure patient P2 to the corridor (COR), awaiting a free space in the holding room.

  3. Move both post-procedure patients (P1 and P2) to the corridor, awaiting a free space in the holding room.

  4. Keep both post-procedure patients in their respective Cath Labs, awaiting a free space in the holding room.

Since patients must be accompanied by a nurse at all times, moving them to the corridor requires a nurse to stay with them, either in the corridor or the Cath Lab. Each option has trade-offs in terms of resource allocation, patient flow, and operational efficiency, as illustrated in Table 1.

To support decision-making, the system simulates and analyzes the consequences of each option. For example, the sequence of steps for Option A is depicted in Figure 4 (T2-6), showing how moving P1 to the corridor impacts resource use and patient flow. This simulation helps staff choose the action that optimizes outcomes based on predefined Key Performance Indicators (KPIs).

Reward Function and Heuristic

The simulation results are analyzed based on three key metrics: actors’ satisfaction, space utilization, and operational efficiency, which collectively inform a reward function to evaluate and compare alternative scenarios.

  • Actors’ Satisfaction: Focused on patients, satisfaction is determined by their activity status: highest during procedures, lower while waiting, and lowest when placed in the corridor.

  • Space Utilization: Measured as the percentage of time a space is used for its intended purpose. A score of 100 is given if usage aligns with design; otherwise, the score is lower. The average score is calculated across all spaces.

  • Operational Efficiency: Tracked by comparing actual activity durations to benchmark durations. A score of 100% is given if actual time matches the benchmark, less than 100% if it exceeds the benchmark, and no bonus if completed earlier.

These metrics are combined into a reward function, which assigns a weighted score to each scenario based on its performance across the three criteria. The reward function enables a comparative evaluation of the four options (A-D) for handling post-procedure patients when the holding room is full.

The analysis reveals that options A and B—moving one patient to the corridor while keeping the other in the Cath Lab—are preferable. These options allow Turnaround Time (TAT) to proceed in one lab, minimizing delays for subsequent procedures while inconveniencing only one patient. In contrast, option C (moving both patients to the corridor) enables TAT in both labs but overcrowds the corridor and dissatisfies both patients. Option D (keeping both patients in the Cath Labs) delays TAT in both labs, reducing overall efficiency.

By leveraging the reward function, the system identifies options A and B as the most balanced, optimizing patient satisfaction, space utilization, and operational efficiency.

Narrative and Agent

The Narrative serves as our agent in this system, representing a main goal to achieve through a series of events. Narratives are divided into two types: Active Narratives and Static Narratives. Active Narratives act as our model agents, representing a comprehensive treatment plan for a patient, such as a catheterization procedure. To complete the procedure, multiple events must be executed, including preparing the room, preparing the patient, and moving the patient to the catheterization lab. These Active Narratives must learn to collaborate to maximize a global utility function, which balances space utilization, time efficiency, and patient satisfaction. Each Active Narrative acts to optimize this global utility function while coordinating with others to efficiently use shared resources, such as medical staff and spaces.

On the other hand, Static Narratives represent consistent, repeatable processes that run in parallel with Active Narratives. They control non-medical actors, such as patients or visitors, and introduce non-deterministic elements into the environment. For example, patients or visitors may interact with resources unpredictably, forcing Active Narratives to adapt to dynamic situations.

Narratives have several attributes, including an ID, the actor they are linked to, a creation time, a real-time map indicating available spaces, and a real-time dictionary tracking available actors. Active Narratives also include a list of events to be performed. While these events do not have a strict order, some require specific patient statuses achieved through prior events. In contrast, Static Narratives must execute their events in a predefined sequence.

All Narratives are managed by a Narrative Manager, which decides at each timestamp which events from which Narrative will maximize the global utility function. The Narrative Manager then instructs the relevant Narrative to execute the chosen event. The Narrative, in turn, selects the most suitable room and actors to perform the event. Additionally, the Narrative Manager can prioritize events not directly tied to a specific Narrative but necessary to support Active Narratives, such as cleaning a room to make it available for future events.

Each Narrative has its own utility function, calculated based on space utilization, patient satisfaction, and the time taken to complete the Narrative. The Narrative Manager also maintains a global utility function, which aggregates these metrics across all Narratives. The system’s goal is to find a balance between the global utility function and the local utility functions of individual Narratives, optimizing the overall performance score. This tradeoff ensures that both individual and collective objectives are met, enhancing the efficiency and effectiveness of the hospital’s operations.

Centralize vs Decentralized Approach

Centralized Approach

In the centralized approach, the Narrative Manager acts as the sole intelligent entity responsible for allocating resources to Narratives. Using Classical Reinforcement Learning (RL), specifically Deep Q-Learning, the Narrative Manager learns to optimize resource allocation over time. It evaluates the state of the system, including available resources, ongoing events, and the needs of each Narrative, and decides how to assign resources to maximize the global utility function. This utility function balances space utilization, patient satisfaction, and operational efficiency. The Narrative Manager has full control over resource allocation and can also lock resources for tasks not directly tied to any Narrative, such as cleaning rooms or preparing equipment. By learning through trial and error, the Narrative Manager develops a policy that ensures efficient and fair resource distribution, ultimately improving the overall performance of the hospital department.

Decentralized Approach

In the decentralized approach, each Narrative acts as an intelligent agent operating within a multi-agent reinforcement learning framework. At the start, each Narrative is allocated a fixed budget of "money" to participate in a two-step auction for resources. Narratives must strategically "bet" on resources during the auction, aiming to secure the resources they need to complete their events. Each Narrative learns to optimize its bidding policy to maximize its individual reward function, which reflects the satisfaction of the patient associated with that Narrative. At the same time, the system also considers the global reward function, which accounts for overall space utilization and operational efficiency. Through training, Narratives learn to balance their individual goals with the collective good, making intelligent bids that not only benefit their own progress but also contribute to the global optimization of the system. This decentralized approach introduces competition and collaboration among Narratives, fostering a dynamic and adaptive resource allocation process.

Centralize Approach Model

Model

In the centralized approach, the Narrative Manager serves as the central decision-making entity, utilizing Deep Q-Learning to allocate resources efficiently. Deep Q-Learning is a reinforcement learning algorithm that combines Q-Learning with deep neural networks to handle high-dimensional state and action spaces. The Narrative Manager operates in a Markov Decision Process (MDP) framework, where the state represents the current configuration of the system, including the availability of resources (e.g., rooms, medical staff), the status of ongoing events, and the progress of each Narrative. The actions involve assigning specific resources to Narratives or locking resources for non-Narrative tasks. The reward function is designed to balance multiple objectives: patient satisfaction (based on waiting times and procedure progression), space utilization (percentage of time spaces are used for their intended purpose), and operational efficiency (how closely activity durations align with benchmark times). The reward function is computed as a weighted sum of these metrics, with weights adjusted to reflect the relative importance of each objective.

To guide the learning process, heuristics are incorporated to reduce the exploration space and accelerate convergence. For example, priority rules are applied to ensure critical tasks (e.g., emergency procedures) are allocated resources first. Additionally, the Narrative Manager uses a replay buffer to store past experiences (state, action, reward, next state) and samples from this buffer to train the neural network. The network architecture typically consists of several fully connected layers with ReLU activation functions, followed by an output layer that estimates the Q-values for each possible action. The Q-value represents the expected cumulative reward of taking a specific action in a given state and following the optimal policy thereafter. The Narrative Manager updates its policy using the Bellman equation, minimizing the temporal difference error between predicted and target Q-values.

Training 

The training process involves simulating numerous episodes of hospital operations, each representing a full day of activities in the Cardiac Catheterization Lab (CCL). During each episode, the Narrative Manager observes the state, selects actions based on its current policy (using an epsilon-greedy strategy to balance exploration and exploitation), and receives rewards based on the outcomes. The epsilon-greedy strategy starts with a high exploration rate (epsilon) to encourage the exploration of diverse actions and gradually reduces it over time to favor exploitation of the learned policy. The neural network is trained iteratively using mini-batches sampled from the replay buffer, with the Adam optimizer used to update the network weights. Training continues until the policy converges, as indicated by stable Q-values and consistent performance across episodes.

Results

After training, the centralized approach demonstrates improved performance in resource allocation compared to a naive baseline, but it falls short of the optimal results achieved by the A algorithm*. The Narrative Manager successfully learns to prioritize critical tasks and allocate resources in a way that balances patient satisfaction, space utilization, and operational efficiency. For example, it reduces patient waiting times and improves the utilization of key spaces like the Cath Labs and holding rooms. However, the results are suboptimal in scenarios with high resource contention or complex interdependencies between Narratives. The Deep Q-Learning model struggles to fully capture the long-term consequences of certain actions, leading to occasional inefficiencies, such as delayed procedures or underutilized resources. Additionally, the heuristic rules, while helpful, sometimes constrain the model’s ability to discover more innovative allocation strategies. Despite these limitations, the centralized approach provides a solid foundation for further refinement, such as incorporating more advanced reward shaping techniques or hybridizing with other optimization methods to bridge the gap toward A*-level performance.

Decentralize Approach Model

Model

In the decentralized approach, each Narrative operates as an independent intelligent agent within a multi-agent reinforcement learning (MARL) framework. Unlike the centralized model, where a single Narrative Manager makes all decisions, here each Narrative has its own neural network to learn and optimize its bidding strategy in a two-step auction system. At every timestamp, Narratives participate in auctions to compete for resources such as rooms, medical staff, and equipment. Each Narrative is allocated a fixed budget of "money" at the start, which it uses to place bids on resources. The goal of each Narrative is to maximize its individual reward function, which reflects the satisfaction of the patient associated with it, while also contributing to the global reward function that considers overall space utilization and operational efficiency.

The auction process works as follows: in the first step, Narratives submit bids for resources based on their current needs and the expected utility of acquiring those resources. In the second step, the Narrative Manager evaluates the bids and allocates resources to the highest bidders, ensuring that no resource is over-allocated. The Narrative Manager can also intervene to lock resources for non-Narrative tasks, such as cleaning or maintenance, to maintain system-wide efficiency. Each Narrative’s neural network is trained to predict the optimal bid for a given resource, considering factors like the urgency of the task, the availability of alternative resources, and the current state of the system. The networks are typically composed of fully connected layers with ReLU activations, followed by an output layer that estimates the Q-values for different bidding strategies.

Training 

Training in the decentralized approach is more complex due to the multi-agent nature of the system. Each Narrative’s neural network is trained simultaneously, and the learning process must account for the interactions and competition between agents. The training involves simulating numerous episodes of hospital operations, with each episode representing a full day of activities in the Cardiac Catheterization Lab (CCL). During each episode, Narratives observe the state of the system, place bids based on their current policies, and receive rewards based on the outcomes of their actions. The reward function for each Narrative includes components for patient satisfaction (e.g., reduced waiting times), resource utilization (e.g., efficient use of rooms and staff), and task completion time (e.g., adherence to benchmark durations).

To encourage cooperation and prevent destructive competition, the training process incorporates shared global rewards that incentivize Narratives to consider the overall system performance. Each Narrative updates its neural network using a policy gradient method, such as Proximal Policy Optimization (PPO), which is well-suited for multi-agent environments. The networks are trained iteratively, with each agent learning to improve its bidding strategy based on the rewards received and the actions of other agents. Over time, the Narratives develop sophisticated policies that balance individual and collective goals, leading to more efficient resource allocation.

Results

The decentralized approach, referred to as Scenario D, achieves a general performance score of 66.9, which is better than the centralized model’s score (referred to as Scenario C) of 62.4 but lower than the optimal scores of 70.8 achieved by the A* algorithm (Scenarios A and B). The decentralized model demonstrates significant improvements in resource allocation and patient satisfaction compared to the centralized approach, as the Narratives learn to make intelligent bids that reflect both their individual needs and the global objectives. For example, the model reduces contention for critical resources like Cath Labs and holding rooms, leading to smoother workflows and shorter patient waiting times.

However, the decentralized approach still falls short of the A* algorithm’s performance, particularly in highly complex scenarios with intense resource competition. The multi-agent nature of the system introduces challenges such as coordination overhead and suboptimal equilibria, where Narratives may settle into bidding strategies that are locally optimal but globally inefficient. Despite these limitations, the decentralized model significantly narrows the gap between the centralized approach and the A* algorithm, demonstrating its potential as a scalable and adaptive solution for resource allocation in dynamic environments like hospitals.

Conclusion, Limitations and Future Work

Conclusion​​

In this study, we explored both centralized and decentralized approaches to optimize resource allocation in a hospital’s Cardiac Catheterization Lab (CCL). The centralized approach, leveraging Deep Q-Learning, demonstrated improved performance over naive baselines but fell short of the optimal results achieved by the A* algorithm. The decentralized approach, utilizing a multi-agent reinforcement learning framework with an auction-based system, showed significant promise by achieving a higher general performance score (66.9) compared to the centralized model (62.4) and narrowing the gap toward the A* algorithm’s optimal score (70.8). These results highlight the potential of decentralized, multi-agent systems to handle complex, dynamic environments like hospitals, where coordination and adaptability are critical. Both approaches provide valuable insights into balancing individual and global objectives, paving the way for more efficient and patient-centric healthcare operations.

Training 

Despite their strengths, both approaches have notable limitations. The centralized model struggles with scalability and long-term planning, as the single decision-making entity cannot fully capture the intricate interdependencies between Narratives and resources. The decentralized model, while more adaptive, faces challenges related to coordination and competition among agents, leading to suboptimal equilibria in highly contested scenarios. Additionally, both models rely heavily on accurate simulations for training, which may not fully capture the unpredictability of real-world hospital environments. The reliance on predefined reward functions and heuristics also limits the models’ ability to discover novel strategies that could further enhance performance. Finally, the computational complexity of training multiple agents in the decentralized approach can be resource-intensive, posing practical challenges for real-time implementation.

Results

Future research should focus on addressing the limitations of both approaches while exploring hybrid models that combine their strengths. For the centralized approach, integrating advanced techniques like hierarchical reinforcement learning or meta-learning could improve long-term planning and scalability. For the decentralized approach, developing mechanisms to enhance cooperation among agents, such as communication protocols or shared memory systems, could mitigate coordination challenges and reduce suboptimal equilibria. Additionally, incorporating real-world data and feedback loops into the training process could make the models more robust and adaptable to dynamic environments. Exploring alternative reward structures, such as curriculum learning or multi-objective optimization, could also help balance individual and global objectives more effectively. Finally, deploying these models in real-world hospital settings for pilot testing would provide valuable insights into their practical applicability and potential for further refinement.

bottom of page