Recent advancements in healthcare have witnessed significant strides, particularly in the realm of oncology, which has led to the development of innovative technologies and therapies that promise better patient outcomes. However, alongside these improvements, the complexity and specialization of care delivery have also escalated, resulting in higher patient-specific costs. This growing financial burden is a significant challenge for health systems worldwide, making it imperative to explore value-based healthcare (VBHC) approaches. VBHC aims to prioritize patient outcomes while minimizing resource utilization and costs across the entire treatment pathway. One of the main challenges in realizing VBHC is accurately estimating costs at the patient level, especially in intricate care settings like colorectal cancer (CRC). Addressing this issue, a method called “cost mining” has emerged, integrating process mining (PM) with cost analysis to uncover high-cost pathways and specific cost drivers using real-world patient-level data.
Initial Data Collection
To kick off the cost mining process, it is crucial to gather comprehensive activity and cost information of a patient’s entire treatment history—ranging from initial screening and diagnosis to treatment and follow-up. This data must include dates or timestamps for each activity, allowing for detailed tracking of the patient journey. One of the strengths of this method lies in its ability to include patients who are still undergoing treatments since costs are estimated at the activity level. However, to ensure accurate group comparisons or total cost estimations, having treatment start dates is crucial to filter out incomplete cases and avoid downward bias in total pathway cost estimates.
Various methods can be employed to estimate costs, including activity-based microcosting approaches or reimbursement data such as Diagnosis-Related Groups (DRGs). In Australia, where this method has been tested, reimbursements are granular, capturing the intricate details and interdependencies across integrated pathways. For instance, the costs associated with the chemotherapy stage comprise several activity-based reimbursements, which means the cost statistics can reflect patient differences. A patient requiring chemotherapy at a later CRC stage may entail more consultations, treatments, or regimens than another patient undergoing chemotherapy at a different stage. Summarizing these data requirements forms the critical first stage of integrating cost mining into healthcare pathways.
Data Preparation
Following the initial data collection, the next step involves integrating this data into a longitudinal database that maps the entire patient pathways and associated costs per activity. Each data source identified in the initial stage must contain unique identifiers, such as anonymized patient identifiers, to link them accurately. This integration process is significant, as it involves the exclusion of incomplete cases—ensuring the robustness of the database. For example, in the CRC case study, integrating data from different sources resulted in a dataset of 4,246 patient records, covering approximately 4 million activities.
Before diving into the analysis, it is essential to assess whether the data linkage introduced any bias owing to data loss. This involves comparing patient characteristics across the original data sources and the final integrated dataset. Conducting such an assessment helps verify the integrity of the linked data and ensures that the analysis that follows is based on a representative and unbiased sample. This rigorous data preparation stage lays the groundwork for accurate and meaningful cost mining analysis, paving the way for an in-depth exploration of patient pathways and cost structures.
Constructing the Event Log
Once the data is ready, the next phase is constructing an event or activity log, which is crucial for the cost mining analysis. An activity log typically contains one row per activity with its start and end times, but it only supports additional data at the unit of each activity. Conversely, event logs offer greater flexibility by containing two or more rows per activity—considering the start and end points as individual events. This data structure allows for modeling scenarios where different resources execute distinct elements of a single activity. For instance, a patient might start a medication-based treatment at a specialist care facility but complete it weeks later in a hospital for acute complications.
For cost mining purposes, an event log is preferable to an activity log due to the varying durations of healthcare activities. Some activities may span weeks or months, while others may take mere minutes. However, the most significant challenge in process mining within the healthcare sector is the inconsistent nature of the required data. Linking and combining data sources to cover integrated pathways, particularly in settings like CRC with lengthy and dispersed treatments, can be daunting. Possible solutions to these challenges include using heuristics to estimate unknown process end times or assuming that the start date of one activity signifies the end date of the previous one.
In our CRC case, entire integrated care pathways were constructed from primary care through to survivorship outcomes without making assumptions or imputations. This comprehensive approach ensures that the event log accurately reflects all the activities and associated costs within the patient pathway, providing a solid foundation for the subsequent cost mining analysis.
Cost Mining Analysis
The cost mining analysis begins with executing process mining on the entire event log constructed in the previous stage, typically using an inductive miner algorithm. This algorithm is particularly suited for healthcare processes because it produces inspectable process maps with a high degree of simplification. Utilizing the provided code, the resulting process map displays cost statistics—such as mean, minimum, maximum, and total costs—for each activity in the form of ‘decorations,’ or labels on the process map. The visual output thus includes the summary statistics of costs per activity and total costs of care per trace, covering each individual patient trajectory.
To ensure accurate cost estimations, it may be useful to restrict the sample to completed cases only to avoid underestimating total pathway costs. This can be achieved by filtering the data to include cases with an observed life event, such as survivorship, death, or no treatment within two years. The cost mining algorithm aggregates cost data from the traces derived from process mining, aligning all traces to calculate a statistically relevant value for each activity. Each process map generated through this method provides actionable insights into the cost structure of patient pathways, facilitating a comprehensive understanding of where and how costs accrue during treatment.
By decorating the process map with cost details, the cost mining technique provides a visual and statistical representation of the complete treatment pathway, highlighting specific activities and stages with significant financial implications. This detailed analysis enables healthcare providers and policymakers to pinpoint high-cost areas and identify potential opportunities for optimizing care pathways to enhance cost-efficiency and patient outcomes.
Drilling Down to Explore Variation
The comprehensive process model generated through the cost mining analysis offers a robust framework for further exploration, notably by drilling down into the data to understand the variation in pathways and cost drivers. This involves investigating critical decision points, pinpointing costly processes, and making case-mix comparisons across various patient groups defined by factors such as sex, age group, tumor location, tumor stage, CRC type, patient’s rurality, and indigenous status.
In the illustrative CRC case, this detailed examination revealed that colon cancer was associated with significantly higher costs across the entire care continuum compared to rectal cancer. Admissions and chemotherapy were identified as the most expensive elements of treatment. A closer look at the data showed that admitted episodes, encompassing 1,965 patients, accounted for a staggering 93.34% of total costs, amounting to $56.6 million AUD out of the overall $60.63 million AUD. On the other hand, the total cost of chemotherapy drug treatments for 218 patients was 6.62% of the total, while general practitioner visits, diagnostic testing, and prescriptions constituted less than 0.01% of overall costs.
The technique of drilling down also uncovered the relationship between treatment-related factors, such as cancer stage, and healthcare costs. For example, the average costs of care ranged significantly from $10,379 AUD to $41,643 AUD per patient, varying substantially across different stages of treatment. Specific treatment regimens, such as the Mfolfox 6 chemotherapy regimen, were found to be particularly costly, with considerable variation in costs depending on the cancer stage. Stage C patients, for instance, incurred much higher costs for the Mfolfox 6 regimen compared to other stages, suggesting a need for further qualitative and quantitative research to understand these variations better.
This in-depth exploratory approach allows stakeholders to account for the temporal nature of care, recognizing that the timing of treatments, such as chemotherapy during late-stage cancer, impacts the associated costs. Consequently, promising avenues for future research and potential protocol adjustments could emerge from understanding the cost implications of specific treatment regimens at various cancer stages.
Applications and Future Directions
Cost mining analysis starts with applying process mining to the event log created earlier, typically using an inductive miner algorithm. This algorithm, ideal for healthcare processes, simplifies complex data into understandable process maps. By running the provided code, the process map reveals cost statistics—such as average, minimum, maximum, and total costs—for each step. These statistics appear as “decorations” or labels on the map, summarizing costs per activity and total costs per patient journey.
To ensure accurate cost assessments, it might be beneficial to restrict the analysis to completed cases to avoid underestimating the total pathway costs. This can be done by filtering the data to include cases with significant life events (e.g., survivorship, death, or no treatment within two years). The cost mining algorithm then compiles cost data from the process traces, producing statistically significant values for each activity. Each process map generated offers actionable insights into the cost structure of patient pathways, helping to understand how costs accumulate during treatment.
By integrating cost details into the process map, the cost mining method visually and statistically represents the complete treatment pathway. It highlights stages and activities with notable financial impact. This detailed analysis helps healthcare providers and policymakers identify high-cost areas and find opportunities to optimize care pathways. As a result, it can improve cost-efficiency and enhance patient outcomes.