Data
We conducted a retrospective study of all narrative notes in an SPMI outpatient clinic of a tertiary hospital in Montreal. The study covered the clinic’s all chronically ill patients, without exclusion, from 2016 to 2020. The most frequent diagnoses in the clinic are schizophrenia, schizoaffective disorder, and bipolar disorder. The patients visited the clinic every 2–4 weeks to see their psychiatrist and/or receive a LAI from the psychiatric nurse. The mental health care services of each hospital and its outpatient clinics in Montreal are designated to a geographical catchment area. The patients who present at a facility outside the catchment are containing their residence are referred to the facilities in their district. There are virtually no private or alternative mental health services available to this population. These conditions guarantee the integrity of our data, where we have complete and accurate healthcare utilization and clinical records for the patient cohort in this study. No consequential changes in the legal tenets, organizational functioning, bed capacity, or clinic staffing occurred during the study period.
We used demographic and care utilization data for each patent as well as unstructured free-text data contained in the clinic progress notes. The patient-level data includes the dates of all outpatient clinic visits, the dates of administration of LAIs, the presence or absence of CTOs granted by the provincial Superior Court including the dates when they were awarded, the dates and durations of all hospitalizations and the dates of all outpatient and ER visits. During the five-year study period, there were 15,415 narrative notes made on a total of 367 patients. Of this total, 77 patients were under a CTO. Table 1 depicts the characteristics of the CTO and non-CTO patients suffering from SPMI. There are 396 hospitalizations during the study period.
There is a single clinical note taken by the psychiatrist (or psychiatric nurse, psychiatry resident) during each outpatient clinic visits. These were de-identified for the purposes of this research. The outpatient narrative notes are organized according to the commonly used “SOAP” system, where each note is divided into Subjective, Objective, Assessment and Plan fields [24]. Subjective are the patient’s words explaining their current condition. Objective contains the clinician’s evaluation concerning one or more aspects of the patient’s mental status, including appearance, behaviour, speech, affect, thought content, and insight/judgment. Assessment describes the overall status. The Plan section describes treatment medications (including dosage), therapeutic and psychosocial interventions.
In this study, we aim at extracting the “health status” from the Assessment and Objective fields in the clinician notes. The former contains a concise evaluation of the health condition of the patient. While the care provider needs to assess the patient as either “stable” or “unstable”, this is the case for only 56% of the notes. In the remaining cases, the evaluation was expressed by a phrase or a short sentence, which is not always grammatical or clearly understandable; for example, “usual complaintive paranoid”. In our data set, 10.95% of Assessment are empty entries.
The Objective field data is more complex, containing significantly more words than the Assessment field. For instance: “adequately groomed. Pleasant, calm, not hypomanic, no psychosis expressed.” Several aspects of the patient’s mental health condition at the time of the outpatient visit, including appearance, behaviour, speech, thought content, and insight/judgment, are noted by the clinician in the Objective field. The average number of words in Objective is 13.52 (±9:73); the lexical diversity, measured by Type-Token Ratio (TTR), is 0.01.Footnote 2 The low TTR means that the Objective fields mainly consist of a small set of medical terms to evaluate the patient. To provide an intuitive understanding of the low diversity of vocabulary; the top 20 words represent 32% of the whole corpus. In our data, 9.56% of the Objective fields are empty entries.
We observed that there are style differences among the two psychiatrists in the clinic in term of note taking. In the Objective field, for example, psychiatrist B has a richer vocabulary (with 6292 unique words) than psychiatrist A (with 2984 unique words). The latter, however, uses longer sentences with 8.06 words in average, whereas the average sentence length of the former is 4.97 words.
Routine text cleaning and pre-processing procedures are applied, includes removing stopping words, removing special characters, lemmatization, and spelling correction. For negations, we tag all words after both common negation words and medical shorthand negation words. The data processing and analysis flowchart is provided in Fig. 1, which also highlights the pertinent fields of the narrative notes, and the text mining methods deployed for them.
Mining the Narrative Notes
Appendix B in the online supplement provides five examples for the content of the Objective field. The corresponding values of the confounding variables are also depicted to provide clarity. To automate the processing of the 15,415 narrative notes, we utilized the Named Entity Recognition (NER) model, which played a crucial role in identifying and categorizing keywords within the Objective field. These keywords were classified into one of nine categories: appearance, behavior, danger, impulse control, insight, language, mood, thought content, and thought process. The NER model is an advanced deep convolutional neural network that learns identification rules from annotated training data, enabling it to recognize and categorize entities accurately.
Our approach involved enhancing an open-source pre-trained model known as “ScispaCy” [25], which is designed for identifying medical terminologies. By fine-tuning this model, we extended its capabilities to extract keywords specifically related to psychiatric symptoms observable in patients with SPMI. This fine-tuning process is both data-efficient and computationally inexpensive, allowing for practical and effective keyword extraction in psychiatric contexts. With less than one thousand manual annotations, we achieved a performance of 92.4% precision and 92.5% recall.
The deep convolutional neural network architecture of the NER model enables it to capture complex patterns and relationships within the text, making it highly effective for identifying relevant entities. By leveraging the power of deep learning, the model can handle large volumes of narrative notes, providing accurate and reliable keyword identification that enhances the analysis and understanding of psychiatric symptoms in clinical practice. This automated approach significantly improves the efficiency of processing and analyzing narrative notes.
Estimation of the hospitalization risk
We used a mixed-effect logistic regression model to estimate the likelihood that the patient will be readmitted before the next clinic visit. In our fixed time-period model, the “time-period” was not universal among all the subjects. It was patient-specific, defined according to the patient’s normal frequency of scheduled clinic visits. For example, for patients whose appointments were scheduled three weeks apart, the duration for the fixed time-period was designated as three weeks; and for those who were scheduled to visit every other week, it was designated as two weeks. We used the following representation:
$$\begin{array}{l}{Logit}(P({Y}_{i,t+1}=1))={\alpha }_{i}+{\beta }_{1}{{Before\; CTO}}_{{it}}+{\beta }_{2}\,{{During\; CTO}}_{{it}}\\ {+\beta }_{3}\,{{Injection}}_{{it}}+{\beta }_{4}\,{{Clinic\; Visit}}_{{it}}+{\beta }_{5}\,{{Prescription\; Change}}_{{it}}\\ +{\beta }_{6}\,{{Prescription\; Change}}_{{it}}\times {{Before\; CTO}}_{{it}}\\ {+{\beta }_{7}\,{Prescription\; Change}}_{{it}}\times {{During\; CTO}}_{{it}}\\ +{\beta }_{8}{{Treatment\; Dropout}}_{{it}}+{\beta }_{9}\,{{Hospital\; Discharge}}_{{it}}\\ +\mathop{\sum }\limits_{k=10}^{K}{\beta }_{k}{Confounding\; Variabl}{e}_{{itk}}+{\epsilon }_{{it}}\end{array}$$
The dependent variable Yi,t+1 denotes the hospitalization of patient i in time period immediately following the clinic visit in time period t. It is a binary indicator, where the value 1 indicates hospital admission and the value 0 indicates no hospital admission during period t + 1. The independent variables are the medical interventions i.e., CTO, LAIs, and prescription changes as well as the patient’s adherence to scheduled clinic visits. We used the following two indicator variables pertaining to CTOs: Before CTO was assigned the value 1 when the patient was not under the CTO intervention during time t but would become under an active CTO later; it took the value 0 otherwise. During CTO was assigned the value 1 when the patient was under an active CTO intervention during time t; it took the value 0 otherwise. Under this variable design, if both Before CTO and During CTO took the value 0, then the patient was a non-CTO patient who was never under any CTO intervention. Clearly, both variables cannot take the value 1 simultaneously.
To represent LAIs, the binary variable Injection was assigned the value 1 if the patient was receiving injections at the time-period of interest; and, it took the value 0 otherwise. The binary variable Clinic Visit was assigned the value 1 if the patient followed the appointment schedule and visited the outpatient clinic during period t; it was assigned the value 0 otherwise. The binary variable Prescription Change was assigned the value 1 if the prescription was changed in the current time-period; it was assigned the value 0 otherwise. This information is collected from the Plan field. To illustrate, Fig. 2 shows the timeline of medical events for a patient. Vertical dotted lines indicate ‘unstable’ notes; green vertical blocks indicate hospitalizations; horizontal blocks indicate the prescription for the patient during a certain time-period; the name of each medication is shown on the y axis; the height of the horizontal block represents dosage for the medication. This specific patient was under a combination of psychiatric medications, had several ‘unstable’ moments, had several prescription changes; and, had one hospitalization. As we can observe from Fig. 2, the prescription was changed in the middle of 2017 at the first ‘unstable’ assessment; but the prescription was not changed at the second ‘unstable’ assessment. Nevertheless, there was no hospitalization until 2020, and the patient remained largely stable during that year. At the end of 2020, however, the patient had become ‘unstable’ – the prescription was changed, and the patient was hospitalized not long afterwards.
In our model, Treatment Dropout is a binary indicator variable, where the value 1 indicates that the patient missed more than three sequential appointments and the value 0 indicates otherwise. For SPMI patients, we considered treatment dropout as a strong indicator of elevated hospitalization risk. Missing appointments were measured via the clinic appointment schedule and with leniency. For example, if a patient is on a two-week schedule for the next appointment, a missing appointment was documented after the patient did not show up in three weeks; if a patient was on a four-week schedule, we utilized a six-week period to declare missing an appointment. Lastly, Hospital Discharge was employed as a binary variable with the value 1 indicating the patient was discharged from the hospital in the current period and 0 otherwise. We did not consider the time spent during the hospital stay, as this study only focuses on the risk of re-hospitalization during outpatient care.
The “health status” of a patient during period t can be correlated with both the medical interventions in t (i.e., CTO, LAI, prescription change) and the health outcome in t + 1 (i.e., re-hospitalization). Therefore, the patient’s health status is represented using a set of confounding variables (i.e., the second last term in the model above) extracted from the outpatient narrative notes through text mining. \({\beta }_{10}\) and \({\beta }_{11}\) are the coefficients for the binary variables Stable and Ustable, indicating the patient’s condition. Stable takes the value 1 when there is the word “stable” in the Assessment field, and Unstable takes the value 1 when there is the word “unstable”. Thus, both variables being zero would indicate that either there are other words, or the field is empty. The variables extracted from the Objective field include the patient’s appearance, behaviour, danger, impulse control, insight, speech, affect, thought content, and insight/judgment. Please see Appendix A in the Online Supplement for the formal definitions of these confounders.
While the confounding variables may not be independent of other, this does not affect the analysis of the effects of interventions or treatment dropout. Although the confounding variables are not the primary focus of this study, it is noteworthy to observe their relationships. The table in Appendix C (see online supplement) illustrates these associations. Since patients are assessed as stable during majority of the clinic visits, the positive aspects of the confounding variables often co-exist in the Objective field. Interestingly, there is also a moderate association between the positive and the adverse variables, reflecting the complexity of patients’ mental state. Focusing solely on the relations among the negative aspects of the confounding variables, adverse mood and adverse thought content emerge as the most strongly associated factors. We have deliberately refrained from further interpreting the relationships between confounding variables, as the primary aim of incorporating these text-mining-based variables is to evaluate whether they enhance the prediction of medical intervention effects and treatment dropout on hospitalization risk for SPMI patients.