Demographic characteristics of study subjects
We recruited a total of 98 participants (49 SCZ patients and 49 HCs). The demographic characteristics are presented in Table 1. The two groups were matched for sex and there were no significant differences in age and BMI. None of the SCZ patients had taken anti-psychotic drugs for more than 30 days; among 48 patients who underwent gut virome and bacteriome sequencing, 21 patients were drug-naïve, while 16 patients took drugs anti-psychotic drugs for less than 5 days (see Supplementary Fig. 2).
Diversity differences of the gut microbiota in SCZ and HC
For alpha diversity, we calculated bacterial diversity at different taxonomic levels, including Shannon index, Simpson index, Chao1 index and abundance-based coverage estimator (ACE) index shown in Fig. 1A–C and Supplementary Table 1. No significant difference was observed in gut bacterial alpha diversity between SCZ and HCs. For beta diversity, the results generated by PERMANOVA showed significant separation of SCZ and HCs (R2 = 0.015, p = 0.049 on family level; R2 = 0.017, p = 0.017 on genus level; R2 = 0.017, p = 0.011 on species level). The results are shown in Fig. 1D–F.
Gut microbiota data pre-processing
After data pre-processing, on the levels of family, genus and species, there were 50, 130, and 647 bacterial entities remaining. For viral entities, on the family, genus and species level, there were 34, 199, and 139 viral entities remaining.
Differential gut bacteria
MaAsLin2 analysis and ANCOM-BC were used to identify bacterial signatures associated with schizophrenia participants. Combining the bacteria showed significant differences between SCZ and HC group generated by MaAsLin2 analysis and ANCOM-BC, at levels of family, genus and species, 7, 14, and 45 bacteria were identified, respectively (Fig. 2). The specific identified bacteria, along with the methods used, as well as the statistical values are described in Supplementary Table 2.
The diagnostic value of identified differential bacteria
At family level, the XGBoost model achieved an area under the receiver operating characteristic curve (AUC) of 0.709 (95% CI [0.706–0.712]) in distinguishing SCZ cases from HCs on the test set. Turicibacteraceae and Moraxellaceae emerged as the most important contributors to predictive performance. Modeling at the genus level yielded an AUC of 0.656 (95% CI [0.653–0.660]), and 0.815 (95% CI [0.811–0.818]) at species level. The details and ranked differential bacteria features based on their importance are shown in Supplementary Fig. 3.
Transkingdom partial correlation pattern differed in SCZ and HCs
We calculated the overall gut viral-bacterial transkingdom correlations on family level (34 viruses and 50 bacteria). There are 153 significant correlations between gut viruses and bacteria in SCZ group, and 132 significant correlations in HC group. The significant correlated number was not differed between two groups (χ2 = 1.689, p = 0.194). Among all significant correlations, there are 88 positive correlations in SCZ group and 56 positive correlations in HC group. The imbalanced ratio between positive and negative correlations in SCZ and HCs showed a significant difference (χ2 = 6.457, p = 0.011). The increased number of correlations in SCZ patients was primarily driven by Mimiviridae, Forsetiviridae, Duneviridae, and Assiduviridae, and the decreased number of correlations in SCZ patients was primarily driven by Rountreeviridae and Intestiviridae (see Fig. 3).
Identified gut viruses and peripheral metabolites co-occurring with differential bacteria
We selected gut viruses and metabolites that most associated with differential bacteria on family level using MCIA analysis. At family level, the RV coefficient value suggested weak to medium relevance between gut virome, gut bacteriome and metabolome, where the values range is from 0.13 to 0.26 (depictured in Fig. 4A). The sum of the first three represents 55.57% variance (PC1: 27.39%; PC2: 21.14%; PC3: 7.04%). The MCIA results at genus and species level were depicted in Supplementary Figs. 4 and 5. The first three PCs captured over 50% of the variance at the family level, indicating that these three PCs can effectively represent and summarize the key information across the three omics data. This suggests that the different omics data we analyzed share strong commonalities, with the main variations or trends concentrated in these dimensions. Therefore, we extracted the variables corresponding to the first three PCs to investigate their interactions across the different omics.
Figure 4B is MCIA projection plot, which showed the first two PCs of MCIA of gut bacteria, gut virome, and metabolomics datasets. The SCZ and HCs separate along horizontal and vertical axis (PC1 and PC2), which explains the largest variance of MCIA. For metabolites, the top 50 metabolites (both on positive and negative side, about top 10%) with greatest weights on PC1 to PC3 were extracted. After removing duplicate metabolites, 214 metabolites were selected among 1086 metabolites on family level. The full list of those selected metabolites was provided in Supplementary Table 3. For gut virome, the top 3 viruses (both on positive and negative side, about top 20%) with greatest weights on PC1 to PC3 were extracted. After the duplicate viruses removed, 13 viruses were selected among 34 gut viruses on family level. We calculated transkingdom partial correlations (taking BMI, age and sex as confounders) between the 7 differential bacteria and their co-occurring 13 gut viruses. There are 7 significant correlations in SCZ group, and 27 significant correlations in HC group. The significant associations number significant differed in SCZ and HC groups (χ2 = 13.057, p = 0.003), which was primarily driven by viruses Rountreeviridae, Pachyviridae, Schitoviridae, and Suoliviridae (depictured in Fig. 4C).
We also calculated MCIA analysis and transkingdom partial correlations between differential bacteria and co-occurring gut viruses at genus level and species level. The sum of the first three represents 50% variance at genus level (PC1: 22.71%; PC2: 16.71%; PC3:10.58%) and represents 38.65% variance at species level (PC1: 18.31%; PC2: 12.75%; PC3: 7.59%). We also calculated transkingdom partial correlations between differential bacteria and co-occurring gut viruses at genus level and species level. At genus level, there were 33 significant correlations in the SCZ group and 51 in the HC group, with the difference being statistically significant (χ² = 3.85, p = 0.04965). However, no such difference was observed at the species level, which may be related to the lower explanatory power of the first three PCs (depicted in Supplementary Figs. 4 and 5).
Enrichment analysis of selected metabolites
Figure 4D delineates the bio-functions of those bacteria at family level associated metabolites characterized by enrichment analysis. After FDR correction, purines and purine derivatives, fatty acids and conjugates, bile acids and derivatives, eicosanoids, benzoic acids and derivatives pathways are significantly enriched. The details including the names of hit metabolites, enrichment ratio and p-value are described in Supplementary Table 4. We also conducted enrichment analyses at genus and species level. We found that, regardless of whether it was at the family, genus, or species level, the metabolites selected by MCIA related to differential bacteria were ultimately enriched in same five metabolic pathways. We have added this part of results in the supplementary results (Supplementary Figs. 4D and 5D; Supplementary Tables 4 to 6).
Effect of gut microbiota and metabolites on SCZ
Bi-direct mediation analyses test the effect of gut viruses and bacteria on SCZ (Model 1 and Model 2), and whether the effect of those gut microbiota on SCZ disease mediate or suppressed by metabolites (Model 3 and Model 4). Results indicated that the total effect of gut viruses on SCZ (Model 1) was not significant, and the total effect not significant mediated by gut bacteria but suppressed by metabolites (Model 3: βindirect = 0.249, LLCI = 0.116, ULCI = 1.156, p = 0.017). The total effect of gut bacteria on SCZ (Model 2) was significant, while the effect was not significant mediate by the gut virus but by metabolites (Model 4: βindirect = 0.159, LLCI = 0.198, ULCI = 3.195, p = 0.026). See Fig. 5. The model fit indices indicated the model 1 to model 4 were fit well and described in Supplementary Table 7.
In model5, after removing observed variables that not significant in latent variables, the latent viral variable (X) was established with 7 gut viruses on family level (Schitoviridae, Pachyviridae, Pithoviridae, Assiduviridae, Suoliviridae, Rountreeviridae, and Demerecviridae), latent bacteria variable (M1) was established with 6 differential gut bacteria on family level (Coprobacillaceae, Enterococcaceae, Erysipelotrichaceae, Turicibacteraceae, Peptostreptococcaceae, and Rikenellaceae), and latent metabolic variable (M2) comprised by 10 metabolities (Glycocholic acid, Prostaglandin H2, Hippuric acid, Syringic acid, Hypoxanthine, Caffeine, Paraxanthine, Theobromine, 1-Methylxanthine and 1-7-Dimethyluric acid). The coefficients of each observed variable to latent variable and coefficients of each regression path are described in Supplementary Table 8. The indirect effect results of Model2 (βindirect = 0.031, LLCI = −1.799, ULCI = 2.56, p = 0.732) showed that the effect of gut bacteria on SCZ was not mediated by gut viruses. And Model5 (a1 = −0.448, LLCI = −0.203, ULCI = −0.003, p = 0.044) showed that the effect of gut viruses on bacteria was significant. Therefore, in the SMM analysis, we restricted latent X to viruses rather than bacteria.
The results of Model5 showed that the total effect of viruses on SCZ was not significant. However, after controlling the effect of all indirect effect (Model5: βindirect_all = 0.299, LLCI = 0.062, ULCI = 1.345, p = 0.032), the direct path showed effect of the selected viruses on SCZ developed was significant (Model5: βdir = −0.545, LLCI = −2.224, ULCI = −0.339, p = 0.008), suggesting the relationship between gut viruses and SCZ significantly suppressed by gut bacteria and metabolites. The suppressed effect primarily driven by indirect effect of path gut viruses → metabolites → SCZ (Model5: βSSM2 = 0.468, LLCI = 0.341, ULCI = 1.86, p = 0.005) rather than the gut bacteria (Model5: βSSM1 = 0.05, LLCI = −0.193, ULCI = 0.426, p = 0.46). The coefficients of indirect path involving two mediators (SMM3) showing the effects of reduction selected viruses to SCZ developed through the increase in the levels of two mediators (Model5: a1 = −0.448; d12 = 0.58; b2 = 0.843; βSMM3 = −0.219, LLCI = −0.98, ULCI = −0.049, p = 0.03). Interestingly, Model5 (b1 = −0.111, LLCI = −4.275, ULCI = 2.002, p = 0.478) suggested that after controlling the effect of viruses, the effect between gut bacteria and SCZ was not significant (see Fig. 5E). The fit index of SMM suggesting this model fits well (GOF = 0.832, CFI = 0.858 and RMSEA = 0.08).