Authors: IJHSB Editorial Board
Publish Date: 30.10.2025
Abstract
Single-cell RNA sequencing (scRNA-seq) has enabled high-resolution atlases of cellular states across tissues, developmental stages, and species. Yet most single-cell datasets remain cross-sectional snapshots, raising a central question: can we infer how cells move through state space and how they will respond to perturbations such as gene knockouts or drug treatments? Transformer-based foundation models for single-cell biology address this challenge by pretraining on millions of cells to learn context-aware representations of genes and cell states. In this article, the term “Transcriptformer” is used as a conceptual label for this class of models. We describe how Transcriptformer embeddings can be combined with RNA velocity and fate-mapping frameworks to obtain trajectory-aware predictions of cell behavior.
Introduction
Single-cell RNA sequencing quantifies gene expression in thousands to millions of individual cells, making it possible to construct high-resolution atlases of cellular states across tissues and species [1,2]. Using analysis frameworks such as SCANPY and Seurat, these data can be visualized, clustered, and integrated into coherent maps of cell states [1,2]. Large community efforts, including the Human Cell Atlas, extend this idea by assembling reference atlases that cover many organs, developmental stages, and individuals [3].
Despite the richness of these descriptive maps, many biological questions are predictive. Beyond asking which states exist, we would like to know how cells move between states and how they will react to specific interventions. Early deep generative approaches, such as single-cell variational inference (scVI) and related probabilistic frameworks, showed that latent-variable models can capture structure in noisy, high-dimensional single-cell data and provide a basis for integration and downstream analysis [4,5]. More recently, transformer-based foundation models such as Geneformer and scGPT have been pretrained on tens of millions of single-cell profiles to learn context-aware representations of genes and cell states, which can then be transferred to diverse downstream tasks [6,7].
In this article, the term “Transcriptformer” is used as an umbrella label for this emerging class of transformer-style foundation models in single-cell biology rather than for any specific software release. When combined with directionality priors derived from RNA velocity and CellRank, Transcriptformers can be made trajectory-aware, yielding embeddings that encode not only where cells are in state space but also where they are likely to go over short and long timescales [8,9]. RNA velocity provides short-term directional information from spliced and unspliced transcript counts, and CellRank converts local velocity fields into probabilistic maps of long-term fates [8,9].
This article provides a concise, didactic overview aimed at advanced students and early-stage practitioners. Figure 1 summarizes the conceptual workflow: assembling large single-cell corpora, pretraining a Transcriptformer, adding directionality via RNA velocity and CellRank, and closing the loop from model-based hypotheses to experimental validation. The goal is not to introduce a new algorithm, but to clarify how existing tools can be combined into a coherent, hypothesis-driven pipeline.

Figure 1. The Transcriptformer workflow: from single-cell data to trajectory-aware predictions. (A) Single-cell data corpus. Large public datasets (for example, the Human Cell Atlas) bring together cells from many tissues, species, and labs. After basic cleaning and standardization, these data form the training pool. (B) Pretraining the Transcriptformer. The model treats a cell like a “document” and each gene like a “word.” By reading millions of cells, it learns embeddings—compact numeric summaries—for genes and cells. These embeddings can be reused for many tasks. (C) Trajectory inference. Cells are placed on a 2-D map (UMAP). Curved arrows show RNA velocity—the short-term direction a cell is likely to move in gene-expression space. Donut charts illustrate CellRank results: the probabilities that a cell will end in Fate A, B, or C. (D) Prediction–validation loop. We make a testable hypothesis (e.g., a gene knockout or drug will lower the chance of Fate A), check it with statistics, then run an experiment to confirm or revise the model.
Notes: UMAP = Uniform Manifold Approximation and Projection. Colors separate concepts only; the diagram is not to scale.
Developments
Background and Concept
Transformer-based foundation models for single-cell data build on an analogy with language modeling. In natural language, a model reads large text corpora and learns to predict words given their context. In single-cell biology, a Transcriptformer instead reads large corpora of single-cell gene expression profiles and learns to predict gene tokens or expression patterns conditioned on a cellular context. The resulting embeddings summarize complex gene–gene and cell–cell relationships in a low-dimensional space [6,7].
Geneformer is a prominent example of this strategy. It is pretrained on roughly 30 million single-cell transcriptomes and learns context-aware embeddings that encode network relationships between genes and cells, which can then be transferred to new tasks such as network inference and perturbation prediction in smaller datasets [6]. Similarly, scGPT is a generative pretrained transformer trained on over 30 million single-cell profiles across multiple modalities and species, providing a general-purpose foundation model for tasks including cell-type annotation, multi-batch integration, multi-omics integration, perturbation-response modeling, and gene network inference [7].
Latent-variable models such as scVI and their implementation in the scvi-tools ecosystem provide complementary probabilistic frameworks for representing uncertainty and integrating heterogeneous datasets [4,5]. Conceptually, Transcriptformers extend these ideas by leveraging deep attention-based architectures and large-scale pretraining. In both cases, pretraining yields general-purpose embeddings, and fine-tuning aligns those embeddings with specific downstream objectives such as separating closely related subtypes, highlighting lineage relationships, or predicting perturbation responses [4–7].
Adding Directionality: RNA Velocity and CellRank
Pretrained embeddings are inherently undirected: they characterize similarities among cells but do not indicate how cells evolve over time. RNA velocity addresses this gap by comparing spliced and unspliced transcript abundance to estimate the near-future change in gene expression for each cell, effectively attaching a short vector to each point in the state space [8]. The scVelo framework generalizes classical velocity approaches by fitting a dynamical model of transcription, splicing, and degradation, improving the reconstruction of transient states and enhancing robustness to technical noise [8].
CellRank builds on these local velocity fields to construct a Markov chain over cells, identifying initial, intermediate, and terminal states and assigning each cell probabilities of reaching specific terminal fates [9]. This yields a probabilistic map of developmental or disease-related trajectories, complete with fate probabilities and gene-expression trends along paths. When Transcriptformer embeddings are combined with RNA velocity and CellRank, the resulting representation is both globally informed, through pretraining on large corpora, and locally directional, through velocity-informed fate mapping [6–9]. In practice, this combination provides a flexible substrate for asking counterfactual questions such as how a perturbation might redistribute cells among future fates rather than merely shifting their static labels.
Methods and Practical Pipeline
A practical workflow begins with a clearly articulated biological question, for example, which genes bias progenitor cells toward one fate rather than another, or how a particular drug affects the probability of a cell entering an inflammatory state. The next step is data curation: assembling relevant scRNA-seq datasets and carefully recording metadata, including tissue of origin, species, experimental protocol, and batch information. Whenever possible, datasets should be linked to broader references, such as Human Cell Atlas resources, to leverage existing annotation and coverage of diverse cell types [1–3].
Initial baseline analysis typically uses established toolkits such as SCANPY and Seurat. This stage includes quality control, normalization, dimensionality reduction and clustering, producing an interpretable state map for checking basic assumptions and identifying major cell populations [1,2]. Because real-world data often come from multiple laboratories and protocols, integration and batch correction are essential. Probabilistic models such as scVI, implemented within the scvi-tools ecosystem, provide a flexible framework for harmonizing heterogeneous datasets and defining a shared latent space [4,5]. Transfer-learning approaches such as scArches build on this idea to map new datasets onto existing reference atlases, enabling cross-study comparison while preserving disease- or condition-specific signals [11]. ResearchGate+2explora.bnc.cat+2
Once a harmonized dataset is available, a Transcriptformer embedding is computed either by applying a publicly released pretrained model, such as Geneformer or scGPT, or by fine-tuning such a model on the specific data of interest [6,7]. RNA velocity is then estimated, for instance using scVelo, and CellRank is applied to derive fate probabilities for each cell [8,9]. At this point, the analyst can formulate explicit, falsifiable hypotheses in the latent space, such as predicting that perturbing a regulatory gene will reduce the probability of reaching a given terminal fate. These hypotheses are evaluated using appropriate statistical models and multiple-testing control, and, where feasible, are taken back to the wet lab for experimental validation. The overall pipeline thus connects large-scale pretraining, trajectory inference, and hypothesis-driven experimentation in a single loop.
Applications
One prominent application of Transcriptformers is cell-type annotation and the discovery of rare or subtle cell populations. By learning from extensive reference corpora, models such as Geneformer and scGPT can improve sensitivity to small subpopulations and stabilize predictions across laboratories, species, and technologies [6,7]. When coupled with integrative frameworks such as scVI, scvi-tools, and Seurat, Transcriptformers help reduce batch effects and enable consistent type labels across studies [2,4,5].
Atlas mapping and reference extension provide another important use case. Large, carefully curated references such as Human Cell Atlas resources serve as anchors, and scArches offers a principled transfer-learning framework for mapping new datasets onto these references while preserving biologically meaningful variation [3,11]. Transcriptformer embeddings can be used within this context to enhance the biological coherence of mapped states and to support cross-modal imputation when additional modalities, such as chromatin accessibility or spatial information, are available. Semantic Scholar
Perturbation-response prediction is a particularly compelling application. Generative models such as scGen demonstrate that learned latent spaces can be used to forecast how cells respond to genetic or pharmacologic perturbations [10]. Transcriptformers extend this idea by leveraging large-scale pretraining and trajectory-aware embeddings to improve generalization, especially in out-of-distribution settings where direct experimental data are sparse [6,7,10]. In principle, this enables in silico screening of interventions and prioritization of candidates for laboratory testing.
Finally, Transcriptformers can aid regulator inference and hypothesis generation. Patterns in attention weights, latent coordinates, and learned gene embeddings may highlight candidate transcription factors, signaling pathways, or ligand–receptor pairs associated with specific trajectories or disease states [6,7]. Combined with CellRank’s identification of early driver genes along trajectories, these models provide a rich source of mechanistic hypotheses that can be explored with targeted experiments [8,9].
Limitations and Good Practices
Despite their promise, Transcriptformers and related tools have important limitations. Batch effects and dataset bias remain central challenges. Even after integration, over-represented tissues, platforms, or disease states can bias the learned representations, leading to overconfident predictions in under-represented regimes [1,2,4,5,11]. Careful dataset curation, sensitivity analyses, and explicit reporting of training data composition are essential.
Annotation inconsistencies across studies also pose risks. Different atlases may apply distinct naming conventions or criteria for defining cell types, and naive label transfer can propagate these inconsistencies. Harmonizing labels, adopting community standards where possible, and providing clear mapping tables between ontologies can mitigate this issue [3].
A further concern is conceptual: correlation does not imply causation. Transcriptformers, velocity methods, and perturbation-prediction models such as scGen all learn from observational or quasi-experimental data [4,6,7,10]. They excel at uncovering associations and counterfactuals in the learned latent space, but they do not by themselves prove mechanistic causality. Model outputs should therefore be interpreted as hypothesis-generating tools whose predictions require orthogonal validation using controlled experiments, genetic perturbations, or longitudinal measurements.
Finally, computational cost and reproducibility need attention. Pretraining and fine-tuning foundation models may demand substantial computational resources, and different implementations can yield subtly different results. Whenever possible, analyses should rely on publicly released model weights and well-documented pipelines, such as those available in scvi-tools and established single-cell workflows [1,2,5]. Recording software versions, random seeds, and detailed preprocessing steps helps ensure that results can be reproduced and extended by others.
Conclusion
Transformer-based foundation models for single-cell data, collectively described here as Transcriptformers, represent a natural extension of pretrain–then–transfer paradigms from language and vision to cell biology. By learning from large, heterogeneous single-cell corpora, these models provide flexible embeddings that can be adapted to annotation, integration, perturbation modeling, and regulator discovery. When combined with trajectory-inference tools such as RNA velocity and CellRank, they become trajectory-aware, supporting quantitative hypotheses about how cells move through state space and how interventions may redirect those paths [6–9].
The most responsible and impactful use of Transcriptformers treats them as partners in hypothesis generation rather than as oracles. Their predictions should be embedded in rigorous statistical analysis and systematically tested against experimental data. As datasets, software ecosystems, and community standards continue to mature, Transcriptformers are likely to become central components of an iterative loop connecting atlases, models, and experiments—and to provide an accessible entry point for students who wish to explore the interface between artificial intelligence and cellular biology.
References
- Wolf, F. A., Angerer, P., & Theis, F. J. (2018). SCANPY: large-scale single-cell gene expression data analysis. Genome Biology, 19, 15.
- Stuart, T., Butler, A., Hoffman, P., Hafemeister, C., Papalexi, E., Mauck, W. M. III, et al. (2019). Comprehensive integration of single-cell data. Cell, 177(7), 1888–1902.e21.
- Regev, A., Teichmann, S. A., Lander, E. S., Amit, I., Benoist, C., Birney, E., et al. (2017). The Human Cell Atlas. eLife, 6, e27041.
- Lopez, R., Regier, J., Cole, M. B., Jordan, M. I., & Yosef, N. (2018). Deep generative modeling for single-cell transcriptomics (scVI). Nature Methods, 15(12), 1053–1058.
- Gayoso, A., Lopez, R., Xing, G., Boyeau, P., Valiollah Pour Amiri, V. V., Hong, J., et al. (2022). A Python library for probabilistic analysis of single-cell omics data (scvi-tools). Nature Biotechnology, 40(2), 163–166.
- Theodoris, C. V., Xiao, L., Chopra, A., Chaffin, M. D., Al Sayed, Z. R., Hill, M. C., et al. (2023). Transfer learning enables predictions in network biology (Geneformer). Nature, 618(7965), 616–624.
- Cui, H., Wang, C., Maan, H., Pang, K., Luo, F., Duan, N., & Wang, B. (2024). scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nature Methods, 21(8), 1470–1480.
- Bergen, V., Lange, M., Peidli, S., Wolf, F. A., & Theis, F. J. (2020). Generalizing RNA velocity to transient cell states through dynamical modeling (scVelo). Nature Biotechnology, 38(12), 1408–1414.
- Lange, M., Bergen, V., Klein, M., Setty, M., Reuter, B., Bakhti, M., et al. (2022). CellRank for directed single-cell fate mapping. Nature Methods, 19(2), 159–170.
- Lotfollahi, M., Wolf, F. A., & Theis, F. J. (2019). scGen predicts single-cell perturbation responses. Nature Methods, 16(8), 715–721.
- Lotfollahi, M., Naghipourfar, M., Luecken, M. D., Khajavi, M., Büttner, M., Wagenstetter, M., et al. (2022). Mapping single-cell data to reference atlases by transfer learning (scArches). Nature Biotechnology, 40(1), 121–130.
