Diffusion-TCE

Architecture

The Diffusion-TCE pipeline orchestrates three key computational stages: template selection and target mapping, partial diffusion with RFdiffusion, and sequence refinement with ProteinMPNN. The architecture preserves proven T‑cell receptor (TCR) engagement mechanisms while reshaping the cancer‑targeting interface.

graph TB %% Input Templates A1[8UCD
CD3e-Binding] --> B1[Template Selection] A2[6AL5
CD19-Binding] --> B1 A3[6BA5
CD3-Binding] --> B1 %% Target Mutations A4[FGFR2 S252W] --> B2[Target Mapping] A5[FGFR2 P253R] --> B2 A6[ERBB2 S310F] --> B2 %% Stage 1: subgraph STAGE1 [ ] B1 --> C1[TCE-Mutant Interface Positioning] B2 --> C1 C1 --> C2[TCR-Binding Preservation] C2 --> C3[Cancer-targeting Domain
Identification] end %% Stage 2: Partial Diffusion subgraph C3 --> D1[RFdiffusion
Partial Diffusion
Structure Generation] end %% Stage 3: Sequence Refinement subgraph D1 --> E1[ProteinMPNN
Sequence Generation] end %% Outputs E1 --> F1[PDB Files
FASTA Files] %% Styling style A1 fill:#e1f5fe style A2 fill:#e1f5fe style A3 fill:#e1f5fe style A4 fill:#fff3e0 style A5 fill:#fff3e0 style A6 fill:#fff3e0 style STAGE1 fill:#f3e5f5 style F1 fill:#ffebee

Program Modules

1. Template Selection & Target Mapping

Starting with three distinct TCE scaffolds from the PDB, the pipeline positions each TCE interface to directly contact the point-mutated residue on receptor tyrosine kinases harboring extracullular mutations. This proximity-based approach ensures the redesigned interface can exploit mutation-specific topological changes while maintaining proven T-cell engagement mechanisms.

2. Precision-Guided Interface Design

The workflow begins with explicitly positioning the two structural partners - the fixed mutant receptor chain and the scaffold to be remodeled - so the prospective interface residues sit roughly 5-15 Å apart. At this separation the side-chain radii do not yet clash, but they are close enough that a single round of generative refinement can "grow" atomic contacts across the gap. The movable scaffold is positioned in PyMOL, using the distance wizard and live distance feedback to nudge a pre-chosen anchor residue on the scaffold to ~5 Å from the mutation site while keeping the remaining paratope residues fanned outward; that hand-tuned pose defines the starting backbone envelope the diffusion model elaborates.

3. RFdiffusion Structure Generation

With the partners pre-aligned, the pipeline launches RFdiffusion in partial-diffusion mode. In the CLI, a contig map is specified, where the receptor is frozen and the binder segment is the only region allowed to drift. The pipeline also sets diffuser.partial_T (e.g., 5–35) to limit noise injection so that only the chosen binder residues are randomized and denoised, while the receptor and the distant parts of the binder backbone remain rigid scaffolds. During diffusion the model treats the mutable residues as a cloud of points whose pairwise separations are incrementally perturbed and then steered back toward physically plausible values by the potential, searching for low-energy paths that tuck loops and secondary-structure elements into the receptor surface. This single-pass generative move typically yields dozens to hundreds of candidate binding topologies, each obeying polymer geometry and exploiting slight backbone flex to bury hydrophobics or form hydrogen-bond networks across the new interface. The process leverages Levitate Bio's hosted RFdiffusion endpoint..

4. ProteinMPNN Sequence Design

Once stable backbones are obtained, the pipeline transitions to ProteinMPNN for sequence design. ProteinMPNN encodes the fixed backbone as a graph of inter-residue distances and torsions and runs a masked-language-model inference pass that fills in amino-acid identities with probabilities conditioned on that geometry. Because the receptor is frozen, only the binder chain (or even just the interface positions) is masked so the network focuses its capacity on side-chains that must conform to the newly sculpted pocket. The model proposes dozens of high-log-probability sequences in seconds, each explicitly rotamer-compatible with the backbone and biased toward forming the polar, apolar, or aromatic contacts implied by the topology. Importantly, any residue identities essential for mutant specificity (e.g., sterically complementing the deletion cavity or hydrogen-bonding to a unique side chain) can be hard-coded so that ProteinMPNN threads those constraints while optimizing the remaining sites for overall pack and stability.

5. Output Generation & Validation

The two-stage protocol therefore disentangles geometry discovery (RFdiffusion) from sequence realization (ProteinMPNN). Hand-placing the chains ensures the search starts in the right basin of conformational space; partial diffusion explores only the local moves needed to close the interface without introducing global artefacts; and ProteinMPNN converts the backbone blueprint into chemically coherent, energetically favored sequences. Down-stream, designs are typically ranked with AlphaFold2-Multimer or Rosetta interface ΔG filters, but the backbone-plus-sequence pipeline described above enabled the mutant-specific T-cell-engager libraries: each candidate emerged from a single pass of RFdiffusion constrained by PyMOL-guided proximity, followed by a rapid ProteinMPNN redesign step that encoded the mutant hallmark directly into the binder's paratope. The pipeline generated approximately 40 libraries of mutant-specific TCE structures with complete PDB files and optimized sequences, providing diverse starting points for experimental validation and further computational refinement.

Computational Results: 40+ Mutant-Specific TCE Libraries

Over a focused one‑month sprint on Levitate Bio’s cloud API, which exposes turnkey endpoints for RFdiffusion and ProteinMPNN, I iteratively pushed batched jobs, tuned partial‑diffusion parameters, and harvested sequence redesigns. This intensive run produced more than forty mutant‑specific T‑cell‑engager (TCE) libraries, each containing multiple two‑chain candidates ready for downstream analysis.

The pipeline zeroes in on three clinically relevant point mutations: FGFR2b S252W (PDB 3OJ2), FGFR2b P253R (PDB 3OJM), and ERBB2 S310F (PDB 8VQE). Each mutation is paired with one of three scaffold templates — 8UCD (CD3ε‑binding), 6AL5 (CD19‑binding), and 6BA5 (CD3‑binding) — yielding a diversified matrix of bispecific architectures.

Every design preserves the canonical TCR‑engagement geometry while embedding a de novo interface molded around the mutation-induced topology on the receptor. The result is a catalogue of structurally-sound, expression‑ready constructs that advance the notion of patient‑specific TCEs from concept to tangible in‑silico reality.

Future Work

Experimental validation is the immediate priority, and I am actively seeking collaborators to synthesize top candidates and quantify binding affinity, mutant‑versus‑wild‑type selectivity, and T‑cell activation. Wet‑lab feedback will close the design loop and guide the next computational round. Looking ahead, I plan to integrate Boltz‑2 complex‑stability scoring, ensemble molecular‑dynamics filtering, and interface ΔG ranks to sharpen triage before bench time. Ultimately, scaling the pipeline genome‑wide will require automated template selection and robust success metrics, while GMP‑grade synthesis and regulatory alignment will be essential for clinical translation.

Open Source & Community

Diffusion-TCE is being released under an MIT license, with the computational pipeline, the majority of generated structures, and a detailed methodology made freely available. The goal is to provide a robust starting point for researchers targeting other mutant-driven cancers.

Portfolio