## The Funnel Problem
High-throughput computational antibody design can generate millions of candidate sequences. But downstream steps—structure prediction, complex modeling, affinity scoring—are expensive. A single ColabFold prediction might take minutes on a GPU. Multiply that by a million candidates and you have a problem.
The solution is intelligent quality gating: filtering out low-quality candidates early, before they consume expensive resources.
Sequence-Level Filters
The cheapest filters operate on sequence alone:
Biophysical Properties - **Charge distribution** - Extreme charge can cause aggregation or off-target binding - **Hydrophobicity** - Excessive hydrophobic patches correlate with expression problems - **Instability index** - Predicts in vivo half-life from amino acid composition
Liability Detection - **Deamidation sites** - NG, NS motifs prone to chemical degradation - **Oxidation sites** - Exposed methionines vulnerable to oxidation - **Glycosylation sites** - N-X-S/T motifs may cause heterogeneity
Sequence Complexity - **Low-complexity regions** - Repetitive sequences often misfold - **Unusual amino acid composition** - Deviations from natural distributions signal problems
These filters are nearly free computationally but can reject 30-50% of generated candidates.
Structure-Level Filters
After fast structure prediction (e.g., ESMFold), additional filters apply:
Confidence Metrics - **pLDDT scores** - Per-residue prediction confidence; low values indicate disorder - **pTM scores** - Overall topology confidence
Structural Properties - **Compactness** - Radius of gyration for expected size - **Exposed hydrophobics** - Solvent-accessible hydrophobic surface area - **Disulfide geometry** - Correct pairing and bond geometry
Structure filters add modest computational cost but catch candidates with folding problems invisible to sequence analysis.
Complex-Level Filters
For antibody-antigen complexes:
Interface Quality - **Contact count** - Minimum buried interface between antibody and antigen - **Interface confidence** - High pLDDT at the binding interface - **Shape complementarity** - Geometric fit between surfaces
Binding Mode - **Epitope adherence** - Does the antibody contact intended epitope residues? - **CDR engagement** - Are CDR loops making productive contacts?
Complex prediction is expensive, so it runs only on candidates that passed earlier gates.
The Numbers
A typical campaign might generate 100,000 candidates. Without gating: - Structure prediction: ~100,000 GPU-hours - Complex modeling: ~500,000 GPU-hours
With intelligent gating: - Pre-structure filters reject 40%: 60,000 proceed - Structure filters reject 50%: 30,000 proceed - Complex modeling on top 5,000 only - Total: ~3,000 GPU-hours (50x reduction)
Implementation Considerations
Good gating systems need:
- Tunability - Different projects have different tolerances
- Transparency - Track why candidates were rejected
- Adaptivity - Learn from experimental results
The goal isn't to reject everything, but to reject the right things—candidates that would fail expensive downstream steps or, worse, fail in wet-lab validation.
Conclusion
Quality gating transforms computational antibody discovery from theoretically interesting to practically useful. The models can generate endless candidates; gating ensures computational resources focus on the ones worth pursuing.