Fabricagen | The Protein Evolution Foundry

## The Funnel Problem

High-throughput computational antibody design can generate millions of candidate sequences. But downstream steps—structure prediction, complex modeling, affinity scoring—are expensive. A single ColabFold prediction might take minutes on a GPU. Multiply that by a million candidates and you have a problem.

The solution is intelligent quality gating: filtering out low-quality candidates early, before they consume expensive resources.

Sequence-Level Filters

The cheapest filters operate on sequence alone:

Biophysical Properties - Charge distribution - Extreme charge can cause aggregation or off-target binding - Hydrophobicity - Excessive hydrophobic patches correlate with expression problems - Instability index - Predicts in vivo half-life from amino acid composition

Liability Detection - Deamidation sites - NG, NS motifs prone to chemical degradation - Oxidation sites - Exposed methionines vulnerable to oxidation - Glycosylation sites - N-X-S/T motifs may cause heterogeneity

Sequence Complexity - Low-complexity regions - Repetitive sequences often misfold - Unusual amino acid composition - Deviations from natural distributions signal problems

These filters are nearly free computationally but can reject 30-50% of generated candidates.

Structure-Level Filters

After fast structure prediction (e.g., ESMFold), additional filters apply:

Confidence Metrics - pLDDT scores - Per-residue prediction confidence; low values indicate disorder - pTM scores - Overall topology confidence

Structural Properties - Compactness - Radius of gyration for expected size - Exposed hydrophobics - Solvent-accessible hydrophobic surface area - Disulfide geometry - Correct pairing and bond geometry

Structure filters add modest computational cost but catch candidates with folding problems invisible to sequence analysis.

Complex-Level Filters

For antibody-antigen complexes:

Interface Quality - Contact count - Minimum buried interface between antibody and antigen - Interface confidence - High pLDDT at the binding interface - Shape complementarity - Geometric fit between surfaces

Binding Mode - Epitope adherence - Does the antibody contact intended epitope residues? - CDR engagement - Are CDR loops making productive contacts?

Complex prediction is expensive, so it runs only on candidates that passed earlier gates.

The Numbers

A typical campaign might generate 100,000 candidates. Without gating: - Structure prediction: ~100,000 GPU-hours - Complex modeling: ~500,000 GPU-hours

With intelligent gating: - Pre-structure filters reject 40%: 60,000 proceed - Structure filters reject 50%: 30,000 proceed - Complex modeling on top 5,000 only - Total: ~3,000 GPU-hours (50x reduction)

Implementation Considerations

Good gating systems need:

Tunability - Different projects have different tolerances
Transparency - Track why candidates were rejected
Adaptivity - Learn from experimental results

The goal isn't to reject everything, but to reject the right things—candidates that would fail expensive downstream steps or, worse, fail in wet-lab validation.

Conclusion

Quality gating transforms computational antibody discovery from theoretically interesting to practically useful. The models can generate endless candidates; gating ensures computational resources focus on the ones worth pursuing.

The Role of Quality Gating in High-Throughput Discovery

Sequence-Level Filters

Biophysical Properties - Charge distribution - Extreme charge can cause aggregation or off-target binding - Hydrophobicity - Excessive hydrophobic patches correlate with expression problems - Instability index - Predicts in vivo half-life from amino acid composition

Liability Detection - Deamidation sites - NG, NS motifs prone to chemical degradation - Oxidation sites - Exposed methionines vulnerable to oxidation - Glycosylation sites - N-X-S/T motifs may cause heterogeneity

Sequence Complexity - Low-complexity regions - Repetitive sequences often misfold - Unusual amino acid composition - Deviations from natural distributions signal problems

Structure-Level Filters

Confidence Metrics - pLDDT scores - Per-residue prediction confidence; low values indicate disorder - pTM scores - Overall topology confidence

Structural Properties - Compactness - Radius of gyration for expected size - Exposed hydrophobics - Solvent-accessible hydrophobic surface area - Disulfide geometry - Correct pairing and bond geometry

Complex-Level Filters

Interface Quality - Contact count - Minimum buried interface between antibody and antigen - Interface confidence - High pLDDT at the binding interface - Shape complementarity - Geometric fit between surfaces

Binding Mode - Epitope adherence - Does the antibody contact intended epitope residues? - CDR engagement - Are CDR loops making productive contacts?

The Numbers

Implementation Considerations

Conclusion

Found this helpful?

The Role of Quality Gating in High-Throughput Discovery

Sequence-Level Filters

Biophysical Properties - **Charge distribution** - Extreme charge can cause aggregation or off-target binding - **Hydrophobicity** - Excessive hydrophobic patches correlate with expression problems - **Instability index** - Predicts in vivo half-life from amino acid composition

Liability Detection - **Deamidation sites** - NG, NS motifs prone to chemical degradation - **Oxidation sites** - Exposed methionines vulnerable to oxidation - **Glycosylation sites** - N-X-S/T motifs may cause heterogeneity

Sequence Complexity - **Low-complexity regions** - Repetitive sequences often misfold - **Unusual amino acid composition** - Deviations from natural distributions signal problems

Structure-Level Filters

Confidence Metrics - **pLDDT scores** - Per-residue prediction confidence; low values indicate disorder - **pTM scores** - Overall topology confidence

Structural Properties - **Compactness** - Radius of gyration for expected size - **Exposed hydrophobics** - Solvent-accessible hydrophobic surface area - **Disulfide geometry** - Correct pairing and bond geometry

Complex-Level Filters

Interface Quality - **Contact count** - Minimum buried interface between antibody and antigen - **Interface confidence** - High pLDDT at the binding interface - **Shape complementarity** - Geometric fit between surfaces

Binding Mode - **Epitope adherence** - Does the antibody contact intended epitope residues? - **CDR engagement** - Are CDR loops making productive contacts?

The Numbers

Implementation Considerations

Conclusion

Found this helpful?

Biophysical Properties - Charge distribution - Extreme charge can cause aggregation or off-target binding - Hydrophobicity - Excessive hydrophobic patches correlate with expression problems - Instability index - Predicts in vivo half-life from amino acid composition

Liability Detection - Deamidation sites - NG, NS motifs prone to chemical degradation - Oxidation sites - Exposed methionines vulnerable to oxidation - Glycosylation sites - N-X-S/T motifs may cause heterogeneity

Sequence Complexity - Low-complexity regions - Repetitive sequences often misfold - Unusual amino acid composition - Deviations from natural distributions signal problems

Confidence Metrics - pLDDT scores - Per-residue prediction confidence; low values indicate disorder - pTM scores - Overall topology confidence

Structural Properties - Compactness - Radius of gyration for expected size - Exposed hydrophobics - Solvent-accessible hydrophobic surface area - Disulfide geometry - Correct pairing and bond geometry

Interface Quality - Contact count - Minimum buried interface between antibody and antigen - Interface confidence - High pLDDT at the binding interface - Shape complementarity - Geometric fit between surfaces

Binding Mode - Epitope adherence - Does the antibody contact intended epitope residues? - CDR engagement - Are CDR loops making productive contacts?