AbXtract is a cloud based bioinformatic platform that allows you to rapidly analyze any antibody based sequencing dataset. Both Sanger and NGS files can be analyzed, and physical clones identified by Sanger sequencing can be correlated with their NGS cluster equivalents. Sequence files are uploaded to, and analyzed in, the AWS cloud within the context of OpenEye’s Orion, making sharing of data, results and analyses extremely straightforward, and secure. Once analysis is complete, gene synthesis ready files of potential leads are generated, allowing rapid outsourcing to synthesize your leads.


At Specifica we’ve been using next generation sequencing to quality control naïve libraries and analyze selection outputs for years, developing our own in-house software to do so. As the value of using NGS in antibody discovery became clearer, and our platform more straightforward to use, we realized this was a software platform we should share, to make the use of NGS in antibody discovery more widely available.

Software is not our primary business, so we partnered with OpenEye, an international software company also headquartered in Santa Fe, New Mexico, to co-market what we named AbXtract.

OpenEye is an industry leader in computational molecular design for drug discovery, offering Orion, the only cloud-native fully integrated software-as-a-service molecular modeling platform. By integrating AbXtract into the Orion platform, we were able to bring Orion’s unlimited computation and storage, as well as the powerful data sharing, visualization and analysis tools to antibody discovery, all within an open development platform.

The percentage of binding clones (≤100nM measured by SPR) versus non-binding clones (>100nM) at different percentage clone abundances

Why use NGS and AbXtract
rather than random colony picking?

Random colony picking inherently undersamples a selection output

  • Even the best equipped labs can’t pick more than 30,000 colonies
  • Diversity is always skewed
  • Shared sequence identity within clusters increases redundancy

If 90% of clones are represented by just a few sequences, accessing full diversity of an output is next to impossible by clone picking

  • Even more so if those sequences belong to the same cluster

NGS samples a complete output (or immune response)

AbXtract identifies unique clones and clusters binding the target of interest, as shown in the figure

  • ≥0.001% abundance >90% of antibodies usually bind the target
  • ≤0.001% abundance antibodies may or may not bind the target

The highest affinity antibodies are not always the most abundant

NGS and AbXtract can generate up to fifty times more leads than picking random clones

Learn more about what AbXtract does

Read more

The AbXtract Advantage

Uncover more leads: Increase the number of cluster leads five- to fifty-fold compared to random colony screening

Increase sequence diversity and cluster representation: Explore the entire sequence diversity within selected populations, even rare clones typically missed by low-throughput methods that favor more abundant antibodies

Prioritize promising leads: Tie in known data, and even low-throughput assay data, to prioritize leads with the most favorable developability and biophysical profiles

Minimize costs: Achieve high throughput for a fraction of the cost of conventional assay runs

Equip the entire discovery team: Automated workflows for novice users, while allowing expert users to fully configure their settings

Two antibodies from the same cluster, one with sequence liabilities and self association, the other without sequence liabilities or self association
Antibody clusters identified by AbXtract. For each cluster (in columns), the different antibodies and their read count are indicated in the rows, with the cluster sequence logo indicated above each of the boxed clusters

Why cluster antibody sequences?

Antibody sequences are clustered by machine learning using physicochemical amino acid properties

Antibodies in the same cluster are expected to have the same paratope and hence bind the same epitope

Exploring additional antibodies in the same cluster can yield antibodies with the same biological activity, but spanning a range of affinities, or lacking sequence liabilities

Antibodies in different clusters are more likely to bind different epitopes, but can also bind overlapping epitopes

A real-life AbXtract example

Picking 96 clones from an output selected against SARS-CoV-2 S1 identified 31 different clones, corresponding to 19 clusters

  • 23 clones were tested as IgG, and all recognized RBD
  • The affinities of these picked antibodies ranged from 34 pM to 3.3 nM, with two antibodies better than 100 pM as measured by surface plasmon resonance
Isoaffinity plot of 13 picked clones , or 143 clones identified by AbXtract, synthesized and expressed . Affinities are indicated by the diagonal lines

AbXtract analysis of PacBio next generation sequencing of selections against RBD identified an additional 328 clusters in the RBD selection

  • 173 of 214 non-redundant clones (81%), representing 70 additional clusters, expressed as IgG recognized RBD
  • The affinities of these NGS identified clones ranged from 21 pM to 645 nM as measured by surface plasmon resonance, with 31 antibodies with affinities better than 100 pM. However, the off rates of many clones measured by SPR are flatlined, and beyond the SPR capture kinetics
Affinities for the same antibodies measured by KinExA or Carterra LSA are indicated in pM. The blue box represents affinity measurements for the two methods within two-fold of one another

The affinities of 40 clones were measured again by kinetic exclusion (KinExA), with 42.5% showing higher affinities by kinetic exclusion, 52.5% approximately the same (within two-fold) and 5% showing higher affinities by surface plasmon resonance. The highest affinity measured by KinExA was 13 pM: remarkable for an antibody selected directly from a naive semi-synthetic antibody library without optimization.

Using AbXtract: Get Leads From Your Data In Three Easy Steps

Upload sequence files into AbXtract

Up to tens of millions of sequences

  • Sanger, PacBio and MiSeq. HiSeq and NovaSeq coming Q3 2022
  • Low-throughput: FASTA, FASTQ, EXCEL, TSV, CSV

FASTQ quality filtering

Simple demultiplexing of highly multiplexed experiments

Annotate to identify and extract features – CDRs and framework regions of interest (IMGT, Kabat, Chothia, or custom annotations)

Produce annotated records ideally suited for antibody discovery

The percentage of binding clones (>100nM measured by SPR) versus non-binding clones (>100nM) at different enrichment levels.

Cluster antibody sequences

Hundreds to tens of thousands of sequence clusters identified by machine learning

Extract features: CDRs / frameworks (IMGT, Kabat, Chothia, or custom annotations)

Identify and avoid contaminants

Eliminate bad sequences (e.g., seen before or with known biophysical liabilities)

Classify functionality: open reading frames, stops, frameshifts etc.

Population, clone and cluster frequencies

Identify different germlines (or scaffolds in fixed framework libraries, e.g., Specifica Generation 3)

Leverage known data for sequence populations from different experimental conditions

  • Positive or negative selections
  • Round to round enrichment or depletion
  • Overlap between different target populations (e.g., species or target cross-reactivity)

Correlate NGS defined clusters with Sanger-identified clones

Integrate with low-throughput screening-assay data for additional sequences with desired features

Choose optimal antibody leads

Two modes

  • Simple – indicate number of desired leads and AbXtract will provide a file of gene synthesis ready sequences optimized by AbXtract for diversity, abundance, enrichment, reduced redundancy and sequence liabilities
  • Expert – delve into the details, specify each of the parameters yourself and assess the effects of different factors

Identify best performers using NGS metrics derived from experimentally validated studies

Maximize the selection of leads with distinct sequence-based properties

  • Perform unsupervised physicochemical clustering to identify clusters sharing similar binding modes
  • Easily select unexplored antibodies with likely distinct paratope space
  • Identify additional clones that share the same features as those characterized, such as antibodies from the same cluster lacking specific sequence liabilities, or antibodies from the same cluster spanning a broad affinity range

Quantify developmental liabilities from common and customized references

Perform region-of-interest-based enrichment calculations across selection rounds / populations

Identify population overlap at deeper sequencing depth

  • Identify the relative abundance of clones that bind distinct targets, cross-reactive species or variants
  • Identify NGS clones that map to already characterized populations
Continue Exploring

Go Back