simultaneously reason over; sequence, structure, and function
had to transform three dimensional structure and function into discrete alphabets, and construct a way to write every three dimensional structure as a sequence of letters
the ESM3 architecture is highly scalable due to its transformer backbone and all-to-all reasoning over discrete token sequences
at its largest scale, ESM3 was trained with 1.07e24 FLOPs on 2.78 billion proteins and 771 billion unique tokens, and has 98 billion parameters