This project contains a comprehensive implementation of the Flash Attention 2 algorithm in CUDA, utilizing CUDA Cores ONLY!, along with comparisons to naive attention implementations, Flash Attention ...