This project contains a comprehensive implementation of the Flash Attention 2 algorithm in CUDA, utilizing CUDA Cores ONLY!, along with comparisons to naive attention implementations, Flash Attention ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results