awesome-gpu
github.com/jokeren/awesome-gpu ↗Awesome resources for GPUs
627
GitHub Stars
85
Curated Resources
6
Categories
4 hours ago
Last Refreshed
ArchitectureAlgorithmsApplicationsToolsRuntimeCode Generation
Use this list with your AI agent
Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:
"Show me parallelism resources from awesome-gpu"
Installation instructions →What's inside
Architecture
- Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline StallsParallelism
- Adaptive and Transparent Cache Bypassing for GPUsCache
- APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUsCache
- Controlled Kernel Launch for Dynamic Parallelism in GPUsParallelism
- COOPERATIVE GROUPSParallelism
- Dynamic GPGPU Power Management Using Adaptive Model Predictive ControlResources Management
Algorithms
- A Coordinated Tiling and Batching Framework for Efficient GEMM on GPUBLAS
- AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUsStencils
- CUTLASS: CUDA TEMPLATE LIBRARY FOR DENSE LINEAR ALGEBRA AT ALL LEVELS AND SCALESBLAS
- Demystifying Tensor Cores to Optimize Half-Precision Matrix MultiplyBLAS
- DEVELOPING CUDA KERNELS TO PUSH TENSOR CORES TO THE ABSOLUTE LIMIT ON NVIDIA A100BLAS
- On Optimizing Complex Stencils on GPUsStencils
Tools
- Allinea MAPProfilers
- Analyzing CUDA Workloads Using a Detailed GPU SimulatorSimulators
- CUDAAdvisor: LLVM-based runtime profiling for modern GPUsProfilers
- Demystifying GPU Microarchitecture through MicrobenchmarkingBenchmarking
- Dissecting the NVIDIA Volta GPU Architecture via MicrobenchmarkingBenchmarking
- Effective sampling-driven performance tools for GPU-accelerated supercomputersProfilers
Runtime
Code Generation
- C-for-metal: high performance SIMD programming on intel GPUsProgramming Models
- Cooperative Profile Guided OptimizationsProfile Guided Optimization
- Coordinating GPU Threads for OpenMP 4.0 in LLVMCompilers
- Decoding CUDA binaryBinaries
- Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-accelerated Climate SimulationCompilers
- Flexible software profiling of GPU architecturesBinaries
Applications
- E.T.: re-thinking self-attention for transformer models on GPUsDeep Learning
- GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUsDeep Learning
- Sparse GPU Kernels for Deep LearningDeep Learning
- SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural NetworksDeep Learning
- Towards Pervasive and User Satisfactory CNN across GPU MicroarchitecturesDeep Learning
- Understanding and bridging the gaps in current GNN performance optimizationsDeep Learning
Showing a sample of 85 resources. View the full list on GitHub →