Context Awesome

awesome-gpu

github.com/jokeren/awesome-gpu ↗

Awesome resources for GPUs

636

GitHub Stars

85

Curated Resources

6

Categories

30 min ago

Last Refreshed

ArchitectureAlgorithmsApplicationsToolsRuntimeCode Generation

Use this list with your AI agent

Add the Context Awesome MCP server to Claude, Cursor, or any MCP client, then ask:

"Show me parallelism resources from awesome-gpu"

Installation instructions →

What's inside

Architecture

Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline StallsParallelism
Adaptive and Transparent Cache Bypassing for GPUsCache
APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUsCache
Controlled Kernel Launch for Dynamic Parallelism in GPUsParallelism
COOPERATIVE GROUPSParallelism
Dynamic GPGPU Power Management Using Adaptive Model Predictive ControlResources Management

Algorithms

Tools

Allinea MAPProfilers
Analyzing CUDA Workloads Using a Detailed GPU SimulatorSimulators
CUDAAdvisor: LLVM-based runtime profiling for modern GPUsProfilers
Demystifying GPU Microarchitecture through MicrobenchmarkingBenchmarking
Dissecting the NVIDIA Volta GPU Architecture via MicrobenchmarkingBenchmarking
Effective sampling-driven performance tools for GPU-accelerated supercomputersProfilers

Runtime

Code Generation

C-for-metal: high performance SIMD programming on intel GPUsProgramming Models
Cooperative Profile Guided OptimizationsProfile Guided Optimization
Coordinating GPU Threads for OpenMP 4.0 in LLVMCompilers
Decoding CUDA binaryBinaries
Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-accelerated Climate SimulationCompilers
Flexible software profiling of GPU architecturesBinaries

Applications

E.T.: re-thinking self-attention for transformer models on GPUsDeep Learning
GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUsDeep Learning
Sparse GPU Kernels for Deep LearningDeep Learning
SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural NetworksDeep Learning
Towards Pervasive and User Satisfactory CNN across GPU MicroarchitecturesDeep Learning
Understanding and bridging the gaps in current GNN performance optimizationsDeep Learning

Showing a sample of 85 resources. View the full list on GitHub →