Efficient inference
Low-overhead runtimes, careful memory lifecycle, and reproducible latency and throughput measurements.
We build open-source software alongside research in language models, diffusion, and systems ML—with an emphasis on measurable performance, honest benchmarks, and sustainable engineering.
What we work on
The Foundry brings together systems-minded contributors who care about what happens on real hardware—not only on slides.
Low-overhead runtimes, careful memory lifecycle, and reproducible latency and throughput measurements.
Tools and experiments grounded in current LLM and diffusion research, shared as open code and docs.
Contributions welcome where evidence and scope are clear—see each repository for contribution guidelines.
Repositories
Active and upcoming initiatives under the Inference Foundry GitHub organization.
Terminal-native, in-process local LLM engine (no HTTP in the main UX); llama.cpp via CGo, focus on low overhead and clean teardown.
Open research journal and experimental log.
Quantization theory, methods, and reproducible experiments across bit-widths and runtimes.
Open fine-tuned prompt catalog with versioning, licensing, and analysis for reuse.
Algorithms to detect AI-generated images using JEPA-based representations.
How we work
These expectations apply across our public spaces and reviews.
For contributors and collaborators, use Discord first. For direct contact, message the founders through GitHub or LinkedIn.
Join Discord