LINUX.ORG.RU

Нейросети на C от создателя Redis

 ,


1

2

Salvatore Sanfilippo тоже увлёкся нейросетями.

https://github.com/antirez/iris.c:

Iris is an inference pipeline that generates images from text prompts using open weights diffusion transformer models. It is implemented entirely in C, with zero external dependencies beyond the C standard library. MPS and BLAS acceleration are optional but recommended. Under macOS, a BLAS API is part of the system, so nothing is required.

The name comes from the Greek goddess Iris, messenger of the gods and personification of the rainbow.

Supported model families:

  • FLUX.2 Klein (by Black Forest Labs):
    • 4B distilled (4 steps, auto guidance set to 1, very fast).
    • 4B base (50 steps for max quality, or less. Classifier-Free Diffusion Guidance, much slower but more generation variety).
    • 9B distilled (4 steps, larger model, higher quality. Non-commercial license).
    • 9B base (50 steps, CFG, highest quality. Non-commercial license).
  • Z-Image-Turbo (by Tongyi-MAI):
    • 6B (8 NFE / 9 scheduler steps, no CFG, fast).

https://github.com/antirez/qwen-asr:

This is a C implementation of the inference pipeline for Qwen3-ASR speech-to-text models (both 0.6B and 1.7B). It has zero external dependencies beyond the C standard library and a BLAS implementation (Accelerate on macOS, OpenBLAS on Linux). Tokens stream to stdout as they are generated. The implementation runs at speed multiple of the file length even in very modest hardware, like low end Intel or AMD processor.

Important: this implementation explicitly avoids implementing support for MPS. Transcription systems are very important pieces of infrastructure, and are often run on remote Linux servers. Adding the MPS target would focus the efforts too much on Apple hardware, so for now I’m skipping it. The code runs very well anyway on Apple hardware (NEON optimized). Please, don’t send pull requests about this feature, fork the code instead, in order to add MPS support. I’ll add it much later when the other optimizations are already mature.

Supported modes and models

Both normal (offline) and streaming (online) modes are supported. Normal mode defaults to full offline decode (-S 0), so the whole audio is encoded at once. Streaming mode processes audio in 2-second chunks with prefix rollback (it keeps the last few decoded tokens as context for the decoder/LLM when transcribing the next chunk).

Important practical note: in this implementation, interactive --stream prioritizes incremental token stability over throughput and can be much slower than normal mode when you process an already-recorded file end-to-end.

Audio can be piped from stdin (--stdin), making it easy to transcode and transcribe any format via ffmpeg. Language is usually auto-detected from audio, and can be forced with --language. A system prompt can bias the model toward specific terms or spellings.

Both the 0.6B and 1.7B parameters models are supported. While the 1.7B model is generally more powerful, the 0.6B model seems the sweet spot for CPU inference, however the speed difference is not huge, so you may want to try both and decide what to use depending on your use case.


https://github.com/antirez/voxtral.c:

This is a C implementation of the inference pipeline for the Mistral AI’s Voxtral Realtime 4B model. It has zero external dependencies beyond the C standard library. The MPS inference is decently fast, while the BLAS acceleration is usable but slow (it continuously convert the bf16 weights to fp32).

Audio processing uses a chunked encoder with overlapping windows, bounding memory usage regardless of input length. Audio can also be piped from stdin (--stdin), or captured live from the microphone (--from-mic, macOS), making it easy to transcode and transcribe any format via ffmpeg. A streaming C API (vox_stream_t) lets you feed audio incrementally and receive token strings as they become available.

More testing needed: please note that this project was mostly tested against few samples, and likely requires some more work to be production quality. However the hard part, to understand the model inference and reproduce the inference pipeline, is here, so the rest likely can be done easily. Testing it against very long transcriptions, able to stress the KV cache circular buffer, will be a useful task.

Motivations (and some rant)

Thank you to Mistral for releasing such a great model in an Open Weights fashion. However, the author of this project believes that limiting the inference to a partnership with vLLM, without providing a self-contained reference implementation in Python, limits the model’s actual reach and the potential good effects it could have. For this reason, this project was created: it provides both a pure C inference engine and a simple, self-contained Python reference implementation (python_simple_implementation.py) that anyone can read and understand without digging through the vLLM codebase.

★★★★★

Нейросети на C

Это чтоб они себя на растишку переписывали.

slackwarrior ★★★★★
()

А обычные на чём-то другом разве? Из тех что на проце считаются.

firkax ★★★★★
()

не фанат ЫЫ но годноту стоит пометить в закладках

ckotctvo
()

Вау, нейросеть!

thesis ★★★★★
()
Для того чтобы оставить комментарий войдите или зарегистрируйтесь.