Code as a Language Model

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

ereniko updated a model 39 minutes ago

CaaLM/CaaLM-v1-GGUF

ereniko published a model 43 minutes ago

CaaLM/CaaLM-v1-GGUF

ereniko updated a Space about 15 hours ago

CaaLM/README

View all activity

Organization Card

Community About org cards

CaaLM

We build models that understand code execution.

Not models that generate code. Not models that explain code. Models that can look at a program and tell you what it does — without running it.

The core question we're exploring: can a language model learn what execution means as an abstract concept, independent of any specific language's syntax or semantics? If you show a model enough programs alongside their outputs, does it learn something generalizable — or does it just memorize patterns for the languages it's seen?

CaaLM-v1 is our first answer to that question. It predicts code output across Python, JavaScript, Lua, and COBOL, and also generalizes to programming languages it has never encountered before. Give it code written in a completely made-up language with invented keywords and syntax, and it figures out what it would print. 96.2% accuracy on a 52-test benchmark, including 19 tests on novel unseen languages.

That's the foundation. Here's where we want to take it.

What we're working toward

Broader language coverage. CaaLM-v1 handles assignment, arithmetic, conditionals, and loops. The next step is functions, recursion, and richer data types. Each added construct is a harder test of whether the model has learned execution semantics or just surface patterns.

More alien languages. The fake languages in v1 are weird but not fundamentally different from conventional imperative code. We want to push further — stack-based languages, concatenative languages, languages with non-linear control flow — and see where generalization breaks down.

Esolangs. Brainfuck and LOLCODE are on the roadmap. If the model can handle Brainfuck's tape-based memory model, that's a meaningful signal about how deep the learned execution semantics actually go.

Smaller and faster. CaaLM-v1 is 1.5B parameters trained for $0.82. The task is narrow enough that a much smaller model might work just as well. We're interested in finding the floor.

Why this matters

The obvious application is sandboxed code execution — running untrusted code without actually running it. But the more interesting implication is that learned execution semantics could be a building block for neural program analysis, language-agnostic debugging tools, or AI systems that reason about what programs do rather than just what they look like.

We're not there yet. CaaLM-v1 is a research model and a proof of concept. But the generalization result is real, and it points somewhere worth exploring.

Models

CaaLM-v1 — 1.5B parameter code output predictor. Generalizes to unseen languages. First release.

Background

CaaLM grew out of LaaLM, a project that taught language models to simulate a Linux terminal. LaaLM worked but had a ceiling — it was always going to be beaten by just prompting a general model or, more practically, just installing Linux. CaaLM is an attempt to build something with a more defensible research direction: not simulating a specific environment, but learning the abstract structure of computation itself.