260315 Hacker News 모음
Show HN: Axe – A 12MB binary that replaces your AI framework (217 pts)
Show HN: Axe – A 12MB binary that replaces your AI framework (217 pts)
I built Axe because I got tired of every AI tool trying to be a chatbot.
Most frameworks want a long-lived session with a massive context window doing everything at once. That's expensive, slow, and fragile. Good software is small, focused, and composable... AI agents should be too.
Axe treats LLM agents like Unix programs. Each agent is a TOML config with a focused job. Such as code reviewer, log analyzer, commit message writer. You can run them from the CLI, pipe data in, get results out. You can use pipes to chain them together. Or trigger from cron, git hooks, CI.
What Axe is:
- 12MB binary, two dependencies. no framework, no Python, no Docker (unless you want it)
- Stdin piping, something like git diff | axe run reviewer just works
- Sub-agent delegation. Where agents call other agents via tool use, depth-limited
- Persistent memory. If you want, agents can remember across runs without you managing state
- MCP support. Axe can connect any MCP server to your agents
- Built-in tools. Such as web_search and url_fetch out of the box
- Multi-provider. Bring what you love to use.. Anthropic, OpenAI, Ollama, or anything in models.dev format
- Path-sandboxed file ops. Keeps agents locked to a working directory
Written in Go. No daemon, no GUI.
What would you automate first?
출처: https://github.com/jrswab/axe
Launch HN: Spine Swarm (YC S23) – AI agents that collaborate on a visual canvas
Launch HN: Spine Swarm (YC S23) – AI agents that collaborate on a visual canvas
Hey HN! We're Ashwin and Akshay from Spine AI (https://www.getspine.ai). Spine Swarm is a multi-agent system that works on an infinite visual canvas to complete complex non-coding projects: competitive analysis, financial modeling, SEO audits, pitch decks, interactive prototypes, and more. Here's a video of it in action: https://www.youtube.com/watch?v=R_2-ggpZz0Q.
We've been friends for over 13 years. We took our first ML course together at NTU, in a part of campus called North Spine, which is where the name comes from. We went through YC in S23 and have spent about 3 years building Spine across many product iterations.
The core idea: chat is the wrong interface for complex AI work. It's a linear thread, and real projects aren't linear. Sure, you can ask a chatbot to reference the financial model from earlier in the thread, or run research and market sizing together, but you're trusting the model to juggle that context implicitly. There's no way to see how it's connecting the pieces, no way to correct one step without rerunning everything, and no way to branch off and explore two strategies side by side. ChatGPT was a demo that blew up, and chat stuck around as the default interface, not because it's the right abstraction. We thought humans and agents needed a real workspace where the structure of the work is explicit and user-controllable, not hidden inside a context window.
So we built an infinite visual canvas where you think in blocks instead of threads. Each block is our abstraction on top of AI models. There are dedicated block types for LLM calls, image generation, web browsing, apps, slides, spreadsheets, and more. Think of them as Lego bricks for AI workflows: each one does something specific, but they can be snapped together and composed in many different ways. You can connect any block to any other block, and that connection guarantees the passing of context regardless of block type. The whole system is model-agnostic, so in a single workflow you can go from an OpenAI LLM call, to an image generation mode like Nano Banana Pro, to Claude generating an interactive app, each block using whatever model fits best. Multiple blocks can fan out from the same input, analyzing it in different ways with different models, then feed their outputs into a downstream block that synthesizes the results.
The first version of the canvas was fully manual. Users entered prompts, chose models, ran blocks, and made connections themselves. It clicked with founders and product managers because they could branch in different directions from the same starting point: take a product idea and generate a prototype in one branch, a PRD in another, a competitive critique in a third, and a pitch deck in a fourth, all sharing the same upstream context. But new users didn't want to learn the interface. They kept asking us to build a chat layer that would generate and connect blocks on their behalf, to replicate the way we were using the tool. So we built that, and in doing so discovered something we didn't expect: the agents were capable of running autonomously for hours, producing complete deliverables. It turned out agents could run longer and keep their context windows clean by delegating work to blocks and storing intermediary context on the canvas, rather than holding everything in a single context window.
Here's how it works now. When you submit a task, a central orchestrator decomposes it into subtasks and delegates each to specialized persona agents. These agents operate on the canvas blocks and can override default settings, primarily the model and prompt, to fit each subtask. Agents pick the best model for each block and sometimes run the same block with multiple models to compare and synthesize outputs. Multiple agents work in parallel when their subtasks don't have dependencies, and downstream agents automatically receive context from upstream work. The user doesn't configure any of this. You can also dispatch multiple tasks at once and the system will queue dependent ones or start independent ones immediately.
Agents aren't fully autonomous by default. Any agent can pause execution and ask the user for clarification or feedback before continuing, which keeps the human in the loop where it matters. And once agents have produced output, you can select a subset of blocks on the canvas and iterate on them through the chat without rerunning the entire workflow.
The canvas gives agents something that filesystems and message-passing don't: a persistent, structured representation of the entire project that any agent can read and contribute to at any point. In typical multi-agent systems, context degrades as it passes between agents. The canvas addresses this because agents store intermediary results in blocks rather than trying to hold everything in memory, and they leave explicit structured handoffs designed to be consumed efficiently by the next agent in the chain. Every step is also fully auditable, so you can trace exactly how each agent arrived at its conclusions.
We ran benchmarks to validate what we were seeing. On Google DeepMind's DeepSearchQA, which is 900 questions spanning 17 fields, each structured as a causal chain where each step depends on completing the previous one, Spine Swarm scored 87.6% on the full dataset with zero human intervention. For the benchmark we used a subset of block types relevant to the questions (LLM calls, web browsing, table) and removed irrelevant ones like document, spreadsheet, and slide generation. We also disabled human clarification so agents ran fully independently. The agents were not just auditable but also state of the art. The auditability also exposed actual errors in an older benchmark (GAIA Level 3), cases where the expected answer was wrong or ambiguous, which you'd never catch with a black-box pipeline. We detail the methodology, architecture, and benchmark errors in the full writeup: https://blog.getspine.ai/spine-swarm-hits-1-on-gaia-level-3-...
Benchmarks measure accuracy on closed-ended questions. Turns out the same architecture also leads to better open-ended outputs like decks, reports, and prototypes with minimal supervision. We've seen early users split into two camps: some watch the agents work and jump in to redirect mid-flow, others queue a task and come back to a finished deliverable. Both work because the canvas preserves the full chain of work, so you can audit or intervene whenever you want.
A good first task to try: give it your website URL and ask for a full SEO analysis, competitive landscape, and a prioritized growth roadmap with a slide deck. You'll see multiple agents spin up on the canvas simultaneously. People have also used it for fundraising pitch decks with financial models, prototyping features from screenshots and PRDs, competitive analysis reports and deep-dive learning plans that research a topic from multiple angles and produce structured material you can explore further.
Pricing is usage-based credits tied to block usage and the underlying models used. Agents tend to use more credits than manual workflows because they're tuned to get you the best possible outcome, which means they pick the best blocks and do more work. Details here: https://www.getspine.ai/pricing. There's a free tier, and one honest caveat: we sized it to let you try a real task, but tasks vary in complexity. If you run out before you've had a proper chance to explore, email us at [email protected] and we'll work with you.
We'd love your feedback on the experience: what worked, what didn't, and where it fell short. We're also curious how others here approach complex, multi-step AI work beyond coding. What tools are you using, and what breaks first? We'll be in the comments all day.
출처: https://www.getspine.ai/
Show HN: Context Gateway – Compress agent context before it hits the LLM (86 pts
Show HN: Context Gateway – Compress agent context before it hits the LLM (86 pts
We built an open-source proxy that sits between coding agents (Claude Code, OpenClaw, etc.) and the LLM, compressing tool outputs before they enter the context window.
Demo: https://www.youtube.com/watch?v=-vFZ6MPrwjw#t=9s.
Motivation: Agents are terrible at managing context. A single file read or grep can dump thousands of tokens into the window, most of it noise. This isn't just expensive — it actively degrades quality. Long-context benchmarks consistently show steep accuracy drops as context grows (OpenAI's GPT-5.4 eval goes from 97.2% at 32k to 36.6% at 1M https://openai.com/index/introducing-gpt-5-4/).
Our solution uses small language models (SLMs): we look at model internals and train classifiers to detect which parts of the context carry the most signal. When a tool returns output, we compress it conditioned on the intent of the tool call—so if the agent called grep looking for error handling patterns, the SLM keeps the relevant matches and strips the rest.
If the model later needs something we removed, it calls expand() to fetch the original output. We also do background compaction at 85% window capacity and lazy-load tool descriptions so the model only sees tools relevant to the current step.
The proxy also gives you spending caps, a dashboard for tracking running and past sessions, and Slack pings when an agent is sitting there waiting on you.
Repo is here: https://github.com/Compresr-ai/Context-Gateway. You can try it with:
curl -fsSL https://compresr.ai/api/install | sh
Happy to go deep on any of it: the compression model, how the lazy tool loading works, or anything else about the gateway. Try it out and let us know how you like it!
출처: https://github.com/Compresr-ai/Context-Gateway
Show HN: Klaus – OpenClaw on a VM, batteries included (159 pts)
Show HN: Klaus – OpenClaw on a VM, batteries included (159 pts)
We are Bailey and Robbie and we are working on Klaus (https://klausai.com/): hosted OpenClaw that is secure and powerful out of the box.
Running OpenClaw requires setting up a cloud VM or local container (a pain) or giving OpenClaw root access to your machine (insecure). Many basic integrations (eg Slack, Google Workspace) require you to create your own OAuth app.
We make running OpenClaw simple by giving each user their own EC2 instance, preconfigured with keys for OpenRouter, AgentMail, and Orthogonal. And we have OAuth apps to make it easy to integrate with Slack and Google Workspace.
We are both HN readers (Bailey has been on here for ~10 years) and we know OpenClaw has serious security concerns. We do a lot to make our users’ instances more secure: we run on a private subnet, automatically update the OpenClaw version our users run, and because you’re on our VM by default the only keys you leak if you get hacked belong to us. Connecting your email is still a risk. The best defense I know of is Opus 4.6 for resilience to prompt injection. If you have a better solution, we’d love to hear it!
We learned a lot about infrastructure management in the past month. Kimi K2.5 and Mimimax M2.5 are extremely good at hallucinating new ways to break openclaw.json and otherwise wreaking havoc on an EC2 instance. The week after our launch we spent 20+ hours fixing broken machines by hand.
We wrote a ton of best practices on using OpenClaw on AWS Linux into our users’ AGENTS.md, got really good at un-bricking EC2 machines over SSM, added a command-and-control server to every instance to facilitate hotfixes and migrations, and set up a Klaus instance to answer FAQs on discord.
In addition to all of this, we built ClawBert, our AI SRE for hotfixing OpenClaw instances automatically: https://www.youtube.com/watch?v=v65F6VBXqKY. Clawbert is a Claude Code instance that runs whenever a health check fails or the user triggers it in the UI. It can read that user’s entries in our database and execute commands on the user’s instance. We expose a log of Clawbert’s runs to the user.
We know that setting up OpenClaw is easy for most HN readers, but I promise it is not for most people. Klaus has a long way to go, but it’s still very rewarding to see people who’ve never used Claude Code get their first taste of AI agents.
We charge $19/m for a t4g.small, $49/m for a t4g.medium, and $200/m for a t4g.xlarge and priority support. You get $15 in tokens and $20 in Orthogonal credits one-time.
We want to know what you are building on OpenClaw so we can make sure we support it. We are already working with companies like Orthogonal and Openrouter that are building things to make agents more useful, and we’re sure there are more tools out there we don’t know about. If you’ve built something agents want, please let us know. Comments welcome!
출처: https://klausai.com/
Show HN: Rudel – Claude Code Session Analytics (141 pts)
Show HN: Rudel – Claude Code Session Analytics (141 pts)
We built rudel.ai after realizing we had no visibility into our own Claude Code sessions. We were using it daily but had no idea which sessions were efficient, why some got abandoned, or whether we were actually improving over time.
So we built an analytics layer for it. After connecting our own sessions, we ended up with a dataset of 1,573 real Claude Code sessions, 15M+ tokens, 270K+ interactions.
Some things we found that surprised us: - Skills were only being used in 4% of our sessions - 26% of sessions are abandoned, most within the first 60 seconds - Session success rate varies significantly by task type (documentation scores highest, refactoring lowest) - Error cascade patterns appear in the first 2 minutes and predict abandonment with reasonable accuracy - There is no meaningful benchmark for 'good' agentic session performance, we are building one.
The tool is free to use and fully open source, happy to answer questions about the data or how we built it.
출처: https://github.com/obsessiondb/rudel
Launch HN: Sentrial (YC W26) – Catch AI agent failures before your users do (31
Launch HN: Sentrial (YC W26) – Catch AI agent failures before your users do (31
Hey HN! We're Neel and Anay, and we’re building Sentrial (https://sentrial.com). It’s production monitoring for AI products. We automatically detect failure patterns: loops, hallucinations, tool misuse, and user frustrations the moment they happen. When issues surface, Sentrial diagnoses the root cause by analyzing conversation patterns, model outputs, and tool interactions, then recommends specific fixes.
Here's a demo if you're interested: https://www.youtube.com/watch?v=cc4DWrJF7hk. When agents fail, choose wrong tools, or blow cost budgets, there's no way to know why - usually just logs and guesswork. As agents move from demos to production with real SLAs and real users, this is not sustainable.
Neel and I lived this, building agents at SenseHQ and Accenture where we found that debugging agents was often harder than actually building them. Agents are untrustworthy in prod because there’s no good infrastructure to verify what they’re actually doing.
In practice this looks like: - A support agent that began misclassifying refund requests as product questions, which meant customers never reached the refund flow. - A document drafting agent that would occasionally hallucinate missing sections when parsing long specs, producing confident but incorrect outputs. There’s no stack trace or 500 error and you only figure this out when a customer is angry.
We both realized teams were flying blind in production, and that agent native monitoring was going to be foundational infrastructure for every serious AI product. We started Sentrial as a verification layer designed to take care of this.
How it works: You wrap your client with our SDK in only a couple of lines. From there, we detect drift for you: - Wrong tool invocations - Misunderstood intents - Hallucinations - Quality regressions over time. You see it on our platform before a customer files a ticket.
There’s a quick mcp set up, just give claude code: claude mcp add --transport http Sentrial https://www.sentrial.com/docs/mcp
We have a free tier (14 days, no credit card required). We’d love any feedback from anyone running agents whether they be for personal use or within a professional setting.
We’ll be around in the comments!
출처: https://www.sentrial.com/
Show HN: Open-source browser for AI agents (154 pts)
Show HN: Open-source browser for AI agents (154 pts)
Hi HN, I forked chromium and built agent-browser-protocol (ABP) after noticing that most browser-agent failures aren’t really about the model misunderstanding the page. Instead, the problem is that the model is reasoning from a stale state.
ABP is designed to keep the acting agent synchronized with the browser at every step. After each action (click, type, etc), it freezes JavaScript execution and rendering, then captures the resulting state. It also compiles the notable events that occurred during that action loop, such as navigation, file pickers, permission prompts, alerts, and downloads, and sends that along with a screenshot of the frozen page state back to the agent.
The result is that browser interaction starts to feel more like a multimodal chat loop. The agent takes an action, gets back a fresh visual state and a structured summary of what happened, then decides what to do next from there. That fits much better with how LLMs already work.
A few common browser-use failures ABP helps eliminate: * A modal appears after the last Playwright screenshot and blocks the input the agent was about to use * Dynamic filters cause the page to reflow between steps * An autocomplete dropdown opens and covers the element the agent intended to click * alert() / confirm() interrupts the flow * Downloads are triggered, but the agent has no reliable way to know when they’ve completed
As proof, ABP with opus 4.6 as the driver scores 90.5% on the Online Mind2Web benchmark. I think modern LLMs already understand websites, they just need a better tool to interact with them. Happy to answer questions about the architecture, forking chrome or anything else in the comments below.
Try it out: claude mcp add browser -- npx -y agent-browser-protocol --mcp (Codex/OpenCode instructions in the docs)
Demo video: https://www.loom.com/share/387f6349196f417d8b4b16a5452c3369
출처: https://github.com/theredsix/agent-browser-protocol
Show HN: OneCLI – Vault for AI Agents in Rust (157 pts)
Show HN: OneCLI – Vault for AI Agents in Rust (157 pts)
We built OneCLI because AI agents are being given raw API keys. And it's going about as well as you'd expect. We figured the answer isn't "don't give agents access," it's "give them access without giving them secrets."
OneCLI is an open-source gateway that sits between your AI agents and the services they call. You store your real credentials once in OneCLI's encrypted vault, and give your agents placeholder keys. When an agent makes an HTTP call through the proxy, OneCLI matches the request by host/path, verifies the agent should have access, swaps the placeholder for the real credential, and forwards the request. The agent never touches the actual secret. It just uses CLI or MCP tools as normal.
Try it in one line: docker run --pull always -p 10254:10254 -p 10255:10255 -v onecli-data:/app/data ghcr.io/onecli/onecli
The proxy is written in Rust, the dashboard is Next.js, and secrets are AES-256-GCM encrypted at rest. Everything runs in a single Docker container with an embedded Postgres (PGlite), no external dependencies. Works with any agent framework (OpenClaw, NanoClaw, IronClaw, or anything that can set an HTTPS_PROXY).
We started with what felt most urgent: agents shouldn't be holding raw credentials. The next layer is access policies and audit, defining what each agent can call, logging everything, and requiring human approval before sensitive actions go through.
It's Apache-2.0 licensed. We'd love feedback on the approach, and we're especially curious how people are handling agent auth today.
GitHub: https://github.com/onecli/onecli Site: https://onecli.sh
출처: https://github.com/onecli/onecli
Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon (240 pts)
Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon (240 pts)
Hi HN, we're Sanchit and Shubham (YC W26). We built a fast inference engine for Apple Silicon. LLMs, speech-to-text, text-to-speech – MetalRT beats llama.cpp, Apple's MLX, Ollama, and sherpa-onnx on every modality we tested. Custom Metal shaders, no framework overhead.
Also, we've open-sourced RCLI, the fastest end-to-end voice AI pipeline on Apple Silicon. Mic to spoken response, entirely on-device. No cloud, no API keys.
To get started:
brew tap RunanywhereAI/rcli https://github.com/RunanywhereAI/RCLI.git
brew install rcli
rcli setup # downloads ~1 GB of models
rcli # interactive mode with push-to-talk
Or: curl -fsSL https://raw.githubusercontent.com/RunanywhereAI/RCLI/main/install.sh | bash
The numbers (M4 Max, 64 GB, reproducible via rcli bench):LLM decode – 1.67x faster than llama.cpp, 1.19x faster than Apple MLX (same model files): - Qwen3-0.6B: 658 tok/s (vs mlx-lm 552, llama.cpp 295) - Qwen3-4B: 186 tok/s (vs mlx-lm 170, llama.cpp 87) - LFM2.5-1.2B: 570 tok/s (vs mlx-lm 509, llama.cpp 372) - Time-to-first-token: 6.6 ms
STT – 70 seconds of audio transcribed in 101 ms. That's 714x real-time. 4.6x faster than mlx-whisper.
TTS – 178 ms synthesis. 2.8x faster than mlx-audio and sherpa-onnx.
We built this because demoing on-device AI is easy but shipping it is brutal. Voice is the hardest test: you're chaining STT, LLM, and TTS sequentially, and if any stage is slow, the user feels it. Most teams fall back to cloud APIs not because local models are bad, but because local inference infrastructure is.
The thing that's hard to solve is latency compounding. In a voice pipeline, you're stacking three models in sequence. If each adds 200ms, you're at 600ms before the user hears a word, and that feels broken. You can't optimize one stage and call it done. Every stage needs to be fast, on one device, with no network round-trip to hide behind.
We went straight to Metal. Custom GPU compute shaders, all memory pre-allocated at init (zero allocations during inference), and one unified engine for all three modalities instead of stitching separate runtimes together.
MetalRT is the first engine to handle all three modalities natively on Apple Silicon. Full methodology:
LLM benchmarks: https://www.runanywhere.ai/blog/metalrt-fastest-llm-decode-e...
Speech benchmarks: https://www.runanywhere.ai/blog/metalrt-speech-fastest-stt-t...
How: Most inference engines add layers between you and the GPU: graph schedulers, runtime dispatchers, memory managers. MetalRT skips all of it. Custom Metal compute shaders for quantized matmul, attention, and activation - compiled ahead of time, dispatched directly.
Voice Pipeline optimizations details: https://www.runanywhere.ai/blog/fastvoice-on-device-voice-ai... RAG optimizations: https://www.runanywhere.ai/blog/fastvoice-rag-on-device-retr...
RCLI is the open-source voice pipeline (MIT) built on MetalRT: three concurrent threads with lock-free ring buffers, double-buffered TTS, 38 macOS actions by voice, local RAG (~4 ms over 5K+ chunks), 20 hot-swappable models, and a full-screen TUI with per-op latency readouts. Falls back to llama.cpp when MetalRT isn't installed.
Source: https://github.com/RunanywhereAI/RCLI (MIT)
Demo: https://www.youtube.com/watch?v=eTYwkgNoaKg
What would you build if on-device AI were genuinely as fast as cloud?
출처: https://github.com/RunanywhereAI/rcli
관련 노트
- [[INDEX]]
- [[260315_tg]] — 키워드 유사
- [[260315_reddit]] — 키워드 유사
- [[260319_tg]] — 키워드 유사
- [[260319_reddit]] — 키워드 유사
- [[260310_tg]] — 키워드 유사
- [[260316_reddit]] — 키워드 유사
- [[260318_x]] — 키워드 유사
- [[260316_x]] — 키워드 유사
- [[260317_x]] — 키워드 유사
- [[260319_hn]] — 키워드 유사