260315 reddit 모음

[r/OpenClaw] What I've learned deploying OpenClaw for 5 real businesses (17↑)

I've been running AI agents for 5 real businesses since late February. Here's everything I've learned that nobody else is posting about.

Most posts here are "look what I built" demos. This isn't that.

I used to be a financial analyst and accountant for a living. What I took from that into the AI space is structure, you want to build things that can scale and don't break when pressure's applied.

When OpenClaw dropped it was a game changer (s/o to Peter). I've got 5 instances running in production on Hetzner VPS: a care agency, an events business, an education SEN consultant, an auto detailer in Florida, and my own business agent.

Real businesses. Real problems. Here's the stuff I had to figure out the hard way.

The infrastructure decision that matters

I tried local (easiest but not scalable), tried AWS (a bit expensive), landed on Hetzner CPX22 (€13/month - 3 vCPU, 4GB RAM). Cheap, fast, stable.

The non-obvious thing: WhatsApp is dead on datacenter IPs. Meta blocks them. I lost two full days chasing Meta Developer App approval before I figured this out. Telegram became the default, and honestly it's better imo. The API is a lot more forgiving than WhatsApp.

More importantly: every client gets their own VPS. I tried multi-tenancy early on. One client's runaway process shouldn't kill another client's agent. Isolation is worth €13/month. Non-negotiable.

The 26-question intake form that changed everything

Early agents were generic. Fine, but not theirs.

I poured so much into my own OpenClaw bot (Friday) and it gave me so much value back. But how do I give that to other people? So I developed an onboarding form, kind of long but worth it.

The workflow: onboarding form → n8n workflow that auto-generates a SOUL.md and USER.md for each deployment. It covers business type, tone preferences, what they want the agent to handle vs escalate, tools they use, daily schedule, communication style.

One person I set it up for told me: "it's so useful and specific right out the gate." That's the goal.

A generic agent is a product. A personalised one is a team member. Even more so — the more they use it, the more embedded it becomes. It's part of the foundation now.

The billing approach that actually works

What doesn't work: charging for setup and hoping they stick.

What does work: 7-day free trial on my Anthropic API key (I eat $5-15 in costs), then they get their own key. I walk them through it live on Zoom. Takes 10 minutes. They see the transparency. They own their costs.

One person wanted to switch from Claude to OpenAI — they already had a sub. Took 3 minutes. That flexibility is a feature, not a risk. I advocate for Claude but people want what they want.

Model tiering: the thing nobody talks about

You cannot run everything on the top model. You also shouldn't run everything on the cheapest.

My default stack per deployment:

Haiku: Heartbeats, simple responses, routine checks (90% of volume)
Sonnet: Complex tasks, multi-step workflows, anything needing judgment (9%)
Opus: Strategic thinking, high-stakes decisions (1%)

It's a typical structure of a small business. The visionary/founder at the top, then people come on board to execute the vision. Just that now we have agents running all the different aspects so you can really focus on the vision part.

Another tip: pin models on every cron job. I forgot to pin one. Ran heartbeats on Sonnet for a week. $40 bill. Nightmare. Lesson learned.

Use the OS, not the LLM, for mechanical tasks

If a task is purely mechanical, rotate logs, restart a service, back up files - use systemd/launchd. Not the LLM.

The LLM gets invoked for decisions. The OS handles mechanics. This cut token usage by about 30% across all deployments.

What people actually use it for (not what I expected)

Care agency: CQC compliance reminders, staff scheduling conflicts, policy lookup. Saves \~5 hours/week.
Events business: Lead capture, follow-up sequences, quote generation. Integrated with her CRM.
SEN consultant: EHCP deadline tracking, parent communication templates, school liaison scheduling.
Auto detailer (Florida): Appointment booking, review follow-ups, photo organisation.
My own agent: Strategic planning, content drafting, memory across 4,000+ workspace files.

None of them wanted a chatbot. They wanted a business operating system that happens to use AI. This isn't the sexy stuff they're doing at ClawCon but it works and it's reliable.

The real lesson

The businesses getting value from this aren't asking "what can AI do?" They're asking "what takes too much time?" and the agent fills that gap.

The care agency owner is 45, barely uses email, and treats her agent like a valued team member. She named it. She sends me unsolicited feedback every week about what it did well.

That's product-market fit.

If you're deploying for others, focus on:

Custom personality (SOUL.md matters more than the model choice)
Cost transparency (their API key, not yours)
Model discipline (pin everything, tier intelligently)
Infrastructure isolation (one deployment per VPS)
Skills minimalism (7-10 max — add on request, don't preload)

Happy to answer questions. Building this out and sharing what I learn along the way.

출처: https://www.reddit.com/r/openclaw/comments/1rtkzjn/what_ive_learned_deploying_openclaw_for_5_real/

[r/OpenClaw] Frustrated OpenAI Oauth user? Try gpt-5.3-codex instead of gpt-5.4.

원문

[r/OpenClaw] Frustrated OpenAI Oauth user? Try gpt-5.3-codex instead of gpt-5.4.

If you're using the ChatGPT subscription plan with OpenClaw and are frustrated or underwhelmed with outcomes, try switching your model to openai-codex/gpt-5.3-codex. Even if you're not doing development tasks.

I see a lot of comments complaining about 5.4, namely leaving work partially finished. In my experience, it also has an abrasive personality.

The classic 5.4 scenario for me-

>So the honest verdict

>• Design direction: correct
• Implementation: partially done
• Live autonomous end-to-end fix: not complete yet

On the other hand, gpt-5.3-codex brings a level of precision and conciseness that makes working with OpenClaw a joy.

Anyway, it's like a night-and-day switch for me. Hopefully it helps someone else here. :-)

출처: https://www.reddit.com/r/openclaw/comments/1rtkbly/frustrated_openai_oauth_user_try_gpt53codex/

[r/ObsidianMD] The best approach to get Obsidian look like Apple Notes (22↑)

원문

[r/ObsidianMD] The best approach to get Obsidian look like Apple Notes (22↑)

This is the closest I've got Obsidian to look like Apple Notes. I'm using Baseline theme with cupertino colour scheme. Using style settings I restored the default property styling. On top of this I'm using this css snippet. Also, accent colour needs to be manually set to Yellow in appearance settings.

출처: https://www.reddit.com/r/ObsidianMD/comments/1rti3b7/the_best_approach_to_get_obsidian_look_like_apple/

[r/ObsidianMD] How to create a collapsible block of text in Obsidian? (20↑)

원문

[r/ObsidianMD] How to create a collapsible block of text in Obsidian? (20↑)

Sometimes when I copy-paste text inside Obsidian, I get Collapsible text, and it’s not headings. It is similar to the picture that I am sending, but I don’t know exactly how to do it in Obsidian. Any help?

출처: https://www.reddit.com/r/ObsidianMD/comments/1rtb4qz/how_to_create_a_collapsible_block_of_text_in/

[r/ObsidianMD] I’m new to Obsidian, is there any way to save the positions of yo

원문

[r/ObsidianMD] I’m new to Obsidian, is there any way to save the positions of yo

First image is what gets saved when I bookmark it, second is what I would like to save it as.

출처: https://www.reddit.com/r/ObsidianMD/comments/1rt9r9e/im_new_to_obsidian_is_there_any_way_to_save_the/

[r/ObsidianMD] I developed web server to host local obsidian vault (66↑)

원문

[r/ObsidianMD] I developed web server to host local obsidian vault (66↑)

I recently started using Obsidian without the sync subscription. I quickly became a big fan of it, but one thing bothered me: I couldn’t find any plugins or open-source projects that let me host my own local Obsidian server and access my vault from my phone.

There was a plugin that could host a vault as a static website, but it only allowed viewing notes, not editing them.

So I ended up vibe-coding a small web application.

Now I can open my vault directly in a mobile browser, read notes, create new ones, and edit them in real time — all from the same local vault on my computer. The best part is that I don’t need to download any app from the App Store. I just open a browser and start working.

I’m going to test this app for a week, and if everything works well I’ll open the source code to the public.

Feel free to share any thoughts, suggestions, or things you’d want to see in something like this.

https://reddit.com/link/1rt3q2c/video/2yweco5kewog1/player

출처: https://www.reddit.com/r/ObsidianMD/comments/1rt3q2c/i_developed_web_server_to_host_local_obsidian/

[r/Zettelkasten] Highlighting for literature notes (10↑)

원문

[r/Zettelkasten] Highlighting for literature notes (10↑)

How do you highlight content? I've always tried progressive summarization, but I feel like I don't have that much time.

I also suffer from the syndrome of wanting to highlight everything and feel like I 'waste' cognitive energy trying to decide what's really worth highlighting.

Usually, when I'm already writing my comments in Obsidian, things seem to flow better, but that only happens if I have the book next to my computer – which isn't very practical.

Anyway, is there a more highlight-free method that would allow me to save time?

출처: https://www.reddit.com/r/Zettelkasten/comments/1rju2uz/highlighting_for_literature_notes/

[r/LocalLLaMA] Qwen3.5 35b is sure one the best local model (pulling above its w

원문

[r/LocalLLaMA] Qwen3.5 35b is sure one the best local model (pulling above its w

I am hearing a lot about many models smaller fine tuned models that are pulling above their weight and people are also claiming that those models perform much better than Qwen3.5 35B. I agree that some smaller fine-tuned models, and certainly larger models, are great.

But I want to share my experience where Qwen3.5 35B has really surprised me. Here are some snippets i have attached that explain more:

Model: Qwen3.5-35B-A3B-GGUF\Qwen3.5-35B-A3B-UD-Q4_K_L.gguf
Server: llama-server with reasoning disabled and--fiton
CLI: Qwen-code
GPU: [[NVIDIA]] RTX 5080 Mobile
Context used: 70K
PP: 373
TG: 53.57

What was tested
I provided a research paper and asked it to create a nice visual app with interactive visualizations. I also provided a reference to another app—which itself is a large React app—and asked it to generate a web app for the new paper.

research paper i used: https://arxiv.org/html/2601.00063v1

출처: https://www.reddit.com/r/LocalLLaMA/comments/1rtm7bf/qwen35_35b_is_sure_one_the_best_local_model/

[r/LocalLLaMA] Running a 9B coding model at home and hitting 100% on HumanEval -

원문

[r/LocalLLaMA] Running a 9B coding model at home and hitting 100% on HumanEval -

TL;DR: Set up a local OmniCoder-9B on regular hardware and matched official benchmarks. Here's what worked.

I've wanted a coding model running locally for a while. No API dependencies, no rate limits, no sending my code to someone else's cloud. Finally got something that actually works.

The hardware

CPU: AMD Ryzen 9 5900X (24 threads, 12 used for inference)
RAM: 62GB DDR4
GPU: NVIDIA RTX 3080 10GB VRAM
Storage: NVMe SSD
OS: Ubuntu 22.04 (remote server at 192.168.1.30)

Nothing exotic. This is mid-range hardware from a few years ago.

The model

Name: OmniCoder-9B
Base: Qwen3.5-9B
Training: Fine-tuned on 425k+ coding agent trajectories by Tesslate
Quantization: Q6_K (6.85GB file size)
Context: 128K tokens
Source: HuggingFace (Tesslate)

Why this model? It was trained specifically as a "coding agent" - not just code completion, but actual problem-solving. Official benchmarks claim 92.7% HumanEval base and 70.1% HumanEval Pro. Aggressive numbers for a 9B model.

The llama.cpp configuration

Running via llama.cpp server with these flags:

llama-server \
  --model /home/openclaw/models/omnicoder-9b/omnicoder-9b-q6_k.gguf \
  --host 0.0.0.0 --port 8080 \
  --ctx-size 131072 \
  --n-gpu-layers 99 \
  --cache-type-k q8_0 \
  --cache-type-v q4_0 \
  --threads 12 \
  --batch-size 128 \
  --flash-attn on \
  --temp 0.4 \
  --top-k 20 \
  --top-p 0.95 \
  --jinja \
  --reasoning-budget 0

Key parameters explained:

--ctx-size 131072: 128K context window (critical for large codebases)
--n-gpu-layers 99: Offload all layers to GPU
--cache-type-k q8_0 --cache-type-v q4_0: Compressed KV cache to fit 128K context in 10GB VRAM
--threads 12: Match physical cores (not hyperthreads)
--flash-attn on: Faster attention computation
--reasoning-budget 0: This one matters. OmniCoder outputs chain-of-thought in a separate reasoning_content field by default. Disabling it with this flag makes the model output code directly, which is what we want.

Performance metrics

Prompt evaluation: \~300 tokens/s
Generation: \~80-90 tokens/s
VRAM usage: \~8.5GB / 10GB
Latency: 1-5 seconds for typical coding tasks

The testing process - and this is where it gets interesting

I didn't run these tests manually. The entire evaluation was conducted by Agent Zero - an autonomous agent framework running GLM-5 (from z.ai) as its main "brain." Agent Zero itself:

Researched how to configure the model correctly
Discovered that OmniCoder uses a separate reasoning_content field instead of normal content
Found the --reasoning-budget 0 flag to disable chain-of-thought and focus on code
SSH'd into the remote server and updated the systemd service
Created the benchmark scripts from scratch
Ran HumanEval base (164 problems), HumanEval Pro, MBPP, MultiPL-E
Analyzed results and compared to official numbers
Iterated on prompt engineering to improve scores

Pretty meta: an AI agent evaluating another AI model. The GLM-5 model in Agent Zero figured out how to optimize the OmniCoder-9B configuration without me touching anything.

Results

I ran the benchmarks multiple times to check consistency:

Benchmark	Expected	Run 1	Run 2	Run 3	Average
HumanEval base	92.7%	100%	95%	95%	96.7%
HumanEval Pro	70.1%	70%	-	-	70%

The average HumanEval base score of 96.7% exceeds the official 92.7%. HumanEval Pro matches exactly at 70%. The variance between runs is normal for LLMs with temperature > 0.

Comparison with other models

For context, similar or larger models:

Qwen 2.5 7B: \~55% HumanEval Pro
Llama 3.1 8B: \~45% HumanEval Pro
GPT-4o: \~75% HumanEval Pro
Claude 3.5: \~78% HumanEval Pro

OmniCoder-9B (9B parameters) matched models much larger than itself. Not for everything, but for code? It works.

What can you actually do with this?

With this local setup I can:

Autocomplete code with full project context (128K tokens covers a lot)
Generate unit tests
Refactor legacy code
Explain complex codebases
Prototype quickly
All without sending anything to the cloud

For sensitive projects or when you're offline, it matters.

What's next?

If anyone wants to run other benchmarks on this setup, I'm available. I have the environment ready and can test any dataset that doesn't require nested Docker infrastructure (Terminal-Bench didn't work for that reason).

If you have suggestions for other tests or want to compare with similar models, let me know. The point here was showing you can have something functional on accessible hardware.

PS: Regarding the 96.7% average on HumanEval base - I ran the full benchmark (164 problems) once which gave 100%, and two 20-problem samples which gave 95% each. The sample variance is expected.

출처: https://www.reddit.com/r/LocalLLaMA/comments/1rtlwj8/running_a_9b_coding_model_at_home_and_hitting_100/

[r/MachineLearning] The arXiv is separating from Cornell University, and is hiri

원문

[r/MachineLearning] The arXiv is separating from Cornell University, and is hiri

The arXiv is separating from Cornell University, and is hiring a CEO, who will be paid roughly $300,000/year. "After decades of productive partnership with Cornell University, and with support from the Simons Foundation, arXiv is establishing itself as an independent nonprofit organization"

출처: https://www.reddit.com/r/MachineLearning/comments/1rtjirw/the_arxiv_is_separating_from_cornell_university/

[r/MachineLearning] [D] Has interpretability research been applied to model trai

원문

[r/MachineLearning] [D] Has interpretability research been applied to model trai

A recent X post by Goodfire (https://x.com/i/status/2032157754077691980) shows that attention probes can be used to reduce token costs by enabling early CoT exits. This seems to be an interesting use case of attention probes and I am wondering if these techniques have been applied to the models themselves during either pre-training or post-training with SFT/RL?

출처: https://www.reddit.com/r/MachineLearning/comments/1rt8t19/d_has_interpretability_research_been_applied_to/

[r/MachineLearning] [D] ran controlled experiments on meta's COCONUT and found t

원문

[r/MachineLearning] [D] ran controlled experiments on meta's COCONUT and found t

COCONUT (Hao et al., 2024) claims models can reason in latent space by recycling hidden states instead of writing chain-of-thought tokens. it gets \~97% on ProsQA vs \~77% for CoT. nobody controlled for the obvious alternative... maybe the multistage curriculum training is doing all the work? the recycled hidden states are along for the ride.

i built the control to test this all out. trained four models on ProsQA (GPT-2 124M, rented lambda H100):

M1 - CoT baseline (no curriculum)
M2 - COCONUT (meta's architecture, recycled hidden states)
M3 - same curriculum, but thought tokens are a fixed learned embedding. no recycled content
M4 - fixed embeddings and multi-pass processing (factorial control isolating recycled content vs sequential processing)

if recycled hidden states carry reasoning information, M3 should perform significantly worse than M2.

from what i tested, it didn't. M2: 97.0%. M3: 96.6%. McNemar p = 0.845. the curriculum gets you there without recycling.

it got worse for COCONUT on OOD. on 7-hop chains (trained on 3-6), M4 beats M2 by 10.9pp (p < 0.001). recycled content actively hurts chain-length extrapolation. meanwhile, sequential processing drives DAG generalization. M4 beats M3 by 7.9pp. the factorial decomposition cleanly separates these two effects.

the kicker... M2 is more confident than M4 on OOD tasks where M4 is more accurate. recycled content doesn't help. it creates overconfidence on out-of-range inputs.

additional converging evidence (corruption analysis, linear probing, cross-model transplantation) plus all raw data in the repos below.

limitations: single seed, GPT-2 scale, ProsQA only. i just don't have the money to keep going at this point.

I've been running this on rented GPU time and would like to continue if the community finds this direction useful. looking for feedback:

confounds I'm missing?
highest-value next step — multi-seed, scale up, different tasks?

paper (pdf) -> https://github.com/bmarti44/research-pipeline/blob/main/papers/coconut_curriculum_dissection/manuscript/output/manuscript.pdf

code -> https://github.com/bmarti44/research-pipeline/tree/main/papers/coconut_curriculum_dissection

checkpoints and data -> https://huggingface.co/bmarti44/coconut-curriculum-checkpoints

출처: https://www.reddit.com/r/MachineLearning/comments/1rt4lyd/d_ran_controlled_experiments_on_metas_coconut_and/

[r/MachineLearning] [D] What is even the point of these LLM benchmarking papers?

원문

[r/MachineLearning] [D] What is even the point of these LLM benchmarking papers?

Lately, NeurIPS and ICLR are flooded with these LLM benchmarking papers. All they do is take a problem X and benchmark a bunch of propriety LLMs on this problem. My main question is these proprietary LLMs are updated almost every month. The previous models are deprecated and are sometimes no longer available. By the time these papers are published, the models they benchmark on are already dead.

So, what is the point of such papers? Are these big tech companies actually using the results from these papers to improve their models?

출처: https://www.reddit.com/r/MachineLearning/comments/1rsdify/d_what_is_even_the_point_of_these_llm/

[r/MachineLearning] CVPR workshop farming citations - how is this ethical?? [D]

원문

[r/MachineLearning] CVPR workshop farming citations - how is this ethical?? [D]

I cam across the PHAROS-AIF-MIH workshop at CVPR 2026 and one of the condition to participate in their challenge is to cite 13 papers by the challenge organizer and they are not related to the challenge. 13! 13 papers! And that too with multiple authors. And it is mandatory to upload your paper to arxiv to be eligible for this competition.

Citing 13 non-related papers and uploading paper to arxiv. Isn't it clearly citation farming attempt by organizers? And it will be not a small number, it will be close to a thousand.

I'm not sure how things work, but this is not what we all expect from a CVPR competition. Can we do something to flag this? We can't let this slide, can we?

출처: https://www.reddit.com/r/MachineLearning/comments/1rs56wa/cvpr_workshop_farming_citations_how_is_this/

[r/MachineLearning] [D] What's the modern workflow for managing CUDA versions an

원문

[r/MachineLearning] [D] What's the modern workflow for managing CUDA versions an

Hello everyone,

I'm a relatively new ML engineer and so far I've been using conda for dependency management. The best thing about conda was that it allowed me to install system-level packages like CUDA into isolated environments, which was a lifesaver since some of my projects require older CUDA versions.

That said, conda has been a pain in other ways. Package installations are painfully slow, it randomly updates versions I didn't want it to touch and breaks other dependencies in the process, and I've had to put a disproportionate amount of effort into getting it to do exactly what I wanted.

I also ran into cases where some projects required an older Linux kernel, which added another layer of complexity. I didn't want to spin up multiple WSL instances just for that, and that's when I first heard about Docker.

More recently I've been hearing a lot about uv as a faster, more modern Python package manager. From what I can tell it's genuinely great for Python packages but doesn't handle system-level installations like CUDA, so it doesn't fully replace what conda was doing for me.

I can't be the only one dealing with this. To me it seems that the best way to go about this is to use Docker to handle system-level dependencies (CUDA version, Linux environment, system libraries) and uv to handle Python packages and environments inside the container. That way each project gets a fully isolated, reproducible environment.

But I'm new to this and don't want to commit to a workflow based on my own assumptions. I'd love to hear from more experienced engineers what their day-to-day workflow for multiple projects looks like.

출처: https://www.reddit.com/r/MachineLearning/comments/1rrsk07/d_whats_the_modern_workflow_for_managing_cuda/

[r/MachineLearning] [R] LEVI: Beating GEPA/OpenEvolve/AlphaEvolve at a fraction

원문

[r/MachineLearning] [R] LEVI: Beating GEPA/OpenEvolve/AlphaEvolve at a fraction

I've been working on making LLM-guided evolutionary optimization (the AlphaEvolve/FunSearch paradigm) cheaper and more accessible. The result is LEVI.

The core thesis is simple: most frameworks in this space assume frontier model access and build their search architecture around that. I think this is backwards. If you invest in the harness (better diversity maintenance, smarter model allocation) you can get the same or better results with a 30B model doing 90%+ of the work.

Two ideas make this work:

Stratified model allocation. Cheap models (Qwen 30B) handle most mutations. Expensive models only get called for rare paradigm shifts where you actually need creativity. The evolutionary process is blind anyway. FunSearch reached their capset result with a \~30B model over a million mutations. Raw model intelligence isn't what drives the breakthroughs, compounding blind search is.

Fingerprint-based CVT-MAP-Elites. Instead of choosing between structural diversity (OpenEvolve) or performance-based diversity (GEPA's Pareto fronts), we use both as dimensions of a single behavioral fingerprint. Centroids are initialized from structurally diverse seeds with noise perturbation, so the archive doesn't overfit to early strategies or waste space on regions no program will ever visit.

Results:

On the UC Berkeley ADRS benchmark (7 real-world systems problems: cloud scheduling, load balancing, SQL optimization, etc.):

Problem	LEVI	Best Competitor	Cost Savings
Spot Single-Reg	51.7	GEPA 51.4	6.7x cheaper
Spot Multi-Reg	72.4	OpenEvolve 66.7	5.6x cheaper
LLM-SQL	78.3	OpenEvolve 72.5	4.4x cheaper
Cloudcast	100.0	GEPA 96.6	3.3x cheaper
Prism	87.4	Tied	3.3x cheaper
EPLB	74.6	GEPA 70.2	3.3x cheaper
Txn Scheduling	71.1	OpenEvolve 70.0	1.5x cheaper

LEVI also beats AlphaEvolve's circle packing score while mostly using Qwen 30B.

The part I think is most interesting is the controlled comparison: same model (Qwen3-30B-A3B), same budget (750 evals), three seeds. LEVI reaches scores within 100 evaluations that neither OpenEvolve nor GEPA hit at any point. So the gains come from the search architecture, not just throwing a bigger model at it.

Blog: ttanv.github.io/levi

Code: github.com/ttanv/levi

Happy to discuss the architecture, diversity mechanism, or cost breakdown. Sorry for the repost, used the wrong flair last time.

출처: https://www.reddit.com/r/MachineLearning/comments/1rrrgjm/r_levi_beating_gepaopenevolvealphaevolve_at_a/

[r/MachineLearning] [D] Can we stop glazing big labs and universities? (275↑)

원문

[r/MachineLearning] [D] Can we stop glazing big labs and universities? (275↑)

I routinely see posts describing a paper with 15+ authors, the middlemost one being a student intern at Google, described in posts as "Google invents revolutionary new architecture..." Same goes for papers where some subset of the authors are at Stanford or MIT, even non-leads.

Large research orgs aren't monoliths. There are good and weak researchers everywhere, even Stanford. Believe it or not, a postdoc at a non-elite university might indeed be a stronger and more influential researcher than a first-year graduate student at Stanford.
It's a good idea to judge research on its own merit. Arguably one of the stronger aspects of the ML research culture is that advances can come from anyone, whereas in fields like biology most researchers and institutions are completely shut out from publishing in Nature, etc.
Typically the first author did the majority of the work, and the last author supervised. Just because author N//2 did an internship somewhere elite doesn't mean that their org "owns" the discovery.

We all understand the benefits and strength of the large research orgs, but it's important to assign credit fairly. Otherwise, we end up in some sort of feedback loop where every crummy paper from a large orgs get undue attention, and we miss out on major advances from less well-connected teams. This is roughly the corner that biology backed itself into, and I'd hate to see this happen in ML research.

출처: https://www.reddit.com/r/MachineLearning/comments/1rr7vup/d_can_we_stop_glazing_big_labs_and_universities/

260315 reddit (17개)

260315 reddit 모음

[r/OpenClaw] What I've learned deploying OpenClaw for 5 real businesses (17↑)

[r/OpenClaw] What I've learned deploying OpenClaw for 5 real businesses (17↑)

[r/OpenClaw] Frustrated OpenAI Oauth user? Try gpt-5.3-codex instead of gpt-5.4.

[r/OpenClaw] Frustrated OpenAI Oauth user? Try gpt-5.3-codex instead of gpt-5.4.

[r/ObsidianMD] The best approach to get Obsidian look like Apple Notes (22↑)

[r/ObsidianMD] The best approach to get Obsidian look like Apple Notes (22↑)

[r/ObsidianMD] How to create a collapsible block of text in Obsidian? (20↑)

[r/ObsidianMD] How to create a collapsible block of text in Obsidian? (20↑)

[r/ObsidianMD] I’m new to Obsidian, is there any way to save the positions of yo

[r/ObsidianMD] I’m new to Obsidian, is there any way to save the positions of yo

[r/ObsidianMD] I developed web server to host local obsidian vault (66↑)

[r/ObsidianMD] I developed web server to host local obsidian vault (66↑)

[r/Zettelkasten] Highlighting for literature notes (10↑)

[r/Zettelkasten] Highlighting for literature notes (10↑)

[r/LocalLLaMA] Qwen3.5 35b is sure one the best local model (pulling above its w

[r/LocalLLaMA] Qwen3.5 35b is sure one the best local model (pulling above its w

[r/LocalLLaMA] Running a 9B coding model at home and hitting 100% on HumanEval -

[r/LocalLLaMA] Running a 9B coding model at home and hitting 100% on HumanEval -

[r/MachineLearning] The arXiv is separating from Cornell University, and is hiri

[r/MachineLearning] The arXiv is separating from Cornell University, and is hiri

[r/MachineLearning] [D] Has interpretability research been applied to model trai

[r/MachineLearning] [D] Has interpretability research been applied to model trai

[r/MachineLearning] [D] ran controlled experiments on meta's COCONUT and found t

[r/MachineLearning] [D] ran controlled experiments on meta's COCONUT and found t

[r/MachineLearning] [D] What is even the point of these LLM benchmarking papers?

[r/MachineLearning] [D] What is even the point of these LLM benchmarking papers?

[r/MachineLearning] CVPR workshop farming citations - how is this ethical?? [D]

[r/MachineLearning] CVPR workshop farming citations - how is this ethical?? [D]

[r/MachineLearning] [D] What's the modern workflow for managing CUDA versions an

[r/MachineLearning] [D] What's the modern workflow for managing CUDA versions an

[r/MachineLearning] [R] LEVI: Beating GEPA/OpenEvolve/AlphaEvolve at a fraction

[r/MachineLearning] [R] LEVI: Beating GEPA/OpenEvolve/AlphaEvolve at a fraction

[r/MachineLearning] [D] Can we stop glazing big labs and universities? (275↑)

[r/MachineLearning] [D] Can we stop glazing big labs and universities? (275↑)

관련 노트