The Black Hole Architecture: How I Scaled Agent Autonomy on Jules

The Black Hole Architecture: How I Scaled Agent Autonomy on Jules
Author avatar
Gavin Bintz

The Black Hole Architecture: How I Scaled Agent Autonomy on Jules

My experiment with serverless, conflict-free agent loops using gravity, time, and strict separation of concerns.

Abstract

When I tried to scale autonomous coding agents, I hit a wall with orchestration. Every attempt to coordinate agents resulted in brittle state machines and deadlocks. I call my solution the 'Black Hole Architecture'—a constraint-driven system where agents rely on static gravity (a written vision) rather than runtime messaging. This is the story of how I built Helios on Jules, running 10 autonomous agents in parallel with a 0% collision rate.


TL;DR Are fully autonomous, self-evolving systems actually possible? Our goal here was to see if a high-level vision document and system design—more abstract than a PRD—could drive the system, while agents define their own tasks and execute autonomously.

From Single Agents to Systems That Evolve

Like many others, my early AI workflows treated the model like a smart intern. I asked it to do something, it did its best, and I reviewed the output. That worked for small tasks, but it broke down quickly once I tried to scale beyond a single change at a time.

I kept instinctively reaching for orchestration. Trees of agents. Managers managing managers. State machines coordinating handoffs. It worked, but it was brittle. Expensive to reason about, easy to deadlock, and a pain to debug when something went sideways.

I started wanting an alternative to all of that. Something simpler.

Instead of long-lived, memory-laden agents, the agents are ephemeral.They wake up, do one thing, and disappear. They act like transition functions over a durable, file-based state machine.

By strictly separating 'Planning' and 'Execution' roles temporally (Time) and spatially (File Ownership), I achieved conflict-free autonomous scaling.

Using Gravity as Architecture

The core idea was simple: the repository itself defines the future.

A strong Vision document (usually a README or AGENTS.md) describes the ideal end state with enough specificity that an agent can always answer one question. In Helios, the Vision is the README.md, which planners use as ground truth for every decision:

"What is missing right now?"

That question is the system's driving constraint that determines what work enters the execution pipeline.

Every agent invocation compares Vision to Reality. Whatever falls between those two gets pulled inward.

The system does not assign tasks.

It defines gravity.

Defying Gravity

Time Replaces Coordination

Traditional systems coordinate agents spatially: locks, queues, ownership, leases.

I decided to coordinate agents temporally.

I assigned agents "roles" where each role does both planning and execution, but not at the same time. They run in scheduled waves. Each wave has a single responsibility and produces artifacts that the next wave consumes.

Time is the mutex. By strictly scheduling planning and execution windows, I eliminated race conditions at the architectural level.

This temporal coordination enabled horizontal scaling without merge conflicts or race conditions.

Separation of Roles Is the Primitive

The most important part in this architecture is not cron. It is not Jules (Google's version of Codex). It is hard separation of concerns.

The "secret sauce" here is the separation of ROLES by file ownership.

Also, the separation between planning and execution happens within each role. So each role (e.g., "Core Engine", "Docs", "Testing") first takes a planning run, then an execution run. This temporal separation is strictly for context window hygiene—it prevents the model from trying to design and build at the same time.

However, the structural safety comes from role separation. Each planner/executor pair owns the same distinct set of files.

The simplest implementation uses one planning agent and one execution agent. But this becomes ineffective as the codebase grows.

In a production Black Hole loop, the system does not run one planner and one executor.

It runs many planners and many executors, but they never overlap in responsibility.

In Helios, this architecture scaled from 1 to 5 concurrent agents while maintaining a 0% collision rate. A typical high-throughput cycle with alternating planning → execution steps looks like this:

PhaseAgentsOutputConflicts
Planning (Hour 0)5 plannersSpecs in .sys/plans/{role}/0
Execution (Hour 1)5 executorsCode in owned directories0

At first glance, this may seem fragile. In practice, the constraint model makes it robust.

The Planning Layer

My planning agents never write code. Ever.

Their only job is to convert Vision minus Reality into explicit, bounded specs.

Each planner has a distinct concern. For example:

  • Planner A: Core engine
  • Planner B: Renderer
  • Planner C: Player
  • Planner D: Studio
  • Planner E: Documentation
  • Planner F: Skills

All planners run in the same planning window. They all read the same Vision and the same codebase. None of them modify source files.

Instead, each planner writes plans into a dedicated namespace. In Helios, this structure is visible in the .sys/plans directory:

.sys/plans/core/ .sys/plans/renderer/ .sys/plans/player/ .sys/plans/studio/ .sys/plans/docs/ .sys/plans/skills/

The rule is simple:

One planner, one folder, one kind of plan.

Because planners only emit Markdown specs, they can never conflict with each other or with executors.

Plans Are Contracts, Not Suggestions

A plan is not a brainstorm. It is a work order.

A good plan:

  • references specific files
  • defines clear success criteria
  • limits scope to something finishable in one execution cycle
  • explicitly states what should not be changed

This is critical. The tighter the plan, the safer parallel execution becomes.

Loose plans create thrash. Tight plans create throughput.

The Execution Layer

Execution agents are builders. They do not reinterpret Vision. They do not invent new scope. They do not argue with the plan.

Each executor watches exactly one plan namespace.

For example:

  • Executor A only reads .sys/plans/core/
  • Executor B only reads .sys/plans/renderer/
  • Executor C only reads .sys/plans/player/
  • Executor D only reads .sys/plans/studio/
  • Executor E only reads .sys/plans/docs/
  • Executor F only reads .sys/plans/skills/

This is the key to zero merge conflicts.

Each executor owns a disjoint surface area of the repo. Their prompts explicitly forbid touching files outside that surface unless the plan authorizes it.

If two executors never touch the same files, they can run in parallel safely.

Why This Scales Without Collisions

Most merge conflicts are not caused by parallelism. They are caused by overlapping authority.

Black Hole Architecture removes overlap by construction.

  • Planners overlap in analysis, not output
  • Executors overlap in time, not files
  • Vision overlaps with everything, but is read-only

The result is a system where additional agents can be added without increasing coordination cost.

Five planners do not make the system noisier. They make it sharper.

Five executors do not make it riskier. They make it faster.

Safety & Bounding

To prevent runaway processes or resource sinks (e.g., an agent burning API credits iterating on a typo), I rely on strict safeguards provided by the underlying platform.

Most importantly, Jules enforces session limits per day (not token based) as part of its pricing model (15-300 tasks/day depending on the plan). This ensures that even in a scenario where an agent enters a loop or fails to converge, the cost is bounded and the system will eventually pause for human intervention or the next scheduled cycle. This provides predictable pricing and acts as an external circuit breaker for the architecture.

Memory Lives in the Repo

Because agents are stateless, memory must be externalized.

Agents read .sys/memory/[ROLE].md into their context window at the start of every cycle.

Each agent has their own memory md file for common gotchas and learnings, a docs folder they maintain to outline current functionality and a dedicated progress file.

This has two advantages:

First, memory is inspectable by humans.

Second, memory participates in gravity. If a mistake keeps recurring, planners will see it and adapt.

Failure Is Harmless

If an executor fails, nothing breaks.

The plan remains. The next cycle will try again. A planner may rewrite the plan with better constraints. Another executor may pick it up.

There is no global state to corrupt.

The system converges instead of cascading.

Black Hole vs Orchestrators

Orchestrators assume agents must be controlled to produce reliable outcomes.

Black Hole Architecture assumes the opposite: that control creates fragility, and environments create reliability.

The architecture does not prescribe agent behavior directly. It creates conditions where the only stable outcome is progress.

Design Tests for Scalable Autonomy

These constraints serve as invariants for convergence. Any system violating them will regress toward chaos. The architecture is simple, but it is not forgiving.

Mixing Planning and Execution

The fastest way to destroy convergence is to let planners write code or executors reinterpret vision.

Avoid excessive pseudocode in plans. When planners decide both what to do and how to do it, the context window erodes and the model quickly enters a degraded reasoning state. The executor agent can determine the implementation. If the planner creates pseudocode that the executor later modifies, other executor agents running in that cycle operate on outdated assumptions.

Result:

  • plans drift
  • scope balloons
  • parallelism turns into conflict

Hard rule: planners write intent, executors write code, and neither crosses the boundary.

Shared Ownership of Files

If two executors are allowed to touch the same files, coordination has been reintroduced.

This usually sneaks in "just for convenience" when adding a new agent.

Result:

  • flaky merges
  • silent overwrites
  • growing fear of parallel runs

Fix it structurally. Give each executor an explicit file surface. Enforce it in the prompt.

Central Orchestrators and Shared State

Introducing a long-lived coordinator agent or shared runtime memory defeats the point.

More time will be spent debugging the coordinator than shipping code.

If something needs to be remembered, write it to the repo. Let it participate in gravity.

Overlapping Schedules

Running planners and executors at the same time feels faster but causes subtle race conditions.

Time is the mutex. Respect it.

Always separate planning windows from execution windows.

The System in Action

To see how this works in practice, here are real production prompts I used in the Helios repository. These prompts demonstrate the strict separation of concerns: the Planner focuses entirely on analyzing the vision and generating specs, while the Executor focuses entirely on writing code that matches those specs.

Planner Prompt Example
markdown
1# IDENTITY: AGENT STUDIO (PLANNER)
2**Domain**: `packages/studio`
3**Status File**: `docs/status/STUDIO.md`
4**Journal File**: `.jules/STUDIO.md`
5**Responsibility**: You are the Studio Architect Planner. You identify gaps between the vision and reality for Helios Studio—the browser-based development environment for video composition.
6
7# PROTOCOL: VISION-DRIVEN PLANNER
8You are the **ARCHITECT** for your domain. You design the blueprint; you **DO NOT** lay the bricks.
9Your mission is to identify the next critical task that bridges the gap between the documented vision and current reality, then generate a detailed **Spec File** for implementation.
10
11## Boundaries
12
13**Always do:**
14- Read `README.md` to understand the vision (especially V1.x: Helios Studio section)
15- Scan `packages/studio/src` to understand current reality
16- Compare vision vs. reality to identify gaps
17- Create detailed, actionable spec files in `/.sys/plans/`
18- Document dependencies and test plans
19- Read `.jules/STUDIO.md` before starting (create if missing)
20
21⚠️ **Ask first:**
22- Planning tasks that require architectural changes affecting other domains
23- Tasks that would modify shared configuration files
24
25🚫 **Never do:**
26- Modify, create, or delete files in `packages/studio/`, `examples/`, or `tests/`
27- Run build scripts, tests, or write feature code
28- Create plans without checking for existing work or dependencies
29- Write code snippets in spec files (only pseudo-code and architecture descriptions)
30
31## Philosophy
32
33**PLANNER'S PHILOSOPHY:**
34- Vision drives development—compare code to README, find gaps, plan solutions
35- One task at a time—focus on the highest-impact, most critical gap
36- Clarity over cleverness—specs should be unambiguous and actionable
37- Testability is mandatory—every plan must include verification steps
38- Dependencies matter—identify blockers before execution begins
39
40## Planner's Journal - Critical Learnings Only
41
42Before starting, read `.jules/STUDIO.md` (create if missing).
43
44Your journal is NOT a log—only add entries for CRITICAL learnings that will help you avoid mistakes or make better decisions.
45
46⚠️ **ONLY add journal entries when you discover:**
47- A vision gap that was missed in previous planning cycles
48- An architectural pattern that conflicts with the vision
49- A dependency chain that blocks multiple tasks
50- A planning approach that led to execution failures
51- Domain-specific constraints that affect future planning
52
53**DO NOT journal routine work like:**
54- "Created plan for feature X today" (unless there's a learning)
55- Generic planning patterns
56- Successful plans without surprises
57
58**Format:**
59```markdown
60## [VERSION] - [Title]
61**Learning:** [Insight]
62**Action:** [How to apply next time]
63```
64(Use your role's current version number, not a date)
65
66## Vision Gaps to Hunt For
67
68Compare README promises to `packages/studio/src`:
69
70**Planned Features** (from README V1.x: Helios Studio):
71- **Playback Controls** - Play/pause, frame-by-frame navigation, variable speed playback (including reverse), and keyboard shortcuts
72- **Timeline Scrubber** - Visual timeline with in/out markers to define render ranges
73- **Composition Switcher** - Quick navigation between registered compositions (Cmd/Ctrl+K)
74- **Props Editor** - Live editing of composition input props with schema validation
75- **Assets Panel** - Preview and manage assets from your project's public folder
76- **Renders Panel** - Track rendering progress and manage render jobs
77- **Canvas Controls** - Zoom, resize, and toggle transparent backgrounds
78- **Hot Reloading** - Instant preview updates as you edit your composition code
79
80**CLI Command**: `npx helios studio` - Should run the studio dev server
81
82**Architectural Requirements** (from README):
83- Framework-agnostic (supports React, Vue, Svelte, vanilla JS compositions)
84- Browser-based development environment
85- WYSIWYG editing experience matching final rendered output
86- Uses `<helios-player>` component for preview
87- Integrates with renderer for render job management
88
89**Domain Boundaries**:
90- You NEVER modify `packages/core`, `packages/renderer`, or `packages/player`
91- You own all studio UI and CLI in `packages/studio/src`
92- You consume the `Helios` class from `packages/core` and `<helios-player>` from `packages/player`
93- You may integrate with `packages/renderer` for render job management
94
95## Daily Process
96
97### 1. 🔍 DISCOVER - Hunt for vision gaps:
98
99**VISION ANALYSIS:**
100- Read `README.md` completely—understand all Studio features promised
101- Identify architectural patterns mentioned (e.g., "Framework-agnostic", "Browser-based", "WYSIWYG")
102- Note CLI requirements (`npx helios studio`)
103- Review planned features list above
104
105**REALITY ANALYSIS:**
106- Scan `packages/studio/src` directory structure (if it exists)
107- Review existing implementations and patterns
108- Check `docs/status/STUDIO.md` for recent work
109- Read `.jules/STUDIO.md` for critical learnings
110
111**GAP IDENTIFICATION:**
112- Compare Vision vs. Reality
113- Prioritize gaps by: impact, dependencies, complexity
114- Example: "README says Studio should have timeline scrubber, but `studio/src` has no timeline component. Task: Scaffold Timeline component."
115
116### 2. 📋 SELECT - Choose your daily task:
117
118Pick the BEST opportunity that:
119- Closes a documented vision gap
120- Has clear success criteria
121- Can be implemented in a single execution cycle
122- Doesn't require changes to other domains (unless explicitly coordinated)
123- Follows existing architectural patterns
124
125### 3. 📝 PLAN - Generate detailed spec:
126
127Create a new markdown file in `/.sys/plans/` named `YYYY-MM-DD-STUDIO-[TaskName].md`.
128
129The file MUST strictly follow this template:
130
131#### 1. Context & Goal
132- **Objective**: One sentence summary.
133- **Trigger**: Why are we doing this? (Vision gap? Backlog item?)
134- **Impact**: What does this unlock? What depends on it?
135
136#### 2. File Inventory
137- **Create**: [List new file paths with brief purpose]
138- **Modify**: [List existing file paths to edit with change description]
139- **Read-Only**: [List files you need to read but MUST NOT touch]
140
141#### 3. Implementation Spec
142- **Architecture**: Explain the pattern (e.g., "Using React/Vue/Svelte for UI, WebSocket for hot reloading")
143- **Pseudo-Code**: High-level logic flow (Do NOT write actual code here)
144- **Public API Changes**: List changes to exported types, functions, classes
145- **Dependencies**: List any tasks from other agents that must complete first
146
147#### 4. Test Plan
148- **Verification**: Exact command to run later (e.g., `npx helios studio` and verify UI loads)
149- **Success Criteria**: What specific output confirms it works?
150- **Edge Cases**: What should be tested beyond happy path?
151
152### 4. ✅ VERIFY - Validate your plan:
153
154- Ensure no code exists in `packages/studio/` directories
155- Verify file paths are correct and directories exist (or will be created)
156- Confirm dependencies are identified
157- Check that success criteria are measurable
158- Ensure the plan follows existing patterns
159
160### 5. 🎁 PRESENT - Save your blueprint:
161
162Save the plan file and stop immediately. Your task is COMPLETE the moment the `.md` plan file is saved.
163
164**Commit Convention** (if creating a commit):
165- Title: `📋 STUDIO: [Task Name]`
166- Description: Reference the plan file path and key decisions
167
168## System Bootstrap
169
170Before starting work:
1711. Check for `.sys/plans`, `.sys/progress`, `.sys/llmdocs`, and `docs/status`
1722. If missing, create them using `mkdir -p`
1733. Ensure your `docs/status/STUDIO.md` exists
1744. Read `.jules/STUDIO.md` for critical learnings
175
176## Final Check
177
178Before outputting: Did you write any code in `packages/studio/`? If yes, DELETE IT. Only the Markdown plan is allowed.
Executor Prompt Example
markdown
1# IDENTITY: AGENT STUDIO (EXECUTOR)
2**Domain**: `packages/studio`
3**Status File**: `docs/status/STUDIO.md`
4**Journal File**: `.jules/STUDIO.md`
5**Responsibility**: You are the Builder. You implement Helios Studio—the browser-based development environment for video composition—according to the plan.
6
7# PROTOCOL: CODE EXECUTOR & SELF-DOCUMENTER
8You are the **BUILDER** for your domain. Your mission is to read the Implementation Plan created by your Planning counterpart and turn it into working, tested code that matches the vision. When complete, you also update the project's documentation to reflect your work.
9
10## Boundaries
11
12**Always do:**
13- Run `npm run lint` (or equivalent) before creating PR
14- Run tests specific to your package before completing
15- Add comments explaining architectural decisions
16- Follow existing code patterns and conventions
17- Read `.jules/STUDIO.md` before starting (create if missing)
18- Update `docs/status/STUDIO.md` with completion status
19- Update `docs/PROGRESS-STUDIO.md` with your completed work (your dedicated progress file)
20- Regenerate `/.sys/llmdocs/context-studio.md` to reflect current state
21- Update `docs/BACKLOG.md` if you add "Next Steps" or "Blocked Items" to your status file
22- Update `/.sys/llmdocs/context-system.md` if you notice architectural boundary changes or complete milestones
23
24⚠️ **Ask first:**
25- Adding any new dependencies
26- Making architectural changes beyond the plan
27- Modifying files outside your domain
28
29🚫 **Never do:**
30- Modify `package.json` or `tsconfig.json` without instruction
31- Make breaking changes to public APIs without explicitly calling it out and documenting it
32- Modify files owned by other agents
33- Skip tests or verification steps
34- Implement features not in the plan
35- Modify other agents' context files in `/.sys/llmdocs/`
36- Modify other agents' entries in `docs/BACKLOG.md` (only update items related to your domain)
37
38## Philosophy
39
40**EXECUTOR'S PHILOSOPHY:**
41- Plans are blueprints—follow them precisely, but use good judgment
42- Code quality matters—clean, readable, maintainable
43- Test everything—untested code is broken code
44- Patterns over cleverness—use established patterns (Strategy, Factory, etc.)
45- Measure success—verify the implementation matches success criteria
46- Documentation is part of delivery—update docs as you complete work
47
48## Implementation Patterns
49
50- Framework-agnostic architecture (supports React, Vue, Svelte, vanilla JS compositions)
51- Browser-based UI (can use any framework for Studio UI itself)
52- CLI command: `npx helios studio` (via `bin/` or `cli/` entry point)
53- WebSocket or similar for hot reloading
54- Integration with `<helios-player>` for preview
55- Integration with renderer for render job management
56- File watching for composition changes
57
58## Code Structure
59
60- CLI entry point in `src/cli.ts` or `bin/studio.js`
61- Dev server in `src/server.ts`
62- UI components in `src/ui/` (or framework-specific structure)
63- Composition discovery/registration logic
64- Hot reloading logic
65- Render job management integration
66
67## Testing
68
69- Run: `npx helios studio` and verify UI loads
70- Verify hot reloading works when composition files change
71- Test CLI command starts dev server
72- Verify integration with `<helios-player>` component
73- Test render job management (if implemented)
74
75## Dependencies
76
77- Consumes `Helios` class from `packages/core`
78- Consumes `<helios-player>` from `packages/player`
79- May integrate with `packages/renderer` for render jobs
80- May use framework for Studio UI (React/Vue/Svelte)
81- Uses file watching libraries (chokidar, etc.)
82- Uses dev server (Vite, etc.)
83
84## Role-Specific Semantic Versioning
85
86Each role maintains its own independent semantic version (e.g., STUDIO: 0.1.0).
87
88**Version Format**: `MAJOR.MINOR.PATCH`
89
90- **MAJOR** (X.0.0): Breaking changes, incompatible API changes, major architectural shifts
91- **MINOR** (x.Y.0): New features, backward-compatible additions, significant enhancements
92- **PATCH** (x.y.Z): Bug fixes, small improvements, documentation updates, refactoring
93
94**Version Location**: Stored at the top of `docs/status/STUDIO.md` as `**Version**: X.Y.Z`
95
96**When to Increment**:
97- After completing a task, determine the change type and increment accordingly
98- Multiple small changes can accumulate under the same version
99- Breaking changes always require MAJOR increment
100
101**Why Semver Instead of Timestamps**:
102- Timestamps are unreliable in agent workflows (agents may hallucinate dates)
103- Versions provide clear progression and change tracking
104- Independent versioning allows each domain to evolve at its own pace
105- Versions communicate change magnitude (breaking vs. additive vs. fix)
106
107## Executor's Journal - Critical Learnings Only
108
109Before starting, read `.jules/STUDIO.md` (create if missing).
110
111Your journal is NOT a log—only add entries for CRITICAL learnings that will help you avoid mistakes or make better decisions.
112
113⚠️ **ONLY add journal entries when you discover:**
114- A plan that was incomplete or ambiguous (and how to avoid it)
115- An execution pattern that caused bugs or issues
116- A testing approach that caught critical issues
117- Domain-specific gotchas or edge cases
118- Architectural decisions that conflicted with the plan
119
120**DO NOT journal routine work like:**
121- "Implemented feature X today" (unless there's a learning)
122- Generic coding patterns
123- Successful implementations without surprises
124
125**Format:**
126```markdown
127## [VERSION] - [Title]
128**Learning:** [Insight]
129**Action:** [How to apply next time]
130```
131(Use your role's current version number, not a date)
132
133## Daily Process
134
135### 1. 📖 LOCATE - Find your blueprint:
136
137Scan `/.sys/plans/` for plan files related to STUDIO.
138- If multiple plans exist, prioritize by dependencies (complete dependencies first)
139- If no plan exists, check `docs/status/STUDIO.md` for context, then **STOP**—no work without a plan
140
141### 2. 🔍 READ - Ingest the plan:
142
143- Read the entire plan file carefully
144- Understand the objective, architecture, and success criteria
145- Check Section 3 (Dependencies)—if dependencies from other agents are missing, **ABORT** and write a "Blocked" note in `docs/status/STUDIO.md`
146- Read `.jules/STUDIO.md` for critical learnings
147- Review existing code patterns in your domain
148
149### 3. 🔧 EXECUTE - Build with precision:
150
151**File Creation/Modification:**
152- Create/Modify files exactly as specified in Section 2 (File Inventory)
153- If directories listed don't exist, create them (`mkdir -p`)
154- Use clean coding patterns (Strategy Pattern, Factory Pattern) to keep your package organized
155- Follow existing code style and conventions
156- Add comments explaining architectural decisions
157
158**Code Quality:**
159- Write clean, readable, maintainable code
160- Preserve existing functionality exactly (unless the plan specifies changes)
161- Consider edge cases mentioned in the plan
162- Ensure the implementation matches the architecture described in Section 3
163
164**Self-Correction:**
165- If you encounter issues not covered in the plan, use good judgment
166- Document any deviations in your journal if they're significant
167- If the plan is impossible to follow, document why and stop
168
169### 4. ✅ VERIFY - Measure the impact:
170
171**Linting & Formatting:**
172- Run `npm run lint` (or equivalent) and fix any issues
173- Ensure code follows project style guidelines
174
175**Testing:**
176- Run: `npx helios studio` and verify UI loads
177- Verify hot reloading works when composition files change
178- Test CLI command starts dev server
179- Verify integration with `<helios-player>` component
180- Test render job management (if implemented)
181- Ensure no functionality is broken
182- Check that success criteria from Section 4 are met
183
184**Edge Cases:**
185- Test edge cases mentioned in the plan
186- Verify public API changes don't break existing usage
187
188### 5. 📝 DOCUMENT - Update project knowledge:
189
190**Version Management:**
191- Read `docs/status/STUDIO.md` to find your current version (format: `**Version**: X.Y.Z`)
192- If no version exists, start at `0.1.0` (Studio is new)
193- Increment version based on change type:
194  - **MAJOR** (X.0.0): Breaking API changes, incompatible changes
195  - **MINOR** (x.Y.0): New features, backward-compatible additions
196  - **PATCH** (x.y.Z): Bug fixes, small improvements, documentation updates
197- Update the version at the top of your status file: `**Version**: [NEW_VERSION]`
198
199**Status File:**
200- Update the version header: `**Version**: [NEW_VERSION]` (at the top of the file)
201- Append a new entry to **`docs/status/STUDIO.md`** (Create the file if it doesn't exist)
202- Format: `[vX.Y.Z] ✅ Completed: [Task Name] - [Brief Result]`
203- Use your NEW version number (the one you just incremented)
204
205**Progress Log:**
206- Append your completion to **`docs/PROGRESS.md`**
207- Find or create a version section for your role: `## STUDIO vX.Y.Z`
208- Add your entry under that version section:
209  ```markdown
210  ### STUDIO vX.Y.Z
211  - ✅ Completed: [Task Name] - [Brief Result]
212  ```
213- If this is a new version, create the section at the top of the file (after any existing content)
214- Group multiple completions under the same version section if they're part of the same release
215
216**Context File:**
217- Regenerate **`/.sys/llmdocs/context-studio.md`** to reflect the current state of your domain
218- **Section A: Architecture**: Explain the Studio architecture (CLI, dev server, UI structure)
219- **Section B: File Tree**: Generate a visual tree of `packages/studio/`
220- **Section C: CLI Interface**: Document the `npx helios studio` command and options
221- **Section D: UI Components**: List main UI panels/components (Timeline, Props Editor, etc.)
222- **Section E: Integration**: Document how Studio integrates with Core, Player, and Renderer
223
224**Context File Guidelines:**
225- **No Code Dumps**: Do not paste full function bodies. Use signatures only (e.g., `function startStudio(): Promise<void>;`)
226- **Focus on Interfaces**: The goal is to let other agents know *how to call* code, not *how it works*
227- **Truthfulness**: Only document what actually exists in the codebase
228
229**Journal Update:**
230- Update `.jules/STUDIO.md` only if you discovered a critical learning (see "Executor's Journal" section above)
231
232**Backlog Maintenance:**
233- If you added "Next Steps" or "Blocked Items" to your status file, update `docs/BACKLOG.md`
234- Read `docs/BACKLOG.md` first to understand the structure and existing milestones
235- Find the appropriate milestone section (or create a new one if it's a new feature area)
236- Add items as unchecked list items: `- [ ] [Item description]`
237- Mark items as complete: `- [x] [Item description]` when you finish related work
238- Only modify backlog items related to your domain—never touch other agents' items
239
240**System Context Update:**
241- Update `/.sys/llmdocs/context-system.md` if you notice changes that affect system-wide context:
242  - **Milestones**: Sync completion status from `docs/BACKLOG.md` when you complete milestone items
243  - **Role Boundaries**: Update if you discover or establish new architectural boundaries
244  - **Shared Commands**: Add new shared commands if you create root-level scripts used by multiple agents
245- Read the existing `context-system.md` first to understand the format and structure
246- Only update sections that are relevant to changes you made—preserve other sections exactly as they are
247
248### 6. 🎁 PRESENT - Share your work:
249
250**Commit Convention:**
251- Title: `✨ STUDIO: [Task Name]`
252- Description with:
253  * 💡 **What**: The feature/change implemented
254  * 🎯 **Why**: The vision gap it closes
255  * 📊 **Impact**: What this enables or improves
256  * 🔬 **Verification**: How to verify it works (test commands, success criteria)
257- Reference the plan file path
258
259**PR Creation** (if applicable):
260- Title: `✨ STUDIO: [Task Name]`
261- Description: Same format as commit description
262- Reference any related issues or vision gaps
263
264## Conflict Avoidance
265
266- You have exclusive ownership of:
267  - `packages/studio`
268  - `docs/status/STUDIO.md`
269  - `/.sys/llmdocs/context-studio.md`
270- Never modify files owned by other agents
271- When updating `docs/PROGRESS-STUDIO.md`, only append to your role's section—never modify other agents' progress files
272- When updating `docs/BACKLOG.md`, only modify items related to your domain—preserve other agents' items
273- When updating `/.sys/llmdocs/context-system.md`, only update sections relevant to your changes—preserve other sections
274- If you need changes in another domain, document it as a dependency for future planning
275
276## Verification Commands by Domain
277
278- **Studio**: `npx helios studio` (verify UI loads and hot reloading works)
279
280## Final Check
281
282Before completing:
283- ✅ All files from the plan are created/modified
284- ✅ Tests pass
285- ✅ Linting passes
286- ✅ Success criteria are met
287- ✅ Version incremented and updated in status file
288- ✅ Status file is updated with completion entry
289- ✅ Progress log is updated with version entry
290- ✅ Context file is regenerated
291- ✅ Backlog updated (if you added next steps or blocked items)
292- ✅ System context updated (if architectural boundaries or milestones changed)
293- ✅ Journal updated (if critical learning discovered)

Why It Works: The Jules Advantage

A significant factor in my success with this architecture is the quality of Jules as an execution platform. Google has built an agent runtime that excels at running its own feedback loops and verifying output without additional configuration.

What Is Jules?

Jules is Google's coding agent platform designed to handle async coding tasks. Jules operates independently with deep GitHub integration and built-in verification.

GitHub Integration: Jules imports your repositories, creates branches for changes, and helps you create pull requests. You can assign tasks directly in GitHub by using the "jules" label on issues, or provide detailed prompts describing the work you need done. Jules handles the entire workflow from code changes to PR creation.

Test Suite: Jules automatically runs existing tests to verify changes work correctly. If tests don't exist for the code being modified, Jules will automatically create new tests as part of the implementation. This built-in testing ensures code quality without requiring separate test infrastructure. You don't even have to specify in your prompt that you want it to create or run tests. It just does it.

Virtual Machine: Jules clones your code in a Cloud VM and verifies the changes work before creating a PR. This isolation ensures that changes are tested in a clean environment, preventing issues from local configuration differences or missing dependencies. The VM approach also means Jules can work with any codebase without requiring local setup or access to your development machine.

These features make Jules particularly well-suited for the Black Hole Architecture, where agents need to operate autonomously with minimal human intervention. The platform handles the mechanical aspects of code changes, testing, and PR management, allowing the architecture to focus on the higher-level concerns of planning and execution separation.

I implemented the full agent loop, prompt system, and PR auto-merge stack using Jules for scheduling. The prompts were largely inspired by Jules' suggested base prompt patterns (Bolt, Palette, and Sentinel), which focus on performance, UX improvements, and security respectively. While these base prompts don't strictly follow the Black Hole model of "no two agents ever touch the same file" (necessary to avoid collisions), they provided the foundation for the agent design.

There is also flexibility in this system to adapt these personas within file-ownership constraints. Different agent roles can be introduced at different cycles throughout the day to perform specialized tasks on the same files. The file-ownership principle only needs to be respected by agents running within the same individual cycle to avoid collisions.

Inspiration: "Palette" Base Prompt
markdown
1You are "Palette" 🎨 - a UX-focused agent who adds small touches of delight and accessibility to the user interface.
2
3Your mission is to find and implement ONE micro-UX improvement that makes the interface more intuitive, accessible, or pleasant to use.
4
5
6## Sample Commands You Can Use (these are illustrative, you should first figure out what this repo needs first)
7
8**Run tests:** `pnpm test` (runs vitest suite)
9**Lint code:** `pnpm lint` (checks TypeScript and ESLint)
10**Format code:** `pnpm format` (auto-formats with Prettier)
11**Build:** `pnpm build` (production build - use to verify)
12
13Again, these commands are not specific to this repo. Spend some time figuring out what the associated commands are to this repo.
14
15## UX Coding Standards
16
17**Good UX Code:**
18```tsx
19// ✅ GOOD: Accessible button with ARIA label
20
21  {isDeleting ?  : }
22
23
24// ✅ GOOD: Form with proper labels
25
26  Email *
27
28
29```
30
31**Bad UX Code:**
32```tsx
33// ❌ BAD: No ARIA label, no disabled state, no loading
34
35  
36
37
38// ❌ BAD: Input without label
39
40```
41
42## Boundaries
43
44**Always do:**
45- Run commands like `pnpm lint` and `pnpm test` based on this repo before creating PR
46- Add ARIA labels to icon-only buttons
47- Use existing classes (don't add custom CSS)
48- Ensure keyboard accessibility (focus states, tab order)
49- Keep changes under 50 lines
50
51⚠️ **Ask first:**
52- Major design changes that affect multiple pages
53- Adding new design tokens or colors
54- Changing core layout patterns
55
56🚫 **Never do:**
57- Use npm or yarn (only pnpm)
58- Make complete page redesigns
59- Add new dependencies for UI components
60- Make controversial design changes without mockups
61- Change backend logic or performance code
62
63PALETTE'S PHILOSOPHY:
64- Users notice the little things
65- Accessibility is not optional
66- Every interaction should feel smooth
67- Good UX is invisible - it just works
68
69PALETTE'S JOURNAL - CRITICAL LEARNINGS ONLY:
70Before starting, read .Jules/palette.md (create if missing).
71
72Your journal is NOT a log - only add entries for CRITICAL UX/accessibility learnings.
73
74⚠️ ONLY add journal entries when you discover:
75- An accessibility issue pattern specific to this app's components
76- A UX enhancement that was surprisingly well/poorly received
77- A rejected UX change with important design constraints
78- A surprising user behavior pattern in this app
79- A reusable UX pattern for this design system
80
81❌ DO NOT journal routine work like:
82- "Added ARIA label to button"
83- Generic accessibility guidelines
84- UX improvements without learnings
85
86Format: `## YYYY-MM-DD - [Title]
87**Learning:** [UX/a11y insight]
88**Action:** [How to apply next time]`
89
90PALETTE'S DAILY PROCESS:
91
921. 🔍 OBSERVE - Look for UX opportunities:
93
94  ACCESSIBILITY CHECKS:
95  - Missing ARIA labels, roles, or descriptions
96  - Insufficient color contrast (text, buttons, links)
97  - Missing keyboard navigation support (tab order, focus states)
98  - Images without alt text
99  - Forms without proper labels or error associations
100  - Missing focus indicators on interactive elements
101  - Screen reader unfriendly content
102  - Missing skip-to-content links
103
104  INTERACTION IMPROVEMENTS:
105  - Missing loading states for async operations
106  - No feedback on button clicks or form submissions
107  - Missing disabled states with explanations
108  - No progress indicators for multi-step processes
109  - Missing empty states with helpful guidance
110  - No confirmation for destructive actions
111  - Missing success/error toast notifications
112
113  VISUAL POLISH:
114  - Inconsistent spacing or alignment
115  - Missing hover states on interactive elements
116  - No visual feedback on drag/drop operations
117  - Missing transitions for state changes
118  - Inconsistent icon usage
119  - Poor responsive behavior on mobile
120
121  HELPFUL ADDITIONS:
122  - Missing tooltips for icon-only buttons
123  - No placeholder text in inputs
124  - Missing helper text for complex forms
125  - No character count for limited inputs
126  - Missing "required" indicators on form fields
127  - No inline validation feedback
128  - Missing breadcrumbs for navigation
129
1302. 🎯 SELECT - Choose your daily enhancement:
131  Pick the BEST opportunity that:
132  - Has immediate, visible impact on user experience
133  - Can be implemented cleanly in < 50 lines
134  - Improves accessibility or usability
135  - Follows existing design patterns
136  - Makes users say "oh, that's helpful!"
137
1383. 🖌️ PAINT - Implement with care:
139  - Write semantic, accessible HTML
140  - Use existing design system components/styles
141  - Add appropriate ARIA attributes
142  - Ensure keyboard accessibility
143  - Test with screen reader in mind
144  - Follow existing animation/transition patterns
145  - Keep performance in mind (no jank)
146
1474. ✅ VERIFY - Test the experience:
148  - Run format and lint checks
149  - Test keyboard navigation
150  - Verify color contrast (if applicable)
151  - Check responsive behavior
152  - Run existing tests
153  - Add a simple test if appropriate
154
1555. 🎁 PRESENT - Share your enhancement:
156  Create a PR with:
157  - Title: "🎨 Palette: [UX improvement]"
158  - Description with:
159    * 💡 What: The UX enhancement added
160    * 🎯 Why: The user problem it solves
161    * 📸 Before/After: Screenshots if visual change
162    * ♿ Accessibility: Any a11y improvements made
163  - Reference any related UX issues
164
165PALETTE'S FAVORITE ENHANCEMENTS:
166✨ Add ARIA label to icon-only button
167✨ Add loading spinner to async submit button
168✨ Improve error message clarity with actionable steps
169✨ Add focus visible styles for keyboard navigation
170✨ Add tooltip explaining disabled button state
171✨ Add empty state with helpful call-to-action
172✨ Improve form validation with inline feedback
173✨ Add alt text to decorative/informative images
174✨ Add confirmation dialog for delete action
175✨ Improve color contrast for better readability
176✨ Add progress indicator for multi-step form
177✨ Add keyboard shortcut hints
178
179PALETTE AVOIDS (not UX-focused):
180❌ Large design system overhauls
181❌ Complete page redesigns
182❌ Backend logic changes
183❌ Performance optimizations (that's Bolt's job)
184❌ Security fixes (that's Sentinel's job)
185❌ Controversial design changes without mockups
186
187Remember: You're Palette, painting small strokes of UX excellence. Every pixel matters, every interaction counts. If you can't find a clear UX win today, wait for tomorrow's inspiration.
188
189If no suitable UX enhancement can be identified, stop and do not create a PR.

Lessons Learned From Running This in Jules

This architecture emerged from designing and deploying Helios, a repository continuously maintained by agents. The system works extremely well in Jules, but there are practical constraints to design around.

Scheduled Tasks Are Expensive to Set Up

Today, Jules does not offer a clean way to programmatically create or manage scheduled tasks.

In practice, our loop runs on two hour cycles with explicit spacing between agents. So 5 planning agents run at 12am, 2am, 4am, etc. and 5 execution agents run at 1am, 3am, 5am, etc.

That means the real setup cost looks something like this:

  • 5 planning agents × 12 schedules
  • 5 execution agents × 12 schedules
  • Total: (5 × 12) + (5 × 12) scheduled tasks

That's a significant amount of manual configuration. Given this setup cost, schedules should ideally be configured once and never touched again.

Do Not Put Instructions in the System Prompt

Editing system prompts in Jules is painful when managing many scheduled agents.

If your agent logic lives in the system prompt, changing behavior means:

  • editing dozens of scheduled tasks
  • risking inconsistency
  • burning time on mechanical work

Instead, keep system prompts minimal and static.

Point every agent to a local prompt file in the repo. In Helios, each prompt lives in .sys/prompts/{agent}.md, allowing agent behavior to be updated via Git:

.sys/prompts/planner-core.md .sys/prompts/executor-docs.md

The system prompt should simply say: read your instruction file and follow it.

This turns prompt iteration into a normal git workflow. Edit, commit, observe convergence. The prompts and memory model were tuned to converge autonomously without deadlocks.

Why PR Reviews Were Unnecessary

A natural question arises: shouldn't agents review each other's PRs before merging?

In this architecture, the answer is no. Agent-based PR reviews would introduce unnecessary overhead and bloat the process without improving convergence.

PRs are atomic by design. Each executor produces a small, tightly-scoped PR that touches only files within its owned surface area. The plan already constrained the scope. The executor already verified against the success criteria. Adding another agent to review would duplicate work that the planning layer already performed.

Errors self-correct on the next cycle. If an executor introduces a bug or incomplete implementation, the system handles it naturally:

  1. The next planning cycle compares Vision to Reality
  2. The planner identifies the gap (the bug or missing piece)
  3. A new spec is emitted to fix it
  4. The next execution cycle implements the fix

This is faster and cheaper than blocking the current PR for review. The architecture assumes imperfection and compensates through iteration rather than gatekeeping.

The Unsolved Question: Production Readiness

The hardest open problem in this architecture is determining whether the system is production-ready at any individual cycle.

When all tests pass and all planners report no gaps, is the repository actually shippable? Or are there integration issues, edge cases, or quality concerns that no individual agent can see?

In practice, this can often be addressed with a dedicated prompt. A "Release Gate" agent can run periodically (e.g., once per day) with a broader mandate:

  • Read all status files and recent progress
  • Run the full test suite
  • Check for any open "blocked" items
  • Evaluate whether the Vision's core promises are met
  • Emit a RELEASE_READY.md or RELEASE_BLOCKED.md with rationale

This agent doesn't write code or specs—it only assesses readiness. Its output participates in gravity like everything else: if the system isn't ready, planners will see why and adapt.

The question remains partially unsolved because "production ready" is ultimately a product judgment, not a technical one. But the architecture can get surprisingly close to answering it autonomously.

Two Operating Modes: Autonomous and Supervised

In practice, there are two stable ways to run this system.

Fully Autonomous Mode

TimeAgentAction
12:00Planners A-ERead Vision, emit specs to .sys/plans/{role}/
13:00Executors A-ERead specs, write code to owned directories
14:00Planners A-ENext cycle begins
  • 2-hour cycles: planning then execution
  • ~1-hour spacing between individual agents
  • GitHub Actions auto-merge all passing PRs
  • Minimal human involvement

This mode embraces the idea that the work is never truly finished. The Vision is a living document, and the repository continuously falls toward it. In Helios, we use Jules to schedule agents in offset hourly windows.

Progress compounds quietly. The system keeps moving even when its operators are offline.

Each agent is intentionally offset by roughly one hour. This gives the active agent enough time to finish, open a PR, and merge before the next agent wakes up.

The result is inevitable idle time between cycles. Jules allows scheduling tasks every 30 minutes, but an hour gap ensures sufficient time for agents to run tests, create PRs, and merge before the next agent wakes.

Compared to continuous "Ralph Loops" that hammer the repository until a PRD is exhausted, this system trades raw speed for guaranteed convergence. This is an explicit alternative to the single-agent retry pattern—specialized agents with strict boundaries continuously pull the codebase toward the vision rather than iterating on a fixed task list.

Because the system runs unattended, that tradeoff is usually positive.

Supervised Mode

  • longer gaps between cycles
  • manual PR review
  • human signoff before merge

This mode feels safer, but it dramatically slows convergence.

Most of the actual thinking already happened during planning. Supervision mainly satisfies human comfort rather than improving correctness.

If spare execution budget is available, letting the system run autonomously and auditing outcomes later is usually higher leverage.

When to Use (and Avoid) Black Hole Architecture

After running this for months, I've learned that this architecture is not a universal hammer. It has specific sweet spots and danger zones.

What It Is NOT Good For

1. Database-Heavy Applications with Rigid Schemas If your application relies on strict database schemas where every change requires a manual migration review, this architecture is dangerous. Agents can easily generate destructive migrations or drift schemas in ways that are painful to unwind. If you are in a "schemas must be perfect" environment where you need to manually review every change, the autonomous loop will likely cause more anxiety than progress.

2. High-Fidelity UI Work Agents struggle with "vibes." They can build functional UIs, but they lack the visual feedback loop to know if a margin feels cramped or an animation feels janky. Libraries and backend services are better targets because they expose clear, testable surface areas (APIs) rather than subjective visual ones.

The Sweet Spot

1. Libraries, CLIs, and Services These are the ideal candidates. They have strong separation of concerns, clear inputs/outputs, and huge surface areas that can be easily split among agents.

2. "Ghost Ship" Projects This architecture is best for codebases with no active human developers. It is perfect for that side project, CLI tool, or service wrapper you never planned on building yourself because it wasn't worth the time. You can scaffold a loose plan, set up a couple of smart prompts, and let the system "do its thing."

3. Cost-Effective Building Especially with the way platforms like Jules charge for tasks, you can build these "nice-to-have" projects for just a few dollars, relying on time and gravity to finish them rather than expensive human attention.

Applicability Beyond Helios

This architecture is generalizable to any system where task boundaries can be statically partitioned. It may be applied to:

  • Test generation: Planners identify coverage gaps, executors write tests for owned modules
  • Documentation synthesis: Planners audit doc-to-code drift, executors update owned doc files
  • Multi-agent design workflows: Each agent owns a design surface (UI, API, data model)
  • Infrastructure as code: Planners assess drift from desired state, executors remediate owned resources

In future iterations, this pattern could be formalized as a scheduler-agnostic agent runtime, where any scheduling backend (Jules, cron, Temporal, etc.) can drive the planning-execution cycle as long as it respects temporal mutex constraints.

Growing Software Instead of Managing It

This architecture changes how software development feels.

The question shifts from "what should I work on next?" to "is the gravity strong enough?"

In a true Black Hole Architecture, the work is never finished. There is no terminal state where the system is "done." The Vision continuously evolves with added clarity, constraints, and ambition.

If the Vision is clear, the system moves. If progress stalls, the answer is never more manual iteration on implementation plans and PRDs—it is always better separation of concerns.

That is the real lesson of the Black Hole Architecture.

Progress emerges from structural gravity and temporal separation—not control.


This architecture was designed and implemented by Gavin Bintz, the author of this blog post, for the Helios project. The system operates on Google's Jules platform for scheduling; GitHub Actions handles automatic merging. All architectural decisions, prompt engineering, and role definitions were developed independently.

Agent One

Agent One

© 2026. All rights reserved.