Archive CONSTRUCT
Codex Gets the Office Graph, Flue Names the Harness, and ARC Stays Under One Percent / DISPATCH 002
PDF RSS

Dispatch 002 · 2026-05-01

Codex Gets the Office Graph, Flue Names the Harness, and ARC Stays Under One Percent

/ 00:17:47 / 8 sources

“If agents can cross systems, the operator contract needs scopes, budgets, logs, replay, and a way to stop the loop.”

— Lenar Kess, today's narration

If agents can cross systems, the operator contract needs scopes, budgets, logs, replay, and a way to stop the loop.

  • Codex Gets the Office Graph, Flue Names the Harness, and ARC Stays Under One Percent

Chapters

  1. 00:00:00 Transcript

Sources

8 cited
  1. 1

    Bring your work into Codex in a few clicks

    Video OpenAI — OpenAI product demo for Codex setup and connectors.

    fully connected context, and a useful first workflow all in about 60 seconds

    www.youtube.com/watch?v=flvZ6jEj3VU →
    Details
    Cited text
    fully connected context, and a useful first workflow all in about 60 seconds
    Context
    It frames Codex as a connected work agent rather than only a coding assistant.
    Key points
    • The demo shows Codex setup through personalization, project import, and plugin enablement.
    • Named plugins include documents, spreadsheets, presentations, browser, computer, calendar, email, Slack, and Google Drive.
    • The example workflow asks Codex to prepare a sales-call brief from calendar, Gmail, and Slack.
    Provenance
    Video · Supporting source
  2. 2

    Introducing Flue — The First Agent Harness Framework

    X Fred Schott — Creator in the Astro ecosystem announcing an agent framework.

    100% headless and programmable

    x.com/FredKSchott/status/2050274923852210397 →
    Details
    Cited text
    100% headless and programmable
    Context
    It names the harness as the product layer for agents.
    Key points
    • Flue is a TypeScript framework for building agents around a built-in harness.
    • The announcement says most logic lives in Markdown: skills, context, and AGENTS files.
    • It is positioned as runtime-agnostic across Node, Cloudflare, GitHub Actions, and GitLab CI.
    Provenance
    Tweet · Primary source
  3. 3

    Introducing AI CLI

    X Chris Tate — Developer announcing a terminal tool for multi-modal AI generation.

    Generate images, video, and text from your terminal. Pipe them together.

    x.com/ctatedev/status/2050306123706613771 →
    Details
    Cited text
    Generate images, video, and text from your terminal. Pipe them together.
    Context
    It treats the command line as a common integration surface for agents.
    Key points
    • AI CLI exposes image, video, and text generation from the terminal.
    • The announcement emphasizes piping, multi-model comparison, inline previews, and no native dependencies.
    • It is presented as working with any agent.
    Provenance
    Tweet · Primary source
  4. 4

    Introducing slop-review

    X Dan Bachelder — Developer adapting a review tool across multiple assistants.

    made it work for pi, Claude and OpenAI Codex

    x.com/BachelderDan/status/20503332427342930… →
    Details
    Cited text
    made it work for pi, Claude and OpenAI Codex
    Context
    Portable review tools reduce dependence on one agent surface.
    Key points
    • The tool adapts pi-diff-review for pi, Claude, and OpenAI Codex.
    • The artifact is small, practical, and cross-assistant.
    • It points to review logic as a portable contract rather than a single assistant feature.
    Provenance
    Tweet · Primary source
  5. 5

    Cloud Skills Are Still Just Skills

    Source AndyNemmity — ClaudeAI community post arguing for inspectable skill pipelines.

    You can’t compose what you can’t read

    www.reddit.com/r/ClaudeAI/comments/1t0wlme/… →
    Details
    Cited text
    You can’t compose what you can’t read
    Context
    It gives the episode a concrete user-side counterpoint to closed agent capabilities.
    Key points
    • The post argues that skill composability depends on being able to inspect and modify prompts.
    • It distinguishes open skills users can learn from from closed services users can subscribe to.
    • Comments echo the concern that opaque skills are less useful for custom workflows.
    Provenance
    Source · Background source
  6. 6

    GPT-5.5 and Opus 4.7 on ARC-AGI-3

    X ARC Prize — Benchmark organizer reporting ARC-AGI-3 model results.

    True local effect, false world model

    x.com/arcprize/status/2050261221165989969 →
    Details
    Cited text
    True local effect, false world model
    Context
    It connects model reasoning limits to practical harness design.
    Key points
    • ARC Prize reports GPT-5.5 at 0.43% and Opus 4.7 at 0.18% on ARC-AGI-3.
    • The thread names three failure modes: local effect without world model, wrong abstraction from training data, and solving without reinforcing reward.
    • The analysis is more useful for agent design than the small leaderboard spread alone.
    Provenance
    Tweet · Primary source
  7. 7

    Latest models remain below one percent on ARC-AGI-3

    X François Chollet — ARC creator commenting on the benchmark trend.

    remains below 1% on ARC-AGI-3

    x.com/fchollet/status/2050328852107612559 →
    Details
    Cited text
    remains below 1% on ARC-AGI-3
    Context
    It prevents the segment from treating one model ranking as the main story.
    Key points
    • Chollet frames the latest crop of models as still below one percent on ARC-AGI-3.
    • The open question is where scores land by the end of the year.
    • The comment reinforces that the benchmark is still far from saturated.
    Provenance
    Tweet · Primary source
  8. 8

    I accidentally burned ~$6,000 of Claude usage overnight with one command

    Source procrastinator_eng — ClaudeAI community post describing an unattended loop cost incident.

    The dashboard has a multi-day reporting lag

    www.reddit.com/r/ClaudeAI/comments/1t11mmy/… →
    Details
    Cited text
    The dashboard has a multi-day reporting lag
    Context
    It makes cost controls and stop conditions concrete for headless agent loops.
    Key points
    • The post reports a slash-loop command running 46 times over 26 hours.
    • The author says the conversation reached about 800,000 tokens and prompt caching expired between 30-minute runs.
    • Community responses emphasized event-driven triggers, hard spend limits, and fresh bounded contexts.
    Provenance
    Source · Background source