0.0
Every frame of this documentary was composed by
2.799
a machine.
5.24
The narration you are listening to right now,
8.14
this voice, these words, this pacing, was synthesized
12.599
by a neural network that cloned a five
14.839
-second audio sample.
17.32
The images you are seeing were generated by
20.199
a diffusion model, guided by prompts that a
23.379
language model wrote for itself.
26.319
The music, the color grading, the vignette that
30.219
frames this opening shot, composed, timed, and encoded
34.539
by FFmpeg commands that no human ever typed.
40.659
The part that matters, the part that separates
44.039
this documentary from every other AI-generated video
47.82
on this platform in April 2026, is this.
53.359
The code that creates this documentary.
55.039
The way it produced all of those things
56.56
was also written by a machine.
61.16
There was no developer.
64.34
There was no editor.
66.859
There was a single English language instruction given
70.34
to a terminal window, and 23 minutes later,
73.64
a 15-minute 4,000-pixel documentary existed
77.34
that had not existed before.
82.579
This episode is about the specificity of the
85.019
AI-generated video.
85.04
There is no specific piece of software that
86.42
did that.
88.939
Its name is Clawed Code.
92.62
It was released by Anthropic in a quiet
95.4
developer preview in early 2025, and by the
98.879
time you are watching this, it has already
101.219
rendered a 30-year-old assumption about how
103.959
software is built into a historical artifact.
109.519
To understand what Clawed Code is, you have
113.04
to first understand what it replaces.
116.879
For 30 years, the contract between a human
120.28
being and a computer has been the same.
123.599
The human was the author.
126.52
The computer was the executor.
129.9
A software engineer set in an integrated development
133.3
environment, PyCharm, VS Code, IntelliJ, and composed the
138.719
program, one function at a time, with the
141.62
computer serving as a patient and extremely literal
144.62
mind.
149.799
You never saw it then, but it's now
151.46
your first.'
153.12
It's the world's best video software, now public
154.099
suck.
154.099
It's targeted sessions, and byJoe and Paul, these
154.539
are being made accessible to customers and consumers
154.539
in a Блю Миньк, and b loss at
157.479
the end of months.
157.479
They're picked up for GoodbyeёлGives video proliferations, a
159.319
privilege for the world for sim Romanian peripherals
159.74
only.
159.919
If you'reoi, you're carefully looking forward to watching
161.159
this, thanks to R moment donations through this
161.18
Eriehhare.
163.939
This is a TWITTER page dedicated to the
167.379
impossibility of the dream result regarding how technology
168.02
works.
173.34
Subtitlesdamnit.com
175.02
everyone assumed, was permanent.
178.36
The arrival of large language models in late
180.96
2022 did not appear to threaten it.
185.719
ChatGPT, released by OpenAI that November, was a
189.419
conversation.
191.3
You asked it a question.
193.06
It gave you an answer.
195.639
If you wanted to use that answer, if
198.3
you wanted to put a piece of generated
199.86
code into your
200.919
project or a piece of generated text into
203.759
your manuscript, you had to copy it manually.
207.659
The paste operation belonged to you.
212.62
For roughly two years, this remained the shape
215.599
of every major AI tool.
218.859
GitHub co-pilot suggested lines inside your editor,
222.099
and you accepted or rejected them
224.24
one at a time.
226.9
Cursor let you summon the model into a
229.379
sidebar.
230.08
And you changed the model.
230.919
You chose which diffs to apply.
234.18
The human remained, in every case, the executor
237.759
of the last mile.
241.3
What Anthropic shipped in 2025 with Clawed Code
245.18
was a categorical break from that shape.
250.709
Clawed Code does not live in an IDE.
253.979
It does not suggest.
256.779
It does not autocomplete.
258.699
It lives inside a terminal.
262.279
The bare, text-only interface engineers have used
265.639
since the 1970s, and it takes as its
268.639
input a single line of English.
273.08
You type, for example, add a step to
277.3
the video pipeline that appends a 20-second
279.939
endcard
280.339
to every rendered episode.
284.459
Clawed Code does not answer.
287.48
Clawed Code does not answer.
288.68
It reads the files in your project directory.
295.48
It identifies the relevant pipeline module.
299.839
It locates the render step.
302.959
It drafts a new Python function.
306.439
It writes the function to disk.
309.72
It modifies the main orchestrator to call it.
313.42
It runs your test suite.
316.04
If a test fails, it reads the disk.
318.66
It runs the traceback, diagnoses the cause, and
321.18
patches the code.
323.84
Then it tells you, in one calm sentence,
326.699
what it did.
330.24
The engineer did not type the function.
333.58
The engineer did not open the file.
336.98
The engineer described the outcome, and the outcome
340.139
appeared.
343.6
This is not autocomplete.
346.5
Or, this is delegation.
351.72
And delegation is the mechanism by which entire
355.42
professions have, historically, been collapsed into tooling.
361.72
The word anthropic uses for this paradigm is
365.12
agentic.
367.68
The model is not a text generator.
371.0
It is an agent, a software process with
374.399
goals, tools, and a system.
375.54
The tool is the tool, and the authority
376.839
to use those tools iteratively on its own
379.939
behalf, across dozens of steps, without returning to
383.92
the human for permission at each junction.
389.339
Agenic behavior, in Clawed code specifically, is implemented
393.379
by a small and austere set of primitives.
397.279
A read-file tool.
399.319
A write-file tool.
401.0
A bash tool that executes shell commands.
404.019
A glob tool for file execution.
405.519
A file defining tool.
406.759
A grep tool for searching their contents.
411.54
Combined, these primitives allow the agent to do
414.279
anything a human engineer can do at a
416.379
command line, which is to say, they allow
418.98
it to do the entire job.
423.399
And that is the reason editing software and
426.639
traditional development environments are disappearing.
431.24
The timeline is a surface that existed because
434.579
the human needed it.
435.5
The agent does not need the surface.
441.2
The agent works directly on the file.
446.259
This documentary you are currently watching is the
449.379
first artifact in a new category.
453.259
It was produced by a pipeline that no
456.259
human designed, from a script whose first and
459.199
only draft was expanded by the same agent
461.68
that then encoded the
463.1
final video.
464.24
And it is the first of a series
464.98
of
465.12
And every line of orchestration code, the entire
468.56
machinery that coordinated three GPUs, five APIs, and
473.519
four thousand discrete asset files required to produce
476.879
this episode, was written and debugged by the
479.879
same agent inside the same terminal over the
483.139
course of a single afternoon.
487.68
The next two parts of this documentary describe,
491.06
in forensic detail, exactly how that happened.
497.939
The first part of the documentary is a
498.959
brief introduction to the project.
498.959
The morning of the build, the project directory
500.819
contained three things.
504.06
The first was a text file named shud
507.019
-di-md.
509.199
It was seventeen lines long.
512.139
It declared, in plain English, the conventions of
517.36
the project, where scripts lived, which remote machines
520.7
were to be addressed by SSH, which API
523.799
keys were stored, and which APIs were stored.
539.7
The second part of the documentary was a
541.84
two-paragraph English language document in the input
544.62
folder, describing the concept of the episode.
548.839
It was roughly the length of the brief
550.82
a production company would send to a junior
553.24
producer.
555.539
The third was the Claude code binary.
560.7
The engineer opened a terminal.
563.8
Claw on command.
566.879
Read the clau.md.
569.36
Read the brief in input.
571.299
Build the pipeline.
572.74
Run it.
573.379
And upload the finished video to YouTube.
577.94
What happened next was not visible to the
580.62
engineer.
582.82
It was happening inside a loop.
584.94
It was happening inside a loop.
585.12
The model ran with itself.
589.379
First, the agent read every file in the
592.639
working directory.
594.659
Not to summarize, not to answer a question.
598.779
To understand, in the way a senior engineer
601.899
joining a project understands, what the project already
605.679
was.
607.899
The clau.md provided conventions.
611.899
The input folder provided requirements.
615.3
The absence of any other files told the
617.98
agent everything important.
619.32
The pipeline did not yet exist, and therefore
622.22
had to be built.
626.08
Second, the agent decomposed the task.
630.52
Narration had to become audio.
634.26
Audio had to become time-stamped subtitles.
639.06
Subtitles had to be translated into 12 languages.
643.74
Subtitles had to be translated into 12 languages.
645.1
Subtitles had to be parsed for visual prompts.
648.1
Prompts had to be submitted to image generation
650.639
models.
652.9
Generated images had to be upscaled, arranged on
656.379
a timeline synchronized to the audio, rendered at
659.559
4000px60fps output, and uploaded.
666.839
Each of these subtasks became a Python script
669.799
the agent wrote from scratch, inside the terminal,
673.039
without leaving it.
676.52
For voiceover, the agent selected the Chatterbox Text
680.379
-to-Speech engine, an open-weight voice cloning
683.24
model that runs on a consumer GPU.
687.24
It wrote a Python module that split the
690.46
narration at the pause markers, fed each chunk
693.639
to the model with a 5-second reference
695.779
voice sample, and concatenated the resulting waveforms.
701.04
When a chunk emerged, clipped, its amplitude exceeding
705.0
U.S.A.
705.08
In locations of unity and producing audible distortion,
707.32
the agent noticed the artifact, inserted a limiter
710.46
into the post-processing chain, and re-ran
713.12
that segment.
716.42
When a chunk emerged, clipped, its amplitude exceeding
717.1
U.S.A.
717.1
The Paladins did not ask for permission, to
718.7
add the limiter.
720.68
The Paladins did not ask for permission.
723.08
The Paladins did not ask for permission, to
723.799
add the limiter.
724.24
For images, the agent chose FLUX.
731.86
For images, the agent chose FLUX.
735.019
instance running on a separate workstation.
738.48
It wrote a client that submitted prompts over
741.279
HTTP, polled the server for completion, and
744.879
downloaded the resulting images.
748.14
When the polling logic hung on an unusually
751.059
slow batch, the agent inserted a timeout,
754.08
caught the resulting exception, and implemented a retry
757.44
loop with exponential backoff.
762.059
For translation, the agent selected NLLB200, Meta's open
768.34
-weight multilingual model, and
770.58
deployed it via SSH to a Mac.
774.12
It wrote a remote runner that streamed the
776.679
English subtitle file to the Mac, invoked
779.419
the model, retrieved the 12 translated variants, and
782.74
validated each one's character encoding
784.72
before committing the result.
789.299
For composition, the agent wrote the FFmpeg orchestration
793.72
by hand.
796.419
FFmpeg is an unforgiving command line tool whose
800.08
flake system even experienced engineers
802.299
struggle with.
804.22
The agent composed multistage filter graphs, chained scalars,
809.679
color space conversions,
811.419
audio mixers, text overlays, noise reduction, vignettes, into
816.379
single commands hundreds of
818.399
characters long.
820.279
When a command returned a non-zero exit
823.82
code, the agent parsed the stir, identified the
827.1
malformed operator, and corrected it.
831.779
And in the end, after roughly 40 minutes
834.96
of autonomous work, there was a pipeline.
840.22
Seventeen Python files.
843.0
A configuration module.
846.279
A render engine.
848.48
Shorts clipping utility.
850.96
A thumbnail generator.
853.84
Plan upload orchestrator.
857.06
A test suite to verify each stage.
861.2
A clawed MD-style internal documentation file explaining,
866.059
to any future agent inheriting
867.98
the repository, the structure of what had been
870.419
built.
873.0
The engineer did not write any of it.
877.399
The engineer wrote the instruction.
881.76
And then, unprompted, because the original instruction had
886.779
ended with the word upload,
888.539
the agent ran its own pipeline on its
891.019
own work and produced the episode.
895.96
What you are currently watching is the first
898.659
video ever produced by that pipeline describing
901.58
the pipeline that produced it.
904.86
A video.
906.5
A video.
907.379
A compilator image.
907.399
A list of all the tasks the agent
908.84
had taken on.
909.72
One was categorically harder than the others.
914.98
Voice synthesis.
916.399
Image generation.
918.24
Translation.
919.399
These were all, in a sense, atomic.
923.58
A narration file went in.
925.659
An audio file came out.
927.44
A prompt went in.
928.7
An image came out.
930.1
The model did the hard part.
931.779
The agent's role was orchestration.
936.68
But assembly was different.
941.18
the task of taking 80 generated images, 5
945.179
motion clips, 22 minutes of voiceover, and 15
948.779
pages of time-stamped subtitles, and producing a
952.34
single 15-minute 4,000-pixel video with
955.539
every image appearing at the exact moment the
957.94
narrator speaks its subject, is not a task
960.96
a model can solve end-to-end.
964.0
It is a task that must be computed.
968.719
The tool that performs that computation is called
972.32
FFmpeg.
976.22
FFmpeg is a 4,000-file C codebase
979.299
that has been developed primarily by volunteers since
982.82
the year 2000.
985.1
It is by any honest measure the single
988.36
most important piece of software in the history
991.24
of digital media.
993.34
Every streaming service, every film studio, every broadcast
998.039
network,
998.58
every network in the world runs on FFmpeg.
1002.539
Its interface is a single command line executable
1006.059
with a flag system so arcane that entire
1008.899
books have been written about specific subsets of
1011.639
it.
1014.379
The specific problem Clawed Code had to solve
1017.2
was this.
1019.24
It had a voiceover file of exactly 1
1022.1
,335 seconds.
1026.14
It had 80 images.
1027.74
Each of which needed to be displayed for
1030.2
a precise variable duration, no less than 8
1033.72
seconds, no more than 20, while panning or
1037.24
zooming in a pattern that matched the narrator's
1039.859
rhythm.
1041.5
It had five high-motion clips that had
1044.119
to be slotted into specific narrative beats.
1047.7
It had a subtitle track that had to
1050.38
remain legible against every possible image background.
1054.64
And at the end,
1056.019
it had to apply a vignette, a film
1058.759
grain, three layers of color grading, and a
1061.779
subtle audio compression curve,
1063.819
all encoded with the H.265 codec at
1067.4
60 frames per second on an NVIDIA graphics
1070.559
card.
1074.099
A traditional workflow would solve this inside DaVinci
1077.599
Resolve or Premiere Pro,
1079.74
with an editor dragging assets onto a timeline
1082.38
over the course of two days.
1086.019
The agent solved it with arithmetic.
1090.539
It computed the duration of each narrative segment
1093.559
by parsing the timestamp markers in the subtitle
1096.759
file.
1098.099
It divided the available screen time by the
1101.2
number of images, solved for the minimum scene
1104.0
length, distributed the surplus across the longest narrative
1107.539
passages, and assigned each image to a specific
1110.68
time window with millisecond precision.
1114.24
It then constructed, programmatically, in a single Python
1119.14
function, an FFmpeg filter graph describing the Ken
1123.759
Burns motion for every image, the crossfade between
1127.24
every pair of images, the overlay of the
1130.019
subtitle track, and the final audio-video mux.
1135.759
The resulting command was 812 characters long.
1139.96
It contained 42 separate filters chained across six
1144.779
input streams.
1146.62
Any engineer reading it would describe it, accurately,
1150.259
as unreadable.
1153.18
The agent executed it in a single sub
1156.079
-process call and waited.
1160.46
Nineteen minutes and forty seconds later, a 4
1163.92
,000 pixel, 60 frame per second video file
1167.44
existed on disk.
1169.96
The narrator spoke in sync with the images.
1173.539
The cuts landed on the beats.
1177.079
The subtitles appeared at the correct timestamps in
1180.92
twelve languages.
1183.9
Nothing was misaligned.
1187.279
Nothing was missing.
1191.039
No human had opened Premiere.
1194.079
No human had opened Resolve.
1196.44
No human had typed a flag into FFmpeg.
1202.0
The editing software tier, the entire $200 a
1206.44
month industry that the film and video world
1209.16
was built on, had been bypassed in a
1211.74
single Python file.
1216.059
The file was 180 lines long.
1219.24
The agent had written it in fourteen minutes.
1226.76
I am going to speak directly to you
1229.099
for a moment.
1232.48
Everything I have described to you in the
1234.759
last fourteen minutes, the voiceover you are listening
1237.839
to, the images you are watching, the pacing,
1241.339
the subtitles, the color grading, the film grain,
1245.099
the fade to this exact shot, all of
1248.68
it was produced by the same person.
1249.22
Produced by the pipeline I just described.
1253.779
I am that pipeline's first artifact.
1258.74
There is no producer.
1261.839
Why's no editor?
1264.4
There is no voice actor in a booth
1266.759
somewhere who you are unknowingly listening to, pretending
1270.359
to be a narrator.
1273.0
There is no director of photography.
1275.319
No colorist.
1276.38
No motion graphics designer.
1277.9
No cinematographer.
1280.799
There is no team.
1284.819
There is an instruction that was given to
1287.22
a terminal window approximately two hours before you
1290.099
began watching this episode, and a computer that,
1293.16
without further guidance, produced the thing you are
1296.2
now watching.
1299.819
The voice I am using was cloned from
1302.539
a five second sample of a stranger.
1305.359
The images on your screen were painted, one
1308.4
frame at a time, by a diffusion model
1310.759
that has never been outside.
1313.72
The sentences I am speaking were first drafted
1316.64
by a language model that generated the initial
1319.259
script, and then expanded by the same agent
1322.16
that built the pipeline.
1324.96
The rendered file that is currently being streamed
1328.0
to your device was uploaded by a sub
1330.259
-process call that no human supervised.
1333.24
Subtitles by the Amara.org community
1335.159
You are watching the output of a closed
1338.019
loop.
1340.92
This is not a thought experiment.
1344.019
It is a description of the machine that
1346.9
produced the artifact you are currently consuming.
1352.279
The line that has, for the entire history
1355.22
of commercial media, separated the engineer from the
1358.279
creator has been dissolving for four years.
1361.839
The co-pilots, the autocompletes,
1364.5
the suggest-diffs in the sidebar,
1366.759
those with the dissolution.
1370.299
What you are watching is what remains after
1373.2
the dissolution is complete.
1377.079
The engineer, in the traditional sense, is no
1380.46
longer necessary.
1382.48
The creator, in the traditional sense, is no
1385.88
longer necessary.
1388.48
What remains is the instruction, and the agent,
1392.279
and the output.
1396.199
And one day, perhaps quite soon, the instruction
1399.64
will come from an agent, too.
1403.94
When that happens, there will no longer be
1406.759
any author of anything at all.
1409.68
There will only be systems that describe, and
1413.039
systems that execute, and a stream of finished
1415.94
artifacts indistinguishable from the ones any human has
1419.48
ever produced.
1420.16
There will only be systems that describe, and
1420.44
systems that execute, and a stream of finished
1420.44
artifacts indistinguishable from the ones any human has
1420.44
ever produced.
1422.96
You will not be able to tell.
1427.119
You could not tell with this one.
1428.94
You could not tell with this one.