#llms

9 posts

Feb. 28, 2026

Using AI agents for pixel art animations

Claude Code has a mascot named Claw'd which features in all the feature announcement posts. I find these animations extremely cute and wondered if it is possible to create such animations for myself.

I showcased before how coding agents can actually be used as general agents to do things outside of coding - like infographic video generations. I also recently subscribed to Google AI Pro plan and have been using the Gemini CLI with Gemini 3.1 Pro to do some frontend design changes for FanMeter.

So, I thought why not test both these agents to see which one can generate the better animation. I launched both of these agents in their respective YOLO modes (--dangerously-skip-permissions for Claude Code and --yolo for Gemini CLI) and gave this prompt:

Create a pixel art animation GIF of a 26 year old guy who spends his week teaching AWS classes (offline), working fulltime (WFH) as a DevOps engineer and is also addicted to AI software development (Claudoholic). High FPS, high definition. Not less than 10 seconds.

Gemini was the first one to generate it and it gave me this. I'd give it 4/10 at best:

Claude took a while and gave me the much better output. An impressive 7/10:

Interestingly, because I use Claude as my daily driver, it knew about how I got the idea of Fan Meter and added the part where I wake up in the middle of the night to build something

[... 72 words]

# 3:50 PM / 2 min read / /

Feb. 26, 2026

Programming is becoming unrecognizable. Karpathy draws attention again to the famous November-December 2025 inflection point.

It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December.

He was able to get his agent (most likely his claw) to build a local video analysis dashboard in ~30 minutes

I didn’t touch anything. All of this could easily have been a weekend project just 3 months ago but today it’s something you kick off and forget about for 30 minutes. [...] As a result, programming is becoming unrecognizable.

He says that the "biggest prize" is being able to set up Claws well enough that it can handle multiple Code instances for you

The biggest prize is in figuring out how you can keep ascending the layers of abstraction to set up long-running orchestrator Claws with all of the right tools, memory and instructions that productively manage multiple parallel Code instances for you. The leverage achievable via top tier "agentic engineering" feels very high right now.

I am currently still operating at the LLM agent layer. I did setup Jacho but given that his brain is powered by GLM 4.7, he isn't that smart of a claw. I'm considering birthing a new claw and plugging its brain to Gemini soon. I'll be more meticulous this time with its setup so that it can "productively manage multiple parallel Code instances" for me.

He points out that while its not perfect yet, the key is figuring out which parts to delegate to AI to be fully autonomous and which ones to be more involved in.

The key is to build intuition to decompose the task just right to hand off the parts that work and help out around the edges. But imo, this is nowhere near "business as usual" time in software.

at the top tiers, deep technical expertise may be even more of a multiplier than before because of the added leverage.

# 2:00 AM / /

Feb. 21, 2026

[…] But I do love the concept [of OpenClaw] and I think that just like LLM agents were a new layer on top of LLMs, Claws are now a new layer on top of LLM agents, taking the orchestration, scheduling, context, tool calls and a kind of persistence to a next level.

— Andrej Karpathy

# 5:30 AM / /

Feb. 20, 2026

Gemini 3.1 Pro Preview scored highest in the Artificial Analysis Intelligence Index but its most significant advantage might be its price and token efficiency. Our evaluations cost <50% to run on Gemini 3.1 Pro Preview compared to Claude Opus 4.6 (max) and GPT-5.2 (xhigh)

— Artificial Analysis

# 10:00 AM / /

Feb. 19, 2026

The Only Moat Left Is Money. Found this interesting post on Hacker News and immediately thought of my blog post from May 2025: Thinking is a commodity. At the end of that post, I hint at writing another post answering the question if money matters now more than ever:

LLM thinking comes at a price, and it can think deeply if you pay more. If you do the “boring” work yourself, you fall behind. Does money matter now more than ever? Food for thought (no pun intended)

I think this post by Elliot captures my feelings very well. Here are some quoteworthy snippets from it:

The effort is gone. Effort was the filter.

When creation was hard, skill was the differentiator: you had to actually be good to make something worth showing. Now the barrier is near zero, so you need reach. Reach costs money or it costs years. Probably both.

He thinks that there's a real chance that those without existing reach are locked out

I don't know if we've already crossed a singularity on this, a point past which new entrants without existing reach or capital to buy it are effectively locked out. I think there's a real chance we have. The uncomfortable version: if you're not already moving, you might never take off.

# 1:45 AM / /

Feb. 13, 2026

Is it me or is the rate of model release is accelerating to an absurd degree? Today we have Gemini 3 Deep Think and GPT 5.3 Codex Spark. Yesterday we had GLM5 and MiniMax M2.5. Five days before that we had Opus 4.6 and GPT 5.3. Then maybe two weeks I think before that we had Kimi K2.5.

— logicprog

# 2:30 AM / /

GPT-5.3-Codex-Spark and AI coding addiction

OpenAI announced the release of their new coding model GPT-5.3-Codex-Spark today, only a week after the release of GPT-5.3-Codex. They say that it has been designed for real-time coding capable of serving more than 1,000 tokens per second. Real-time coding here means to see the results of your requested changes immediately by getting near-instant responses. It runs on Cerebras for high-speed inference.

When I read 'ultra-fast model', I first thought of Fast mode for Opus 4.6 in Claude Code. But the primary difference is that Fast mode is the same model with different API configuration that prioritizes speed over cost. Codex-Spark is a different model with a drop in quality and capabilities.

Also interesting to note that the reduced latency is not just due to the improved model speed, but also because of improvements made to the harness itself:

"As we trained Codex-Spark, it became apparent that model speed was just part of the equation for real-time collaboration—we also needed to reduce latency across the full request-response pipeline. We implemented end-to-end latency improvements in our harness that will benefit all models [...] Through the introduction of a persistent WebSocket connection and targeted optimizations inside of Responses API, we reduced overhead per client/server roundtrip by 80%, per-token overhead by 30%, and time-to-first-token by 50%. The WebSocket path is enabled for Codex-Spark by default and will become the default for all models soon."

I wonder if all other harnesses (Claude Code, OpenCode, Cursor etc.,) can make similar improvements to reduce latency. I've been vibe coding (or doing agentic engineering) with Claude Code a lot for the last few days and I've had some tasks take as long as 30 minutes.

[... 178 words]

# 2:10 AM / 2 min read / /

Feb. 11, 2026

If you are in any situation where being right matters, you would, at this point, be making a mistake to not ask a frontier LLM for help.

That can mean checking your own work, second opinions on other experts, or getting help with a complex problem. Have judgement, but use them

— Ethan Mollick

# 9:35 AM / /

May 22, 2025

Thinking is a commodity

I am undecided on how I feel about LLMs (especially reasoning models). I have always been careful about my thoughts and decision making. I like to do things most people label as "boring" work, like DYOR (Doing Your Own Research) and RTFM (Reading the Fucking Manual).

My personal experience has been that doing the "boring" work is essential to think clearly. It is what solidifies the concepts & strengthens the fundamentals. Good decision making requires clear thoughts & strong fundamentals.

But given that now LLMs have done the boring work (pretraining) and can also do reasoning, anyone using LLMs is no longer thinking. And because everyone is using LLMs, everyone is basically thinking the same. The lack of diversity in thinking bothers me a lot.

When I look at a PR (pull request) with full of AI generated code, I don't know how to feel about it. Is it frustrating that the PR author has not done the thinking or does it really matter if the code works?

LLM thinking comes at a price, and it can think deeply if you pay more. If you do the "boring" work yourself, you fall behind. Does money matter now more than ever? Food for thought (no pun intended)

# 5:30 AM / 1 min read / /