#llms

9 posts

Feb. 28, 2026

Using AI agents for pixel art animations

Claude Code has a mascot named Claw'd which features in all the feature announcement posts. I find these animations extremely cute and wondered if it is possible to create such animations for myself.

I showcased before how coding agents can actually be used as general agents to do things outside of coding - like infographic video generations. I also recently subscribed to Google AI Pro plan and have been using the Gemini CLI with Gemini 3.1 Pro to do some frontend design changes for FanMeter.

So, I thought why not test both these agents to see which one can generate the better animation. I launched both of these agents in their respective YOLO modes (--dangerously-skip-permissions for Claude Code and --yolo for Gemini CLI) and gave this prompt:

Create a pixel art animation GIF of a 26 year old guy who spends his week teaching AWS classes (offline), working fulltime (WFH) as a DevOps engineer and is also addicted to AI software development (Claudoholic). High FPS, high definition. Not less than 10 seconds.

Gemini was the first one to generate it and it gave me this. I'd give it 4/10 at best:

Gemini 3.1 Pro with Gemini CLI

Claude took a while and gave me the much better output. An impressive 7/10:

Opus 4.6 with Claude Code

Interestingly, because I use Claude as my daily driver, it knew about how I got the idea of Fan Meter and added the part where I wake up in the middle of the night to build something

[... 72 words]

/ 2 min read / ai, llms, claude, claude-code, gemini, gemini-cli, pixel-art /

Feb. 26, 2026

Feb. 21, 2026

[…] But I do love the concept [of OpenClaw] and I think that just like LLM agents were a new layer on top of LLMs, Claws are now a new layer on top of LLM agents, taking the orchestration, scheduling, context, tool calls and a kind of persistence to a next level.

Feb. 20, 2026

Gemini 3.1 Pro Preview scored highest in the Artificial Analysis Intelligence Index but its most significant advantage might be its price and token efficiency. Our evaluations cost <50% to run on Gemini 3.1 Pro Preview compared to Claude Opus 4.6 (max) and GPT-5.2 (xhigh)

Feb. 19, 2026

/ ai, llms /

Feb. 13, 2026

Is it me or is the rate of model release is accelerating to an absurd degree? Today we have Gemini 3 Deep Think and GPT 5.3 Codex Spark. Yesterday we had GLM5 and MiniMax M2.5. Five days before that we had Opus 4.6 and GPT 5.3. Then maybe two weeks I think before that we had Kimi K2.5.

logicprog
/ ai, llms /

GPT-5.3-Codex-Spark and AI coding addiction

OpenAI announced the release of their new coding model GPT-5.3-Codex-Spark today, only a week after the release of GPT-5.3-Codex. They say that it has been designed for real-time coding capable of serving more than 1,000 tokens per second. Real-time coding here means to see the results of your requested changes immediately by getting near-instant responses. It runs on Cerebras for high-speed inference.

When I read 'ultra-fast model', I first thought of Fast mode for Opus 4.6 in Claude Code. But the primary difference is that Fast mode is the same model with different API configuration that prioritizes speed over cost. Codex-Spark is a different model with a drop in quality and capabilities.

Also interesting to note that the reduced latency is not just due to the improved model speed, but also because of improvements made to the harness itself:

"As we trained Codex-Spark, it became apparent that model speed was just part of the equation for real-time collaboration—we also needed to reduce latency across the full request-response pipeline. We implemented end-to-end latency improvements in our harness that will benefit all models [...] Through the introduction of a persistent WebSocket connection and targeted optimizations inside of Responses API, we reduced overhead per client/server roundtrip by 80%, per-token overhead by 30%, and time-to-first-token by 50%. The WebSocket path is enabled for Codex-Spark by default and will become the default for all models soon."

I wonder if all other harnesses (Claude Code, OpenCode, Cursor etc.,) can make similar improvements to reduce latency. I've been vibe coding (or doing agentic engineering) with Claude Code a lot for the last few days and I've had some tasks take as long as 30 minutes.

[... 178 words]

/ 2 min read / ai, openai, llms, codex-spark, cerebras /

Feb. 11, 2026

If you are in any situation where being right matters, you would, at this point, be making a mistake to not ask a frontier LLM for help.

That can mean checking your own work, second opinions on other experts, or getting help with a complex problem. Have judgement, but use them

May 22, 2025

Thinking is a commodity

I am undecided on how I feel about LLMs (especially reasoning models). I have always been careful about my thoughts and decision making. I like to do things most people label as "boring" work, like DYOR (Doing Your Own Research) and RTFM (Reading the Fucking Manual).

My personal experience has been that doing the "boring" work is essential to think clearly. It is what solidifies the concepts & strengthens the fundamentals. Good decision making requires clear thoughts & strong fundamentals.

But given that now LLMs have done the boring work (pretraining) and can also do reasoning, anyone using LLMs is no longer thinking. And because everyone is using LLMs, everyone is basically thinking the same. The lack of diversity in thinking bothers me a lot.

When I look at a PR (pull request) with full of AI generated code, I don't know how to feel about it. Is it frustrating that the PR author has not done the thinking or does it really matter if the code works?

LLM thinking comes at a price, and it can think deeply if you pay more. If you do the "boring" work yourself, you fall behind. Does money matter now more than ever? Food for thought (no pun intended)

/ 1 min read / llms, ai, thinking /