Louis Brion

Using Rift Stats live at a tournament, and where it fell short

2026-05-30T00:00:00+00:00

Today was a tournament day, which meant less time building and more time actually using what I’ve built. That’s a useful kind of test.

Querying live data between rounds

I was able to use the natural language query interface throughout the day to pull insights from the live event data — things like which players had the easiest or hardest paths to their current record based on their opponents’ standings. That kind of context isn’t something you’d normally have access to mid-tournament without doing a lot of manual lookup, and having it queryable in plain language made it genuinely fast to get to.

It did influence real decisions. At one point I used it to think through whether to stay in the tournament or drop, weighing my remaining path against the field. It didn’t end up mattering — a last-second disqualification retroactively changed a competitor’s record and shifted things anyway — but the data was useful in the moment and I liked having it. That’s a meaningful validation: the tool changed what I was thinking about, not just what I was curious about.

Hitting session limits and an arithmetic error

The session limits hit quickly today, which makes sense — natural language data queries are token-heavy, especially when the model is doing a lot of interpretation and context-holding between questions. I burned through more than I expected just in the gaps between rounds.

More concerning was a calculation error I caught. Matches are best-of-three, so results get reported as game scores like 2-1 for a win. The model was treating that as a literal score addition — adding 2-1 to a running total — rather than counting it as a single match win. It’s the kind of domain-specific encoding that’s obvious once you know it and easy to get wrong if you don’t. The underlying data is fine; it was the interpretation layer that slipped.

The fix is probably not a better model — it’s better prompting or a structured query layer that handles the arithmetic before the model ever sees it. For the natural language interface to be reliable, I need to be more deliberate about what I’m asking the model to reason about versus what should just be computed directly.

Rethinking model selection for different query types

The token burn today is pushing me to think more carefully about when to use Opus versus something lighter. For exploratory or interpretive queries, the quality difference probably justifies the cost. For simpler lookups — give me the current standings, how many rounds are left, what’s this player’s record — a less capable model or a lower effort setting would likely do just as well and cost a fraction. I want to experiment with that tomorrow and see if the quality actually holds for the simpler cases.

I’ve also been reading about older models getting quietly degraded after new ones release, which is something I want to keep an eye on as I build more reliance on specific model behavior. Whether that’s real or perception drift is hard to tell from the outside, but it’s worth being aware of as a variable.

Rift Stats becomes a real tool: live event querying and what that means

2026-05-29T00:00:00+00:00

A productive couple of days, and ones where the project shifted in a direction I didn’t fully anticipate going in.

Terminal over desktop, and why it matters

I’ve settled on Claude Code in the terminal as my preferred way to work with LLM coding tools. The desktop app gives you high-level summaries — lines changed, files touched — but not much else. In the terminal I can run a split layout: Claude on one side, the editor and a local instance on the other, with approvals and output visible at the same time. That kind of context is hard to replicate in a GUI, and it makes a real difference in how much I actually understand what’s happening versus just watching it happen.

I also swapped to Opus today from Sonnet. I haven’t been hitting my usage caps, so the tradeoff toward quality over speed made sense to try. It felt noticeably different — more thorough in the discovery and architecture work especially.

Data ingestion, mishaps, and what the data revealed

A lot of the past two days went into the data ingestion pipeline for Rift Stats. One of the more useful things Claude did here was help explore the endpoints themselves — inferring what URLs and parameters were available based on a small sample of responses, then building out scripts to canvas the API more broadly. That kind of exploratory reverse-engineering is tedious to do by hand and Claude handled it well.

It did make one meaningful mistake: it scoped the event filter too narrowly based on the initial subset of data I’d given it. It was filtering for registration open and registration closed events, and missing completed ones entirely — which is exactly the data I actually wanted. The initial test scrape looked plausible but was missing most of the relevant match history. Once I caught it and corrected the filter, the ingestion ran properly.

There was also a fair amount of malformed data in the raw responses, which Claude helped identify and work around. Beyond fixing errors, it also acted as a kind of informal DevOps agent — reporting on network failures during ingestion runs and helping figure out recovery paths when requests failed partway through. That was useful in a way I hadn’t anticipated going in.

One side effect of pulling the full event history is that I ended up with a lot of data I didn’t set out to analyze — things like event volume over time, day-of-week patterns, how activity correlates with release cycles and prerelease events. None of it was the goal, but it painted an interesting picture of how the game has grown and where the player base concentrates.

There are rate limit constraints I’m still working through, and my plan is to run continued ingestion over the weekend through remote sessions while I’m away from the computer.

From coding tool to query interface

The more interesting thing that happened today is that the Claude Code session I’ve been using to build Rift Stats quietly became something else: a natural language interface into the data itself. Once the ingestion was far enough along, I started asking questions in plain language about the player base, match patterns, archetype distributions — and getting useful answers back. The data has real limits (matches are tracked at the legend level, not the deck level, so there’s no card-level signal), but there’s more in the archetype and player pattern data than I initially expected.

What I keep thinking about is that this kind of capability probably puts real pressure on tools like Power BI or Tableau. The barrier to querying a large dataset with natural language is much lower than building a dashboard, and the flexibility is higher. The catch is that you still need to know what questions to ask and whether the data you have can actually answer them. The LLM can’t read your mind, and it can’t tell you what insights matter — that’s still entirely on the operator. But as a tool for getting from a question to an answer faster, it’s genuinely useful.

Live event tracking, and what it could mean competitively

The part I didn’t plan for: I’m at a live Riftbound event today, and I realized I could use this in real time. The data structure I’m pulling from tags events as completed, in progress, registration open, and so on — and in-progress events update with live data as matches are reported. That means I can query the current field composition, see what legends are being played, track specific players across rounds, and understand what matchups I’m likely to face going into later rounds.

In a two-thousand-person open field that’s probably not decisive. But going into a top sixty-four where you know the field composition, you can make informed decisions about what matchups you actually need to prepare for mentally — and which ones you can stop thinking about. I’d imagine a lot of serious players are already doing some version of this manually. Having it queryable in natural language just removes the friction.

Whether I publish this as a feature for other players is something I’m thinking about. There’s a question of whether it’s the kind of edge that should be broadly available or whether surfacing it changes the dynamic in ways that aren’t entirely good. For now it’s a proof of concept, and a useful one.

Efficiency work ahead

The system works locally and the data is there, but there’s a meaningful amount of optimization still to do before this is something I’d want to host properly. Caching, request limiting, and pagination are the obvious levers. Beyond that, there’s derivative data I can generate in batch rather than computing on demand — ELO ratings being the main example. Recalculating a player’s ELO every time their profile is rendered doesn’t make sense; a daily batch update would be far cheaper and fast enough for this use case. I want to understand what the actual hosting costs look like before optimizing prematurely, but it’s worth designing for it now rather than retrofitting later.

Codex frustrations and the tool invocation problem

2026-05-27T00:00:00+00:00

Today was mostly friction, which is useful in its own way.

Trying Codex for PR review

The plan was to use ChatGPT’s Codex to get a second set of eyes on a pull request — the cross-agent review idea I’d been thinking about since the manual ferrying session earlier this week. In practice it was a nightmare to get working. I had to authenticate to GitHub, configure which projects I wanted it to work with, and then when I actually tried to invoke something against GitHub, it tried to use the GitHub CLI instead of the direct connection I’d already configured. It didn’t work, and the setup overhead made the whole thing feel like it cost more than it was worth.

Agents and tool confusion

That experience connected to something I’ve been noticing more broadly across all the agents I’m using: they get confused about how to invoke tools. Whether it’s something that needs to run locally, a script, or an MCP connection, the choice between them often feels arbitrary. Claude ran into the same thing when I tried to use a plugin from the marketplace — it spent five attempts trying to invoke it through Python in both bash and PowerShell before eventually finding the path where it was actually installed. Codex did the equivalent with the GitHub CLI.

I don’t know yet whether this is a fundamental limitation of how these agents reason about their environment, a configuration problem on my end, or something in between. But it’s a consistent enough pattern across different tools that I want to understand it better rather than just working around it each time.

What I was actually building

While the tooling was fighting me, the actual work on the game analytics project went surprisingly well. It seems like I can take an entire analytics feature set and get it to a reasonable state in close to one shot with Claude. That’s impressive and also a little unsettling — the barrier to replicating something like this keeps dropping, which makes it harder to feel like any particular implementation has much differentiation. More is not always better, but the speed is real.

What’s next

I want to find a cleaner workflow for the cross-agent review idea before abandoning it. The value of having one model critique another’s work is still there in theory — I just need a setup where the overhead doesn’t eat the benefit. That’s what I’m going to try to work out tomorrow.

Catching up: match history architecture and finding easy tournaments

2026-05-26T00:00:00+00:00

A few days lighter than usual — some illness and weekend commitments — but not completely idle. I was able to push a few features forward on projects over the weekend without much active intervention, which is its own kind of useful data point about how far along these projects are.

Reaching parity with an existing Riftbound fan site

Today’s main focus was doing a proper feature analysis against another Riftbound fan site to understand what parity actually looks like. It’s a useful exercise because it forces the question of what data you actually need versus what you’re currently pulling, and the answer here is: more than I have.

To replicate the feature set I’m looking at, I’ll need the full history of match data stored locally. The alternative — making a large number of live requests — isn’t viable for a hobbyist project where I’m watching costs. That makes this the first time I’ve had to think seriously about data architecture for this project: how to store it, how to keep it updated, and how to do both cheaply. I don’t have those answers yet, but framing it as an architecture problem rather than a scraping problem feels like the right starting point.

Finding easy tournaments

One feature I did ship today: a way to identify “easy” Riftbound competitions by looking at a store’s history of prior events and predicting how competitive the next one is likely to be based on past attendance. Low attendance historically tends to mean a smaller, less competitive field.

It’s low signal by design — stores can only run these tournaments once per quarter, so there isn’t much data per location to work with. But some signal is better than none, and it’s a feature that actually changes how you’d decide which events to travel to. The longer-term version of this couples tournament difficulty with individual ELO ratings calculated from full match history, which would give a much sharper picture. That’s further out and depends on getting the match data architecture right first.

Still frustrated with Codex

I haven’t gone back to Codex since the rough session earlier in the week. The tool invocation problems made it more hindrance than help, and I haven’t had a good reason to fight through the setup again. I’m planning to try building some real flows with it tomorrow and see if a more structured approach changes the experience.

Building Gym OS: real architecture, real DevOps, real approval fatigue

2026-05-22T00:00:00+00:00

Today was the first day that felt less like tinkering and more like building something with actual scope. I’ve been calling it Gym OS — software for managing a gym, a domain I know well enough to have real opinions about. I had a plan drafted from the night before and spent the day executing on it.

A more complex Railway deployment

The Railway setup today was meaningfully more involved than anything I’d done before — two separate frontends, one for staff and one for members, both connected to a shared PostgreSQL database. Getting that architecture deployed meant dealing with DevOps questions that go beyond what Claude Code handles in the terminal: environment configuration, service relationships, how the pieces talk to each other across Railway’s infrastructure. It’s the kind of work that doesn’t show up cleanly in a chat session but takes real time to get right.

Approval fatigue on mobile

The mobile workflow hit a wall today, and I think I understand why. The remote Claude Code session generated a lot of chained outputs — test runs in particular — and each one required an approval. In practice that meant I was spending most of my time tapping through confirmations rather than actually reviewing anything meaningful. It felt like clicking next on an incremental game.

The frustrating part is that a lot of those approvals were for test runs that could have just written output to a file for me to review later, rather than piping it somewhere that triggers another permission prompt. I got through it, but it’s a workflow problem I want to solve before the next session rather than just tolerate.

Tests I haven’t verified yet

The agent wrote a set of tests and health checks as part of the build, which I need to actually read. For all I know some of them are just returning true regardless of the state of the service — that’s the kind of thing that looks like coverage and isn’t. Verifying those by hand is on the list, and it’s a good example of why I can’t just treat agent output as correct because it ran without errors. The code compiles and the tests pass, but that’s not the same as the tests being meaningful.

Watching software get built fast

Coming from a background of writing code by hand, the pace is genuinely strange to witness. The architecture I put together today — separate frontends, a real database, health checks, unit tests — would have taken days to weeks depending on how carefully it was being built. Watching an agent scaffold that in a session, even accounting for the approval friction and the verification work still ahead, is a different experience than I expected.

It doesn’t make the verification less important. If anything it makes it more important, because the volume of generated code outpaces what I can carefully read in the same session. But it does change what’s possible as a solo developer, and I keep coming back to the same thought: this is a tool worth learning well, because the gap between someone who uses it deliberately and someone who doesn’t is only going to grow.

Cross-agent collaboration and unsupervised loops

2026-05-21T00:00:00+00:00

Today was mostly experimental — less about shipping something and more about understanding what different workflows feel like in practice.

Applying a UI skill to an older project

I ran the ui-ux-pro-max skill against an older project that had accumulated a lot of UI elements. A single pass did more than I expected — things are noticeably more accessible and the layout is cleaner. It’s not perfect, but for a one-shot it held up well. Good to know that skill has some retroactive value on existing projects, not just greenfield ones.

Manual cross-agent collaboration

The bulk of the day went into something I’ve been curious about: using multiple agents together on the same problem. I was working on a spec and routed it through Claude, Cursor, and ChatGPT in sequence — each one reviewing and building on the output of the last, with me manually ferrying the results between them. The idea is that each model has different tendencies and blind spots, so having one critique the work of another might surface things a single model would miss or confidently gloss over.

It’s a reasonable theory and the output was interesting, but doing it manually is slow. I’ve seen implementations where this kind of loop is automated, and I understand why — the overhead of being the one shuttling context between agents adds up quickly and starts to feel like it’s defeating the purpose. The next question is what automated orchestration actually looks like in practice, and whether the token cost is manageable.

Token spend, or the lack of it

I expected usage to be a constraint by now, but I’m nowhere near it. I’m not scratching the limits on any of my subscriptions across multiple providers. That’s surprising, and it reframes the question a bit — I’ve been thinking about token spend as a ceiling to stay under, but in practice the bottleneck seems to be elsewhere, probably in how much I can actually review and direct in a session rather than what the models can process.

That said, I’m still cautious about automated loops. An unsupervised agent running without checkpoints can spend tokens in ways that are hard to audit after the fact, and more importantly it can make decisions I haven’t reviewed. Before I set up anything that runs without me in the loop, I want to understand what guardrails exist and where the failure modes are. That’s the research I want to do next.

Remote Claude sessions, Railway hiccups, and a new project spec

2026-05-20T00:00:00+00:00

Today had a mix of progress and friction, which is probably the most realistic description of how most of these sessions go.

Claude Code remote sessions

The thing I’m most happy about today: I got Claude Code remote sessions working. I started Claude Code in the terminal on my PC and connected it to the mobile app, and from there I was able to invoke skills that I couldn’t access on mobile previously. That closes a gap I’d been bumping into — it means anything I can configure on my PC, including marketplace plugins and custom skills, is now available when I’m working from my phone. That’s a meaningful upgrade to the mobile workflow.

I used it to kick off a UI redesign on the PWA. The terminal output says the project still loads, but since I wasn’t at my computer I couldn’t verify it visually — which brings me to the other thing that happened today.

Railway pains

Railway is having some availability issues and has paused builds and deployments on the free plan. I checked their status page and they’re running at around 99.6-99.7% uptime across services, which sounds good until you’re the one waiting on a build. I’m not sure yet whether my currently deployed version is still serving fine or just cached on my end, but either way the new version isn’t getting out.

It’s a temporary roadblock and not a crisis — the code lives in GitHub so nothing is lost — but it’s a reminder that the deployment platform is a dependency I don’t control. I want to look at alternatives or at least understand what instant or near-instant public deployment options exist, both as a backup and because the feedback loop matters when you’re iterating on UI from a phone. This seems like ominous foreshadowing of the state of software as a whole, as anecdotally many people are saying services across the board are just lowering in quality.

Small wins on the blog

In between the bigger things I did a bit of exploration on what Jekyll can do with GitHub Pages and added an automated sitemap. It’s a small thing, and I’m not actively trying to get this site discovered right now, but it’s the kind of half-percent optimization that’s easy to miss if you don’t know to look for it. That’s something I’m finding genuinely useful about having a language model available for this kind of work — where I used to go to Stack Overflow for a specific technical answer, I can now have a broader conversation about options and tradeoffs, and surface improvements I wouldn’t have thought to ask about. The sitemap is a good example: I didn’t set out to add one, but it came up naturally while exploring what was possible.

A new project spec

I also started scoping a new project today, though I’m conscious of the risk of accumulating too many half-finished things. This one feels different — it’s in an area where I have real subject matter expertise, and there’s a chance it produces something with actual practical value rather than just being an exercise.

I had Claude generate a spec from requirements I laid out, and it’s a reasonable starting point. Not everything in it reflects how I’d design it, and I’m not sure yet whether iterating over the spec with the model will sharpen it or just produce confident-sounding variations of the same decisions. I might need to mark it up manually and use that as the input for a second pass. Worth figuring out what that revision loop looks like before I start building from it.

AI slop UI, Claude skills, and finding the right feedback loop

2026-05-19T00:00:00+00:00

A short but productive day. I made a few small bug fixes on the PWA through the mobile workflow, and then switched focus to the frontend quality problem I’d been thinking about since yesterday.

Claude Code in the terminal

One small change worth noting: I installed Claude Code directly in the terminal today instead of using the Windows Claude application I’d been running on desktop. No dramatic difference yet, but it feels like a more natural fit for the kind of work I’m doing and I expect it’ll matter more as the projects get more complex.

Experimenting with frontend skills

The UI problem I identified yesterday — that generated frontends have a recognizable, assembled-not-designed quality — sent me toward Claude’s frontend skills today. The results were interesting. Using a skill oriented toward frontend UI does noticeably shift the output away from the worst of the AI slop aesthetic, but not entirely out of it. There’s a more refined version of the same problem: the output is better, but it still has a discernible pattern to it once you know what you’re looking at. It’s a different flavor of the same tell.

That’s not a reason to abandon the approach, but it does tell me that skills alone aren’t going to solve this. I’m going to look at whether there are skills more specifically tailored to PWAs, since the generic frontend output might just not be the right starting point for what I’m building.

Desktop vs. mobile development

I’m also recalibrating my thinking on the mobile-only workflow. There’s genuine value in being able to iterate from my phone — I proved that over the last couple of days — but the feedback loop is slower than working on a machine. For the kind of tight iteration that UI work requires, desktop is faster. I think the right model is desktop as the primary environment with mobile as a real option when I’m away from a desk, rather than trying to make mobile the default.

What’s next

Two things I want to dig into: finding a skill or scaffolding approach that produces better PWA output specifically, and understanding what testing and verification automation looks like when agents are doing most of the building. That second one feels important — if I’m not writing the code myself, I need some other mechanism to catch problems before they land, and I haven’t thought carefully enough about what that looks like yet.

Shipping from my phone with Railway and PWAs

2026-05-18T00:00:00+00:00

Today started with a question I’ve been putting off: what does DevOps actually look like for a solo developer with limited resources? I wanted to understand the reasoning behind the decisions Claude made in the initial Riftbound stats frontend, think through continuous deployment, and figure out how to make development and testing work from a phone. By the end of the day I had a working answer to at least part of that.

Getting a mobile-first workflow off the ground

The setup I landed on is a progressive web app deployed through Railway, which is free tier for now. The workflow is simple: Railway watches a GitHub repository and rebuilds whenever there’s new activity. Push a commit, get a deployment. The app surfaces through a URL I can hit from my phone, which means the whole loop — write code, push, test — can happen without sitting at a desk.

I got a “hello world” version of a personal app I wanted to build running today, which is deliberately minimal. There aren’t many moving parts yet, and that’s fine. What matters is that the pipeline works end to end from my phone, and it does. An added bonus was that I was able to use it in a real world scenario, and it functioned as I wanted it to, but definitely needs a lot of visual work.

Iterating in the wild

Once the basic workflow was up, I took it out with me and did a few more iterations while out and about. It mostly worked, though I hit a caching issue where new deployments weren’t loading — browsers holding onto old versions. Once I understood what was happening it was straightforward to fix, but it was a good reminder that deployment isn’t just about getting the build to succeed.

The deployment cycle has a rhythm to it that reminds me of firmware engineering, where you’d kick off a compilation and data transfer and then wait. There’s a similar cadence here: push, wait for Railway to build, check the result. It’s not instant, but it creates natural pauses, and I can see how that rhythm might actually work in your favor when running multiple projects — you kick something off and switch to something else while it builds.

What I want to improve next

The generated frontend works but it has rough edges, and a lot of them are the kind that are hard to pin down but immediately recognizable. There’s a distinct aesthetic to LLM-generated web apps — just like there are tells in LLM-generated copywriting — and I want to understand it well enough to work against it deliberately. The code is functional but the UI doesn’t feel designed, it feels assembled.

I want to spend time on this: looking at what’s actually being generated, understanding where the scaffolding decisions are coming from, and figuring out whether the right fix is better prompting, different tooling, or just more hands-on iteration on the output. Getting something that feels like a real product rather than a proof of concept is the goal, and that’s going to require more intentionality about the frontend layer than I’ve applied so far.

Data pipelines, a committed .env, and a new writing tool

2026-05-17T00:00:00+00:00

I took yesterday off sick, so today was about getting back into it. Most of the session went into the Riftbound stats project — a card game data analysis side project I’m running in parallel with the blog — but I also set up something new for the writing workflow itself.

The Riftbound project takes shape

The goal for today was getting real data flowing. That meant package installs and environment setup outside of Claude, which I’ve done before in other contexts but am figuring out the right rhythm for here. My hope is it’s mostly a one-time cost per project rather than something I’m wrestling with every session.

Once the environment was sorted, I hit the classic mistake: a .env file committed to the repo. I caught it, removed it from git history, and moved on — a good reminder that the basics still apply even when you’re working fast with an agent helping you.

From there things went well. Claude helped me identify public endpoints on the UVS website to pull the data I needed, which was the kind of research help that would have taken me much longer to do manually. I set up PostgreSQL locally and worked through a schema that fit the shape of the data, then ingested results from the first six Riftbound regional qualifiers. By the end of the session I had working visualizations across all of them: leaderboards, legend composition breakdowns, winrate matrices, and player tracking across events. It’s a good chunk of data to work with.

A tool for writing about the work

The other thing I set up today is what’s producing this post. I started a dedicated Claude session that acts as a running journal — I drop notes in throughout the day, and at the end it synthesizes them into a blog post. The idea is to close the loop between building and writing without it feeling like a separate task.

Part of why I want to keep writing these posts, even if nobody reads them, is that I’m conscious of a pattern I keep hearing about: people leaning on AI so heavily that they can’t understand or debug their own code without it. The writing is meant to be a counterweight to that — a way to force myself to actually understand what I’m doing, not just watch an agent do it. Whether it works that way in practice is something I’ll find out over time.

What’s next

The Riftbound project is in a good state to expand — I want to pull in data from more events and then start on more interesting analysis. I’m also curious to try Claude’s design tools for the visualization layer.

On the workflow side, I want to see if I can extend this writing tool to do something more useful at the end of each session: look at what I’ve been working on and suggest what I should learn about or explore next, rather than just summarizing what happened.