Redirecting to https://www.makingdatamistakes.com/making-tea-while-ai-codes-a-practical-guide-to-2024s-development-revolution/ - click if you are not redirected.


Note: You will be redirected to the original article. A local copy is included below for convenience.

N.B. This was written with Claude Sonnet 3.5 in mind - for more modern models, see:

[A field guide to AI-first development

Detailed techniques and prompts for AI-first coding, for experienced developers to build medium-sized production-ready codebases, providing lots of architecture-level product guidance but without writing a line of code by hand.

Making Data MistakesGreg Detre

](https://www.makingdatamistakes.com/ai-first-development/)

It was afternoon on New Year's Eve 2024, and I had 15 minutes before we needed to leave for the party. The dishes weren't done, and a stubborn DevOps issue was blocking my hobby project - the web server couldn't talk to the database in production. Classic.

I had an inspiration for how to tackle it, but there was no time to try it! So I gave Cursor (an AI coding assistant) clear instructions and toddled off to do the washing up. "Wouldn't it be poetic," I thought, "if this became my story of the moment AI truly became agentic?"

Five minutes later, I returned to find... my laptop had fallen asleep.

I gave it a prod, went back to the dishes, and then checked again just before we walked out the door. This time? Success! Clean dishes and the site was working perfectly after a long AI Composer thread of abortive attempts, culminating in a triumphal "All tests passed". My New Year's gift was watching an AI assistant independently navigate a complex deployment process while I handled real-world chores.

Introduction

What is Cursor?

"We've moved from razor-sharp manual tools to chainsaws, and now someone's strapped a bazooka to the chainsaw."

Let's start with some context. Cursor is a modern code editor (or IDE - Integrated Development Environment) built on top of VS Code. It has been re-worked for AI-assisted development, and integrated closely with Claude Sonnet 3.5 (the best overall AI model in late 2024).

The State of AI Coding in 2024

Over the last year, we've progressed from razor-sharp manual tools to chainsaws, and in the last few months Cursor has strapped a bazooka to the chainsaw. It's powerful, sometimes scary, easy to shoot your foot off, but with the right approach, transformative and addictive.

We've gone from auto-completing single lines, to implementing entire features, to running the tests and debugging the problems, to making DevOps changes, to refactoring complex systems.

The most recent game-changer is Cursor's Composer-agent in "YOLO" mode (still in beta). Think of Composer as an AI pair programmer that can see your entire codebase and make complex changes across multiple files. YOLO mode takes this further - instead of just suggesting changes, it can actually run commands and modify code while you supervise. We've gone from an AI that suggests recipe modifications to a robot chef that can actually flip the pancakes.

The result? A fundamentally new style of programming, where the AI handles the mechanical complexity while you focus on architecture and intention. At least that's the pitch, and the reality isn't far off.

A Day in the Life: Building Features at AI Speed

"One afternoon, I realized I'd been shipping a complete new UI feature every 5-10 minutes for an hour straight - and for most of that time, I was doing other things while the AI worked."

What does this new style of programming look like? Let me share a "magic moment" to illustrate.

I needed to several CRUD buttons to my web app - delete, rename, create, etc. Each button sounds simple, but consider the full stack of work for each button:

Even just a delete button could easily be an hour of focused work - potentially more for the first one, or if you hit a snag, or haven't done it for a while, or need to refactor, or want to be thorough with testing.

One afternoon, I asked the AI to create the first delete button, answered some clarifying questions, watched it write all the code and tests, and tweak things until everything just worked. Then I asked for another button, then a small refactor, then added some complexity, then another. I would prepare the instructions for the next feature while the AI was generating code. It was tiring, and I felt like I'd been at it all afternoon. But then I reviewed my Git commits - we'd built a new, robust, fully-tested UI feature every seven minutes for an hour straight. That would have been a day or two's work, doing all the typing yourself, like an animal.

Of course, one might respond with a quizzical eyebrow and ask "But is the code any good?". The answer depends enormously. My experience has been that, with enough context, guidance, and clear criteria for success, the code it produces is usually good enough. And as we'll discuss, there is much you can do to improve it.

N.B. My experience so far has been working with small codebases. From talking to people, further techniques & progress may be needed for larger codebases.

Getting Started: The Fundamentals

Cursor Settings & Setup

For the best AI-assisted development experience, start with these essential configurations:

  1. Core Settings:
  1. Enhanced Features:

See Appendix A for additional system prompt configurations and "Rules for AI".

Treat AI Like a Technical Colleague - provide enough context, and make sure you're aligned

"The AI isn't junior in skill, but in wisdom - like a brilliant but occasionally reckless colleague who needs clear context and guardrails."

Forget the "junior developer" metaphor - it's more nuanced than that.

This mirrors the engineering manager's challenge of overseeing a team on a new project: you can't personally review every line of code, so you focus on direction, and creating the conditions for success. For them to have a chance of succeeding, you'll need to:

Clear Success Criteria, and The Cup of Tea Test

"Will no one rid me of these failing tests?"

Before writing a single line of code, define what success looks like. The ideal is to have an objective target that the AI can iterate towards, with clear feedback on what it needs to change.

Automated tests are your best friend here. The AI can run in a loop, fixing problems determinedly until all the tests pass. (If you don't have automated tests yet, the good news is that AI is really good at writing them, especially if you discuss what they should look like before allowing it to start coding them.)

The objective criterion could take other forms, e.g.:

It could be anything that can be run automatically, quickly, repeatedly, consistently, and give clear feedback on what needs to be changed, e.g. an error message or a score.

The dream is to be able to walk away and make a cup of tea while the AI toils. For this to work, you need to be confident that if the objective criterion has been met (e.g. all the tests pass), then the code is probably right.

This makes it sound easy. But often, figuring out the right objective criteria is the hard part (e.g. ensuring the tests capture all the edge cases, or the ML evaluation metric really meets the business needs). Before stepping away, you want to see evidence that the AI truly understands your intentions. This is why the "propose and discuss" phase is crucial - when the AI asks insightful questions about edge cases you hadn't considered, or makes proposals that align perfectly with your architectural vision, that's when you can start warming up the kettle.

N.B. There is of course one obvious way that this could fail - the AI could modify (or even just delete) the tests! I have definitely seen this happen, whether by accident or some AI sneakiness. Add instructions to your Cursor rules system prompt and/or to the conversation prompt, instructing it to minimise changes to the tests, and then keep a close eye on any changes it makes! (I wonder in retrospect if part of the problem was that I used the ambiguous phrase, "Make sure the tests pass", which might be the AI programming equivalent of "Will no one rid me of this turbulent priest?")

Set guardrails, and make every change reversible

"Good guardrails are like a net beneath a trapeze artist - you can risk the triple-somersault"

We would ideally like to be able to let the AI loose with instructions to iterate towards some objective criterion.

We gave an example of how this could go wrong, if the AI were to simply delete the tests. Of course, there are many other, worse failure modes. It might deploy to production and create an outage. It might format your laptop's hard disk. In practice, I've found Claude to be reasonably sensible, and the worst thing it has ever done is drop the tables on my local development database as part of a migration.

Here are some minimal guardrails to put in place:

Code Safety:

Data Safety:

Process Safety:

AI-Specific Safety:

N.B. For irreversible, consequential or

potentially dangerous tasks, then you probably

need to hobble it from running things without your say-so. For example, even though I've found Cursor very helpful for DevOps, you'll have to

make your own risk/reward judgment.

The key insight is that good guardrails don't just prevent disaster - they enable freedom. When you have automated backups and a solid test suite, you can more confidently let the AI work in its own loop. When it can't break anything important, you can pass the "cup of tea test" more often.

Summary of the Optimal Simple Workflow

  1. Describe your goal, constraints, and objective criterion. Ask the AI to AI propose approaches, weigh up trade-offs, raise concerns, and ask questions. But tell it not to start work yet.

  2. Answer its questions, and refine its plan. Give it permission to proceed - either one step at a time (for high-risk/uncertain projects), or until the criteria have been met (e.g. the tests pass).

You may want to first discuss & discuss & ask it to build automated tests as a preliminary step, so that you can then use them as the objective criterion.

[SCREENSHOT: A conversation showing this workflow in action]

Intermediate techniques

Spotting When Things Go Wrong - Watch for these red flags

What to do when the AI gets stuck, or starts going in circles

The AI rarely seems to give up. If its approach isn't working, it keeps trying different fixes, sometimes goes in circles, or graduates to increasingly drastic diagnoses & changes. Eventually, as the conversation gets very long, it seems to forget the original goal, and things can really go rogue.

N.B. Cursor's checkpoint system is your secret weapon. Think of it like save points in a video game:

[SCREENSHOT: The checkpoint restore interface in action]

Possible remedies:

The Mixed-Speed Productivity Boost

"AI-assisted programming often feels slower while you're doing it - until you look at your Git diff and realize you've done a day's work in an hour."

After hundreds of hours of usage in late 2024, I've observed a spectrum of outcomes that roughly breaks down into four categories:

  1. A Pit Full of Vipers: It'll make weird choices, get lost in rabbit-holes, or go rogue and make destructive changes

  2. The Scenic Route: It'll get there, but with a good deal of hand-holding (still perhaps faster than doing the typing yourself)

  3. Better Bicycle: Routine tasks, done 2x faster

  4. The Magic of Flight: 10-100x speedups

While dramatic speedups are exciting, the key to maximizing overall productivity is having fewer of pits full of vipers and scenic routes. Think of it like optimizing a production line: eliminating bottlenecks and reducing errors often yields better results than trying to make the fastest parts even faster.

The most effective tip for minimising these time-sinks is to remember that AI-generated code is cheap and free of sunk costs. Get used to interrupting it mid-flow, reverting back one or more steps, and trying again with tweaked prompts.

https://media2.giphy.com/media/v1.Y2lkPTc5MGI3NjExdjhpZXFuNXFlOHJkaWtoOWhmcGE4cnNwZ3RpdTl6am5uNzh2ZDJ0MiZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9dg/KT0d5zfjm3qYiZbCzT/giphy.gif

One surprising insight - AI-assisted programming often feels slower than coding manually. You watch the AI iterate, fail, correct itself, and try over and over again, and you can't help but think "what a putz" (even as you secretly know it would have taken you at least as many tries). It is also cognitively effortful to write careful instructions that make the implicit explicit, and to keep track of a rapidly-changing garden of forking paths. And each of the small delays spent waiting for it to generate code take you out of the state of flow. The remedy to this impatient ingratitude is to look at the resulting Git diff and imagine how much actual work would have been involved making each one of those tiny little changes carefully by hand yourself.

Advanced techniques: for complex projects

The Planning Document Pattern - for complex, multi-step projects

See: Appendix C - Project Planning Guide

Ask the AI to create a project plan:

Runbooks/howtos

See: Appendix B - Example Runbook for Migrations

Create runbooks/howtos with best practices and tips for important, complex, or frequent processes, e.g. database migrations, deploys, etc.

💡 Pro Tip: Create dedicated runbooks (like MIGRATIONS.md) to codify common practices. This gives the AI a consistent reference point, making it easier to maintain institutional knowledge and ensure consistent approaches across your team.

The Side Project Advantage: Learning at Warp Speed

What might take months or years to learn in a production environment, you can discover in days or weeks with a side project. The key is using that freedom not just to build features faster, but to experiment with the relationship between human and AI developers.

Each "disaster" in a side project becomes a guardrail in your production workflow. It's like learning to drive - better to discover the importance of brakes in an empty parking lot than on a busy highway.

With lower stakes, you can attempt transformative changes that would be too risky in production, e.g. large-scale refactoring, or use new tools.

This creates a virtuous cycle: try something ambitious, watch what goes wrong, update your techniques and guardrails, push the envelope further, and repeat.

[SCREENSHOT: A git history showing the evolution of guardrails and complexity over time]

The Renaissance of Test-Driven Development

Why TDD Finally Makes Sense

"With AI, you get tests for free, then use them as guardrails for implementation."

Test-Driven Development is like teenage sex - everybody talks about it, but nobody's actually doing it. With AI-assisted programming, that will change. TDD gives you a double win:

  1. The AI does the legwork of writing the tests in the first place, removing that friction

  2. The tests then provide an objective success criterion for the Composer-agent AI to iterate against, so it can operate much more independently and effectively.

The key is to get aligned upfront with the AI on the criteria and edge cases to cover in the tests. This creates a virtuous cycle: the AI writes tests for free, then uses them as guardrails for implementation.

Avoiding Common Pitfalls

However, you need to be precise with your instructions. I learned this the hard way:

A Two-Stage Approach

I've found success with this pattern:

  1. First pass: Focus on getting the test cases right

  2. Second pass: "Without changing the tests too much, get the tests passing"

This prevents the AI from trying to solve test failures by rewriting the tests themselves.

Managing Test Execution

With large test suites, efficiency matters. Instead of repeatedly running all tests:

  1. Run the full suite once to identify failures

  2. Use focused runs (e.g., pytest -x --lf in Python) to tackle failing tests one by one

  3. Run the full suite again at the end

Tweak your system prompt

see: Appendix A: System Prompts and Configuration for an example of mine as it stands. Each line tells a story :~

Over time, I suspect these will become less important. Even now, the out-of-the-box behaviour is pretty good. But in the meantime, they help a little.

For Engineering Managers: Introducing AI to Teams

The Unease of Not Knowing Your Own Code

"Managing AI is like managing a team - you can't review every line, so you focus on direction, criteria for success, and making mistakes safe."

Perhaps the most interesting insight comes from watching your relationship with the code evolve. There's an initial unease when you realize you no longer know exactly how every part works - the tests, the implementation, the infrastructure. The AI has modified nearly every line, and while you understand the structure, the details have become a partnership between you and the AI.

I relaxed when I realised that I recognised this feeling from a long time ago. As I started leading larger teams, I had this same feeling of unease when I couldn't review every line of code the team wrote, or even understand in detail how everything worked. As an engineering leader, I learned to:

There are differences, but most of these lessons apply equally to managing teams of people and AIs.

Conversation Patterns with Composer

Here are the main patterns of Composer conversation that I've found useful:

Pattern 1: Propose, Refine, Execute

This is the most common pattern for implementing specific features or changes:

  1. Initial Request:
  1. Review and Refine:
  1. Implementation:

Often it helps to separate out the test-writing as a preliminary conversation, with a lot of discussion around edge-cases etc. Then the follow-up conversation about actually building the feature becomes pretty straightforward - "write the code to make these tests pass!"

Pattern 2: Architectural Discussion

When you need to think through design decisions:

Pattern 3: Large-Scale Changes

For complex, multi-stage projects:

  1. Create a living planning document containing:
  1. Update the document across conversations

  2. Use it as a reference point for context

  3. Track progress and adjust course as needed

See Appendix C: Project Planning Guide.

Conclusion: Welcome to the Age of the Centaur for programming

After hundreds of hours of AI-assisted development, the raw productivity gains are undeniable to me. For smaller, lower-stakes projects I see a boost of 2-5x. Even in larger, more constrained environments, I believe that 2x is achievable. And these multipliers are only growing.

We're entering the age of the centaur - where human+AI hybrid teams are greater than the sum of their parts. Just as the mythical centaur combined the strength of a horse with the wisdom of a human, AI-assisted development pairs machine capabilities with human judgment.

This hybrid approach has made programming more joyful for me than it's been in years. When the gap between imagination and implementation shrinks by 5x, you become more willing to experiment, to try wild ideas, to push boundaries. You spend less time wrestling with minutiae and more time dreaming about what should be, and what could be. It feels like flying.

The short- and medium-term future belongs to developers who can:

In AI-assisted development, your most productive moments might come while making a cup of tea - as your AI partner handles the implementation details, freeing you to focus on what truly matters: ensuring that what's being built is worth building.

[SCREENSHOT: A before/after comparison of a complex feature implementation, showing not just the time difference but the scope of what's possible]

Postscript - AI-assisted writing

Postscript: I wrote this article with Cursor as an experiment - watch this space for more details on that AI-assisted writing process. AI helped with the structure, exact phrasing, and expansion of ideas, the core insights and experiences are drawn from hundreds of hours of real-world usage.

Resources and Links

Useful tips for Composer

Dear reader, Have you found other helpful resources for AI-assisted development? I'd love to hear about them! Please share your suggestions for additional links that could benefit other developers getting started with these tools.

Appendices

Appendix A: System Prompts and Configuration

Click the cog icon in the top-right of the Cursor window to open the Cursor-specific settings. Paste into General / "Rules for AI".

You can also set up .cursorrules per-project.

Here's are the Rules from one of the Cursor co-founders. Interestingly:

This is mine:

Core Development Guidelines

Code Style and Testing

Communication

Appendix B: Example Runbook for Migrations

Database Migrations Guide

This guide documents our best practices and lessons learned for managing database migrations. For basic migration commands, see scripts/README.md.

Core Principles

  1. Safety First
  1. Test-Driven Development
  1. PostgreSQL Features

Common Patterns

Adding Required Columns

Three-step process to avoid nulls (see migrations/004_fix_sourcedir_language.py):

Make it required and remove default

migrator.sql("ALTER TABLE table ALTER COLUMN new_field SET NOT NULL")
migrator.sql("ALTER TABLE table ALTER COLUMN new_field DROP DEFAULT")

Fill existing rows

database.execute_sql("UPDATE table SET new_field = 'value'")

Add column as nullable with a default value

migrator.add_columns(
    Model,
    new_field=CharField(max_length=2, default="el"),
)

Managing Indexes

Create new index:

migrator.sql(
    'CREATE UNIQUE INDEX new_index_name ON table (column1, column2);'
)

Drop existing index if needed:

migrator.sql('DROP INDEX IF EXISTS "index_name";')

Model Definitions in Migrations

When using add_columns or drop_columns, define model classes in both migrate and rollback functions:

class BaseModel(Model):
    created_at = DateTimeField()
    updated_at = DateTimeField()

class MyModel(BaseModel):
    field = CharField()

    class Meta:
        table_name = "my_table"

# Then use the model class, not string name:
migrator.drop_columns(MyModel, ["field"])

Note: No need to bind models to database - they're just used for schema definition.

Best Practices

  1. Model Updates
  1. Naming and Organization
  1. Error Handling and Safety
  1. Documentation

Superseding Migrations

If a migration needs to be replaced:

  1. Keep the old migration file but make it a no-op

  2. Document why it was superseded

  3. Reference the new migration that replaces it

See migrations/002_add_sourcedir_language.py for an example.

Questions or Improvements?

Appendix C: Project Planning Guide

Structure of document

Include 3 sections:

Goals, problem statement, background

Progress so far

Future steps

Key Tactics Used

  1. Vertical Slicing
  1. Test Coverage
  1. Progressive Enhancement
  1. Completed Tasks History
  1. Clear Next Steps