Which AI model is best for coding

Each AI model has its strengths and weaknesses when it comes to coding tasks. There is no silver bullet; picking the right model for your task is the best way to go.

I have been testing most of the frontier AI models for coding tasks since GPT-3 and it's hard to determine which model is best. Official benchmarks released by the companies are not very useful when it comes to helping you pick the right model for your use cases.

Even if you stick with a model, there are chances that it will be quantised or reduced in quality during peak hours, or a newer version to be released behind the hood (even though the version number in the API is not).

When it comes to deciding which model to use for your coding, there are several factors to be considered:

  • The programming language you are using (some models work best with certain programming languages)
  • Front-end versus back-end code (models like the newly released GPT-5 is great for front-end, whereas Claude Sonnet 4 is better at back-end)
  • If you are using it in a code editor like Cursor, or Claude Code, or directly via the API. (GPT-5 is very bad at running inside Cursor - from my initial tests it keep running into loops). Sonnet 4 is briliant inside Cursor.
  • Thinking mode can produce superior results for certain tasks. Some models do a better job if thinking mode is used.
  • Context window is important for bigger code bases or more complex tasks. Claude Sonnet 4 just released a 1 million context window. GPT-5 supports 256k tokens, which is also quite large.

If I want to create a website or front-end code, my goto model is GPT-5. They weren't lying when they said that it produces beautiful designs. Sonnet 4 does a good job as well, and it was for a while the best model for generating UI.

For back-end coding tasks, Sonnet 4 is brilliant, especially is used in Cursor or Claude Code (which I haven't really adjusted to),

Grok 4 is quite good as well for back-end tasks or researching bugs, but I mainly used is in the official chat app. The UI it generates is similar to that of GPT-4.

I have never been a big fan of the Gemini models from Google. There are several reasons for this:

  • The Google Cloud interface is cumbersome and difficult to use
  • Google has a habit of "releasing" a frontier model, and it is not available to use through the API (only their UI), or is very rate-limited and essentially not usable in a production app.
  • Even though they are cheap (the cheapest of the ones mentioned), the code they generate is most of the time incomplete or unusable.
  • They have a big context window, but with the latest releases from OpenAI and Anthropic, it isn't an advantage anymore)

To recap, I now use GPT-5 for GetSite for generating websites, chatbot, and other menial tasks, and for coding I use Sonnet 4 in Cursor. Grok 4 is good to research certain bugs or general research on a topic (I used it in the official chat app).

Another aspect to point out is that GPT-5 is cheaper than Grok 4 and Sonnet 4, although with thinking activated, the tokens consumed increase the price substantially.