Skills for AI jobs · · 7 min read
The AI engineer portfolio: what to show
Four project archetypes that signal real skill, READMEs with decisions, evals and the anti-patterns to avoid.
Your AI engineer portfolio is not a gallery. It is evidence. A recruiter and a tech lead look at it with one question in mind: “did this person ship something that holds up under real data and real users, or did they just glue tutorials together?”. In 2026 anyone can call a model API. The difference is the engineering around that call: evals, observability, decisions, tradeoffs. This piece is about what to show specifically and — just as important — what not to show.
Three to four project archetypes that actually signal skill
You do not need ten projects. You need three, maybe four, that together cover different axes of competence. Repeating the same project type (another chatbot, another “chat with PDF”) adds no signal — it multiplies noise.
- A RAG app with evals. Not just “upload a document, ask a question”. Show chunking, embedding choice, reranking, and — crucially — an eval set: 30–50 question–answer pairs with faithfulness, context recall and answer relevance metrics. This single project separates an engineer from someone gluing tutorial code.
- An agent that does a real task. Not an “agent talks to agent” demo. Something useful: ticket triage, automated fix PRs, research with citations. Show the tool loop, tool-error handling, limits (max steps, token budget) and what happens when the agent gets stuck.
- A fine-tune or an eval harness. Either you tuned a smaller model on a narrow task and show it beats the baseline on your set, or you built an eval tool that compares models/prompts on defined metrics. Either one shows you understand how quality ismeasured, not just asserted.
- A shipped product with users. Even a small one. Even ten users. What counts is that it lives in production: a domain, monitoring, cost control, and a history of what broke and how you fixed it. One such project weighs more than five projects that never left
localhost.
Why “wrappers” still count if the engineering is real
There is a snobbery going around: “it is just a wrapper on the OpenAI API”. Ignore it. Most of AI’s product value in 2026 is exactly the layer around the model. The question is not “is it a wrapper”, it is “is the engineering inside the wrapper real”.
Real engineering inside a wrapper means: an eval layer that catches prompt regressions; caching and cross-provider fallbacks; per-request cost control; rate-limit handling and retry with backoff; output validation before it reaches the user; trace logging for every call. If you show that, the “wrapper” is a strong engineering signal. If your code is three lines of a call with none of these things — then yes, they will rightly call it a demo.
READMEs that show decisions and tradeoffs
The README is the most important file in the repository from a hiring standpoint. The reader spends their first thirty seconds there and decides on that basis whether to open the code at all. A “how to run” README is necessary but not sufficient. Add a decisions section.
- Problem and context. Two sentences: what it solves and for whom. Without it the reader cannot tell whether they are looking at a toy or a tool.
- Architectural decisions. “I picked pgvector over a dedicated vector DB because the scale is tens of thousands of vectors and that is one less service to run.” That sentence says more than a diagram.
- Tradeoffs you deliberately did not make. “I skipped reranking because on my eval set it raised recall by 2% at a cost of 300ms latency.” This shows you measured rather than guessed.
- What you would do with more time. A short “next steps” section shows you see the limits of your own solution.
Demonstrating evals and observability
This is today the rarest and strongest signal. Most portfolio AI projects do not have a single number. If yours does, you are instantly in a different league.
- An eval set in the repository. A file with test cases (input, expected behaviour, metric). Even thirty cases are enough to show you think in terms of measurement.
- Metrics, not vibes. “Faithfulness 0.87 on a 40-question set” instead of “works pretty well”. Frame numbers as a result on a specific set, not as universal truth.
- Observability. Show that you log traces (e.g. via a tool like Langfuse or a custom logger): cost per request, p95 latency, token count, prompt version. A dashboard screenshot in the README is worth a thousand words.
- A prompt regression test. Show that changing a prompt fires the eval set in CI. That is a rarity that screams “this person worked with LLMs seriously”.
GitHub hygiene
Your GitHub profile is your business card before anyone reads the CV. Three things ruin the impression fastest: committed node_modules, secrets in commit history, and twenty skeleton repos with no README. Clean it up.
- Pin your 3–4 best repositories. Make the rest private or archived.
- Descriptive commits, not “wip” ×40. Commit history is evidence of your thought process.
- Zero secrets in history. Scan the repo (e.g.
gitleaks) before you make it public. An API key ingit logis a hiring red flag. .gitignore, a license, a cleanREADME— baseline hygiene.- A profile README with one paragraph: who you are and what you are working on now.
A personal site, plus writing and teaching as a multiplier
A personal site does not need to be a work of art. It needs to answer in thirty seconds: who you are, what you built, how to reach you. A project list with a one-sentence description and a link to repo and demo. That is enough.
The multiplier is writing and teaching. If you describe how you built the eval set for your RAG app, or why you dropped the fine-tune in favour of a better prompt — you show a depth the repository alone cannot convey. One good technical blog post or one thread with a real engineering takeaway is worth more than ten “thoughts on AI”. It is also a signal that you can communicate — a skill teams pay a premium for.
Anti-patterns that kill a portfolio
- Tutorial clones. If the course a project came from is recognisable, the signal is zero or negative. Take the idea from a tutorial, but change the domain, add evals and ship it — then it becomes yours.
- No metrics. An AI project without a single number is a statement of “I do not know whether this works”. It is the single biggest mistake in AI portfolios.
- No deployment. “Works locally” means “I never faced cost, latency, secrets and rate limits”. Ship at least one project.
- Notebooks only. A pile of
.ipynbfiles with no application suggests an experimenter, not an engineer. Turn at least one notebook into a service. - README outweighing the code. A long, generated README on an empty project reads instantly as a facade. Fewer promises, more working code.
A concrete checklist
- 3–4 projects covering different axes: RAG with evals, an agent, a fine-tune/harness, a shipped product.
- Each project has an eval set in the repo and at least one metric with a number.
- At least one project lives in production under its own domain.
- Every README has a decisions and tradeoffs section, not just run instructions.
- Observability is visible: cost per request, latency, prompt versioning.
- GitHub profile cleaned up: 3–4 pinned repos, zero secrets, descriptive commits.
- A personal site with a project list and a clear contact.
- At least one technical post describing a real engineering decision.
- Zero tutorial clones recognisable by name.
TL;DR
An AI engineer portfolio in 2026 is evidence, not a gallery. Show 3–4 projects across different axes: a RAG app with evals, an agent doing a real task, a fine-tune or eval harness, and at least one shipped product. “Wrappers” count if there is real engineering around the model call. READMEs should show decisions and tradeoffs. The strongest and rarest signal is evals and observability — numbers instead of vibes. Clean up your GitHub, put up a simple site, write one good technical post. Avoid tutorial clones, missing metrics and missing deployment.