Marek Kalnik
AI Generated Cover for

Summary

This presentation by Marek Kalnik, a manager at the Theodo group, delves into the profound impact of AI on software development and how organizations can effectively navigate this transformation. He emphasizes that AI usage is a learnable skill and advocates for a Lean-inspired approach to integrate AI into daily development workflows.


Comprehensive Summary of AI in Software Development Presentation

Introduction: The AI Revolution and Its Challenges

Marek Kalnik begins by highlighting the immediate and significant impact of AI on coding. He shares an anecdote where GitHub Copilot accelerated a specific coding task by a factor of seven, demonstrating AI’s potential for efficiency. He also references the “Matter study” which shows AI agents (like Claude Opus 4.5) can successfully complete long-duration tasks (exceeding a day) with a 50% success rate, indicating a vast potential for value creation in software.

However, this rapid advancement presents a significant challenge:

  • Developer Disorientation: Many developers feel lost, questioning their value and role as they find themselves “watching a progress bar” rather than actively coding.
  • The “Unknown Machine”: Unlike traditional deterministic software, AI is probabilistic. Kalnik illustrates this with an example of AI struggling with basic multiplication, highlighting that our intuition is not yet equipped to work with this new type of “software that is not software, human that is not human.”

Building Intuition and Skill for AI Usage: The Monozukuri Approach

Inspired by Toyota’s “Monozukuri” philosophy (deep respect and understanding of materials and tools), Kalnik proposes shifting from distrust to technical curiosity about AI. He outlines three pillars for developers and leaders to build intuition and skill:

  1. Hands-on Coding: Actively coding with AI tools (like Copilot) to understand their limits and successes. He notes a trend of senior tech leaders returning to coding due to AI removing many frictions.
  2. Gemba (Observation): Managers observing their teams directly as they work with AI. This reveals what developers achieve, where they struggle, and how they react, identifying areas for support.
  3. Experimentation and Research: Given AI’s probabilistic nature and the inability of its creators to fully explain its behavior, extensive experimentation (e.g., testing multiple prompts or models in parallel) is crucial to understand what works and what doesn’t, moving beyond one-off successes.

Key Takeaway: AI usage is a learnable skill that requires active engagement and a deep understanding of the tool.

Training and Standardization with Lean Principles

Traditional training methods often fall short. Kalnik advocates for on-the-job training and standardization, drawing heavily from Lean principles:

  • Job Instruction Training: Guiding developers on what to do and how to adapt their tasks with AI.
  • Standardization: Defining the “best way to do it to date” for AI interactions, often by observing high-performing individuals.
  • Waste Elimination (Muda): Applying the “seven wastes” framework to identify inefficiencies in AI-assisted development:
    • Transportation: Copying information between tools (e.g., tickets to AI, prompts from external files).
    • Overprocessing: Using excessive or unnecessary words in prompts, or over-reviewing AI-generated code.
    • Defects: AI failing to produce desired results, leading to rework or stopping generations.
    • Waiting: The most significant waste, where developers spend considerable time waiting for AI responses. Kalnik notes that 90% of observed video time can be waiting, indicating a lack of skill or knowledge on how to optimize this.

Key Takeaway: Identifying and eliminating these wastes helps uncover underlying knowledge gaps and areas for skill development, not just minor time savings.

Defining a “Good Conversation” with AI

The core output of a developer working with AI is no longer just code, but a “conversation” with the AI. Evaluating this conversation is crucial:

  • Conversation Analysis: Tools to export and analyze AI chat logs (e.g., Claude logs, GitHub Copilot chat history) are essential.
  • Interaction Modes (Antony’s Research):
    • Centaur Mode: AI executes a clear, well-defined task (e.g., code generation). The goal is a good result on the first try, with minimal interaction.
    • Cyborg Mode: Continuous, dynamic exchange with AI for brainstorming, debugging, or exploring ideas. This requires maximum interaction and iteration, leveraging AI’s “abundance of interferences” (ability to generate many ideas quickly).
  • Prompt Quality: Prompts should be minimal, unambiguous, and clearly convey the developer’s intention and desired interaction mode.
  • Context Provision: Developers must provide AI with necessary project context (e.g., how to run tests, project-specific commands) to prevent errors and token waste.
  • Tool Selection: Using the right AI model or agent for the specific task.

Project Readiness and Reusable Standards

The quality of AI output is heavily dependent on the project’s codebase and existing standards:

  • Codebase Quality: Poorly structured or inconsistent code (e.g., mixed casing in translation keys) leads to poor AI-generated output. A “bad base” yields “bad results.”
  • Reusable “Skills”: Creating shared, reusable prompts or “recipes” for common tasks (e.g., API generation, project deployment) across teams. This fosters consistency and reduces repetitive effort.
  • AI-Ready Projects: Projects need to meet specific criteria (around 15, according to Kalnik’s matrix) to be effectively leveraged by AI.
  • Prompt Review: A new challenge is reviewing the quality of shared prompts, requiring emerging frameworks (like Test L) to assess significant improvements.

Reinventing Development and Continuous Improvement

The integration of AI fundamentally changes the development process:

  • New Inputs and Outputs: Developers now manage prompts, workflows, and skills as inputs, and produce AI-compatible specifications and technical solution plans as outputs.
  • Continuous Learning: Given the rapid evolution of AI, constant technical watch and a high frequency of iteration on standards and analysis are critical. Standards are “living documents” that evolve incrementally based on real-world feedback.

Business Impacts and Scaling Challenges

The adoption of AI has yielded significant business benefits:

  • Accelerated Processes: One team achieved 4x faster legacy system migration using AI.
  • New Business Opportunities: AI opens doors for new services and client offerings.
  • Shift in Work Content: Some tasks, like API documentation, are disappearing as AI handles them automatically.
  • Individual Growth: Developers who initially struggle with AI show significant progress and satisfaction within months.

However, scaling these successes across a large organization (e.g., 600 people) is challenging:

  • Iterative and Small-Group Approach: The effective method is highly iterative, focusing on small teams and projects, making widespread deployment resource-intensive.
  • Engaging Teams: While many can be engaged through dedicated time and attention, a small percentage of “firmly opposed” individuals may not adapt, sometimes leading to departures.
  • Adapting to Speed: Standards and training must be constantly updated to keep pace with rapidly evolving AI models and capabilities. The bar for “good AI usage” must continuously be raised.

Key Lean Takeaways for AI Adoption

Kalnik concludes by reiterating the core Lean concepts that underpin their successful AI integration:

  • Monozukuri: Fostering technical curiosity and deep understanding of AI.
  • Problem Solving: Empowering teams to identify and resolve issues in their AI-assisted work.
  • Gemba: Managers engaging directly with teams on the ground to build trust and shared understanding.

Talk Infographic

🚀 Navigating AI Disruption with Lean in Software Development

🎯 AI’s Dual Impact: Potential & Challenges

✨ Potential

  • Accelerated Coding: 7x faster for simple, standardized tasks (e.g., color space converter).
  • Complex Projects: AI can handle tasks exceeding one day with high confidence (e.g., Claude Opus 4.5).
  • New Business Opportunities: E.g., 4x faster legacy system migration, opening new service areas.

😟 Challenges for Developers

  • Anxiety & Value Loss: Developers question their role (“What is my value?”), feeling reduced to “watching a progress bar.”
  • Feeling Lost: Facing a “machine we don’t understand.”
  • Intuition Gap: Our existing intuition is not prepared for AI’s probabilistic nature.

🧠 Understanding AI: A New Paradigm

🆚 AI vs. Traditional Software

  • Traditional Software: Deterministic, its workings are explainable (e.g., a compiler).
  • AI (LLMs): Probabilistic, its behaviors escape full analytical explanation. We created it, but don’t fully understand how it works.
    • Example: AI struggles with basic math (e.g., 97% accuracy for 5-digit x 1-digit multiplication).
  • Interaction: Not a human colleague (human-like but not human), not deterministic software.

💡 The Solution: Intuitive Understanding (Monozukuri)

  • Inspired by Toyota’s Monozukuri: Deep respect and understanding of the tool/material.
  • How to Achieve It:
    1. Personal Coding: Hands-on experimentation to understand AI’s limits, successes, and nuances.
    2. Gemba: Managers observe developers directly in their workflow to see real-time challenges, reactions, and opportunities for help.
    3. Research & Experimentation: Formulate hypotheses, test multiple prompts/models in parallel, and verify results. AI behavior is learned through observation, not just analysis.

🗑️ Lean Principles for AI Usage: Eliminating Waste

🔍 Identifying “Waste” (Muda) in AI Interactions

  • Framework: Apply the Seven Wastes of Lean to developer-AI workflows.
  • Examples of Waste Observed:
    • Transportation: Copying tickets, prompts between different tools.
    • Motion: Unnecessary mouse movements, opening files, searching for context.
    • Waiting: For AI generation (often underestimated, but significant in video analysis).
    • Overprocessing: Excessive words in prompts, re-reviewing already processed AI-generated code.
    • Defects: AI failures, rework, stopping AI mid-generation due to incorrect output.
  • Goal: Uncover deep knowledge gaps and skill deficiencies that lead to these wastes, rather than just optimizing seconds.

💬 The New “Product”: The AI Conversation

🔄 Developer’s Output Has Shifted

  • Old Product: Code, design, documentation.
  • New Product: The conversation with AI (which then produces the code).
  • Key Question: What defines a good conversation with AI?

👥 Interaction Modes (Antony’s Framework)

  1. Centaur Mode: Human thinks, AI executes.
    • Goal: Delegate tasks, achieve first-try success.
    • Conversation: Simple, direct, focused on clear delegation.
  2. Cyborg Mode: Continuous, dynamic exchange.
    • Goal: Brainstorming, inspiration, testing ideas, rapid iteration, generating an “abundance of ideas.”
    • Conversation: Many tests, iterations, dynamic feedback loop.

✅ Criteria for a “Good” AI Conversation

  • Clear Intention: Explicitly state the goal and desired interaction mode.
  • Minimal & Unambiguous Expression: Avoid superfluous words; precise language.
  • Context Provision: Give AI necessary project-specific information (e.g., how to run tests, project-specific functions, code history).
  • Correct Tool/Model Selection: Using the right AI for the specific task (e.g., model, agent, mode).

📚 Standardization & Reusable Skills

  • Prompt Standards: Document best practices (intention, typical errors, clear prompt, operating mode) as a training support.
  • Impact of Codebase Quality: Poor codebase quality (e.g., inconsistent naming, mixed languages) directly leads to poor AI output. Projects must be “AI-ready.”
  • Reusable Skills: Share common prompts/recipes across teams (e.g., “how to deploy a project”).
    • Challenge: Evaluating the quality of these shared “skills” (e.g., emerging Test L framework).

📈 Building an Adaptable Training Framework

❌ Why Traditional Training Fails

  • AI technology evolves too rapidly.
  • Generic training doesn’t address specific, dynamic, on-the-job problems.

🌱 The “Living” Training Approach

  • Continuous Adaptation: Training content evolves based on real-world problems and new models.
  • Job Instruction Training: Focus on how to adapt AI to specific job tasks (e.g., Kubernetes migration prompts).
  • On-the-Job Learning: Define work, create standards, observe, and continuously improve through feedback loops.
  • Research Mindset: Encourage experimentation (hypothesis, test, verify) over blind deployment.
  • External Knowledge: Integrate continuous technical watch to stay updated with rapid advancements.
  • High Frequency Iteration: Standards and analyses must evolve constantly, not just periodically (e.g., weekly analysis of developer videos).

🤝 Engaging Developers & Managing Resistance

  • Less Convinced: Requires one-on-one time, sincere attention, and demonstrating personal success stories.
  • Firmly Opposed: Very difficult to engage; may require broader organizational changes in recruitment, promotion, and performance evaluation to include AI competency.
  • Outcome: Fosters developer progress and growth, but acknowledges that some individuals may choose to leave.

📊 Business Impact & Scaling

  • Positive Impacts: Significant acceleration in specific areas (e.g., legacy migration), new business opportunities, individual developer growth and engagement.
  • Scaling Challenge: The iterative, small-group, project-by-project approach is resource-intensive and difficult to scale across large organizations (e.g., 600 people). It requires continuous effort and significant means.

Main Questions Answered

Main Questions Addressed in the Talk

  1. How can organizations effectively leverage AI’s potential in software development while addressing the challenges it introduces for developers?
    • The talk highlights AI’s significant potential for accelerating coding tasks and handling complex projects, but also acknowledges the resulting developer anxiety, loss of value perception, and the struggle to understand AI’s probabilistic nature.
  2. What are the fundamental differences between traditional software and AI, and how do these differences necessitate a new approach to understanding and interacting with AI tools?
    • The speaker emphasizes that AI is a “machine we don’t understand,” contrasting its probabilistic behavior with deterministic software. This requires a shift from analytical comprehension to intuitive understanding through experimentation and observation, as our existing intuition is not prepared for this new paradigm.
  3. How can Lean principles, such as identifying and eliminating “waste” and fostering a deep understanding (Monozukuri), be applied to optimize AI usage in development workflows?
    • The talk advocates for a Lean approach, inspired by Toyota’s Monozukuri, to deeply understand AI’s workings, advantages, and failure modes. It details using the “seven wastes” framework to analyze developer interactions with AI, identify inefficiencies (e.g., waiting, overprocessing, defects), and uncover underlying knowledge gaps.
  4. What defines a “good” conversation or interaction with AI, and how can these best practices be standardized, taught, and continuously improved within development teams?
    • The speaker posits that the developer’s new “product” is the conversation with AI. The talk explores how to define and analyze “good” conversations based on interaction modes (Centaur vs. Cyborg), clarity of intention, context provision, and tool selection. It then outlines the creation of reusable standards and “skills” to guide effective AI interaction.
  5. How can organizations build a scalable and adaptable training framework for AI usage that accounts for rapid technological evolution and varying levels of developer engagement?
    • The talk addresses the challenges of traditional training models for AI, proposing a “living” training approach that continuously adapts to new models and developer problems. It also touches on strategies for engaging developers, managing resistance, and integrating AI competency into recruitment and performance evaluation processes.

Raw Transcript

Euh, il se passe après le talk de Pierre. En fait, en l’écoutant, je me suis dit, en fait, il m’a déjà fait mon talk, il a répondu tout ce que je vais aborder dans mon talk. Ça va être compliqué, mais je’m going to try to bring maybe a little more concrete elements, related to computer science, related to IT, related to code. I don’t know if I’ll show code, if I show code too. so we’re going to do a deep dive on how to navigate the distribution, with Lynn. I’m going to start with a little exercise for you. what you see here, this is an export of a discussion with Claude. Claude, one of the generative coding tools, who has already used it?

Perfect, almost everyone. Very good. So, what we do is we have small scripts that allow us to export Claude’s logs to have them in a readable way for someone to work with, to analyze. And we’re going to do a little GK exercise. Don’t answer all at once, we’ll see later. But are you able to tell if it’s a good or bad discussion with Claude? Is that a good piece?

And we’ll see during my talk, I will try to give you some ideas, some thoughts, some ways of thinking that will help you to answer this discussion. why is it me who’s talking to you about it? my name is Marek Kalnik, I am a manager at one of the entities of the Theodo group, an IT entity. But it’s been especially three years that I’ve been working on all the subjects of future of software, how to integrate AI into the development process, because that’s Theodor’s job, developing applications, that’s my company’s job. And I told myself a few years ago, In fact, our profession is changing, it’s changing profoundly and as a leader of my team, I must find the answer to how to support them at that stage. And well, I don’t know about you, but one of the first things I discovered in this field was GitHub Copilot. and we did a few tests with my colleagues. So there you see someone coding, in the process of coding. I’m not going to show you the whole video. someone is coding a color space converter between Oklab, RGB, etc. Not bad for degrading, and we’ll see. We see in the video that Copilot is basically completing all the functions. Certainly, it’s super simple, it’s very standardized, it’s mathematical formulas that are already established, studied, etc. There’s nothing to invent, there’s nothing to create, you just have to restore to create the right algorithms. so it’s not a very difficult task for the AI. The data is there. But we see, you see a bit of the transition between gray and color, it’s the moment when the developer presses tab and in fact it completes several lines at once. I measured in pure writing speed. Just to write that, without looking for the formula, without thinking about the structure, etc., we divided the time to do this task, this small task, by seven. And when I saw that, I told myself, wow.

In fact, it’s really, really, really going to move. I don’t know yet how, I don’t know on what task, etc., but things are happening, things are happening that make our job changing. So somewhere there’s a value to be found, and in fact, this test we did three years ago. However, if you followed the latest news, you’ve certainly followed it. This is an article that I like, it’s the Matter study on long running tasks. So, can we entrust long-duration tasks to agents? and they evaluate uh the maximum duration of the task that can be. Okay, so there are human tasks over certain durations, of 14 seconds, of 3 hours, of a few days, that they ask the AI to perform. And here we see, on the success ratio, greater than 50%. So 50% of tasks, AI managed to succeed. If we look for 40% success, the results are much less interesting. We pass over a few hours, we don’t pass over several days. What I put in your mouth is, approximately what is superior to a day of work. We can already trust with a fairly high confidence level, tasks that exceed one day, so that the AI can succeed them autonomously. And this is from Claude Opus 4.5 which was published end of November last year.

So there is a potential for AI to bring a lot, a lot, a lot of value in software creation. So as a company, as someone who has IT systems, it’s super interesting. But something else is happening too.

It’s a video that circulated a lot on Reddit.

I thought I had preloaded it.

It happens like that.

We see people, we see developers who are in front of their computers. I’m launching something. It generates, it generates, it generates, the rest of the video. The more you watch it, the more disturbing it is because you see that the guy is scrolling through TikTok, I imagine, with images of war in Iran or Ukraine. it’s extremely disturbing as an image. Our teams are lost.

The people who work with AI constantly ask themselves the question, what is my value? What do I bring? What should I do? Is my job today to watch a progress bar of an agent and click OK at the end? And maybe again 10 times OK during the, the course of the generation, or is there another value to bring? And that’s very, very, very disturbing because there are certain people who manage to find their way, there are certain people who change their workflow, who advance several jobs in parallel, who delve into this extremely profound subject. But most of them, I trained more than 400 people, and I saw them working, it was that: I launch something, I wait, and I don’t know what to do. And I don’t know where my place is as a developer.

And well, it’s not new, we’ve already seen that with the famous XSLT compilation. But this time it’s a bit different, because this time, we’re facing a machine we don’t understand. Before, a compiler, we know how it works. Today, with AI, we don’t know how it works. So our teams are lost in this AI world. And there is a question, I think everyone asks themselves this question, how do we react as a company? But also as a leader, how should I react in this situation?

There’s a story that inspires me a lot. I often come back with my thoughts at that moment. as a father, I had the pleasure of visiting the Toyota Museum in Nagoya. We also saw some factories, when we did a Lean trip in 2017. it’s fun, there’s Lean tourism in Japan, there are travel organizations specifically for that. and Regis Medina, who is a coach I respect enormously, stopped me at a place in this museum, it was a very small room. Not really indicative, we were next to huge machines, beautiful cars, of all, all the development of professions, etc. There was a room where there wasn’t much, there were a few graphs. And it was the room that was dedicated to the research that Toyota did in the 30s around steel.

So steel became, became the raw material to build cars. And in fact, Toyota’s engineers asked themselves the question, how does steel work? What is this material? How does it behave? So here we see, these are graphs that measure, I believe, the hardness and flexibility of steel as a function of composition and temperature. Who really had this approach of a deep understanding of the material that I’m working with. There’s a term we’ve already covered today, Monozukuri, which I link a lot to this because Monozukuri means ’to make things’ in its literal translation. But in Lean culture, it means a deep respect for the material, for the person, for the machine we use, to understand how to perform one’s work well. So I bring the question of AI, not with distrust, not by maybe laughing a little, seeing once again how ChatGPT was wrong on a simple task, but with a desire to understand what it is. Okay, me as a developer, what is this strange machine that I can use today in my work? How does it work? What are its advantages? What are its drawbacks? What are its modes of operation? What are its modes of failure? And how can I use all this, well, to succeed? And this Monozukuri is all the more important today because we are facing a tool that we do not know because for most people, it is a completely bizarre thing.

So I told myself, I must learn what AI is, me personally as a manager, and if I want my teams to learn.

So, there are those who say it’s simple, you just have to follow Andrej Karpathy, a great researcher, a very good channel on YouTube, in less than 30 you’ll have your own LM. Small problem in with the LLM, is that today, for the first time in history, we are faced with something that we created, that we cannot explain by the way it was created. The behaviors of AIs escape all analysis, all analysis that we are able to do. So we’re more on an observation of how it behaves. We have to make an intuitive opinion, more an intuitive opinion of use than just a comprehension, okay, there are 10 layers and in fact, how does Kindle work? And finally, it’s just a mathematical matrix. That doesn’t help. Especially since we had the misfortune perhaps to make our first encounter with the LLM through this type of interface. So ChatGPT did great, we got used to an AI that talks like a person, that tries to present itself as a person, which often has a very, very good level of expression. So, we already have a certain relationship, rather like with a colleague. Moreover, the term artificial intelligence always refers us in a collective imaginary of lots of intelligent robots, of relationships, there are lots of films about that. So we say to ourselves, hey, it’s a human. There are a lot of people who talk to AI as if it were a colleague. and at the same time it disappoints us in that regard, we ask it to tell a joke, the joke is completely useless. We give it an impossible task to do, but it tries, it tries, it tries, it starts to lie. In the end, it’s perhaps a human behavior that it presents to us. but so this machine that presents itself as a person, but is not a person. And at the same time, we know it’s a machine. So we expect a software that does its job and does its job well. Except that we are used to deterministic software. AI is probabilistic software. I love this graph, it’s a graph that measures the rate of correct answers when we ask AI to multiply two numbers. You see the two axes, the length of the number we multiply. And already, if we multiply a number a five-digit number by a one-digit number, we only have 97% correct answers. A calculator would do better. The bigger it gets, the more it fails, it’s astonishing, and we often entrust tasks to AI without knowing that it’s a machine that is not a machine. So it’s a software that is not a software, it’s a human that is not a human, it’s bizarre. Our intuition is not ready for this work, so we have to refer to an intuition.

How do we refer to an intuition, how do we work with it first?

We can’t escape that. here you see my technical review of Copilot in 2021, I was super happy, I launched Copilot. It was null. It didn’t work, the syntax wasn’t good. So compilation was slow. Then it started to get better. And gradually, through personal projects, I begin to understand the limits, the successes, I re-test, I re-test, etc. And I also notice today, by discussing with friends, with managers, with tech leaders, that there have never been so many senior tech people coding. We see people coming back, putting their hands in the code, putting their hands in the code. to code and uh, two days ago, I had a discussion with the speaker, and I was told, in fact, it’s been 10 years since I coded. But I code. It’s great. I discover. I can do it now. because there are also a lot of frictions that AI removes. I’m talking about syntax problems, not knowing the language, it’s me who’s new around here, my team has changed for the third time, it’s not serious, but now, I can apply my skills to understand that. And it also allows us to understand AI itself well.

We’re not just there for the coder, we’re there to use the tool and understand how it works.

The second tool to refer to an intuitive understanding is the Gemba. So the Gemba is a Lean practice that consists, as a manager, of standing next to one’s team and observing what the person is doing. The gesture is a bit more complex than that, but already we can start with that. And we’ll see, I see tech leaders who refuse to do it. No, no, I don’t want to do it, I don’t want to disturb the equilibrium, I don’t understand, I don’t understand, etc. It’s a gesture that is not always simple. but it allows to see several things. It allows to see in fact what the developers manage to do, what they don’t manage to do, how they react. Are they happy, are they not happy? Are there problems where I can help them? But you are. You see them next to him, who was coding a library. Look how Copilot helps, does it help, does it not help? what is difficult? One last thing, and we shouldn’t ignore it, is research.

What’s a bit strange about AI, precisely, is this inability today for these creators to explain the behavior through analysis. So what we do is we do a lot of experimentation. Okay, if I test a prompt, I’m going to test 10 prompts in parallel, I’m going to see which ones work. If I want to generate a component, I’m going to generate with three models in parallel, I’m going to see how it works. That’s how I manage to understand what works and what doesn’t work. Because if I managed to get a result only once, potentially it’s not replicable. So we really have to be in a research approach. I have a hypothesis, I test, I verify the result, rather than in a, in a deployment approach. For all this, having done all this with my teams for a certain period of time, I told myself, in fact, the use of AI is a skill, it’s a skill that can be learned. There’s one last thing, and we shouldn’t ignore it, it’s research. What’s a bit strange about AI, specifically, is this inability today for this creator to explain behaviors through analysis. So what we do is we do a lot of experimentation. Okay, if I test a prompt, I’m going to test 10 prompts in parallel, I’m going to see which ones work. If I, if I want to generate a component, I’m going to generate it with three models in parallel, I’m going to see how it works. That’s how I manage to understand what works and what doesn’t. Because if I managed to get a result only once, potentially, it’s not replicable. So we really need to be in a research approach, I have a hypothesis, I test, I verify the results. rather than in a uh in a publication approach. For all that, having done all this with my teams for a certain period, I told myself, in fact, using AI is a skill, it’s a skill that can be learned. So I’m going to try to train them.

Well, it’s pretty great because we learn from people, we teach people to do things, then they’re competent, they succeed, life is good.

So, you just need to hire an AI usage consultant, who will make us a two-day planet, he will give everything to our teams, and there, the world is beautiful.

I don’t have this image with Geminai. we see that he became aware, it’s a bit off. obviously, it doesn’t work. So, it doesn’t work, why? Because we’re missing a few elements in this graph. The first is job instruction training. The assistants formulate, we must train them in their job. What do you do, what do you keep doing, what are your tasks? how do I adapt this training, all these concepts, well, what’s in the prompt? At best, we can have a plenary session. Then, your prompt for your task, for what you want to do today, you’re doing a Kubernetes cluster migration, how you do your prompt, that’s not the same question. Uh. Second thing, if we want to do a training on the, so on the job. We need to know how to do it. We need to have a method. We need to define that work.

And how do we define that work? We try to start with a standard, now a standard is quite a heavy word, quite meaningful. We usually define it as the best way to do it to date.

It’s the best way to do it to date. I think you’ve defined that, in fact, you go see a person in your team who’s in shape and they have the standard. We observe, we record. We ask the person, explain to us, we have a standard, it’s not very complicated, you shouldn’t make it complicated. And then, we put a whole loop that is composed of analysis and elimination of the waste I was talking about. But how did I do that? The first thing, already, we won’t know your teams, and the first thing they’re going to tell you is, in fact, the AI didn’t manage to do what I wanted. I ask for something, it doesn’t work, I’m disappointed.

Uh, I had a client who told me once, in fact, we deployed Copilot everywhere, 6 months later, no one uses it. That’s normal. People are disappointed. A few times in a row, then he stops using it. And so, the first thing we look at, the first thing I asked my teams, is to send me screenshots when they are disappointed by the AI. So here, we arrive at this kind of situation, so here you see a floating request, someone was expecting a completion. I don’t know if it’s visible on the screen, but in fact, there’s an override part, and in the override part, you see below, there, in fact, he wanted to have the provider of his list of stations with retrieving the list of stations. Disappointment, there’s the ‘guess’ paste.

So, the question we’re going to ask ourselves is, in fact, how could it guess that? I don’t know if that’s the right word, by the way.

Uh, well, there’s a causal chain that caused the AI not to guess that. It’s that it’s a function that belongs to the project, it’s not something standard, so already, it’s not in the training data. Then, if it’s not in the training data, it means you need to inject the project context into the AI. Copilot, at the time we made the piece, did not automatically retrieve the context beyond the file, outside the file that was open. So if you, developer, expect it to complete something when you haven’t opened the file, well, think again, it’s not going to work. Today, the tools work a little differently. But we see, we’re starting to see that there’s a link between the performance of what I obtain, the knowledge of the tool, and the know-how of people who know how to call the tool at the right time and who know how to use it in the right way. So indeed, we have a training in the second mode of training, it’s the construction of freeing the context. You want it to complete well, well you have to type a bit beforehand. It’s not as magical as we would have hoped. But it also works very well. Then, when we start looking at screenshots, we realize that in fact, there’s a whole workflow missing, development is not a static moment, it’s dynamic, we code, we go back, we modify things, we wait, and so on.

So I replace screenshot analysis with video analysis with my team. so I asked everyone while doing a training in Bendeberg, I would ask everyone in the group, small group, 6 to 10 people. To send me a video, 5 minutes. Not very long. Before the session, I analyzed the video to understand what was happening.

In fact, here, we run into the problem you had at the beginning of the session, in fact, I have a video. I look at what’s happening. What am I going to do with it? How do I know where to understand the video? Lean helps us a lot. It helps us a lot, why? Because there’s a framework that is extremely simple, extremely powerful, it’s the seven wastes. These seven wastes, movement. It’s transporting things from one place to another. Everything that’s in inventory, so, I do things, and we wait, nothing happens. I give the example of projects that have a hundred tickets in the backlog. It wanders through the backlogs for months.

Any unnecessary movement. Need to move the mouse, to move, to open files and so on. waiting. Well, it explains itself. Overproduction, it’s quite interesting, it’s not very present in AI usage, but it’s very present in development. Producing things like that, we don’t need. Especially end-users, we’re not a text, we imagined the end-user will never, ever take that. Overprocessing, doing things much more complicated than necessary. And defects, obviously, everything I have to correct, rework, improve, whether it’s bugs or something else. So these wastes, you, you see that in fact, some of them are sometimes very minimal. earlier, Pierre was talking about, we can improve a minute in a two-day process, there it’s not worth it, but if we produce something, it’s worth it. but it’s a fairly interesting framework because it’s a framework that commands, that allows us to spot things.

In the images, in the videos of people using AI, we see a lot of transportation. We see people copying tickets to put them in an AI. We copy the prompts that are somewhere. I started with people who had files in a note file or in fact, they had prompts they had just used and so on. everything that’s overprocessing. I put a lot of words, I type a lot of words, prompt thing is writing. and in fact words that are not often useful, which sometimes influence the results in a negative way. Everything related to the review of AI-generated code is a big topic. But in fact, I re-process something that was already processed, is all that necessary? And obviously, the defect. I do it, it doesn’t work, I do it, it doesn’t work, the AI starts doing something, I stop it, it doesn’t work, and so on. That’s quite easy to spot in the video. Because when I watch a video, in fact, I can very quickly understand, okay, here, you corrected something, here, you had to react, how it happens, and so on.

And obviously, in a video, we see a lot of waiting. Because if I, as a tough person, have to watch an hour of video, on Thursday afternoon to prepare the training, and 90% of these videos, it’s waiting for the AI to generate the results.

I tell myself, in fact, it’s not possible, my teams, they say that, in fact, they say that because when you’re working, you don’t realize it. You don’t realize it, whereas by watching the video, we all say, yeah. Okay, there, it can be a lot.

Uh, we’re not the only ones to notice it. So this is another study by a director who compared the performance of developers with AI and without AI. Specifically, he has it right on the screen, and immediately there are two posts that increase. There are new positions, there is a position of achievement. Whether it’s idle, overhead, in fact, I don’t really know what to do, and so, I, nothing much happens. Or else, I’m clearly waiting for the AI. we have waiting times that are increasing, and I think that’s underestimated. Compared to what I see, it’s underestimated.

Uh. So. The Loom video, great platform, we can tag everything that happens in a video with emoticons and so on. We see that even with 8 minutes of video, we already have six comments to make for a session where there are six videos, there’s content to discuss, there are things to improve, and we start to really find, just with this simple Buddha framework, okay, things are happening that are not really great in the way we code.

And I come back to this point, we’re perhaps talking about something that in this video took me 3 seconds, 10 seconds, a minute, that’s okay, a minute of a developer’s wasted time is really not significant.

Except that this minute can uh stem from two things. Either, it’s not possible to do better. But honestly, that’s rare. Or, I don’t know how to do better. And if I don’t know how to do better, my lack of skills, my lack of knowledge will prevent me from doing better in all other instances where I need that skill and that expertise. So that’s why we sometimes take subjects that are really tiny. Because what interests me. It’s not optimization, we’re not trying to find out how to save a second on a developer’s daily work. What interests me is all the deep knowledge hidden behind this small waste that someone made during the day. Do you know how it works? Do you know how to react? You might not want to, that’s okay. We’re not obliged to react, but just, do you have the necessary skill?

So once we started discovering these steps. We told ourselves, well, now, we have a good knowledge base of what doesn’t work in AI usage. We need to build on top of it, we need to build the best method and then standardize it. that’s great. But the first question we need to ask is, what are we producing?

So before, as a developer, we produce a lot of things, there’s design, there are schemas, there’s documentation and so on, but there’s a moment, for most teams, the bulk of their work, we produce code. we write code, we type, and in the end, this code gives a functionality for the user, it goes through a compiler, it goes through a platform, we have a function.

Today, what a developer produces, he works with AI, he produces a conversation with AI.

And it is this conversation that produces the code.

We can correct them, we can fix them, we can have modes that are a little less automated with everything that is autocompletion and so on, I know some people still prefer it. But the piece has changed. And so, the question of what is a good piece, it will be as important for the code, but it will be especially important for my conversation with the AI.

Is this conversation a good conversation? So the first thing I ask myself, the first question I ask myself, which I try to standardize in my teams, is that.

As I told you, Claude, there’s a super folder hidden in your computer, called Project. which contains all your discussions that the person had with Claude on their machine. other solutions seem to have their solution to that, there’s the pad log in VS Code on GitHub, Open Code also has solutions, so a priori, we are able to extract the data. And thus to retrieve the discussion that someone had with the AI. It’s easier to do it retrospectively than during the discussion because during the discussion, we have somewhat different modes of thinking, reflecting on how to react in the task. It’s not ‘am I doing it right?’ So once we’ve done that, once we’ve seen that. We created a small script at home that allows exporting the prompts and having them in the format you saw.

So anyone can read that. The whole discussion, I even coded a small interface on top of it that allows you to clearly see the sequence and so on to coach people well. And we will especially see a whole part which is the summary of the discussion, the, the number of fields and so on, and then we can say how the different calls are chained.

So the developer produced this piece. And we can analyze it together.

And the first part of the analysis we’re going to use is already to understand what kind of discussion mode we have with the AI. And there, my bed, who is one of the most interesting researchers in prompting, who is at the World Business School who guides the research lab on Generative AI, has written a lot of things, especially on generative AI in business. He distinguished two modes of working with AI, the centaur mode. So, there’s a head that thinks, there’s a body that executes. I think and then I have a machine that does the job. The discussion should be very simple. And there’s a Cyborg mode. I am half-man, half-machine, there is a continuous exchange that allows to obtain results that are richer than what each one would be capable of doing separately.

So that. I’m going to look for it here, what mode did the person apply, first. So here, we have 1 hour 30, we have five messages from the user, we have 11 assistant responses, we are neither in the first nor in the second. I see, okay, there’s perhaps a problem with the application of the co-creativity mode of AI usage between what’s happening and and the FS we told you. Because there are two big tasks, there’s a generation task and a production task, and a brainstorming task, debugging, whatever, and there are two definitions of what’s good that are separate. In the first type, I try to get a good result on the first try. I know how to do the task, I know how to validate it. I should be able to delegate the task entirely to the AI. I can have an hour of generation with the AI if I get a request from the person. If there’s more, it means there was a need to correct what was happening. So we can see why correction is necessary. And on the other hand, if I’m in Cyborg mode, I want to have a maximum of exchange between the person and the AI. I want it to be very dynamic because the person is trying to get inspired, to become aware, to try to test their ideas. So it requires a lot of testing, a lot of iterations. And especially something we also call abundance of interferences, it’s another interesting concept from Antermony. which says in fact, you have something that is capable of giving you a hundred ideas with a single prompt. If I ask a colleague to give me a hundred ideas, he’ll look at me like I’m crazy, he’ll give me three. AI can give you a hundred ideas.

So the two modes are different, and therefore, the first part of what makes a good prompt, I already know what the intention is and what the interaction mode is. Then, we have a lot of requests that are actually not very clear. Which in fact, this happens. because the person is trying to inspire herself, trying to become aware, trying to try out her ideas. So you need a lot of tests, you need a lot of iterations. And especially something that we call the abundance of intelligence. That’s a concept, a very important concept in the world of IA, which says that in fact, you have something that is capable of giving you a hundred ideas with a single prompt. If I ask a colleague to give me a hundred ideas, he’ll look at me weirdly, he’ll give me three. IA, it can give you a hundred ideas. So the two modes are different, and so the first part of what’s a good prompt is, I already know what the intention is and what the interaction mode is. Then, we have a lot of requests that are actually not very clear, like, well, in fact, this happens. Okay. or a lot of requests where we have words that are superfluous, useless, etc. I love people who use IA. It’s a prompting technique, by the way, there are plenty of papers on it. It works randomly. Um, so, do I manage to express this intention in a minimal and unambiguous way? Do I transmit this information well?

Then, What I observe in the discussion is that there are many tool usages. On 130, there are 36 tools that have been applied. One that resulted in an error, and in particular I see, so this is my analysis interface that I use to analyze the piece. You see that. So this is the AI that tries to understand how to launch the tests on the project. It regularly makes mistakes, there are 5 tool usages, it has LLM calls with tens of thousands of context tokens.

It’s not good. Why? Because we forgot to specify how to launch the tests on that project. What are the right commands for my task? Maybe 30 seconds of developer work that would have allowed all these back and forth to be done. So there’s a question of how I give the AI the necessary context to succeed in the task, without over-searching, without making errors.

And if it comes to that, I’m not going to dwell on it.

Oh, you’ve never seen that, no one knows that. and it’s a tool question, in fact. Sometimes I use the wrong model, sometimes I use the wrong agent, sometimes I make a mistake between the ASP mode, plan, etc. which means that in the end, even with all these good techniques, the AI doesn’t succeed because I just used the wrong tool. So it’s the right flow and the right tools. So that’s the prompt standard we wrote. What you see here is that by giving this to someone, that person won’t be able to achieve their goal. There’s nothing here that explains to me, in fact, what minimum means, what the intention is, etc. A standard is not there to replace training, it’s there so that the person can go back and see what the key points are. Oh yes, I just checked them, everything is good, I have just, I can really launch the prompt now. So this is a support, it’s one of the elements of training. And that’s the prompt standard that we’re creating today.

Then, the AI depends on the prompt.

It produces the prompt, but it also depends enormously on the project. All the topics of poka-yoke, etc., that were mentioned earlier, we’ll find them here. I have two examples that I love. This one is great, so it’s a person who asks the AI to give her a translation key. And if you look there, in fact, the translation key, she has, like, Nicolette. it’s snake case with camel case mixed, there’s a mix of French, English, well, in fact, it’s extremely bizarre how this translation works.

Look, but with the AI, she did this for me, it’s rubbish, your thing. And if she looked at the translation file, what does she have? She has a mix of French and English, she has a mix of snake case and camel case, she has exactly the same pattern that we saw in the generation. If I give this file to a developer, let’s say a junior who arrives in the project, what is the project convention? He will be lost.

So, the quality of the code, the quality of the codebase, the quality of everything we do, it will have a huge impact on what the AI generates. And if there’s a bad base, the results will be bad. And a second example, which is this type of prompt, is already quite effective. But we see that there are a lot of repetitions. The person has certainly made this prompt a dozen times, and it’s potentially copied and pasted somewhere. I generate an API, or hundreds of APIs, in a project potentially. So why not do that? Why not make a file where I tell it, okay, this is how I made my API, here’s the API? I don’t need to explain to the AI every time.

There’s a very good talk by N.O.R.E.S. from Factorial AI that he called Droid. which explains how small frictions for a person in a project can derail agents. and they, they made a great agent for environment evaluation. I also have one of my own that is a bit less powerful. which actually tells us where we are in terms of testing, in terms of quality, etc. and I had fun making a small project evaluation matrix, as you can see, it’s very complex. And in fact, there are about fifteen criteria that must be met for a project to be AI-ready. So this is the last condition, we have standards for how to do things. We have standards for what criteria my project must meet to work well for an AI.

Uh, and there’s one last thing. which is also about everything we see in the video, which is this moment when, in fact, I start repeating things. And you know, my project, it’s often on Railway, so in fact, go ahead and deploy, do the, do the, use the CLI, etc. These are things that are repeated and repeated and repeated and repeated to the AI constantly. And when I look at several videos in a row, I see not only that one person repeats this constantly, but also that several teams repeat the same thing constantly. Because in the end, the coding standards will be quite close. in a tech, in a team, with respect to a certain tool, and all of that is shared. I don’t need to reinvent the wheel every time. And especially if I do that every time myself personally, I’m going to have this urge, this incentive to go a little faster, my prompt will be more qualitative, I’ll spend less time on it. If I have a prompt that I share with all my colleagues, I want to refine it a bit.

So, one of the things we’ve done is reusable standards.

Today in the form of skills, it used to be something else. in the form of certain scripts, so here you see in particular Export Code History that I was talking about. But there’s a whole part of skills which is the most common file, at the enterprise level, we share the recipes that the AI can apply as if we were telling a colleague, go ahead, how do you deploy a project?

And when we do that, there’s a pretty funny thing that happens, which is that people start making pull requests, they start adding their standards, etc.

And we’re going to ask ourselves, how do we review the thing? I want to bring the same quality, I’ll probably go back to my notebook. I want to do my job well, as a person who accepts this pull request. Either I accept, or I don’t accept. And in fact, if I accept, I accept on what basis? How do I know if the prompt is good? And there we really reach the frontier of what is being done today. The skill quality evaluation frameworks, for now there’s one that’s emerging, Test L, which is starting to have something a bit solid. In fact, it didn’t exist. So we’re at the frontier of what’s possible today, reviewing a prompt to know if there’s a significant improvement compared to the previous version. Very good.

All that, little by little, we put all these bricks in place, and in fact, we start reinventing. the job, we start reinventing development.

There are three bricks that appear, which are all the inputs I told you about. The prompt, the workflow, the skill, etc. I inject them, which define how to execute tasks, what my expectations are regarding quality.

There are new pieces on which we are going to concentrate, not only just the prompt, but in fact, what is a specification? What is a specification of a functionality, what is a specification of a functionality compatible with AI? if I generate a technical solution plan and I want to review it with the AI because, in fact, it’s a functionality that is quite complex, that will take perhaps several hours to code, it’s not a good plan for the AI to execute it without error. So we start asking ourselves questions that go further and further, always with this approach, this small iteration, okay, I’m improving one thing, I’m improving one thing, I’m improving one thing.

So here you see, this is a screenshot of our database, a standard prompt, the standard always has the same form, by the way. it’s the intention, the typical errors, the clear prompt, and there’s an operating mode below.

Uh, there’s a visual part, so that’s the visual part of the specification standard, which explains what a good specification is. There’s nothing extremely innovative in all of that.

It’s good IT pushed to the extreme. That’s what works with AI. I’m doing everything I’ve wanted to do for years. Now I have a good reason to do it. and before finishing a talk, in my talk, I have to mention two things. All this, today, cannot work without a good dose of external knowledge. Because it’s super interesting to do your own research, to do your own iterations, but we’re in a domain that moves enormously, moves very fast. The tools change, but also there are people who learn super interesting things. So we also inject into this method a good technical watch. There are a whole bunch of processes, methods, habits for that.

And it’s not a method that is completely self-centered, it cannot be completely self-centered because we will, we will slow ourselves down. And a second element that we don’t see in this whole discussion, and I think that if it’s simply related to the previous point, it’s that what’s important is the quantity, the frequency of iteration. The iterations on the standard, there are dozens that have been made. It’s not something that we validate by a committee every six months, so we evolve the standard in an incremental way. The pieces that are analyzed, when I did it and again, 10 participants every week for 3 years.

Well, that’s more than a hundred pieces that were analyzed very quickly. And, from that, we can draw all these standards. So it’s not a question of, of the gesture, I think that everyone is capable of doing it. It’s more a question of bringing the routine and the necessary frequency so that it really builds something usable.

So, how did it work, how did it work for us? So, what worked for us is that there are really interesting business impacts. we have, for example, today, Theodore, an integration team that uses AI to accelerate the migration of legacy systems. They do that four times faster than what we were able to do manually. So there’s a whole new type of business that’s opening up for us, and new opportunities for our clients. On the build, we are less, the impacts are less interesting because, there’s a lot of dependence on doing a good spec. And doing a good spec today is still a very manual and iterative job. And ultimately, we realized that, that the AI, uh,

when we need, when we need to do a lot more iterations and a lot more research on this part, how to really build the product with it. I was telling you earlier if it’s completely like that. There are tasks that have completely disappeared, so the content of my teams’ work today, when I observe it, documentation of API, these kinds of things, we forget it. Now it’s a, it’s a, it’s a, it’s not done. And, I see after a few sessions, in fact, personally, how people are on top of it. Someone who uses AI very badly, three months later saying, look, what I’ve done, it’s great, I’m super happy, etc. So there’s this whole approach there, very, very personal.

It makes people progress.

What’s a lot harder is to scale that. Because all of a sudden, we have a process that is really very, on very small groups, very small teams, team by team, project by project, putting in place all these mechanics. So in fact, deploying this in an organization of 600 people, it requires significant means. It’s not something where I’m going to be able to do a training, I’m going to tell everyone, listen, it’s good, everyone will do it. No, it’s very, very, very iterative.

Some Lean concepts to remind you, Monozukuri, which allows to convert this anxiety into a technical curiosity, listening, how does this thing work? trying to understand. The problem solving that replaces top-down approaches by engaging teams in understanding what doesn’t work in their work, in their piece, in their daily life. And the Gemba, which means that as a manager, we sit next to the teams and we have a difficult technical discussion about a detail that allows to create a good confidence. Okay, but in fact, this person, my manager, he knows.

Uh, no, not that. He had to listen.

I’m not sure. Do you have any questions?

Any questions?

Thank you very much for your presentation. Do you use code agents like Copilot, for example?

No, no, no, the question is not that.

I’m sorry, I forgot. The question is rather, uh, you’re talking about 700 people, and we see a maturity matrix that you presented earlier. I imagine that there must be people who are hyper-promoters, those who use it well. I imagine there are people who follow, and my question for you is, how do you manage, how do you work with people who don’t follow? who are not yet on board, or who have not understood what you presented. Thank you.

So, in fact, we have people who are hyper-promoters and, in fact, those, there are working groups, etc. The work with people who are less convinced, it’s really one-on-one work. That’s why, we find ourselves in the project, we do a training for a specific project.

And in fact, there are people who have never taken the time, who have never really been interested, who we have given the tool to, the usage is a bit less. And in fact, ultimately, if we manage to dedicate this time with them for training, if they see that we are sincere, that we have knowledge, etc., these people, we manage to engage them because, in fact,

they start to understand what’s at stake, they start to understand that it’s important for their personal success, for the job. And, in fact, that’s not a population that’s hard to engage, it’s a population that needs time and attention. The population that is difficult to engage is a training of people who are, well, firmly opposed. That is, there are people in our teams for, for ideological reasons, in the broadest sense. and in fact, those are people I haven’t been able to engage, I don’t know how to engage them. and, today, I forbid the participation of these types of people in my training. Because they are capable of completely messing up a training with remarks that disturb everyone, I think. So I can debate for an hour about the usefulness of something, but in fact we have things to explain. So, training is not a multi-change tool.

It can come as a support for change, but no.

There is a change tool, and we have other ways of managing change in-house already. we changed the recruitment process to evaluate the aptitude and competence of AI during the recruitment process. to then also make it a bit robust. quite explicitly, we changed our management framework, promotion, evaluation of people to include the criteria of AI in there. And we talk about competence, I think there’s no problem, there are reserves, very well. Already, there are training courses that are available, waiting for the update of all the skills. and, it’s making progress. I don’t hide that there have been departures because of that too, because frankly there are people who get angry.

One last question?

Uh, yes, thank you for the presentation. I have a question about the speed of evolution. In fact, is the time to train us not already obsolete? For example, you show the quality prompts, but at the release of the next model in two or three months, will these standards still be relevant or not? For example, defining, asking for a plan, etc. Now, I think it’s almost two buttons in the latest models. It’s something that I have the impression that we no longer need to do, with a phrase it arrives right away depending on the model. The large model manages to generate things of very good quality. So my question is, does training really make sense over time?

Yes, super question.

First answer, we have to answer quickly. The first is that as it’s a living training that is based on what’s happening on the ground, we constantly look at the training. You used AI, you used the latest Claude, it’s Claude 4.5, but it always generated super well, etc. it’s a training that adapts. Because in fact, we constantly react to the problems that people encounter. If there are problems they don’t encounter, in fact, we stop talking about them. I had a great difficulty at the end of November 2025 because Claude 4.5 came out and it was so good that everyone immediately became a very good AI user. Just, there you go, I’m doing whatever with the prompt, it works, and everything, it’s still good. and in fact, I told myself I had to raise the bar. So I raised the bar in terms of expectations, and what I gave as a goal to my teams is that today you have to be able to work in a way, with your prompt. If I’m not able to launch several agents in parallel and succeed in an effective way, I’m not good at AI. So we raise the bar and that way we always have things to learn.

Thank you.