As AI becomes more deeply integrated into our products, conversations about it are often clouded with jargon terms like transformers, RAG, fine-tuning, etc. As product managers, we’re expected to discuss AI features with engineers, explain them to stakeholders, and envision new use cases for users. But these jargons can easily get in the way. This post breaks down these concepts in a simple way.
#1 Generative AI
Generative AI is a branch of AI focused on creating new content rather than just analyzing existing data. These models can produce text, images, videos, audio, code, etc. Unlike traditional AI systems that classify inputs or make predictions, generative models synthesize original outputs by learning patterns from large datasets. This capability powers products such as conversational agents, auto-drafting assistants, and AI-driven design tools. For example, instead of an intern who only summarizes reports, generative AI is like one who can also draft emails, design slides, and propose product copy based on context.
#2 Generative Pre-trained Transformer (GPT)
GPT stands for Generative Pre-trained Transformer, one of the most widely recognized generative AI models. It powers ChatGPT and many other applications.
- Generative: The model can produce new content rather than just analyzing or classifying existing data.
- Pre-trained: It is first trained on massive, internet-scale datasets to learn language patterns before being fine-tuned for specific tasks or applications.
- Transformer: The underlying neural network architecture that enables the model to understand context, relationships, and dependencies in language, making it especially powerful for tasks involving long-range context and nuanced meaning.
#3 Transformers
Earlier models like RNNs and LSTMs struggled with long-range dependencies. For example, remembering what was said five sentences ago was difficult. Transformers solved this problem, becoming the foundation of GPT. Unlike older models, transformers can process long passages of text while preserving context, making them far more effective for tasks like summarization, translation, and dialogue.
Attention
Attention is the mechanism that helps the model determine which parts of the input matter most. Think of it as the model’s “focus.” For example, when summarizing a document, attention lets the model highlight the most relevant sentences. For example: In “The cat sat on the mat because it was tired,” the model needs to resolve whether “it” refers to cat or mat. Attention weights “cat” more strongly, making the prediction correct.
Self-Attention
Self-attention allows every word in a sentence to compare itself with every other word to understand relationships. This means “tired” gets linked back to “cat,” not “mat.” Self-attention is what gives transformers their ability to capture rich context across even long passages, making outputs coherent and contextually accurate rather than random.
Transformer Architecture
At a high level, a transformer works as follows:
- Input: The text is broken into tokens (word pieces)
- Embeddings: Tokens are converted into vectors (numbers) that represent meaning
- Encoder: Builds contextual representations of the input
- Decoder: Generates new sequences from those representations
- Output: The final generated text or prediction
Training a Transformer Model
Training is how the model learns patterns from massive datasets. The model reads billions of sentences, predicts the next word, and adjusts internal weights (“knobs”) whenever it’s wrong. It requires enormous data, compute (GPUs/TPUs), and time. Over time, the model captures grammar, facts, and even communication styles, though not perfectly. Product Managers will rarely train from scratch. Instead, you’ll rely on pre-trained models from vendors (e.g., OpenAI, Anthropic, Cohere).
Fine-Tuning a Transformer Model
Fine-tuning adapts a general pre-trained model (like GPT) to a specific domain. Off-the-shelf models may not perform well in specialized contexts. Hence, train the model further on domain-specific data, like, thousands of medical records, legal contracts, customer chats, etc. As a result you will have a customized model that performs better for your product’s needs without the cost of full training. For example, fine-tuning GPT on medical documents so it can act as a doctor’s assistant, rather than using a generic chatbot.
#4 Retrieval-Augmented Generation (RAG)
RAG combines information retrieval with text generation. Large language models (LLMs) can “hallucinate” because they rely only on what they memorized during training, which may be outdated or incomplete. RAG solves this by integrating search with generation. The model retrieves fresh, relevant knowledge from a database or document store, then uses that information to generate a coherent response.
Retrieval
The model searches for the most relevant documents or text chunks. Without retrieval, an LLM might guess, provide outdated details, or fabricate facts. For example, imagine a student answering an exam question. Instead of relying only on memory, they quickly look up notes and then answer based on those notes.
Generation
After retrieval, the LLM rewrites or synthesizes the retrieved information into a fluent, user-friendly response. For example, instead of copy-pasting “Refund Policy: Eligible within 14 days”, the model would output: “You can request a refund within 14 days of purchase by filling out this form.”
RAG is one of the safest and most practical ways to integrate LLMs into real-world products.
- Grounds answers in trusted sources, resulting in reduced hallucinations
- Keeps AI up-to-date, hence, no reliance on a fixed training cutoff
- Enables product integration to build AI features on top of internal knowledge bases (eg: FAQs, wikis, product manuals)
- Balances accuracy and usability as retrieval ensures correctness, generation ensures readability
#5 Evaluations (Evals)
Once you have an AI feature, the next challenge is evaluating its performance. That’s where Evals come in. Unlike traditional software testing where outcomes are often binary, AI evals are more nuanced. For example, when testing an AI-powered support chatbot, it’s not just about “Did it answer correctly?” but also “Was the tone friendly?” or “Did it align with our brand voice?” Hence, Evals help product managers decide whether an AI feature is production-ready, needs further fine-tuning, or requires guardrails.
- Accuracy: Did the model provide the correct response?
- Relevance: Was the response useful and on-topic?
- Consistency: Does the model give stable answers when asked the same question multiple times?
- Helpfulness: Did the output actually solve the user’s problem?
- Bias: Does the model avoid unfair or skewed responses?
- Safety: Does it prevent harmful or inappropriate content?