Fine-tuning Execution
Running the Job
Two Paths: API-based vs Open-source
API-based Fine-tuning (OpenAI, Anthropic)
Upload your JSONL, set a few hyperparameters, click go. The provider handles infrastructure, GPU allocation, and checkpoint management. You get back a model ID you can call via API.
# OpenAI fine-tuning in ~10 lines
import openai
# 1. Upload training file
file = openai.files.create(
file=open("sales_companion_train.jsonl", "rb"),
purpose="fine-tune"
)
# 2. Create fine-tuning job
job = openai.fine_tuning.jobs.create(
training_file=file.id,
model="gpt-4o-mini-2024-07-18",
hyperparameters={"n_epochs": 3}
)
# 3. Use your fine-tuned model
response = openai.chat.completions.create(
model=job.fine_tuned_model, # e.g., "ft:gpt-4o-mini:acme:sales-v1:abc123"
messages=[{"role": "user", "content": "Prep me for the Globex call"}]
)Best for: Teams that want results fast without managing GPUs. Your Sales Companion already uses an API — swapping in a fine-tuned model ID is a one-line change.
Open-source Fine-tuning (LoRA/QLoRA on HuggingFace)
You control everything: which base model, which layers to train, how long to train. More work, but more flexibility and no per-token inference costs once deployed.
Best for: Teams with GPU access who want to own the model, need to run on-premise, or want to fine-tune open models like Llama or Mistral.
Hyperparameters That Matter
You don't need to tune dozens of knobs. Three hyperparameters account for 90% of outcomes:
| Parameter | What It Does | Starting Point | Adjust If... |
|---|---|---|---|
| Learning rate | How fast the model updates weights | 1e-5 to 5e-5 | Validation loss plateaus early (too low) or spikes (too high) |
| Epochs | Number of full passes through training data | 2-4 | Validation loss starts rising (too many) or hasn't converged (too few) |
| Batch size | Examples processed before a weight update | 4-16 | Training is unstable (increase) or you're running out of memory (decrease) |
For API-based fine-tuning, the provider picks sensible defaults. For open-source, start with these values and adjust based on your loss curves.
Why LoRA Works
LoRA (Low-Rank Adaptation) is the technique that makes fine-tuning practical for large models. Instead of updating all 7 billion parameters, LoRA freezes the base model and trains small adapter matrices (typically 0.1-1% of total parameters).
The Intuition
A 7B parameter model has massive weight matrices. But the *change* you need for your task is low-rank — it can be represented by two small matrices multiplied together. LoRA exploits this:
Original weight: W (4096 x 4096)
LoRA adapters: A (4096 x 16) * B (16 x 4096)
Fine-tuned weight: W + A*BInstead of training 16.7M parameters in that layer, you train 131K. Multiply that across all targeted layers, and you're training ~10M parameters instead of 7B.
Which Layers to Target
q_proj, v_proj) — Almost always. These control what the model "pays attention to."QLoRA adds quantization (4-bit) to LoRA, cutting memory usage by 4x. A 7B model that normally needs 28GB of VRAM can be fine-tuned on a single 8GB GPU.
Monitoring Training
The most important chart during fine-tuning is training loss vs validation loss over time.
Loss
|
|\
| \ Training loss (should go down)
| \___________
| \
| \____ Validation loss (should go down then flatten)
| \____________________
|
+----------------------------> StepsWhat to Watch For
Checkpointing and Early Stopping
Save a checkpoint every N steps (typically every 100-500 steps or every epoch). If validation loss hasn't improved for 2 consecutive checkpoints, stop training and use the best checkpoint.
API providers handle this automatically. For open-source training:
from transformers import TrainingArguments
training_args = TrainingArguments(
output_dir="./checkpoints",
evaluation_strategy="steps",
eval_steps=100,
save_strategy="steps",
save_steps=100,
load_best_model_at_end=True,
metric_for_best_model="eval_loss",
)Cost Breakdown
The total cost formula for API-based fine-tuning:
Training cost = (total_tokens / 1,000) x price_per_1K x epochsFor the Sales Companion with 500 examples (~400K tokens) on GPT-4o mini:
Fine-tuned inference costs ~1.5-2x base model rates. For a Sales Companion handling 1,000 queries/day at ~500 tokens/query, that's roughly $1-2/day in additional inference cost. The quality improvement usually justifies this many times over.
This is chapter 3 of Fine-tuning for Enterprise AI.
Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.
View course details