4 min

Fine-tuning Execution

Running the Job

Two Paths: API-based vs Open-source

API-based Fine-tuning (OpenAI, Anthropic)

Upload your JSONL, set a few hyperparameters, click go. The provider handles infrastructure, GPU allocation, and checkpoint management. You get back a model ID you can call via API.

# OpenAI fine-tuning in ~10 lines
import openai

# 1. Upload training file
file = openai.files.create(
    file=open("sales_companion_train.jsonl", "rb"),
    purpose="fine-tune"
)

# 2. Create fine-tuning job
job = openai.fine_tuning.jobs.create(
    training_file=file.id,
    model="gpt-4o-mini-2024-07-18",
    hyperparameters={"n_epochs": 3}
)

# 3. Use your fine-tuned model
response = openai.chat.completions.create(
    model=job.fine_tuned_model,  # e.g., "ft:gpt-4o-mini:acme:sales-v1:abc123"
    messages=[{"role": "user", "content": "Prep me for the Globex call"}]
)

Best for: Teams that want results fast without managing GPUs. Your Sales Companion already uses an API — swapping in a fine-tuned model ID is a one-line change.

Open-source Fine-tuning (LoRA/QLoRA on HuggingFace)

You control everything: which base model, which layers to train, how long to train. More work, but more flexibility and no per-token inference costs once deployed.

Best for: Teams with GPU access who want to own the model, need to run on-premise, or want to fine-tune open models like Llama or Mistral.

Hyperparameters That Matter

You don't need to tune dozens of knobs. Three hyperparameters account for 90% of outcomes:

Parameter	What It Does	Starting Point	Adjust If...
Learning rate	How fast the model updates weights	1e-5 to 5e-5	Validation loss plateaus early (too low) or spikes (too high)
Epochs	Number of full passes through training data	2-4	Validation loss starts rising (too many) or hasn't converged (too few)
Batch size	Examples processed before a weight update	4-16	Training is unstable (increase) or you're running out of memory (decrease)

For API-based fine-tuning, the provider picks sensible defaults. For open-source, start with these values and adjust based on your loss curves.

Why LoRA Works

LoRA (Low-Rank Adaptation) is the technique that makes fine-tuning practical for large models. Instead of updating all 7 billion parameters, LoRA freezes the base model and trains small adapter matrices (typically 0.1-1% of total parameters).

The Intuition

A 7B parameter model has massive weight matrices. But the *change* you need for your task is low-rank — it can be represented by two small matrices multiplied together. LoRA exploits this:

Original weight: W (4096 x 4096)
LoRA adapters: A (4096 x 16) * B (16 x 4096)
Fine-tuned weight: W + A*B

Instead of training 16.7M parameters in that layer, you train 131K. Multiply that across all targeted layers, and you're training ~10M parameters instead of 7B.

Which Layers to Target

Attention layers (q_proj, v_proj) — Almost always. These control what the model "pays attention to."

All linear layers — More expensive but captures more nuance. Use when attention-only doesn't hit your quality bar.

Rank (`r`) — Start with 16. Increase to 32 or 64 if the model isn't learning enough.

QLoRA adds quantization (4-bit) to LoRA, cutting memory usage by 4x. A 7B model that normally needs 28GB of VRAM can be fine-tuned on a single 8GB GPU.

Monitoring Training

The most important chart during fine-tuning is training loss vs validation loss over time.

Loss
 |
 |\
 | \          Training loss (should go down)
 |  \___________
 |   \
 |    \____     Validation loss (should go down then flatten)
 |         \____________________
 |
 +----------------------------> Steps

What to Watch For

Both losses decreasing — Good. Keep training.

Training loss decreasing, validation loss increasing — Overfitting. Stop training, use the earlier checkpoint.

Both losses flat from the start — Learning rate too low, or your data isn't teaching the model anything new.

Loss spikes — Learning rate too high, or a batch of bad data.

Checkpointing and Early Stopping

Save a checkpoint every N steps (typically every 100-500 steps or every epoch). If validation loss hasn't improved for 2 consecutive checkpoints, stop training and use the best checkpoint.

API providers handle this automatically. For open-source training:

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./checkpoints",
    evaluation_strategy="steps",
    eval_steps=100,
    save_strategy="steps",
    save_steps=100,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
)

Cost Breakdown

The total cost formula for API-based fine-tuning:

Training cost = (total_tokens / 1,000) x price_per_1K x epochs

For the Sales Companion with 500 examples (~400K tokens) on GPT-4o mini:

Training: 400K tokens x 3 epochs x $8/1M tokens = ~$9.60

Validation: included (runs on your 10% split automatically)

Fine-tuned inference costs ~1.5-2x base model rates. For a Sales Companion handling 1,000 queries/day at ~500 tokens/query, that's roughly $1-2/day in additional inference cost. The quality improvement usually justifies this many times over.

This is chapter 3 of Fine-tuning for Enterprise AI.

Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.

View course details

Ch. 2: Data Preparation

Ch. 4: Evaluation Harness