Back to guides
3
4 min

Fine-tuning Execution

Running the Job

Two Paths: API-based vs Open-source

API-based Fine-tuning (OpenAI, Anthropic)

Upload your JSONL, set a few hyperparameters, click go. The provider handles infrastructure, GPU allocation, and checkpoint management. You get back a model ID you can call via API.

# OpenAI fine-tuning in ~10 lines
import openai

# 1. Upload training file
file = openai.files.create(
    file=open("sales_companion_train.jsonl", "rb"),
    purpose="fine-tune"
)

# 2. Create fine-tuning job
job = openai.fine_tuning.jobs.create(
    training_file=file.id,
    model="gpt-4o-mini-2024-07-18",
    hyperparameters={"n_epochs": 3}
)

# 3. Use your fine-tuned model
response = openai.chat.completions.create(
    model=job.fine_tuned_model,  # e.g., "ft:gpt-4o-mini:acme:sales-v1:abc123"
    messages=[{"role": "user", "content": "Prep me for the Globex call"}]
)

Best for: Teams that want results fast without managing GPUs. Your Sales Companion already uses an API — swapping in a fine-tuned model ID is a one-line change.

Open-source Fine-tuning (LoRA/QLoRA on HuggingFace)

You control everything: which base model, which layers to train, how long to train. More work, but more flexibility and no per-token inference costs once deployed.

Best for: Teams with GPU access who want to own the model, need to run on-premise, or want to fine-tune open models like Llama or Mistral.

Hyperparameters That Matter

You don't need to tune dozens of knobs. Three hyperparameters account for 90% of outcomes:

ParameterWhat It DoesStarting PointAdjust If...
Learning rateHow fast the model updates weights1e-5 to 5e-5Validation loss plateaus early (too low) or spikes (too high)
EpochsNumber of full passes through training data2-4Validation loss starts rising (too many) or hasn't converged (too few)
Batch sizeExamples processed before a weight update4-16Training is unstable (increase) or you're running out of memory (decrease)

For API-based fine-tuning, the provider picks sensible defaults. For open-source, start with these values and adjust based on your loss curves.

Why LoRA Works

LoRA (Low-Rank Adaptation) is the technique that makes fine-tuning practical for large models. Instead of updating all 7 billion parameters, LoRA freezes the base model and trains small adapter matrices (typically 0.1-1% of total parameters).

The Intuition

A 7B parameter model has massive weight matrices. But the *change* you need for your task is low-rank — it can be represented by two small matrices multiplied together. LoRA exploits this:

Original weight: W (4096 x 4096)
LoRA adapters: A (4096 x 16) * B (16 x 4096)
Fine-tuned weight: W + A*B

Instead of training 16.7M parameters in that layer, you train 131K. Multiply that across all targeted layers, and you're training ~10M parameters instead of 7B.

Which Layers to Target

  • Attention layers (q_proj, v_proj) — Almost always. These control what the model "pays attention to."
  • All linear layers — More expensive but captures more nuance. Use when attention-only doesn't hit your quality bar.
  • Rank (`r`) — Start with 16. Increase to 32 or 64 if the model isn't learning enough.
  • QLoRA adds quantization (4-bit) to LoRA, cutting memory usage by 4x. A 7B model that normally needs 28GB of VRAM can be fine-tuned on a single 8GB GPU.

    Monitoring Training

    The most important chart during fine-tuning is training loss vs validation loss over time.

    Loss
     |
     |\
     | \          Training loss (should go down)
     |  \___________
     |   \
     |    \____     Validation loss (should go down then flatten)
     |         \____________________
     |
     +----------------------------> Steps

    What to Watch For

  • Both losses decreasing — Good. Keep training.
  • Training loss decreasing, validation loss increasing — Overfitting. Stop training, use the earlier checkpoint.
  • Both losses flat from the start — Learning rate too low, or your data isn't teaching the model anything new.
  • Loss spikes — Learning rate too high, or a batch of bad data.
  • Checkpointing and Early Stopping

    Save a checkpoint every N steps (typically every 100-500 steps or every epoch). If validation loss hasn't improved for 2 consecutive checkpoints, stop training and use the best checkpoint.

    API providers handle this automatically. For open-source training:

    from transformers import TrainingArguments
    
    training_args = TrainingArguments(
        output_dir="./checkpoints",
        evaluation_strategy="steps",
        eval_steps=100,
        save_strategy="steps",
        save_steps=100,
        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
    )

    Cost Breakdown

    The total cost formula for API-based fine-tuning:

    Training cost = (total_tokens / 1,000) x price_per_1K x epochs

    For the Sales Companion with 500 examples (~400K tokens) on GPT-4o mini:

  • Training: 400K tokens x 3 epochs x $8/1M tokens = ~$9.60
  • Validation: included (runs on your 10% split automatically)
  • Fine-tuned inference costs ~1.5-2x base model rates. For a Sales Companion handling 1,000 queries/day at ~500 tokens/query, that's roughly $1-2/day in additional inference cost. The quality improvement usually justifies this many times over.

    This is chapter 3 of Fine-tuning for Enterprise AI.

    Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.

    View course details