LLM Fine-tuning Course

Live

Fine-tuning a 3B language model end-to-end on the HF Hub: SFT, DPO, and a vision-language sidetrack

SmolLM3-3B

Base Model

SFT + DPO

Methods

Adapters Shipped

~$12

Total Compute

Live Demo GitHub

Overview

I worked through Hugging Face's Smol Fine-Tuning Language Models course and shipped a preference-aligned small model to the Hub. SmolLM3-3B-Base taken through SFT on 12k summarization examples, then DPO on 12k preference pairs, with DPO continuing to train the same LoRA rather than starting fresh (and the pre-DPO state frozen as the reference policy). A SmolVLM2-2.2B ChartQA adapter sits alongside as a vision-language sidetrack, where LoRA adapts the LLM while the SigLIP vision encoder stays frozen. Four LoRA adapters published, all reproducible from the public code.

Screenshots

Tech Stack

PyTorchHugging Face TRLPEFT (LoRA)HF Jobs (A100/A10G)Python 3.12uvTrackio