Interactive Preview
See It in Action
By the Numbers
5,000+
Training Pairs
Generated via GPT-4o teacher model across 6 categories
200
Benchmark Questions
SkyBench evaluation suite with GPT-4o judge
6
Aircraft Manuals
POH manuals parsed and structured
10×
Cost Reduction
vs. Gemini API with comparable accuracy
Overview
SkyForge is being built as the ML backbone of FlightReady AI. I'm building the pipeline from data ingestion (6 aircraft POH manuals, FAA handbooks, 461 emergency procedures) to training (QLoRA rank-64 adapters) to evaluation (SkyBench, a 200-question benchmark scored by GPT-4o). Currently using Gemini as the primary LLM while SkyForge is in development.
Architecture
Pattern
End-to-End ML Pipeline
DI Strategy
Config-driven with YAML hyperparameters
AI Integrations
Backend
Modal serverless + vLLM inference
Testing
SkyBench 200-question benchmark
Key Features
What I Built
End-to-End Training Pipeline
Data ingestion from PDFs → GPT-4o teacher model dataset generation → QLoRA fine-tuning → SkyBench evaluation → Modal deployment. Fully automated.
QLoRA Fine-Tuning
4-bit NF4 quantization, rank-64 LoRA adapters targeting all 7 attention+MLP components. Flash Attention 2, sequence packing, gradient checkpointing on A100.
SkyBench Evaluation Suite
200 questions across 6 domains (emergency, W&B, regulations, systems, weather, decision-making). GPT-4o judge scores correctness, completeness, hallucination, and specificity.
Planned Production Serving
Targeting OpenAI-compatible API via vLLM on Modal with auto-scaling. Will serve as a drop-in replacement for Gemini in FlightReady AI's production app.
Performance
Optimizations
4-bit quantization reduces memory by ~75%
Sequence packing enables effective batch size of 16 on single A100
Target <500ms TTFT, <2s full response
10x cost reduction vs. Gemini API
Tech Stack
Interested in learning more?
Check out the live project or get in touch to discuss the technical details.