/WORK · Selected project ·

Chaptive:
Video Intelligence Explorer

Streamlit · Hugging Face · LLM

Animated walkthrough of Chaptive turning a YouTube lecture into bookmarks and Q&A.

Live demo

If the Space is sleeping, give it a moment to wake up — or open it in a new tab ↗.

Chaptive ingests trusted educational YouTube channels, builds Gemini-powered embeddings, and exposes grounded search, Q&A, bookmarks, and quizzes through a FastAPI backend running on AWS Lambda. The experience surfaces lecture knowledge in seconds across web and Hugging Face Space front-ends.

Problem Statement

Your YouTube Tutor Chatbot. Chaptive turns any long lecture into productive, structured learning so you can talk to the video like a tutor.

If you’ve ever:

…then Chaptive AI is for you. Educational videos are packed with value, but they’re hard to skim and easy to forget. Chaptive shrinks hours into minutes with sharp summaries, links every answer to the source moment, auto-builds chapters, and reinforces memory with instant quizzes and semantic search.


Solution Overview

Chaptive turns YouTube into a fast, interactive learning loop:

Under the hood we fetch transcripts, embed them with Gemini, store artifacts in S3, and serve everything via a FastAPI app on AWS Lambda + API Gateway managed by Terraform.


Key Features

  1. Paste a YouTube URL and kickstart an asynchronous ingestion pipeline: transcripts are fetched, chunked, embedded with Gemini, and cached while you keep browsing.
    Step 1 interface showing YouTube URL input, metadata preview, and Run button

    Drop in any YouTube URL to preview the channel metadata, video title, and kick off ingestion with a single Run.

    Cached video list with thumbnails and Use this video buttons

    Alternatively, pick from cached videos for instant reuse when you need quick answers.

  2. Generate video bookmarks with section titles, timestamps, and one-sentence recaps to turn a two-hour lecture into a scannable, hyperlinked timeline.
    Animated walkthrough of Chaptive AI bookmark generation

    Bookmark previews animate how Chaptive slices a lecture into timestamped chapters.

  3. Ask free-form questions and receive grounded answers backed by cited transcript snippets so fact-checking stays instant.
    Animated view of Chaptive Q&A highlighting grounded responses

    Grounded answers highlight the exact transcript lines that justified each response.

  4. Spin up quizzes and executive summaries using Gemini prompts tuned for study guides, with UI sliders for difficulty and length.
    Animated demo of quiz and summary sliders

    Adjust difficulty and summary length, then watch quizzes and recaps generate in seconds.

  5. Restrict usage to approved educational channels; off-list submissions return a friendly warning plus Terms-of-Use context to keep the dataset compliant.

High-Level Architecture

User flow and system architecture diagram for Chaptive AI

User submits a video, Lambda ingests and caches artifacts, front-ends query cached embeddings for real-time answers.


Client–Server Interaction Flow

  1. Client sends authenticated requests to API Gateway.
  2. API Gateway proxies to Lambda, where Mangum boots FastAPI.
  3. Ingestion pulls transcripts, chunks, embeds via Gemini, and stores artifacts (chunks.json, embeddings.npy, metadata) in S3.
  4. DynamoDB tracks job state, TTL, and user quotas.
  5. Search/QA requests reuse cached artifacts; only retrieval + Gemini generation run at request time.
  6. Responses propagate back through API Gateway while logs/quotas land in CloudWatch and S3.

Infrastructure as Code (Terraform)


System Design Goals & Constraints


Whitelisted YouTube Channels

Only pre-approved educational channels may be ingested to avoid redistribution of copyrighted material. The whitelist covers MIT OpenCourseWare, Stanford Online, Harvard, Yale Courses, Khan Academy, Coursera, edX, Udacity, CrashCourse, TED, TED-Ed, TEDx Talks, 3Blue1Brown, Numberphile, Computerphile, SciShow, Veritasium, MinutePhysics, Programming with Mosh, and freeCodeCamp.org. Any other channel returns HTTP 403 together with the approved list for transparency.


Data Model


Backend Execution Model (AWS Lambda + FastAPI)


Performance, Latency & Scaling


Concurrency, Rate Limiting & Security


Idempotency & Observability


Technology Stack

Layer Tools
Frontend clientsHugging Face Space, Streamlit, or any HTTPS app
BackendFastAPI, Mangum, Pydantic, AnyIO
Transcriptsyoutube_transcript_api, Whisper + FFmpeg fallback
Embeddings & LLMGemini + Gemini Flash
StorageAWS S3 (artifacts + Lambda ZIP), DynamoDB (jobs)
DeploymentTerraform, AWS Lambda, API Gateway, CloudWatch Logs

Deployment & Packaging (S3 ZIP Workflow)

  1. Run ./scripts/package_lambda.sh to build dist/chaptive-api.zip (manylinux/ARM64 container, pinned wheels, bundled nltk_data).
  2. Upload the ZIP to S3 or whichever artifact bucket Terraform references.
  3. Apply Terraform to refresh S3, DynamoDB, IAM, Lambda, and API Gateway; the stack pulls the new ZIP and updates env vars.

API Reference & Usage

Method & Path Description Request Schema Response Schema
POST /videos/processQueue ingestion for a YouTube URL.Query param url.ProcessAccepted (job id + video id).
GET /videos/process/{job_id}Poll ingestion status.Path job_id.ProcessStatus (state, stats, message).
GET /videos/{video_id}/bookmarksRetrieve inferred sections.Query min_sections, max_sections.List<Bookmark>.
POST /videos/{video_id}/qaAsk a grounded question.QARequest (query, limit).QAResponse (answer + sources).
GET /videos/{video_id}/summarySummarize transcript.Query max_words.SummaryResponse.
GET /videos/{video_id}/quizGenerate quiz items.Query num_questions, style.List<QuizItem>.
POST /searchSemantic search across cached chunks.SearchRequest (query, video_id, limit).List<SearchResult>.
GET /healthHealth probe.None.{"status":"ok"}.

Improvement Opportunities

  1. Graduate ingestion from ad-hoc Lambda calls to Step Functions + SQS fan-out so multiple long videos process concurrently without throttling the API.
  2. Layer a managed vector database (Qdrant or pgvector) on top of the S3 cache to enable cross-video retrieval and personalization.
  3. Add multilingual transcript normalization and translation before embedding so non-English lectures share the same Gemini prompts.
  4. Build automated regression evaluation (TruLens, Ragas, etc.) to score answer quality, hallucination risk, and latency on every deploy.
  5. Ship proactive notifications (email, Slack, webhooks) when ingest jobs finish so Hugging Face Space users can leave the page while content backfills.

Repository & References

The demo is deployed in Hugging Face Spaces. For infrastructure walkthroughs, see the GitHub repository. For the complete project source, please reach out to me.

Keep in touch

Contact

Email me

Based in
Singapore