RAGA2024 PROJECT SHOWCASE
RAGA: Retrieval Augmented Generation ApplicationA project developed over the course of 2024 that competed in the Google Gemini API competition.RAG (Retrieval-Augmented Generation) is a method of customizing AI by giving it access to external knowledge and tools.This project demonstrates how multimodal data—text, audio, images, and video—can be transformed into a library you can learn and pull information from by asking about it in natural language.
Development Timeline
The app journey from concept to competition to what's coming next.
Individual Research
Period of using, testing and developing with the emerging LLM technologies like ChatGPT and RAG.
Team Formation
Team formed in Attleboro, MA.
Google Gemini API
Team received early access to the multimodel Google Gemini API key.
5 Day Sprint - RAGA-V1
Development of foundational application with core modalities: text, audio, image and video.
Joins Google Gemini API Developer Competition
Competing for over $1M prize money and a fully customized 1981 DeLorean.
Second Sprint - RAGA-V2
Upgraded UI and backend capabilities.
Google API Competition Submission
Submitted RAGATOULLIE (RAGA-V2) to the Google Gemini API competition.
Invited to Project Astra Trusted Testers Group
Received exclusive access to test upcoming Project Astra features.
Project Astra Testing & Competition Results
Started testing Project Astra. Competition closed and winners were announced the week of filming.
Google DeepMind reveals Project Astra on YouTube
Google DeepMind showcases the technology they're developing that has not yet been publically released included Project Astra.

$1 Million Prize Pool
Google Gemini API Developer Competition with custom electric DeLorean prize
Current Status
RAGATOULLIE did not win the competition but was a core piece of the Trusted Tester application for Project Astra.
Core Innovations
RAGA introduces several breakthrough technologies that set it apart from existing video analysis platforms.
A tree-based indexing system that organizes video content semantically, creating multi-level summaries that preserve contextual relationships between concepts. This allows for more efficient searching and thematic analysis across large datasets.
Integrates Google Gemini's video frame analysis with audio transcriptions from Whisper, providing a unified understanding of video content that correlates what is seen with what is said.
Adds closed captions to video frames, significantly improving context understanding and search accuracy while facilitating intuitive human evaluation of search results.
Built upon the GPT Researcher project, this system orchestrates specialized AI agents for complex, multi-step analysis and retrieval tasks.
Innovation Impact
These innovations combine to create a system that not only understands video content more deeply than traditional approaches but also enables entirely new ways of interacting with and extracting value from video libraries. The RAPTOR architecture in particular represents a significant advancement in how AI systems can organize and retrieve information from multimodal content.
Proof of Concept
Our approach to solving the challenge of integrating spoken content with visual context in video analysis.
Caption-Frame Integration
Our proof-of-concept approach focuses on overlaying closed captions onto video frames, creating a direct visual connection between spoken content and visual elements. This simple yet effective solution allows for:
- Immediate context association between what is said and what is shown
- Enhanced searchability across both visual and audio content
- Improved accessibility for users with hearing impairments
- Better training data for multimodal AI models
Technical Approach
We extract audio from videos using OpenAI Whisper for transcription, then process video frames with Gemini to understand visual content. RAPTOR clustering organizes this information hierarchically, allowing for semantic connections between spoken words and visual elements.

Example: Video frame with caption overlay showing spoken instructions about food packaging
Demonstrations
Explore RAGA's capabilities through our demonstration videos, UI mockups, and actual application screenshots.

Hall of fame chat screenshot prompt and answers.

The refined RAGA interface combines a conversational AI with comprehensive documentation and file handling capabilities. Users can ask questions about video content while accessing detailed information about the platform's capabilities and supported file formats.

One of RAGA's breakthrough features is the intelligent frame-caption pairing, which overlays transcribed text onto video frames. This example shows the system analyzing wedding vows, demonstrating how it connects spoken content with visual context for enhanced searchability.

A glimpse into the development process of RAGA, showing the team working on the "raga-blast-redux" codebase. The image captures real-time video processing, Python backend development, and team collaboration.
Development Process
A behind-the-scenes look at how RAGA was built, from planning and architecture to implementation and testing.
Sprint Planning Methodology
The RAGA team employed a structured but casual planning methodology with had clearly defined objectives and responsibilities for each team member. This approach allowed for accelerated knowledge sharing while promoting an organic play oriented development process.
Key Planning Elements
Weekly Recordings
Preparation before the sprint to unify the team spirit an chemistry.
Data Collection
Free flow recordings were collected..
Open Topic Exploration
The team embraced a flexible approach to content topics, covering philosophy, tech, AI, and science to ensure diverse testing scenarios.

March 2024 Sprint Planning
Detailed sprint planning notebook showing weekly objectives and team member assignments
Our Team
Meet the dedicated individuals behind RAGA who combined their diverse expertise to create this innovative video intelligence platform.

Kevin
Left
Josh
Center
Jake
Right
Kevin is a dedicated school counselor in New Hampshire with a diverse background in education and a passion for supporting students' holistic growth. Originally from Attleboro, Massachusetts, Kevin grew up on Tanglewood Drive alongside Josh Ogden.
Josh moved from Massachusetts to Los Angeles in 2016 to start his career as a property insurance underwriter. He has a passion for learning, entrepreneurship and organizing stuff
Jake has been a software developer throughout his entire career, specializing in healthcare-related SaaS companies. He's passionate about Web3 and its potential to give users true autonomy over their data while reimagining traditional institutions.
GitHub Repository
The RAGA codebase is currently maintained as a private repository while the team evaluates their plans for RAGA-v3.

RAGA v1: The initial version of the Retrieval Augmented Generation Application

RAGA v2: The enhanced version with improved multimodal analysis and RAPTOR integration