RAGA Logo

RAGA2024 PROJECT SHOWCASE

RAGA: Retrieval Augmented Generation ApplicationA project developed over the course of 2024 that competed in the Google Gemini API competition.RAG (Retrieval-Augmented Generation) is a method of customizing AI by giving it access to external knowledge and tools.This project demonstrates how multimodal data—text, audio, images, and video—can be transformed into a library you can learn and pull information from by asking about it in natural language.

Development Timeline

The app journey from concept to competition to what's coming next.

2023

Individual Research

Period of using, testing and developing with the emerging LLM technologies like ChatGPT and RAG.

January 2024

Team Formation

Team formed in Attleboro, MA.

March 2024

Google Gemini API

Team received early access to the multimodel Google Gemini API key.

April 2024

5 Day Sprint - RAGA-V1

Development of foundational application with core modalities: text, audio, image and video.

May 2024

Joins Google Gemini API Developer Competition

Competing for over $1M prize money and a fully customized 1981 DeLorean.

July 2024

Second Sprint - RAGA-V2

Upgraded UI and backend capabilities.

August 2024

Google API Competition Submission

Submitted RAGATOULLIE (RAGA-V2) to the Google Gemini API competition.

October 2024

Invited to Project Astra Trusted Testers Group

Received exclusive access to test upcoming Project Astra features.

November 2024

Project Astra Testing & Competition Results

Started testing Project Astra. Competition closed and winners were announced the week of filming.

December 2024

Google DeepMind reveals Project Astra on YouTube

Google DeepMind showcases the technology they're developing that has not yet been publically released included Project Astra.

Google Gemini API Developer Competition
Innovation
Competition
Gemini API

$1 Million Prize Pool

Google Gemini API Developer Competition with custom electric DeLorean prize

Current Status

Competition Closed

RAGATOULLIE did not win the competition but was a core piece of the Trusted Tester application for Project Astra.

Core Innovations

RAGA introduces several breakthrough technologies that set it apart from existing video analysis platforms.

RAPTOR Hierarchical Clustering

A tree-based indexing system that organizes video content semantically, creating multi-level summaries that preserve contextual relationships between concepts. This allows for more efficient searching and thematic analysis across large datasets.

Enhanced Multimodal Analysis

Integrates Google Gemini's video frame analysis with audio transcriptions from Whisper, providing a unified understanding of video content that correlates what is seen with what is said.

Intelligent Frame-Caption Pairing

Adds closed captions to video frames, significantly improving context understanding and search accuracy while facilitating intuitive human evaluation of search results.

Advanced DAG-based Agent System

Built upon the GPT Researcher project, this system orchestrates specialized AI agents for complex, multi-step analysis and retrieval tasks.

Innovation Impact

These innovations combine to create a system that not only understands video content more deeply than traditional approaches but also enables entirely new ways of interacting with and extracting value from video libraries. The RAPTOR architecture in particular represents a significant advancement in how AI systems can organize and retrieve information from multimodal content.

Proof of Concept

Our approach to solving the challenge of integrating spoken content with visual context in video analysis.

Caption-Frame Integration

Our proof-of-concept approach focuses on overlaying closed captions onto video frames, creating a direct visual connection between spoken content and visual elements. This simple yet effective solution allows for:

  • Immediate context association between what is said and what is shown
  • Enhanced searchability across both visual and audio content
  • Improved accessibility for users with hearing impairments
  • Better training data for multimodal AI models

Technical Approach

We extract audio from videos using OpenAI Whisper for transcription, then process video frames with Gemini to understand visual content. RAPTOR clustering organizes this information hierarchically, allowing for semantic connections between spoken words and visual elements.

Caption-Frame Integration Example showing refrigerator with synchronized captions

Example: Video frame with caption overlay showing spoken instructions about food packaging

Demonstrations

Explore RAGA's capabilities through our demonstration videos, UI mockups, and actual application screenshots.

RAGATOULLIE v1: Early Concept
Alpha
RAGATOULLIE v1 conversation interface showing early concept discussions

Hall of fame chat screenshot prompt and answers.

Early Prototype
Concept Development
Mission Statement
RAGA v2: Production UI
Beta
RAGA v2 user interface showing the chat and documentation panels

The refined RAGA interface combines a conversational AI with comprehensive documentation and file handling capabilities. Users can ask questions about video content while accessing detailed information about the platform's capabilities and supported file formats.

Production UI
File Support
Documentation
Caption-Frame Integration
Core Feature
RAGA caption-frame integration showing wedding vows transcription

One of RAGA's breakthrough features is the intelligent frame-caption pairing, which overlays transcribed text onto video frames. This example shows the system analyzing wedding vows, demonstrating how it connects spoken content with visual context for enhanced searchability.

Multimodal Analysis
Speech Recognition
Context Understanding
Development Environment
Behind the Scenes
RAGA development environment showing code and video processing

A glimpse into the development process of RAGA, showing the team working on the "raga-blast-redux" codebase. The image captures real-time video processing, Python backend development, and team collaboration.

Flask Backend
Python Development
Video Processing

Development Process

A behind-the-scenes look at how RAGA was built, from planning and architecture to implementation and testing.

Sprint Planning Methodology

The RAGA team employed a structured but casual planning methodology with had clearly defined objectives and responsibilities for each team member. This approach allowed for accelerated knowledge sharing while promoting an organic play oriented development process.

Key Planning Elements

  • Weekly Recordings

    Preparation before the sprint to unify the team spirit an chemistry.

  • Data Collection

    Free flow recordings were collected..

  • Open Topic Exploration

    The team embraced a flexible approach to content topics, covering philosophy, tech, AI, and science to ensure diverse testing scenarios.

Team sprint planning notebook from March 2024

March 2024 Sprint Planning

Detailed sprint planning notebook showing weekly objectives and team member assignments

Our Team

Meet the dedicated individuals behind RAGA who combined their diverse expertise to create this innovative video intelligence platform.

The RAGATOULLIE team - Kevin, Josh, and Jake

Kevin

Left

Josh

Center

Jake

Right

Kevin Papargiris
School Counselor

Kevin is a dedicated school counselor in New Hampshire with a diverse background in education and a passion for supporting students' holistic growth. Originally from Attleboro, Massachusetts, Kevin grew up on Tanglewood Drive alongside Josh Ogden.

Josh Ogden
Insurance Underwriter & Developer

Josh moved from Massachusetts to Los Angeles in 2016 to start his career as a property insurance underwriter. He has a passion for learning, entrepreneurship and organizing stuff

Jake Eid
Software Developer

Jake has been a software developer throughout his entire career, specializing in healthcare-related SaaS companies. He's passionate about Web3 and its potential to give users true autonomy over their data while reimagining traditional institutions.

GitHub Repository

The RAGA codebase is currently maintained as a private repository while the team evaluates their plans for RAGA-v3.

GitHub
joshogden360/raga-v1
Public

RAGA v1: The initial version of the Retrieval Augmented Generation Application

machine-learning
rag
retrieval
GitHub
joshogden360/raga-v2
Public

RAGA v2: The enhanced version with improved multimodal analysis and RAPTOR integration

machine-learning
gemini-api
raptor
video-intelligence