RAGA2024 PROJECT SHOWCASE

RAGA: Retrieval Augmented Generation ApplicationA project developed over the course of 2024 that competed in the Google Gemini API competition.RAG (Retrieval-Augmented Generation) is a method of customizing AI by giving it access to external knowledge and tools.This project demonstrates how multimodal data—text, audio, images, and video—can be transformed into a library you can learn and pull information from by asking about it in natural language.

Development Timeline

The app journey from concept to competition to what's coming next.

2023•

Individual Research

Period of using, testing and developing with the emerging LLM technologies like ChatGPT and RAG.

January 2024•

Team Formation

Team formed in Attleboro, MA.

March 2024•

Google Gemini API

Team received early access to the multimodel Google Gemini API key.

April 2024•

5 Day Sprint - RAGA-V1

Development of foundational application with core modalities: text, audio, image and video.

May 2024•

Joins Google Gemini API Developer Competition

Competing for over $1M prize money and a fully customized 1981 DeLorean.

July 2024•

Second Sprint - RAGA-V2

Upgraded UI and backend capabilities.

August 2024•

Google API Competition Submission

Submitted RAGATOULLIE (RAGA-V2) to the Google Gemini API competition.

October 2024•

Invited to Project Astra Trusted Testers Group

Received exclusive access to test upcoming Project Astra features.

November 2024•

Project Astra Testing & Competition Results

Started testing Project Astra. Competition closed and winners were announced the week of filming.

December 2024•

Google DeepMind reveals Project Astra on YouTube

Google DeepMind showcases the technology they're developing that has not yet been publically released included Project Astra.

Innovation

Competition

Gemini API

$1 Million Prize Pool

Google Gemini API Developer Competition with custom electric DeLorean prize

Current Status

Competition Closed

RAGATOULLIE did not win the competition but was a core piece of the Trusted Tester application for Project Astra.

Core Innovations

RAGA introduces several breakthrough technologies that set it apart from existing video analysis platforms.

RAPTOR Hierarchical Clustering

A tree-based indexing system that organizes video content semantically, creating multi-level summaries that preserve contextual relationships between concepts. This allows for more efficient searching and thematic analysis across large datasets.

Enhanced Multimodal Analysis

Integrates Google Gemini's video frame analysis with audio transcriptions from Whisper, providing a unified understanding of video content that correlates what is seen with what is said.

Intelligent Frame-Caption Pairing

Adds closed captions to video frames, significantly improving context understanding and search accuracy while facilitating intuitive human evaluation of search results.

Advanced DAG-based Agent System

Built upon the GPT Researcher project, this system orchestrates specialized AI agents for complex, multi-step analysis and retrieval tasks.

Innovation Impact

These innovations combine to create a system that not only understands video content more deeply than traditional approaches but also enables entirely new ways of interacting with and extracting value from video libraries. The RAPTOR architecture in particular represents a significant advancement in how AI systems can organize and retrieve information from multimodal content.

Proof of Concept

Our approach to solving the challenge of integrating spoken content with visual context in video analysis.

Caption-Frame Integration

Our proof-of-concept approach focuses on overlaying closed captions onto video frames, creating a direct visual connection between spoken content and visual elements. This simple yet effective solution allows for:

Immediate context association between what is said and what is shown
Enhanced searchability across both visual and audio content
Improved accessibility for users with hearing impairments
Better training data for multimodal AI models

Technical Approach

We extract audio from videos using OpenAI Whisper for transcription, then process video frames with Gemini to understand visual content. RAPTOR clustering organizes this information hierarchically, allowing for semantic connections between spoken words and visual elements.

Caption-Frame Integration Example showing refrigerator with synchronized captions

Example: Video frame with caption overlay showing spoken instructions about food packaging

Demonstrations

Explore RAGA's capabilities through our demonstration videos, UI mockups, and actual application screenshots.

RAGATOULLIE v1: Early Concept

Alpha

RAGATOULLIE v1 conversation interface showing early concept discussions

Hall of fame chat screenshot prompt and answers.

Early Prototype

Concept Development

Mission Statement

RAGA v2: Production UI

Beta

RAGA v2 user interface showing the chat and documentation panels

The refined RAGA interface combines a conversational AI with comprehensive documentation and file handling capabilities. Users can ask questions about video content while accessing detailed information about the platform's capabilities and supported file formats.

Production UI

File Support

Documentation

Caption-Frame Integration

Core Feature

One of RAGA's breakthrough features is the intelligent frame-caption pairing, which overlays transcribed text onto video frames. This example shows the system analyzing wedding vows, demonstrating how it connects spoken content with visual context for enhanced searchability.

Multimodal Analysis

Speech Recognition

Context Understanding

Development Environment

Behind the Scenes

A glimpse into the development process of RAGA, showing the team working on the "raga-blast-redux" codebase. The image captures real-time video processing, Python backend development, and team collaboration.

Flask Backend

Python Development

Video Processing

Development Process

A behind-the-scenes look at how RAGA was built, from planning and architecture to implementation and testing.

Sprint Planning Methodology

The RAGA team employed a structured but casual planning methodology with had clearly defined objectives and responsibilities for each team member. This approach allowed for accelerated knowledge sharing while promoting an organic play oriented development process.

Key Planning Elements

Weekly Recordings
Preparation before the sprint to unify the team spirit an chemistry.
Data Collection
Free flow recordings were collected..
Open Topic Exploration
The team embraced a flexible approach to content topics, covering philosophy, tech, AI, and science to ensure diverse testing scenarios.

Team sprint planning notebook from March 2024

March 2024 Sprint Planning

Detailed sprint planning notebook showing weekly objectives and team member assignments

Our Team

Meet the dedicated individuals behind RAGA who combined their diverse expertise to create this innovative video intelligence platform.

Kevin

Left

Josh

Center

Jake

Right

Kevin Papargiris

School Counselor

Kevin is a dedicated school counselor in New Hampshire with a diverse background in education and a passion for supporting students' holistic growth. Originally from Attleboro, Massachusetts, Kevin grew up on Tanglewood Drive alongside Josh Ogden.

Josh Ogden

Insurance Underwriter & Developer

Josh moved from Massachusetts to Los Angeles in 2016 to start his career as a property insurance underwriter. He has a passion for learning, entrepreneurship and organizing stuff

Jake Eid

Software Developer

Jake has been a software developer throughout his entire career, specializing in healthcare-related SaaS companies. He's passionate about Web3 and its potential to give users true autonomy over their data while reimagining traditional institutions.

GitHub Repository

The RAGA codebase is currently maintained as a private repository while the team evaluates their plans for RAGA-v3.

joshogden360/raga-v1

Public

RAGA v1: The initial version of the Retrieval Augmented Generation Application

machine-learning

rag

retrieval

joshogden360/raga-v2

Public

RAGA v2: The enhanced version with improved multimodal analysis and RAPTOR integration

machine-learning

gemini-api

raptor

video-intelligence

RAGA2024 PROJECT SHOWCASE

Development Timeline

Individual Research

Team Formation

Google Gemini API

5 Day Sprint - RAGA-V1

Joins Google Gemini API Developer Competition

Second Sprint - RAGA-V2

Google API Competition Submission

Invited to Project Astra Trusted Testers Group

Project Astra Testing & Competition Results

Google DeepMind reveals Project Astra on YouTube

$1 Million Prize Pool

Current Status

Core Innovations

Innovation Impact

Proof of Concept

Caption-Frame Integration

Technical Approach

Demonstrations

Development Process

Sprint Planning Methodology

Key Planning Elements

Weekly Recordings

Data Collection

Open Topic Exploration

March 2024 Sprint Planning

Our Team

GitHub Repository