A high-performance backend for codebase retrieval augmented generation (RAG) that allows you to ask natural language questions about your codebase and get accurate, contextual answers.
English | 简体中文
- Superior Performance: Outperforms existing open-source alternatives with faster retrieval and more accurate results
- Semantic Code Understanding: Leverages advanced embeddings to understand code semantics
- Cross-Reference Awareness: Maintains awareness of relationships between files and functions
- Contextual Answers: Provides answers with relevant code snippets and references
- Efficient Caching: Smart caching system minimizes redundant processing
- Python 3.10+
- Git
- Clone the repository:
git clone https://github.com/yourusername/codebase-rag.git
cd codebase-rag- Install dependencies:
pip install -r requirements.txtpython src/main.py localpython src/main.py webserverpython src/main.py local --refresh_cachepython src/main.py local --config /path/to/configEdit the src/config.yaml file to customize:
- Codebase path
- Webserver parameters
- LLM parameters
CodebaseRAG consists of several key components:
- Code Parser (
codeparser.py): Analyzes and extracts structured information from source code - Embeddings Engine (
embeddings.py): Creates semantic representations of code - Vector Store (
vector_store.py): Efficiently indexes and retrieves relevant code snippets - LLM Interface (
llm.py): Generates human-readable answers from retrieved context - Caching Layer (
cache.py): Optimizes performance through intelligent caching
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.