🤖 Complete Guide: Local AI Coding Agents on RTX 3060

A comprehensive setup guide for running AI coding assistants locally on your RTX 3060 with 16GB RAM and 100GB SSD

📋 Table of Contents

Hardware Assessment
Quick Start Recommendations
Option 1: Ollama + Continue.dev
Option 2: LM Studio + Continue.dev
Option 3: Tabby Self-Hosted
Option 4: Additional Tools
Model Recommendations
Performance Optimization
Storage Management
Troubleshooting Guide
Advanced Configuration
Community Resources
Official Links & Documentation

🔧 Hardware Assessment

Your RTX 3060 Setup Analysis

GPU: RTX 3060 12GB VRAM ✅ Excellent for 7B models
RAM: 16GB System RAM ✅ Sufficient for local AI
Storage: 100GB SSD ⚠️ Tight but manageable

What You Can Run

Model Size	Performance	VRAM Usage	Recommended
7B Models	Excellent	6-8GB	✅ Best choice
13B Models	Good (Q4)	8-10GB	✅ With optimization
20B+ Models	Poor/Impossible	12GB+	❌ Not recommended

🚀 Quick Start Recommendations

For Beginners: LM Studio + Continue.dev

GUI-based setup
No command line required
Easy model management
Setup time: 15-20 minutes

For Advanced Users: Ollama + Continue.dev

Command line control
Better performance
More flexibility
Setup time: 10-15 minutes

For Enterprise: Tabby

Self-hosted GitHub Copilot alternative
Professional features
Multi-IDE support
Setup time: 30-45 minutes

🦙 Option 1: Ollama + Continue.dev (Recommended)

Why This Combination?

Free and open source
Excellent performance on RTX 3060
Regular model updates
Strong community support
Works offline

Step 1: Install Ollama

Windows Installation

Download Ollama
- Visit: https://ollama.com/download
- Download Windows installer
- Run installer as administrator
Verify Installation
```
ollama --version
```
Start Ollama Service
```
ollama serve
```
Keep this terminal open

Step 2: Download Coding Models

Best Models for RTX 3060

# Primary coding model (Recommended)
ollama pull deepseek-coder:6.7b

# Alternative excellent options
ollama pull qwen2.5-coder:7b
ollama pull codellama:7b
ollama pull starcoder2:7b

# For specific languages
ollama pull deepseek-coder:6.7b-instruct  # Better for chat
ollama pull codellama:7b-python           # Python specialist

Model Download Sizes

deepseek-coder:6.7b → ~4.1GB
qwen2.5-coder:7b → ~4.4GB
codellama:7b → ~3.8GB
starcoder2:7b → ~4.0GB

Step 3: Install Continue.dev

VS Code Installation

Open VS Code
Go to Extensions (Ctrl+Shift+X)
Search: "Continue"
Install: Continue extension by Continue
Restart VS Code

JetBrains Installation

Open your JetBrains IDE
Go to: File → Settings → Plugins
Search: "Continue"
Install and restart

Step 4: Configure Continue.dev

Open Continue Panel
- VS Code: Click Continue icon in sidebar
- Or press Ctrl+Shift+P → "Continue: Open"

Configure Ollama Connection

Click gear icon ⚙️
Add configuration:

{
  "models": [
    {
      "title": "DeepSeek Coder",
      "provider": "ollama",
      "model": "deepseek-coder:6.7b",
      "apiBase": "http://localhost:11434"
    }
  ]
}

Test Connection
- Type a coding question in Continue chat
- Should respond within 5-10 seconds

Step 5: Usage Examples

Code Completion

Type code and press Tab for suggestions
Use Ctrl+I for inline editing

Chat Examples

"Explain this function"
"Add error handling to this code"
"Convert this to TypeScript"
"Write unit tests for this function"
"Optimize this algorithm"

🖥️ Option 2: LM Studio + Continue.dev (Beginner Friendly)

Why LM Studio?

User-friendly GUI
No command line needed
Built-in model browser
Easy model management
Real-time performance monitoring

Step 1: Install LM Studio

Download LM Studio
- Visit: https://lmstudio.ai
- Click "Download"
- Choose Windows version
- Install and launch
System Requirements Check
- LM Studio will automatically detect your GPU
- Verify RTX 3060 is recognized

Step 2: Download Models via LM Studio

Browse Models
- Click "Discover" tab
- Search for coding models
Recommended Downloads
- DeepSeek Coder 6.7B Instruct Q4_K_M ← Best choice
- Code Llama 7B Instruct Q4_K_M
- Qwen2.5 Coder 7B Instruct Q4_K_M
Download Process
- Click download button
- Monitor download progress
- Models saved automatically

Step 3: Load and Test Model

Load Model
- Go to "Chat" tab
- Select downloaded model
- Click "Load Model"
- Wait for loading (30-60 seconds)

Test Model

Test prompt: "Write a Python function to calculate fibonacci numbers"

Monitor Performance
- Check GPU utilization
- Note tokens/second speed
- Verify VRAM usage < 11GB

Step 4: Start Local Server

Enable Server
- Go to "Local Server" tab
- Click "Start Server"
- Note the server URL (usually http://localhost:1234)
Server Settings
- Port: 1234 (default)
- CORS: Enabled
- API: OpenAI Compatible

Step 5: Connect Continue.dev

Install Continue.dev (same as Option 1)

Configure for LM Studio

{
  "models": [
    {
      "title": "DeepSeek Coder (LM Studio)",
      "provider": "openai",
      "model": "deepseek-coder",
      "apiBase": "http://localhost:1234/v1",
      "apiKey": "not-needed"
    }
  ]
}

🏢 Option 3: Tabby Self-Hosted (Enterprise Grade)

Why Tabby?

GitHub Copilot alternative
Self-hosted and private
Multi-IDE support
Team collaboration features
Enterprise security

Prerequisites

Docker Desktop installed
NVIDIA Container Toolkit
Basic Docker knowledge

Step 1: Install Docker & NVIDIA Support

Install Docker Desktop

Download from: https://www.docker.com/products/docker-desktop/
Install and restart computer
Enable WSL 2 backend

Install NVIDIA Container Toolkit

# Run in PowerShell as Administrator
# Download and install NVIDIA Container Toolkit
# Follow: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

Step 2: Run Tabby Server

Basic Setup

docker run -it --gpus all \
  -p 8080:8080 \
  -v ~/.tabby:/data \
  tabbyml/tabby serve \
  --model TabbyML/DeepSeekCoder-6.7B \
  --device cuda

Advanced Setup with Persistence

# Create data directory
mkdir -p ~/tabby-data

# Run with persistent storage
docker run -d \
  --name tabby-server \
  --gpus all \
  -p 8080:8080 \
  -v ~/tabby-data:/data \
  tabbyml/tabby serve \
  --model TabbyML/DeepSeekCoder-6.7B \
  --device cuda \
  --host 0.0.0.0

Step 3: Install IDE Extensions

VS Code

Search "Tabby" in extensions
Install "Tabby" by TabbyML
Configure server URL: http://localhost:8080

JetBrains IDEs

Go to Plugins
Search "Tabby"
Install and configure

Other IDEs

Vim/Neovim: https://github.com/TabbyML/tabby/tree/main/clients/vim
Emacs: https://github.com/TabbyML/tabby/tree/main/clients/emacs
Sublime Text: Available via Package Control

Step 4: Web Interface

Access Dashboard
- Open: http://localhost:8080
- Create admin account
- Configure settings
Monitor Usage
- View completion statistics
- Monitor model performance
- Manage users (if team setup)

🛠️ Option 4: Additional Tools

OpenInterpreter

AI that runs code locally

Installation

pip install open-interpreter

Usage with Ollama

interpreter --local --model ollama/deepseek-coder:6.7b

Features

Executes code in real-time
File system access
Terminal integration
Multi-language support

Aider

AI pair programmer

Installation

pip install aider-chat

Usage

# With Ollama
aider --model ollama/deepseek-coder:6.7b

# With LM Studio
aider --model openai/deepseek-coder --api-base http://localhost:1234/v1

Features

Git integration
Automatic commits
Multi-file editing
Code review assistance

CodeGPT

VS Code extension for multiple providers

Installation

Install "CodeGPT" extension in VS Code
Configure for local models
Supports Ollama, LM Studio, and custom APIs

Cursor (Alternative IDE)

AI-first code editor

Features

Built-in AI chat
Code generation
Natural language editing
Can connect to local models

🎯 Model Recommendations for RTX 3060 12GB

Tier 1: Excellent Performance (6-7B Models)

DeepSeek Coder 6.7B ⭐ Best Overall

Size: 4.1GB
Strengths: Excellent code generation, multiple languages
VRAM: 6-7GB
Speed: 15-25 tokens/sec
Download: ollama pull deepseek-coder:6.7b

Qwen2.5 Coder 7B ⭐ Best for Multiple Languages

Size: 4.4GB
Strengths: Great multilingual support, fast inference
VRAM: 7-8GB
Speed: 12-20 tokens/sec
Download: ollama pull qwen2.5-coder:7b

Code Llama 7B ⭐ Best for Python

Size: 3.8GB
Strengths: Excellent Python support, good documentation
VRAM: 6-7GB
Speed: 10-18 tokens/sec
Download: ollama pull codellama:7b

Tier 2: Good Performance (13B Models with Optimization)

DeepSeek Coder 13B (Q4 Quantization)

Size: 7.2GB
Strengths: Better reasoning, more context
VRAM: 9-10GB
Speed: 8-12 tokens/sec
Download: ollama pull deepseek-coder:13b-instruct-q4_k_m

Code Llama 13B (Q4 Quantization)

Size: 7.8GB
Strengths: Better code understanding
VRAM: 9-11GB
Speed: 6-10 tokens/sec
Download: ollama pull codellama:13b-instruct-q4_k_m

Specialized Models

StarCoder2 7B (Code Completion Specialist)

Size: 4.0GB
Strengths: Excellent autocomplete, fast
Use case: Code completion only
Download: ollama pull starcoder2:7b

Magicoder 7B (Problem Solving)

Size: 4.2GB
Strengths: Good at algorithmic problems
Use case: Competitive programming
Download: ollama pull magicoder:7b-s-cl-q4_k_m

Model Comparison Table

Model	Size	VRAM	Speed	Code Quality	Languages	Best For
DeepSeek Coder 6.7B	4.1GB	6-7GB	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	General coding
Qwen2.5 Coder 7B	4.4GB	7-8GB	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	Multilingual
Code Llama 7B	3.8GB	6-7GB	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	Python/C++
DeepSeek Coder 13B	7.2GB	9-10GB	⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	Complex tasks
StarCoder2 7B	4.0GB	6-7GB	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	Autocomplete

⚡ Performance Optimization

GPU Optimization

NVIDIA Control Panel Settings

Open NVIDIA Control Panel
3D Settings → Manage 3D Settings
Power Management: Prefer Maximum Performance
CUDA - GPUs: Use all available GPUs

Windows GPU Scheduling

Settings → System → Display
Graphics Settings
Enable Hardware-accelerated GPU scheduling
Restart computer

System Optimization

Power Settings

# Set high performance power plan
powercfg /setactive 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c

Memory Management

Close unnecessary applications
Disable startup programs
Use Task Manager to monitor RAM usage
Keep 4-6GB RAM free for system

Model Loading Optimization

Ollama Configuration

Create ~/.ollama/config.json:

{
  "gpu_memory_fraction": 0.9,
  "num_ctx": 4096,
  "num_gpu": 1,
  "num_thread": 8
}

LM Studio Settings

GPU Offload: 100% (if VRAM allows)
Context Length: 4096 tokens
Batch Size: 512
Thread Count: 8

Performance Monitoring

GPU Monitoring Tools

# NVIDIA System Management Interface
nvidia-smi

# Continuous monitoring
nvidia-smi -l 1

# Memory usage
nvidia-smi --query-gpu=memory.used,memory.total --format=csv

Performance Benchmarks

Test with this prompt:

"Write a Python function that implements a binary search algorithm with error handling and documentation"

Expected Performance:

7B models: 15-25 tokens/second
13B models: 8-15 tokens/second
Response time: 2-5 seconds for first token

💾 Storage Management (100GB SSD)

Storage Allocation Strategy

Recommended Distribution

Total: 100GB SSD
├── Windows + System: 40GB
├── Development Tools: 15GB
│   ├── VS Code: 1GB
│   ├── Git: 500MB
│   ├── Python/Node: 3GB
│   ├── Docker: 5GB
│   └── Other IDEs: 5.5GB
├── AI Models: 25GB
│   ├── Ollama models: 15GB
│   ├── LM Studio models: 10GB
│   └── Model cache: Variable
├── Project Files: 10GB
└── Free Space: 10GB (minimum)

Model Storage Optimization

Ollama Model Management

# List installed models
ollama list

# Remove unused models
ollama rm model_name

# Check model sizes
ollama show model_name

# Model storage location
# Windows: C:\Users\{username}\.ollama\models

LM Studio Model Management

Location: C:\Users\{username}\.cache\lm-studio\models
Cleanup: Use LM Studio's built-in cleanup tool
External Storage: Move models to external drive if needed

Storage Monitoring

Check Disk Usage

# Check drive space
dir C:\ /-c

# Detailed folder sizes
powershell "Get-ChildItem C:\ | Measure-Object -Property Length -Sum"

Cleanup Strategies

Regular model cleanup
Clear browser cache
Remove old Docker images
Use Windows Disk Cleanup
Move projects to external storage

External Storage Options

For Model Storage

USB 3.0+ External Drive: 500GB+ recommended
Network Attached Storage (NAS)
Cloud storage for backups only

Symlink Setup (Advanced)

# Move models to external drive
move "C:\Users\{username}\.ollama\models" "E:\AI_Models\ollama"

# Create symbolic link
mklink /D "C:\Users\{username}\.ollama\models" "E:\AI_Models\ollama"

🔧 Troubleshooting Guide

Common Issues & Solutions

Issue 1: "CUDA Out of Memory"

Symptoms: Model fails to load, CUDA errors Solutions:

# Use smaller model
ollama pull deepseek-coder:6.7b  # Instead of 13b

# Reduce context window
# In Continue.dev config:
"contextLength": 2048  # Instead of 4096

# Enable CPU offloading in LM Studio
# GPU Offload: 80% instead of 100%

Issue 2: Slow Performance

Symptoms: <5 tokens/second, long response times Diagnosis:

# Check GPU utilization
nvidia-smi

# Check system resources
taskmgr

Solutions:

Close other GPU-intensive applications
Use Q4 quantized models
Reduce batch size
Check thermal throttling

Issue 3: Ollama Service Won't Start

Symptoms: "Connection refused" errors Solutions:

# Check if service is running
tasklist | findstr ollama

# Restart Ollama service
taskkill /f /im ollama.exe
ollama serve

# Check port availability
netstat -an | findstr 11434

Issue 4: Continue.dev Not Connecting

Symptoms: No responses in chat, connection errors Solutions:

Check Ollama is running:
```
curl http://localhost:11434/api/tags
```

Verify Continue.dev config:

{
  "models": [
    {
      "title": "Test",
      "provider": "ollama",
      "model": "deepseek-coder:6.7b",
      "apiBase": "http://localhost:11434"
    }
  ]
}

Restart VS Code

Issue 5: Model Download Fails

Symptoms: Download interruptions, corrupted models Solutions:

# Clear Ollama cache
ollama rm model_name

# Re-download with verbose output
ollama pull model_name --verbose

# Check internet connection and disk space

Issue 6: High VRAM Usage

Symptoms: System instability, graphics glitches Monitoring:

# Continuous VRAM monitoring
nvidia-smi -l 1 --query-gpu=memory.used,memory.total --format=csv

Solutions:

Use smaller models (6.7B instead of 7B)
Reduce context length
Enable CPU offloading
Close other applications

Performance Troubleshooting

Benchmark Your Setup

# Save as benchmark.py
import time
import requests

def benchmark_ollama():
    url = "http://localhost:11434/api/generate"
    prompt = "Write a Python function to calculate fibonacci numbers"
    
    data = {
        "model": "deepseek-coder:6.7b",
        "prompt": prompt,
        "stream": False
    }
    
    start_time = time.time()
    response = requests.post(url, json=data)
    end_time = time.time()
    
    if response.status_code == 200:
        result = response.json()
        tokens = len(result['response'].split())
        duration = end_time - start_time
        tokens_per_second = tokens / duration
        
        print(f"Response time: {duration:.2f} seconds")
        print(f"Tokens generated: {tokens}")
        print(f"Tokens per second: {tokens_per_second:.2f}")
    else:
        print(f"Error: {response.status_code}")

if __name__ == "__main__":
    benchmark_ollama()

Expected Benchmarks

DeepSeek Coder 6.7B: 15-25 tokens/sec
Qwen2.5 Coder 7B: 12-20 tokens/sec
Code Llama 7B: 10-18 tokens/sec

System Requirements Check

Verify CUDA Installation

# Check NVIDIA driver
nvidia-smi

# Check CUDA version
nvcc --version

# Verify PyTorch CUDA support (if using Python tools)
python -c "import torch; print(torch.cuda.is_available())"

Memory Requirements

Minimum System RAM: 8GB (16GB recommended)
Available VRAM: 6GB+ for 7B models
Free Disk Space: 20GB+ for models

🔧 Advanced Configuration

Custom Model Fine-tuning

Preparing Your Dataset

# Example: Prepare code dataset for fine-tuning
import json

def prepare_dataset(code_files):
    dataset = []
    for file_path in code_files:
        with open(file_path, 'r') as f:
            code = f.read()
            dataset.append({
                "instruction": "Complete this code:",
                "input": code[:len(code)//2],
                "output": code[len(code)//2:]
            })
    
    with open('training_data.json', 'w') as f:
        json.dump(dataset, f, indent=2)

# Usage
code_files = ['project1.py', 'project2.js', 'project3.cpp']
prepare_dataset(code_files)

Fine-tuning with Ollama (Advanced)

# Create Modelfile
cat > Modelfile << EOF
FROM deepseek-coder:6.7b
PARAMETER temperature 0.1
PARAMETER top_p 0.9
SYSTEM "You are an expert programmer specializing in [YOUR_DOMAIN]."
EOF

# Build custom model
ollama create my-custom-coder -f Modelfile

Multi-Model Setup

Running Multiple Models

# Terminal 1: Start first model
ollama run deepseek-coder:6.7b

# Terminal 2: Start second model on different port
OLLAMA_HOST=0.0.0.0:11435 ollama serve
ollama run codellama:7b

Load Balancing Configuration

{
  "models": [
    {
      "title": "DeepSeek Coder",
      "provider": "ollama",
      "model": "deepseek-coder:6.7b",
      "apiBase": "http://localhost:11434"
    },
    {
      "title": "Code Llama",
      "provider": "ollama", 
      "model": "codellama:7b",
      "apiBase": "http://localhost:11435"
    }
  ]
}

API Integration

Custom API Wrapper

# api_wrapper.py
from flask import Flask, request, jsonify
import requests

app = Flask(__name__)

@app.route('/v1/completions', methods=['POST'])
def completions():
    data = request.json
    
    # Route to appropriate model based on request
    if 'python' in data.get('prompt', '').lower():
        model_url = "http://localhost:11434"
        model_name = "codellama:7b"
    else:
        model_url = "http://localhost:11434"
        model_name = "deepseek-coder:6.7b"
    
    response = requests.post(f"{model_url}/api/generate", json={
        "model": model_name,
        "prompt": data['prompt'],
        "stream": False
    })
    
    return jsonify(response.json())

if __name__ == '__main__':
    app.run(port=8080)

Environment Variables

Ollama Configuration

# Windows Environment Variables
setx OLLAMA_HOST "0.0.0.0:11434"
setx OLLAMA_MODELS "E:\AI_Models\ollama"
setx OLLAMA_NUM_PARALLEL "2"
setx OLLAMA_MAX_LOADED_MODELS "2"
setx OLLAMA_FLASH_ATTENTION "1"

LM Studio Configuration

// LM Studio config.json
{
  "gpu_offload": 100,
  "context_length": 4096,
  "batch_size": 512,
  "threads": 8,
  "gpu_split": [100],
  "main_gpu": 0,
  "tensor_split": [1.0]
}

🌐 Community Resources

Forums & Communities

Reddit Communities

r/LocalLLaMA: https://reddit.com/r/LocalLLaMA
- Hardware discussions, model comparisons
- Performance optimization tips
- Troubleshooting help
r/ollama: https://reddit.com/r/ollama
- Ollama-specific discussions
- Model recommendations
- Setup guides
r/MachineLearning: https://reddit.com/r/MachineLearning
- Latest AI research
- Model releases
- Technical discussions

Discord Servers

Ollama Discord: https://discord.com/invite/ollama
- Real-time support
- Community models
- Beta testing
Continue.dev Discord: https://discord.gg/NWtdYexhMs
- Extension support
- Feature requests
- Integration help
LM Studio Discord: https://discord.gg/aPQfnNkxGC
- GUI support
- Model sharing
- Performance tips

YouTube Channels

Setup Tutorials

AI Explained: https://youtube.com/@aiexplained-official
- Model comparisons and reviews
- Setup tutorials
- Performance benchmarks
Matthew Berman: https://youtube.com/@matthew_berman
- Local AI setup guides
- Model testing and reviews
- Hardware recommendations
Prompt Engineering: https://youtube.com/@promptengineering
- Advanced prompting techniques
- Model fine-tuning
- Use case examples

Hardware-Specific Content

Tech Yes City: RTX 3060 AI performance videos
Gamers Nexus: GPU benchmarking for AI workloads
Level1Techs: Enterprise AI setup guides

GitHub Repositories

Essential Repos

Ollama: https://github.com/ollama/ollama
- Source code and documentation
- Issue tracking and feature requests
- Community contributions
Continue.dev: https://github.com/continuedev/continue
- Extension source code
- Configuration examples
- Plugin development
LM Studio: https://github.com/lmstudio-ai
- Official repositories
- API documentation
- Integration examples

Community Tools

Ollama WebUI: https://github.com/open-webui/open-webui
- Web interface for Ollama
- Chat and model management
- Multi-user support
Ollama Python: https://github.com/ollama/ollama-python
- Python client library
- API integration examples
- Automation scripts
Awesome Ollama: https://github.com/jmorganca/awesome-ollama
- Curated list of Ollama resources
- Tools and integrations
- Community projects

Blogs & Articles

Technical Blogs

Hugging Face Blog: https://huggingface.co/blog
- Model releases and comparisons
- Technical deep dives
- Performance optimizations
NVIDIA Developer Blog: https://developer.nvidia.com/blog
- GPU optimization guides
- CUDA programming tips
- AI acceleration techniques
Towards Data Science: https://towardsdatascience.com
- AI implementation guides
- Performance benchmarking
- Best practices

Hardware Reviews

Tom's Hardware: GPU reviews for AI workloads
AnandTech: Detailed hardware analysis
Phoronix: Linux performance benchmarks

📚 Official Links & Documentation

Primary Tools

Model Providers

Hugging Face

Model Hub: https://huggingface.co/models
Code Models: https://huggingface.co/models?pipeline_tag=text-generation&other=code
DeepSeek Models: https://huggingface.co/deepseek-ai
Code Llama Models: https://huggingface.co/codellama

Model-Specific Links

DeepSeek Coder: https://github.com/deepseek-ai/DeepSeek-Coder
Code Llama: https://github.com/facebookresearch/codellama
StarCoder: https://github.com/bigcode-project/starcoder
Qwen2.5 Coder: https://github.com/QwenLM/Qwen2.5-Coder

Hardware & Drivers

NVIDIA

Driver Downloads: https://www.nvidia.com/drivers
CUDA Toolkit: https://developer.nvidia.com/cuda-toolkit
Container Toolkit: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit
RTX AI: https://www.nvidia.com/en-us/ai-on-rtx

Docker

Docker Desktop: https://www.docker.com/products/docker-desktop
NVIDIA Container Toolkit: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

Additional Tools

OpenInterpreter

GitHub: https://github.com/KillianLucas/open-interpreter
Documentation: https://docs.openinterpreter.com
Installation: pip install open-interpreter

Aider

GitHub: https://github.com/paul-gauthier/aider
Documentation: https://aider.chat
Installation: pip install aider-chat

CodeGPT

VS Code Extension: https://marketplace.visualstudio.com/items?itemName=DanielSanMedium.dscodegpt
Documentation: https://docs.codegpt.co

Learning Resources

Courses & Tutorials

Fast.ai: https://www.fast.ai - Practical deep learning
Coursera AI Courses: https://www.coursera.org/browse/data-science/machine-learning
edX MIT AI: https://www.edx.org/course/artificial-intelligence-mit

Books

"Hands-On Machine Learning" by Aurélien Géron
"Deep Learning" by Ian Goodfellow
"Pattern Recognition and Machine Learning" by Christopher Bishop

Research Papers

arXiv.org: https://arxiv.org/list/cs.AI/recent - Latest AI research
Papers With Code: https://paperswithcode.com - Implementation guides
Google Scholar: https://scholar.google.com - Academic search

🎯 Quick Reference Commands

Ollama Commands

# Install and manage models
ollama pull deepseek-coder:6.7b
ollama list
ollama rm model_name
ollama show model_name

# Run models
ollama run deepseek-coder:6.7b
ollama serve

# API testing
curl http://localhost:11434/api/tags
curl -X POST http://localhost:11434/api/generate -d '{"model":"deepseek-coder:6.7b","prompt":"Hello"}'

Performance Monitoring

# GPU monitoring
nvidia-smi
nvidia-smi -l 1

# System monitoring
taskmgr
perfmon

# Network monitoring
netstat -an | findstr 11434

Troubleshooting

# Restart services
taskkill /f /im ollama.exe
ollama serve

# Check ports
netstat -an | findstr 11434
netstat -an | findstr 1234

# Clear cache
ollama rm model_name
docker system prune

🚀 Getting Started Checklist

✅ Pre-Setup (5 minutes)

Verify RTX 3060 drivers are updated
Check available disk space (>20GB free)
Close unnecessary applications
Enable Windows GPU scheduling

✅ Beginner Path: LM Studio (20 minutes)

Download and install LM Studio
Download DeepSeek Coder 6.7B model
Test model in LM Studio chat
Start local server
Install Continue.dev in VS Code
Configure Continue.dev for LM Studio
Test code completion

✅ Advanced Path: Ollama (15 minutes)

Download and install Ollama
Pull DeepSeek Coder model: ollama pull deepseek-coder:6.7b
Test model: ollama run deepseek-coder:6.7b
Install Continue.dev in VS Code
Configure Continue.dev for Ollama
Test integration

✅ Optimization (10 minutes)

Monitor GPU usage with nvidia-smi
Adjust model settings for performance
Test with coding prompts
Bookmark troubleshooting section
Join community Discord for support

📞 Support & Help

When You Need Help

Check this troubleshooting guide first
Search Reddit r/LocalLLaMA for similar issues
Join Discord communities for real-time help
Check GitHub issues for known problems
Post detailed error messages when asking for help

What to Include in Support Requests

Hardware specs (GPU, RAM, storage)
Software versions (Ollama, LM Studio, Continue.dev)
Error messages (full text)
Steps to reproduce the issue
Screenshots if relevant

This guide was created to help you set up local AI coding agents on your RTX 3060 system. For updates and community contributions, visit the GitHub repository or join our Discord community.

Happy coding with AI! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🤖 Complete Guide: Local AI Coding Agents on RTX 3060

📋 Table of Contents

🔧 Hardware Assessment

Your RTX 3060 Setup Analysis

What You Can Run

🚀 Quick Start Recommendations

For Beginners: LM Studio + Continue.dev

For Advanced Users: Ollama + Continue.dev

For Enterprise: Tabby

🦙 Option 1: Ollama + Continue.dev (Recommended)

Why This Combination?

Step 1: Install Ollama

Windows Installation

Step 2: Download Coding Models

Best Models for RTX 3060

Model Download Sizes

Step 3: Install Continue.dev

VS Code Installation

JetBrains Installation

Step 4: Configure Continue.dev

Step 5: Usage Examples

Code Completion

Chat Examples

🖥️ Option 2: LM Studio + Continue.dev (Beginner Friendly)

Why LM Studio?

Step 1: Install LM Studio

Step 2: Download Models via LM Studio

Step 3: Load and Test Model

Step 4: Start Local Server

Step 5: Connect Continue.dev

🏢 Option 3: Tabby Self-Hosted (Enterprise Grade)

Why Tabby?

Prerequisites

Step 1: Install Docker & NVIDIA Support

Install Docker Desktop

Install NVIDIA Container Toolkit

Step 2: Run Tabby Server

Basic Setup

Advanced Setup with Persistence

Step 3: Install IDE Extensions

VS Code

JetBrains IDEs

Other IDEs

Step 4: Web Interface

🛠️ Option 4: Additional Tools

OpenInterpreter

Installation

Usage with Ollama

Features

Aider

Installation

Usage

Features

CodeGPT

Installation

Cursor (Alternative IDE)

Features

🎯 Model Recommendations for RTX 3060 12GB

Tier 1: Excellent Performance (6-7B Models)

DeepSeek Coder 6.7B ⭐ Best Overall

Qwen2.5 Coder 7B ⭐ Best for Multiple Languages

Code Llama 7B ⭐ Best for Python

Tier 2: Good Performance (13B Models with Optimization)

DeepSeek Coder 13B (Q4 Quantization)

Code Llama 13B (Q4 Quantization)

Specialized Models

StarCoder2 7B (Code Completion Specialist)

Magicoder 7B (Problem Solving)

Model Comparison Table

⚡ Performance Optimization

GPU Optimization

NVIDIA Control Panel Settings

Windows GPU Scheduling

System Optimization

Power Settings

Memory Management