AI Vision Assistant — NVDA Add-on
AI Vision Assistant is an AI-powered NVDA add-on that provides UI Mode, OCR Mode, Cursor Mode, and Tab Nav Mode. It supports multiple AI providers including Gemini, OpenAI, Anthropic Claude and Groq Llama.
How UI Mode Works
- Press NVDA + Shift + U.
- NVDA says "Capturing screen, please wait."
- The add-on captures your focused window and sends it to the selected AI provider.
- AI analyzes the image and returns a list of all visible, clickable elements.
- A dialog opens listing those elements. Use Arrow keys to navigate and Enter to select.
- The add-on performs a mouse click and NVDA announces "Clicked: [element name]".
Supported AI Providers
- Google Gemini (free API key from Google AI Studio)
- OpenAI GPT (requires billing)
- Anthropic Claude (API key from Anthropic Console)
- Groq API (free API key available)
Requirements
- NVDA 2023.1 or later
- Windows 10 or 11
- Internet connection
- A valid API key from at least one supported provider
Troubleshooting
- No settings panel in NVDA: Ensure the add-on is installed in %APPDATA%\nvda\addons\ and NVDA has been restarted.
- API key errors: Use the Test Connection button to verify your key.
- No elements found: Try on a window with visible UI controls, or switch to a different AI provider.
Developed by: Team 6 for Vision-Aid "Hack for Inclusion" 2026.