🤖AI & LLM
1,814
103

ai-multimodal

Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.

#multimodal#api#content-creation#gemini#audio-processing
Share
Quick Install
>_npx skills add mrgoonie/claudekit-skills
Documentation
Loading documentation...
Repository
Repositorymrgoonie/claudekit-skills
Stars1,814
Last UpdatedFeb 4, 2026
Related Skills
271,400
6,331

find-skills

Helps users discover and install agent skills based on their queries.

vercel-labs
vercel-labs/skills
46,800
19,561

agent-browser

A CLI tool for AI agents to automate browser tasks like navigation, form filling, and data scraping.

vercel-labs
vercel-labs/agent-browser
34,600
79,803

browser-use

Automates browser interactions for web testing, form filling, screenshots, and data extraction.

browser-use
browser-use/browser-use
32,600
86,065

skill-creator

A guide for creating effective AI skills that extend Claude's capabilities with specialized knowledge, workflows, or tool integrations.

anthropics
anthropics/skills
24,400
55,506

brainstorming

A skill for brainstorming and exploring user intent before implementing creative work.

obra
obra/superpowers