API reference

Video in, slides and subtitles out, over HTTPS. Built for n8n, Make, and AI-agent tool calls: send a URL or a file, get JSON back in one request.

BETA /v1/transcribe is live today; /v1/extract is rolling out account by account. Create a key: $5 of starting credit included.

Basics

Base URL: https://api.video2any.com. Every request carries your key as a bearer token; create keys in your account. Both endpoints respond synchronously.

Authorization: Bearer v2a_live_...

Transcribe audio (live)

POST /v1/transcribe takes an audio file and returns text plus a ready .srt. Runs on Whisper, billed at $0.006 per audio minute (same rate as OpenAI, SRT included).

curl -X POST https://video2any.com/v1/transcribe   -H "Authorization: Bearer $VIDEO2ANY_KEY"   -F audio=@lecture.mp3
{
  "ok": true,
  "text": "Welcome back. Today we cover...",
  "srt": "1
00:00:00,000 --> 00:00:02,300
Welcome back...",
  "audio_minutes": 12.4,
  "cost_cents": 7.44
}

Beta cap: 24 MB per request. Send the file as multipart field audio or as the raw request body.

Extract slides (rolling out)

POST /v1/extract runs the same adaptive detection as the web app, server side, and responds synchronously. Billed at $0.05 per video minute. While the rollout completes it can return 501 coming_soon; the shape below is final.

curl -X POST https://video2any.com/v1/extract   -H "Authorization: Bearer $VIDEO2ANY_KEY"   -F video=@lecture.mp4   -F formats=json,pptx

Parameters:

  • video: the file, multipart. Max 2 GB.
  • formats: json (timestamps only) and/or pptx (adds the deck, base64). Default json.
  • interval: sampling interval in seconds, 1 to 10. Default: automatic from duration.
  • sensitivity: 1 to 10, biases the auto-calibrated threshold. Default 5.
{
  "ok": true,
  "mode": "motion",
  "threshold": 0.514,
  "duration_seconds": 80,
  "slides": 10,
  "timestamps": [0, 3, 4, 10, 11, 14, 17, 18, 64, 68],
  "pptx_base64": "UEsDBBQABgAIAA...",
  "video_minutes": 1.33,
  "cost_cents": 2.67
}

Use from n8n, Make, or an AI agent

In n8n or Make, one HTTP Request node does the whole job: method POST, JSON body with video_url, your key in the Authorization header. The synchronous response drops straight into the next node; no polling loop needed.

For AI agents, describe the endpoint as a tool. A function definition that works as-is:

{
  "name": "extract_slides",
  "description": "Extract slide timestamps (and optionally a .pptx) from a video URL",
  "parameters": {
    "type": "object",
    "properties": {
      "video_url": { "type": "string", "description": "YouTube, Google Drive or direct video URL" },
      "formats": { "type": "string", "enum": ["json", "json,pptx"] }
    },
    "required": ["video_url"]
  }
}

Errors

  • 401 unauthorized: missing or revoked key.
  • 402 credit_exhausted: starting credit used up; contact us to top up.
  • 413 too_large: over the size cap (24 MB transcribe, 2 GB extract).
  • 413 video_too_long: URL-sourced video above the 4 hour cap.
  • 422 undecodable: the container or codec could not be read. Re-encode to H.264 MP4 and retry.
  • 422 url_fetch_failed: the link could not be downloaded (private, region-locked, or dead).
  • 501 coming_soon: extract not yet enabled while the rollout completes.

Error bodies are JSON with code, message, and, where useful, a hint.

Metering

Slide extraction is $0.02 per minute of source video; transcription is $0.006 per audio minute. A 40 minute lecture with subtitles costs about $1.04. Sign-up includes $5 of credit, roughly 250 extraction minutes. Full details on the pricing page.