Inference ingress

Package: agentrouter.inference.v1 Service: InferenceIngressService

Endpoints

Send a prompt

What it does: Sends a prompt to an upstream LLM and returns the model's completion. The gateway selects an eligible data plane, retrieves and applies the project's BYOK key, enforces quota, records a request log entry, and proxies the call to the upstream provider. Streaming, multi-turn conversations, tools, and vision are not yet supported on this path.

Request fields:

FieldRequiredDescription
customer_idnoCustomer the prompt is billed/scoped against. When empty the server resolves the caller's default customer from their session identity (the first customer the caller is a member of, ordered by oldest membership). Supply explicitly to override the session default.
project_idnoProject the prompt is logged against. When empty the server resolves the caller's default project from their session identity (the first project the caller is a member of, ordered by oldest membership). Supply explicitly to override the session default.
modelnoModel selector. A bare model name (e.g. gpt-4o) is accepted when it is unique across the catalog; the server resolves its provider. Use the provider/model form (e.g. openai/gpt-4o) to disambiguate when the same model name exists under more than one provider. Empty selects the project's default model.
promptyesThe user prompt as plain text. Multi-message conversations land when the proto grows a messages repeated field; the skeleton path is a single user turn.
key_idnoOptional client_keys / api_keys row id (UUID) to bind this request to a specific customer-owned key for per-key ratelimit + fallback accounting on the dataplane. Empty means "use the latest active client key for the customer". The id flows through the minted MP JWT's jti claim, which the dataplane reads in its ext-authz filter.
idempotency_keynoIdempotency key for safe retry of mutating prompts.

Response fields:

FieldRequiredDescription
completionoutput-onlyThe model's reply text.
modeloutput-onlyThe model that produced the reply.
dataplane_idoutput-onlyThe dataplane that served the request, for observability. Empty in milestone B (no dataplane yet); populated once milestone C lands.
prompt_tokensoutput-onlyToken accounting (best effort; some providers don't report).
completion_tokensoutput-onlyTokens generated in the completion (output tokens), as reported by the provider's usage block.
{"signatures":{"go":"func (i *InferenceClient) Prompt(ctx context.Context, customerID, projectID, prompt string, opts ...PromptOption) (*inferencev1.PromptResponse, error)","python":"prompt(customer_id: str, project_id: str, prompt: str, model: str, key_id: str, idempotency_key: str) -\u003e PromptResponse","typescript":"prompt(customerId: string, projectId: string, prompt: string, opts: PromptOptions): Promise\u003cPromptResponse\u003e","cli":"tare api prompt \u003cprompt\u003e","curl":"curl -X POST \"${AGENTROUTER_BASE_URL}/v1/customers/cust_01H.../projects/proj_01H.../prompts\" \\\n  -H \"Authorization: Bearer ${AGENTROUTER_API_KEY}\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"...\",\n    \"prompt\": \"...\",\n    \"key_id\": \"...\",\n    \"idempotency_key\": \"...\"\n  }'"},"examples":{"go":{"files":[{"name":"main.go","content":"// Command example is a runnable example for the AgentRouter Go SDK.\n// Set AGENTROUTER_BASE_URL and AGENTROUTER_API_KEY in the environment, then `go run .`.\npackage main\n\nimport (\n\t\"context\"\n\t\"fmt\"\n\t\"log\"\n\t\"os\"\n\n\tagentrouter \"github.com/tetrateio/agentrouter-go\"\n)\n\n// input holds the request fields. Replace the placeholder values below.\ntype input struct {\n\tCustomerID string\n\tProjectID string\n\tPrompt string\n\tOpts... string\n}\n\nfunc main() {\n\tctx := context.Background()\n\n\tclient, err := agentrouter.New(ctx,\n\t\tagentrouter.WithBaseURL(os.Getenv(\"AGENTROUTER_BASE_URL\")),\n\t\tagentrouter.WithAPIKey(os.Getenv(\"AGENTROUTER_API_KEY\")),\n\t)\n\tif err != nil {\n\t\tlog.Fatalf(\"client: %v\", err)\n\t}\n\n\tin := input{\n\t\tCustomerID: \"cust_01H...\",\n\t\tProjectID: \"proj_01H...\",\n\t\tPrompt: \"...\",\n\t\tOpts...: \"...\",\n\t}\n\n\tcustomerID := in.CustomerID\n\tprojectID := in.ProjectID\n\tprompt := in.Prompt\n\topts... := in.Opts...\n\tresult, err := client.Inference().Prompt(ctx, customerID, projectID, prompt, opts...)\n\tif err != nil {\n\t\tlog.Fatalf(\"call: %v\", err)\n\t}\n\n\tfmt.Printf(\"%+v\\n\", result)\n}\n"},{"name":"go.mod","content":"module github.com/tetrateio/agentrouter-go-examples/inferenceingress/prompt\n\ngo 1.26\n\nrequire github.com/tetrateio/agentrouter-go v0.1.1\n\n// Point this at the directory you extracted the downloaded Go SDK tarball into.\n// The directory name matches the tarball stem on the Download SDK page.\nreplace github.com/tetrateio/agentrouter-go =\u003e ./third_party/agentrouter-go-0.1.1\n"}]},"python":{"files":[{"name":"main.py","content":"\"\"\"Runnable example for the AgentRouter Python SDK.\n\nSet AGENTROUTER_BASE_URL and AGENTROUTER_API_KEY in the environment, then run `python main.py`.\n\"\"\"\nimport os\n\nfrom agentrouter_sdk import Client\n\nclient = Client(\n    base_url=os.environ[\"AGENTROUTER_BASE_URL\"],\n    api_key=os.environ[\"AGENTROUTER_API_KEY\"],\n)\n\n# Replace the placeholder values below.\ncustomer_id = \"cust_01H...\"\nproject_id = \"proj_01H...\"\nprompt = \"...\"\nmodel = \"...\"\nkey_id = \"...\"\nidempotency_key = \"...\"\ntry:\n    result = client.inference.prompt(customer_id, project_id, prompt, model, key_id, idempotency_key)\n    print(result)\nexcept Exception as err:\n    print(\"Error:\", err)\n"},{"name":"requirements.txt","content":"# Point this at the directory you extracted the downloaded Python SDK tarball\n# into. The directory name matches the tarball stem on the Download SDK page.\n# To install instead from PyPI once published, replace the line below with:\n#   agentrouter-sdk\u003e=0.1.0\nagentrouter-sdk @ file:./third_party/agentrouter-python-0.1.1\n"}]},"typescript":{"files":[{"name":"index.ts","content":"// Runnable example for the AgentRouter TypeScript SDK.\n// Set AGENTROUTER_BASE_URL and AGENTROUTER_API_KEY in the environment, then run `npm install \u0026\u0026 npx tsx index.ts`.\nimport { Client } from '@tetrate/agentrouter-sdk'\n\nconst client = new Client({\n  baseUrl: process.env.AGENTROUTER_BASE_URL,\n  apiKey: process.env.AGENTROUTER_API_KEY,\n})\n\n// Replace the placeholder values below.\nconst customerId = \"cust_01H...\"\nconst projectId = \"proj_01H...\"\nconst prompt = \"...\"\nconst opts = \"...\"\ntry {\n  const result = await client.inference.prompt(customerId, projectId, prompt, opts)\n  console.log(result)\n} catch (err) {\n  console.error('Error:', err)\n}\n"},{"name":"package.json","content":"{\n  \"name\": \"inferenceingress\",\n  \"version\": \"0.1.0\",\n  \"private\": true,\n  \"type\": \"module\",\n  \"dependencies\": {\n    \"@tetrate/agentrouter-sdk\": \"file:./third_party/agentrouter-typescript-0.1.1\"\n  },\n  \"devDependencies\": {\n    \"@types/node\": \"^20.0.0\",\n    \"typescript\": \"^5.4.0\"\n  }\n}\n"},{"name":"tsconfig.json","content":"{\n  \"compilerOptions\": {\n    \"target\": \"ES2020\",\n    \"module\": \"ESNext\",\n    \"moduleResolution\": \"bundler\",\n    \"strict\": true,\n    \"esModuleInterop\": true,\n    \"skipLibCheck\": true\n  }\n}\n"}]},"cli":"tare api prompt \u003cprompt\u003e","curl":"curl -X POST \"${AGENTROUTER_BASE_URL}/v1/customers/cust_01H.../projects/proj_01H.../prompts\" \\\n  -H \"Authorization: Bearer ${AGENTROUTER_API_KEY}\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"...\",\n    \"prompt\": \"...\",\n    \"key_id\": \"...\",\n    \"idempotency_key\": \"...\"\n  }'"},"persona":"Authenticated (API key or session token)","httpVerb":"POST","httpPath":"/v1/customers/{customer_id}/projects/{project_id}/prompts","slug":"send-a-prompt"}

Get dataplane URL

What it does: Returns the dataplane (gateway) URL for a single project. The project is deduced from the caller's credential -- no input is required in the common case: a caller who belongs to one project gets that project's dataplane; one whose identity spans several gets their default (the oldest membership). Pass project_id to disambiguate or target a specific project (it is membership-checked). The customer is derived the same way. When the credential itself already pins a workspace, that workspace is used directly.

Request fields:

FieldRequiredDescription
project_idnoOptional project override. Empty resolves to the workspace the credential pins (when it carries one) or the caller's default project (the oldest membership). Supply explicitly to pick a specific project when the identity spans several; the chosen project is membership-checked.
customer_idnoOptional customer override. Empty resolves to the customer deduced from the credential. Honored only for principals not already bound to a customer; a cross-customer value is rejected unless the caller is a member.

Response fields:

FieldRequiredDescription
dataplaneoutput-onlyundocumented
{"signatures":{"go":"c.InferenceIngress().GetDataplaneURL(ctx, \u0026inferencev1.GetDataplaneURLRequest{...})","python":"client.inferenceingress.getdataplaneurl(...)","typescript":"client.inferenceingress.getdataplaneurl({...})","cli":"tare api dataplane-url","curl":"curl \"${AGENTROUTER_BASE_URL}/v1/dataplane-url\" \\\n  -H \"Authorization: Bearer ${AGENTROUTER_API_KEY}\""},"examples":{"go":"// No Go SDK wrapper for InferenceIngressService yet -- use the CLI or curl example below.","python":"# No Python SDK wrapper for InferenceIngressService yet -- use the CLI or curl example below.","typescript":"// No TypeScript SDK wrapper for InferenceIngressService yet -- use the CLI or curl example below.","cli":"tare api dataplane-url","curl":"curl \"${AGENTROUTER_BASE_URL}/v1/dataplane-url\" \\\n  -H \"Authorization: Bearer ${AGENTROUTER_API_KEY}\""},"persona":"Authenticated (API key or session token)","httpVerb":"GET","httpPath":"/v1/dataplane-url","slug":"get-dataplane-url"}

List dataplane URLs

What it does: Returns every dataplane URL the caller can reach -- one per workspace under the customer deduced from the credential. Key-deduced; no input. Use GetDataplaneURL when a single project's URL is wanted.

Request body: None.

Response fields:

FieldRequiredDescription
dataplanesoutput-onlyundocumented
{"signatures":{"go":"c.InferenceIngress().ListDataplaneURLs(ctx, \u0026inferencev1.ListDataplaneURLsRequest{...})","python":"client.inferenceingress.listdataplaneurls(...)","typescript":"client.inferenceingress.listdataplaneurls({...})","cli":"tare api dataplane-urls","curl":"curl \"${AGENTROUTER_BASE_URL}/v1/dataplane-urls\" \\\n  -H \"Authorization: Bearer ${AGENTROUTER_API_KEY}\""},"examples":{"go":"// No Go SDK wrapper for InferenceIngressService yet -- use the CLI or curl example below.","python":"# No Python SDK wrapper for InferenceIngressService yet -- use the CLI or curl example below.","typescript":"// No TypeScript SDK wrapper for InferenceIngressService yet -- use the CLI or curl example below.","cli":"tare api dataplane-urls","curl":"curl \"${AGENTROUTER_BASE_URL}/v1/dataplane-urls\" \\\n  -H \"Authorization: Bearer ${AGENTROUTER_API_KEY}\""},"persona":"Authenticated (API key or session token)","httpVerb":"GET","httpPath":"/v1/dataplane-urls","slug":"list-dataplane-urls"}