Kimi K2.6, served over an OpenAI-compatible API.

Pylo routes requests to Moonshot Kimi K2.6 and serves them back over an OpenAI-compatible API. Point your existing client at https://api.pylo.sh/v1 and change one line.

Pylo is the routing and reliability layer in front of the model. Requests fail over across upstreams automatically, and every request carries one stable ID. The model produces the tokens. Pylo keeps them flowing.

Endpoint

API base URL: https://api.pylo.sh/v1
Model slug: moonshotai/kimi-k2.6
Auth: Authorization: Bearer <key>

Read the Quickstart or jump to pricing.

Quickstart

Pylo speaks the OpenAI chat-completions API. If your code already talks to OpenAI, change the base URL and the model. Streaming works the same way.

curl

bash

curl https://api.pylo.sh/v1/chat/completions \
  -H "Authorization: Bearer $PYLO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moonshotai/kimi-k2.6",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Write a haiku about routing tables."}
    ]
  }'

Python (openai SDK)

python

import os

from openai import OpenAI

client = OpenAI(
    base_url="https://api.pylo.sh/v1",
    api_key=os.environ["PYLO_API_KEY"],
)

stream = client.chat.completions.create(
    model="moonshotai/kimi-k2.6",
    stream=True,
    messages=[
        {"role": "user", "content": "Write a haiku about routing tables."},
    ],
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Kimi K2.6

One model, one slug. Pylo exposes Kimi K2.6 with a 262144-token context window and text in, text out.

Model card

Model slug: moonshotai/kimi-k2.6
Display name: Kimi K2.6
Context length: 262144 tokens
Input: text
Output: text
Supported features: reasoning, structured_outputs, tools

Pricing

Item	Price (USD / 1M tokens)
Input	$0.90
Output	$3.90
Cache read	$0.18

Launch pricing, subject to change.

Reliability

Pylo runs redundant upstream backends behind a single endpoint. A per-upstream circuit breaker tracks time-to-first-token and error rate over a rolling window. When the primary trips, requests fail over to the fallback automatically.

Failover triggers on a time-to-first-token timeout (8 seconds by default), a 5xx, or a 429.
No failover after the first token has streamed. Once bytes are on the wire, the stream stays on its upstream.
One stable X-Request-Id per inbound request is the idempotency key, so a fallback retry never double-bills.
If every backend is degraded, Pylo sheds early with a 429 or 503 instead of queueing your request.

Pylo describes its failover mechanism, not a numeric uptime guarantee.

Data handling

Pylo retains request metadata for abuse and legal purposes and does not train on it. Pylo is not zero-data-retention. The audit log records the request, never your prompts or responses. See the Privacy page for the full policy.