Kimi K2.6, served over an OpenAI-compatible API.
Pylo routes requests to Moonshot Kimi K2.6 and serves them back over an OpenAI-compatible API. Point your existing client at https://api.pylo.sh/v1 and change one line.
Pylo is the routing and reliability layer in front of the model. Requests fail over across upstreams automatically, and every request carries one stable ID. The model produces the tokens. Pylo keeps them flowing.
Endpoint
- API base URL
- https://api.pylo.sh/v1
- Model slug
- moonshotai/kimi-k2.6
- Auth
- Authorization: Bearer <key>
Read the Quickstart or jump to pricing.
Quickstart
Pylo speaks the OpenAI chat-completions API. If your code already talks to OpenAI, change the base URL and the model. Streaming works the same way.
curl
curl https://api.pylo.sh/v1/chat/completions \
-H "Authorization: Bearer $PYLO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "moonshotai/kimi-k2.6",
"stream": true,
"messages": [
{"role": "user", "content": "Write a haiku about routing tables."}
]
}'Python (openai SDK)
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.pylo.sh/v1",
api_key=os.environ["PYLO_API_KEY"],
)
stream = client.chat.completions.create(
model="moonshotai/kimi-k2.6",
stream=True,
messages=[
{"role": "user", "content": "Write a haiku about routing tables."},
],
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)Kimi K2.6
One model, one slug. Pylo exposes Kimi K2.6 with a 262144-token context window and text in, text out.
Model card
- Model slug
- moonshotai/kimi-k2.6
- Display name
- Kimi K2.6
- Context length
- 262144 tokens
- Input
- text
- Output
- text
- Supported features
- reasoning, structured_outputs, tools
Pricing
| Item | Price (USD / 1M tokens) |
|---|---|
| Input | $0.90 |
| Output | $3.90 |
| Cache read | $0.18 |
Launch pricing, subject to change.
Reliability
Pylo runs redundant upstream backends behind a single endpoint. A per-upstream circuit breaker tracks time-to-first-token and error rate over a rolling window. When the primary trips, requests fail over to the fallback automatically.
- Failover triggers on a time-to-first-token timeout (8 seconds by default), a 5xx, or a 429.
- No failover after the first token has streamed. Once bytes are on the wire, the stream stays on its upstream.
- One stable X-Request-Id per inbound request is the idempotency key, so a fallback retry never double-bills.
- If every backend is degraded, Pylo sheds early with a 429 or 503 instead of queueing your request.
Pylo describes its failover mechanism, not a numeric uptime guarantee.
Data handling
Pylo retains request metadata for abuse and legal purposes and does not train on it. Pylo is not zero-data-retention. The audit log records the request, never your prompts or responses. See the Privacy page for the full policy.