Skip to main content

Quickstart

Using the Python SDK

Here’s the simplest way to run a job with Sutro - just pass a list of inputs and a system prompt.
import sutro as so

# Can be skipped if set via CLI with `sutro login`
so.set_api_key("sk_******")

user_reviews = [
    "I loved the product! It was easy to use and had a great user interface.",
    "The product was okay, but the customer support could be better.",
    "I had a terrible experience with the product. It didn't work as advertised and customer service was unhelpful.",
]

results = so.infer(
    user_reviews,
    system_prompt="Classify the review as positive, neutral, or negative."
)

print(results)
This outputs a list preserving the original ordering:
[
    'This review is positive.',
    'This review is neutral.',
    'This review is negative.'
]
Below are some more complex examples that show different ways to use Sutro!

Structuring outputs

In the above example, we’re trying to perform a simple classification task. In such cases, we may want structured outputs. We can accomplish this by passing in a Pydantic model or JSON schema using the output_schema parameter. The model will strictly adhere to this schema in its output content.
import sutro as so
from pydantic import BaseModel

# This can be skipped if set via the CLI with `sutro login`
so.set_api_key("sk_******")

user_reviews = [
    "I loved the product! It was easy to use and had a great user interface.",
    "The product was okay, but the customer support could be better.",
    "I had a terrible experience with the product. It didn't work as advertised and customer service was unhelpful."
]

class ReviewClassification(BaseModel):
    classification: str

results = so.infer(
    user_reviews,
    system_prompt="Classify the review as positive, neutral, or negative.",
    output_schema=ReviewClassification
)

print(results)
Now we should obtain the following output:
[
    {"classification": "positive"},
    {"classification": "neutral"},
    {"classification": "negative"}
]
Structured outputs also work well with reasoning models, since the model has the token space to go through a reasoning process before its outputs get constrained to the output schema.
import sutro as so
from pydantic import BaseModel
from typing import List

so.set_api_key("sk_******")

reviews = [...]

class ReviewAnalysis(BaseModel):
    sentiment: str
    rating: int
    key_aspects: List[str]
    would_recommend: bool

system_prompt = """Analyze the review and extract structured insights.
Reflect and conisider the implications of what the customer is stating
and how that may affect your analysis. Rate from 1-5."""

results = so.infer(
    data=reviews,
    system_prompt=system_prompt
    output_schema=ReviewAnalysis,
    model='qwen-3-30b-a3b-thinking'
)

# Note the `content` and `reasoning_content` fields
print(results[0])
# >> {"content": {"sentiment": ... }, "reasoning_content": "Ok, our tasks is to..."}

Working with DataFrames and Sampling Parameters

This example shows how to work with DataFrames, customize sampling parameters, and wait for job completion.
import sutro as so
import polars as pl

so.set_api_key("sk_******")

# Load your data
df = pl.read_csv('customer_feedback.csv')

# Run inference with custom sampling parameters
results_df = so.infer(
    data=df,
    column='feedback_text',
    output_column='sentiment_analysis',
    model='llama-3.1-70b',
    system_prompt='Analyze sentiment and extract key themes',
    sampling_params={
        'temperature': 0.3,
        'top_p': 0.9,
        'max_tokens': 200
    },
)

print(results_df)

Multi-Model Comparison

Run the same inputs across multiple models to compare outputs and quality.
import sutro as so

so.set_api_key("sk_******")

prompts = [
    "Explain quantum computing in simple terms",
    "What are the benefits of renewable energy?",
    "How does photosynthesis work?"
]

# Run same inputs across multiple models for comparison
job_ids = so.infer_per_model(
    data=prompts,
    models=['gemma-3-27b-it', 'qwen-2.5-32b-instruct', 'gpt-oss-20b'],
    names=['gemma-27b-run', 'qwen-32b-run', 'gpt-oss-run'],
    system_prompt='Provide a concise, accurate explanation',
)

# Retrieve results from each model
for model_name, job_id in zip(['gemma-27b', 'qwen-32b', 'gpt-oss'], job_ids):
    results = so.await_job_completion(job_id)
    print(f"\n{model_name} results:")
    print(results)

Cost Estimation

Before running a large job, you can estimate costs using the dry_run parameter.
import sutro as so
import polars as pl

so.set_api_key("sk_******")

# Load a large dataset
df = pl.read_csv('large_dataset.csv')

# Get cost estimate without running inference
# The cost estimate will be displayed automatically
so.infer(
    data=df,
    column='text_column',
    model='gemma-3-27b-it',
    system_prompt='Summarize this text',
    job_priority=1,
    dry_run=True  # Returns cost estimate instead of running inference
)

Using Files

You can also use files to pass in data. We currently support CSV, Parquet, and TXT files. If you’re using a TXT file, each line should represent a single input. If you’re using a CSV or Parquet file, you must specify the column name that contains the inputs using the column parameter.
import sutro as so

results = so.infer(
    inputs='my_file.csv',
    column='reviews',
    system_prompt='Classify the review as positive, neutral, or negative.',
)

print(results)
You can view the full details of the SDK at here.

Moving to Production

So far we’ve shown what it looks like to use prototyping jobs (priority 0, default). After working with a small amount of data using prototyping jobs, you’ll likely want to move to production jobs (priority 1) which are less expensive and have higher quotas. To do so, you simply need to set the job_priority parameter to 1:
job_id = so.infer(inputs='my_file.csv', column='reviews', system_prompt='Classify the review as positive, neutral, or negative.', job_priority=1)
Instead of waiting for the job to finish, it will return a job ID immediately which you can pass to await_job_completion, which will poll for job status and retrieve results once complete.
import time
import sutro as so

job_id = so.infer(
    inputs="my_file.csv",
    column="reviews",
    system_prompt="Classify the review as positive, neutral, or negative.",
    job_priority=1,
)

results = so.await_job_completion(job_id)

Using the CLI to view job progress and results

Once you’ve submitted jobs via the SDK or API, you can use the CLI to view the status of the job and retrieve the results. Viewing current and past jobs:
sutro jobs
Retrieving job status:
sutro status <job_id>
Retrieving job results:
sutro results <job_id>
You can view the full details of the CLI at here.

API Usage Example

You can accomplish the same tasks using the API directly. You’ll need to preprocess data as a list or array and pass it in via the parameters in the JSON body.
import requests
import json

user_reviews = [
    "I loved the product! It was easy to use and had a great user interface.",
    "The product was okay, but the customer support could be better.",
    "I had a terrible experience with the product. It didn't work as advertised and customer service was unhelpful."
]

params = {
    "model": "llama-3.1-8b",
    "inputs": user_reviews,
    "system_prompt": "Classify the review as positive, neutral, or negative.",
}
headers = {
    "Authorization": "Key <YOUR_API_KEY>",
    "Content-Type": "application/json"
}

response = requests.post("https://api.sutro.sh/batch-inference", json=params, headers=headers)
results = response.json()
For more details on using the API directly, refer to the Batch API Reference.