Skip to main content

Sutro Functions Overview (Research Preview)

Summary

Sutro Functions are task-specific classifiers that are aligned with your decision preferences and inexpensive to run. We offer a declarative, iterative interface for building these functions that abstracts away model selection, prompt engineering, and upfront data labeling. Sutro Web UI

Why Sutro Functions?

When using AI to solve specific, repeated tasks it is often suboptimal to reach for large, general-purpose foundation models. Instead, it is often more practical to create a task-specific model that is:
  • Highly accurate and aligned with your organization’s decision preferences (more so than an out-of-the-box foundation model)
  • Consistent and low-variance across inputs (often not the case with foundation models)
  • As small and cheap to run as possible (especially when scaling to large datasets or many inputs over time)
We also find that today’s approach of model selection and prompt engineering is broken - creating brittle, inconsistent and overly expensive solutions leaving much to be desired. We believe Sutro Functions offers a better approach.

How does it work?

Start by uploading an unlabeled dataset, choosing a task type (ex. binary classification), and creating a simple task definition (ex. “Determine if this is a qualified lead for my business.”). The system iteratively learns decision preferences, surfacing examples where model confidence is low as well as overall learning progress metrics. The user then labels low-confidence examples and provides justifications for the responses, encoding preferences as it iterates. The user can also view high-confidence results (along with other uploaded correlative data if desired) and optionally re-label cases where confidence is high but labels are incorrect. After a few iterations, it typically converges on a highly-aligned task representation which can then be deployed on Sutro’s efficient batch inference service or exported for external usage. Once in production, you can also use Sutro to detect anomalies or data drift, and return to Sutro Functions at any time to make updates to the function.

What can I do with a Sutro Function?

We currently support text-based classification tasks only (binary, multi-class, and multi-label), but plan on adding support for other analytical tasks like extraction, matching, and more in the future - as well as expand to image and video inputs. Classification tasks covers an extremely wide range of high-value enterprise and research problems including:
  • Lead scoring
  • Support routing and triage systems
  • Document categorization
  • Fraud and scam/spam detection
  • Semantic tagging for downstream analytics
  • Data quality filtering
  • Creating product taxonomies
  • Merchant categorization
  • Model/query routers
  • Churn-prediction models
  • Many, many more!
These tasks are often extremely subjective, and must be aligned with your organization’s preferences to be considered accurate or useful. They are also often high-volume, and should be run inexpensively if possible.

How can I get started?

Sutro Functions is currently in research preview, and we are working with a small set of motivated design partners to improve the methododology and product experience before a self-serve release. If you are interested in joining the preview or getting a demo, send a message to team@sutro.sh.

FAQ

Even better! We can use those labels to pre-populate each iteration or serve as correlative references, but we still recommend providing justifications to encode preferences.
A small, preference-aligned AI model that can be used on larger datasets or many one-off task invocations.
On Sutro’s batch inference service, or via a self-hosted solution in your environment.
Both! We are happy to share more technical details about the under-the-hood approach, but intend for this guide to be more outcomes-focused.
Significantly less expensive to run than large foundation models. However, true ROI should be assessed via relative accuracy gains and overall task value.
Sutro Functions is still in active development, but we have seen extremely promising results. Generally speaking if a task is well-scoped from the start and the sample data contains a realistic distribution of label outcomes, it will succeed in encoding user preferences to a high-degree. We plan to release benchmarks showing relative performance gains in the near future.
Our aim is to make it significantly faster than prompt engineering, with superior results.
Generally speaking, you should reduce a task down to it’s simplest form for best results. For example, it may be helpful to break a larger, more complex problem like a lead-scoring model into several binary classifiers to more accurately solve that problem.
Generally speaking, foundation models do not improve on subjective tasks as they get larger or more generally capable. Sutro Functions aims to improve accuracy when preference-encoding or domain expertise is required. You can think of it like a last-mile solution for AI.
Sutro Functions are suited for unstructured input data (text, images, etc. - not numerical), and alleviate the need for upfront data labeling.
Yes, and we plan to offer the ability to mark incorrect production instantiations for further refinement over time.