Using the API directly is not recommended for most users. Instead, we recommend using the
Python SDK.
List all files in a dataset.
Request Body
The ID of the dataset to list the files in
Your Sutro API key using Key authentication scheme.Format: Key YOUR_API_KEYExample: Authorization: Key sk_live_abc123...
Response
Returns a JSON object containing a list of files in the dataset, in the order they were uploaded.
A list of files in the dataset, ordered by upload time. Each file object contains metadata about the file including file_name, upload time, size, and other relevant information.
{
"files": [
{
"file_name": "training_batch_1.parquet",
"file_id": "file_abc123def456",
"uploaded_at": "2024-01-15T10:30:00Z",
"size_bytes": 524288,
"row_count": 1000,
"schema": {
"fields": [
{"name": "input", "type": "string"},
{"name": "output", "type": "string"},
{"name": "category", "type": "string"}
]
}
},
{
"file_name": "training_batch_2.parquet",
"file_id": "file_def456ghi789",
"uploaded_at": "2024-01-15T11:00:00Z",
"size_bytes": 262144,
"row_count": 500,
"schema": {
"fields": [
{"name": "input", "type": "string"},
{"name": "output", "type": "string"},
{"name": "category", "type": "string"}
]
}
}
]
}
Code Examples
import requests
response = requests.post(
'https://api.sutro.sh/list-dataset-files',
headers={
'Authorization': 'Key YOUR_SUTRO_API_KEY',
'Content-Type': 'application/json'
},
json={
'dataset_id': 'dataset_12345'
}
)
result = response.json()
print(f"Found {len(result['files'])} files in dataset:")
for i, file in enumerate(result['files'], 1):
print(f"{i}. {file['file_name']}")
print(f" File ID: {file['file_id']}")
print(f" Uploaded: {file['uploaded_at']}")
print(f" Size: {file['size_bytes']} bytes")
print(f" Rows: {file['row_count']}")
print("---")
File Object Fields
Each file in the files array contains the following fields:
file_name: Name of the file as it appears in the dataset
file_id: Unique identifier for the file
uploaded_at: ISO timestamp of when the file was uploaded
size_bytes: Size of the file in bytes
row_count: Number of rows/records in the file
schema: Schema information including field names and types
Notes
- Files are returned in the order they were uploaded to the dataset
- This ordering is preserved and matches the order used for batch inference
- Use the
file_name from this response with the download endpoint
- All files in a dataset share the same schema structure