Concept
Dataset is a collection of datapoints. It can be used for the following purposes:- Data storage for use in future fine-tuning or prompt-tuning.
- Provide inputs and expected outputs for Evaluations.
Format
Every datapoint has two fixed JSON objects:data and target, each with arbitrary keys.
target is only used in evaluations.
data– the actual datapoint data,target– data additionally sent to the evaluator function.metadata– arbitrary key-value metadata about the datapoint.
data and target, the value can be any JSON value.
Example
This is an example of a valid datapoint.Editing
Datasets are editable. You can edit the datapoints by clicking on the datapoint and editing the data in JSON. Changes are saved as a new datapoint version.Versioning
Each datapoint has a unique id and acreated_at timestamp. Every time you
edit a datapoint, under the hood,
a new datapoint version is created with the same id,
but the created_at timestamp is updated.
The version stack is push-only. That is, when you revert to a previous version,
a copy of that version is created and added as a current version.
Example:
- Initial version (v1):
- Version 2 (v2):
- Version 3 (v3):
created_at timestamp is updated.
- Version 4 (v4):
