Manage training data

Manage existing training data, including adding, editing, or removing records.

To effectively manage training data, you can take the following steps:

Adding new records:

  • Collect new documents to serve as training data for your model.

  • Make sure these documents are a representative sample of the different types of data the model is designed to process.

  • Upload the new records to your training data repository.

Editing existing records:

  • Regularly review your existing training data and update it as needed. This may include editing document metadata, adding additional labels, or removing erroneous or non-representative records.

Removing records:

  • Identify outdated, inaccurate, or no longer relevant records and remove them from your training data set.

  • Make sure you have a clear process for deciding which records to remove and document that process.

Training data versioning:

  • Implement a version control system for your training data to track changes and keep a clear history of dataset changes. This allows you to restore older versions of the training data when needed and track changes.

Training data security:

  • Ensure your training data is appropriately protected, especially if it contains sensitive or confidential information. Implement access controls to ensure only authorized users can access the training data, and encrypt the data during transfer and storage.

Documentation and tracking:

  • Document all changes to your training data, including adding, editing, and removing datasets. This allows you to track the history of your training data and ensure you have current and relevant data for training your model.

By regularly managing and updating your training data, you can ensure that your model is trained with current and representative data and achieves optimal performance.

Last updated