> For the complete documentation index, see [llms.txt](https://docs.docbits.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.docbits.com/administration-and-setup/settings/global-settings/document-types/model-training/manage-training-data.md).

# Manage training data

## Manage existing training data, including adding, editing, or removing records.

**To effectively manage training data, you can take the following steps:**

**Adding new records:**

* Collect new documents to serve as training data for your model.
* Make sure these documents are a representative sample of the different types of data the model is designed to process.
* Upload the new records to your training data repository.

**Editing existing records**:

* Regularly review your existing training data and update it as needed. This may include editing document metadata, adding additional labels, or removing erroneous or non-representative records.

**Removing records:**

* Identify outdated, inaccurate, or no longer relevant records and remove them from your training data set.
* Make sure you have a clear process for deciding which records to remove and document that process.

**Training data versioning:**

* Implement a version control system for your training data to track changes and keep a clear history of dataset changes. This allows you to restore older versions of the training data when needed and track changes.

**Training data security:**

* Ensure your training data is appropriately protected, especially if it contains sensitive or confidential information. Implement access controls to ensure only authorized users can access the training data, and encrypt the data during transfer and storage.

**Documentation and tracking:**

* Document all changes to your training data, including adding, editing, and removing datasets. This allows you to track the history of your training data and ensure you have current and relevant data for training your model.

By regularly managing and updating your training data, you can ensure that your model is trained with current and representative data and achieves optimal performance.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.docbits.com/administration-and-setup/settings/global-settings/document-types/model-training/manage-training-data.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
