DLHUB currently supports 3 dataset formats for the training
To get started:
- Select the "?" button next to the Detected Data Type (1) to show the Training Data File Help dialog
- Select the desired file format from the drop-down list to learn detail about it.
- Click Generate Sample File/Folders to download the dataset template
- Based on the dataset template, you can prepare your own dataset for training.
File Type 1: Classified Image Folder
Organize your images dataset and define their output by classified folders
For example, if you want to do image classification for Avengers, you will have classified folders such as Spiderman, Superman, Wonderwoman
Each folder will contain image dataset that belongs to the character.
File Type 2: FEATURE vs CATEGORY (csv or txt)
This is the standard format where you list your data as columns, including:
- Column of labels (output)
- Column of features (input)
Here is a simple example that has 4 outputs and 8 inputs
Output 1 is labeled as 1 0 0 0
Output 2 is labeled as 0 1 0 0
Output 3 is labeled as 0 0 1 0
Output 4 is labeled as 0 0 0 1
Each row defines the classified output with its corresponding inputs (features).
The above example only shows one training sample for each output. In the real-world application, you would have many more samples for each output.
Below is a screenshot of the MNIST training dataset containing 60,000 training samples.
File Type 3: IMAGE MAP FILE (.csv or .txt)
This file contains a list of image directories and their classified output, separated by Tab
The first column will be the list of image paths, and the second column will be the classified output.
You need to make sure the image directory contains the actual image file.
To load this type of data into DLHUB, you just need to load the map file.