- 01 Aug 2023
- 4 Minutos para leer
- Colaboradores
- Impresión
- OscuroLigero
Document Classifier
- Actualizado en 01 Aug 2023
- 4 Minutos para leer
- Colaboradores
- Impresión
- OscuroLigero
The Document Classifier feature allows VisualVault users to create and train their own machine learning models, which will be used to automatically classify documents through a specific action on Workflows.
The main Document Classifier page will display a list of models created, including details such as the model’s name, description, location and status.
Status column might indicate any of the options below:
- Initializing: The model has been created in Studio but not yet on AWS
- SUBMITTED: The model has been created in AWS
- TRAINING: The model is being trained by AWS
- TRAINED: The model has been trained and is ready to be used
- DELETING: The model is being deleted in AWS
- IN_ERROR: AWS indicates there is an error
- STOP_REQUESTED: The training has been requested to stop on AWS
- STOPPED: The training has been stopped on AWS
Create a New Classifier Model
To create a new Classifier Model:
- Navigate to the Process Design Studio, from the Enterprise Tools tab of the Control Panel.
- From the left menu, select to Document Classifier (under Analytics). Click New Classifier Model.The New Document Classifier Model window will open.
Main Info
- Add a name to your Document Classifier Model and description.
- Select data training source. From the dropdown menu, select either Folder Selection or File Upload.
Folder Selection
- Upload documents to a folder in the Document Library. These documents' content will be used for training the classification machine learning model. Supported file types include PDF, DOCX, XLXS, PPT, TXT, and TIFF formats.
- Select checkboxes to indicate the folders to use the documents within those folders as the model.
- In order to do this, classification categories must be created from index fields. This is done under Category Builder by clicking on Add new category from index fields.
File Upload
- Select CSV File.
The CSV file should be formatted as follows:
- The first row is the header row, which specifies the column names. In this case, the column names are Category and Text.(This row can be removed by configuration at the UI)
- Starting from the second row, each row represents a piece of data. The data is divided into two columns: Category (Categories assigned to the text) and Text (Represents the body of the document).
- The Category column contains the categories assigned to each text. Multiple categories can be assigned to a text by separating the category names with the "|" character as default.
- The Text column contains the actual text associated with each category.
Example Document format:
Advanced Settings
- Indicate if your CSV file has a header or not. The default option will be Yes.
- The default CSV Separators will be “,”. The default Category Separator will be “|”.
- Indicate the number of lines from your file that will be used for the training model. The default amount of records to process will be 1.000.000.
- Indicate the percentage, of the number of lines selected, that will be used for training the model. The default percentage will be 80%.
Saving
Once all settings have been configured, click Save in order to create the model or Cancel to dismiss the creation. The Created Classifier should now appear on the list, with its status set to Initializing.
Use Workflow to Monitor Classification
After uploading files, a pre-configured workflow process will be initiated. To view the workflow configuration and monitor the document classification workflow progress, complete the following steps:
- In the Process Design Studio, click on the Workflows link shown in the screenshot below to see a list of existing workflows.
- Click on the Edit button to launch the workflow design and monitoring screen.
- From the workflow designer screen, click on the History tab to view in-process and completed workflow instances that are related to this specific workflow. It’s important to understand the concept of a “workflow” vs. “workflow instances”. When a user-defined event triggers a workflow, a new “workflow instance” is created by using the most recently published workflow version. Each of the workflow instances has its own history and status which can be viewed from the workflow’s history tab.
- Click the View button for the first workflow instance in the list to open a graphical view of the workflow execution history.Each Workflow instance’s execution history shows the graphical workflow history for a single “workflow instance”. Completed workflow actions are highlighted by a solid green border in the workflow execution history screen. The path of the workflow execution is highlighted by a solid green line. Clicking a completed workflow action will cause the workflow variables to update and display what the variable values were when the selected workflow action began.
Modify
To edit an existing classifier model, complete the following steps:
- Navigate to the Process Design Studio, from the Enterprise Tools tab of the Control Panel.
- From the left menu, navigate to Document Classifier. Click Edit.
- Edit fields under Main Info and Advanced Settings.
- Click Save.
Search
In order to search for a specific Document Classifier Model, use the search box on the top left of the main Document Classifier screen.
When using the search box, it will try to find the text under Classifier Name as well as under Description.
Delete
In order to delete a Model, go to the main Document Classifier screen, and click on the item's Delete icon.
You can also select one or more items on the list and then click on Delete Model(s) button on the top right of the screen.