Uploading data to a NER Project

Accessing the upload screen

Once a Project has been created and opened, Upload can be accessed using the sidebar.

Current supported data formats

Plain text
JSONL
IOB-style data
- IOB(1)
- IOB2
- IOBES
- CONLL2003
- BILOU
TSV

Uploading data

To upload the data, either click on the drag and drop files here section or drag and drop files into the UI element.

If uploading records in plain text, the UI supports adding the plain text content directly as a record. Paste the content into the section highlighted below and the content will be added as a record.

note

Duplicate file uploads are checked to prevent inconsistencies.

Form fields

Data format

The format of the uploaded data. Currently, the following data formats are supported: Supported data formats

Upload name

The name of the current upload

note

An upload name is generated if one is not provided

Is the data in Acharya format?

Select this option if the data being uploaded is in the default Acharya JSONL format default config

Text (txt)

This is a plain text upload. All content uploaded will be treated as record data not associated to a particular data format

IOB style data

Formats such as IOB, IOB2, IOBES, CONLL2003 and BILOU are actively supported

JSONL

JSON Lines is supported where each line contains a JSON string with the following keys:
The JSON Key represents the key of the JSON property in the record (Fields marked * are required)

JSON Key	Type	Description
Data *	string	which denotes the actual training data
EntityLabels	[][number, number, string]	list of entity labels with `start` index, `end` index and label
Key	string	which denotes the record key
Completed	number	which denotes record as `pending` = 0/ `train` = 1/ `test` = 2
Prev	string	previous record's key
Next	string	next record's key

note

For EntityLabels the end index is exclusive

For example consider a JSONL record

{"details":"Welcome to Acharya","entities":[[10,20,"Name"]]}
{"details":"Acharya is a data centric MLOps tool","entities":[[0,7,"Name"],[26,31,"Operation"]]}

the corresponding JSON map will be

{
    "Data": "details",
    "EntityLabels": "entities"
}

info

For the fields that are not provided in JSON map will be overridden with the default values

Default JSON map configuration

{
    "Data": "data",
    "EntityLabels": "meta_data",
    "Key": "key",
    "Completed": "completed",
    "Prev": "prev",
    "Next": "next"
}

Mark all the records in this upload

Here there are 3 options

As Pending
For Test/Evaluation
For Training

As Pending

As pending will mark all the records in the data being uploaded to be pending (i.e awaiting action). Records marked as pending will not be part of any training or evaluation.

For Test/Evaluation

For Test/Evaluation will mark all the records in the data being uploaded for testing or evaluation only, it will not be used for training.

For Training

For Training will mark all the records in the data being uploaded to be used for training.

tip

It is recommended to test your files before upload

Uploading data to a NER Project

Accessing the upload screen​

Current supported data formats​

Uploading data​

Form fields​

Data format​

Upload name​

Tags​

Is the data in Acharya format?​

Text (txt)​

IOB style data​

JSONL​

Default JSON map configuration​

Mark all the records in this upload​

As Pending​

For Test/Evaluation​

For Training​

Accessing the upload screen

Current supported data formats

Uploading data

Form fields

Data format

Upload name

Tags

Is the data in Acharya format?

Text (txt)

IOB style data

JSONL

Default JSON map configuration

Mark all the records in this upload

As Pending

For Test/Evaluation

For Training