Algorithm Configuration
An algorithm is added to a Project via a JSON configuration.
The JSON configuration has the following properties.
Properties
Property | Type | Required |
---|---|---|
Name | string | Required |
Version | string | Required |
Owner | string | Optional |
DataFormat | enum | Required |
TrainWordEmbeddings | boolean | Optional |
WordEmbeddingsCache | boolean | Optional |
WordEmbeddingsOutput | string | Optional |
WordEmbeddingsName | string | Optional |
RunOn | enum | Required |
Details | object | Required |
AlgoFrom | string | Required |
AlgoDetails | object | Required |
Name
The algorithm name
is required
Type:
string
Constraints
minimum length: the minimum number of characters for this string is: 3
Version
Version associated with the current configuration for eg: 1.0.2
is required
Type:
string
Constraints
pattern: the string must match the following regular expression:
^([0-9]+(.[0-9]+)*)$
Owner
Owner of the algorithm
is optional
Type:
string
DataFormat
Data format of the data accepted by algorithm
is required
Type:
enum
Constraints
enum: the value of this property must be equal to one of the following values:
Value | Explanation |
---|---|
"CoNLL2003" | Conll2003 format |
"IOB" | Iob format |
"IOB2" | Iob2 format |
"IOBES" | iobes format |
"BILOU" | bilou format |
"TSV" | tsv format |
"JSONL" | jsonl format |
"Acharya" | Acharya format |
TrainWordEmbeddings
If set, this will initiate a word embedding training configured in config.yaml before the actual training. Word embedding training helps in training/updating the vocabulary based on the data. This can also be used to update a pre-trained model like BERT. This is very useful when a domain specific vocabulary training is desired.
is optional
Type:
boolean
WordEmbeddingsCache
Recommended if TrainWordEmbeddings is set
If set, will cache the output of the word-embedding training done and will only re-initiate word-embedding training when the data content changes. This will help in overall training time reduction.
is optional
Type:
boolean
WordEmbeddingsOutput
Recommended if TrainWordEmbeddings is set
Path where the word-embedding output is saved. The contents of this path will be cached and reused for future training until the input data changes.
is optional
Type:
string
WordEmbeddingsName
A name to be associated with the word embedding training
is optional
Type:
string
RunOn
Will run the training on the specified infrastructure like: bare-metal, docker container or virtual machine. Currently supported values: Docker
is required
Type:
enum
Constraints
enum: the value of this property must be equal to one of the following values:
Value | Explanation |
---|---|
"Docker" | Specifies that the algorithm should be trained in a docker container configured. |
Default Value
The default value is:
"Docker"
Details
Detailed configuration of the infrastructure to run the training
is required
Type:
object
More info on each property of Details
Default Value
The default value is:
{
"Image": false,
"Port": "7707/tcp",
"HostIP": "0.0.0.0",
"Debug": false,
"DockerHost": "localhost",
"DockerHostPort": 2375
}
AlgoFrom
The source from where the algorithm should be fetched
is required
Type:
string
Constraints
enum: the value of this property must be equal to one of the following values:
Value | Explanation |
---|---|
"Git" | The algo is obtained from a git repository |
Default Value
The default value is:
"Git"
AlgoDetails
is required
Type:
object
More info on each property of AlgoDetails
Default Value
The default value is:
{
"Path": "",
"Branch": "master",
"Auth": "None",
"Debug": false,
"DockerfilePath": "Dockerfile",
"ConfigPath": "config.yaml",
"AlgoOutput": [
"/path/to/model/output",
"any/other/path"
],
"Logs": [
"path/to/model/logs",
"any/other/logs"
]
}
Details Properties
Property | Type | Required |
---|---|---|
Image | boolean | Required |
Port | string | Required |
HostIP | string | Optional |
Debug | boolean | Optional |
Basepath | string | Optional |
DockerHost | string | Required |
DockerHostPort | number | Required |
Runtime | string | Optional |
Image
Specifies whether the docker is an Image or not
is required
Type:
boolean
ImageName
If Image, the Image name to run
is optional
Type:
string
Port
Port to be used for Training service, Default: 7707
Port
is required
Type:
string
Constraints
pattern: the string must match the following regular expression:
^[0-9]+/(tcp|udp)$
Default Value
The default value is:
"7707/tcp"
HostIP
IP on the host machine where the container should listen on
is optional
Type:
string
Constraints
hostname: the string must be a hostname, according to RFC 1123, section 2.1
Default Value
The default value is:
"0.0.0.0"
Debug
Set true to enable debug mode
is optional
Type:
boolean
Basepath
Set the basepath to the path where the scripts are to be copied and run
is optional
Type:
string
DockerHost
The ip address or the hostname of the Docker server to be connected. The Dockerserver can be remote over a VPN also, provided proper network connectivity is available.
is required
Type:
string
Constraints
hostname: the string must be a hostname, according to RFC 1123, section 2.1
Default Value
The default value is:
"localhost"
DockerHostPort
The port number of the Docker server to be connected.
is required
Type:
number
Constraints
maximum: the value of this number must smaller than or equal to: 65535
minimum: the value of this number must greater than or equal to: 0
Default Value
The default value is:
2375
Runtime
Configure docker runtime, for eg: user nvidia for using Docker with cuda compatiblie gpu cards
is optional
Type:
string
Constraints
enum: the value of this property must be equal to one of the following values:
Value | Explanation |
---|---|
"nvidia" |
AlgoDetails Properties
Property | Type | Required |
---|---|---|
Path | string | Required |
Branch | string | Required |
Auth | string | Required |
DockerFilePath | string | Optional |
ConfigPath | string | Required |
Username | string | Optional |
Credential | string | Optional |
AlgoOutput | array | Required |
RestoreOutput | boolean | Optional |
Logs | array | Optional |
Path
The git url or path from where the algorithm can be fetched.
is required
Type:
string
Branch
The name of the branch to be used to fetch the algorithm.
is required
Type:
string
Default Value
The default value is:
"master"
Auth
The Authentication mechanism to use.
is required
Type:
string
Constraints
enum: the value of this property must be equal to one of the following values:
Value | Explanation |
---|---|
"None" | |
"http" | |
"ssh" |
Default Value
The default value is:
"None"
DockerFilePath
The path to Dockerfile inside the repository.
is optional
Type:
string
Default Value
The default value is:
"Dockerfile"
ConfigPath
The path to config.yaml inside the repository.
ConfigPath
is required
Type:
string
Default Value
The default value is:
"config.yaml"
Username
The username to authenticate against in case the authentication type is not none.
is optional
Type:
string
Credential
The password or authkey to be used to authenticate
is optional
Type:
string
AlgoOutput
Provide all the paths which needs to be saved as the output of the model trained.
is required
Type:
array
RestoreOutput
If set, previously trained models would be restored in new training runs. This will help in incremental training
is optional
Type:
boolean
Default Value
The default value is:
true
Logs
Provide all the paths which log details about the training/evaluation
is optional
Type:
array