API Reference

CLI

The A2RCHI CLI provides commands to create, manage, and delete A2RCHI deployments and services.

Commands

1. `create`

Create a new A2RCHI deployment.

Usage:

a2rchi create --name <deployment_name> --config <config.yaml> --env-file <secrets.env> [OPTIONS]

Options:

--name, -n (str, required): Name of the deployment.
--config, -c (str): Path to a YAML configuration file (repeat the flag to supply multiple files).
--config-dir, -cd (str): Directory containing configuration files.
--env-file, -e (str, required): Path to the secrets .env file.
--services, -s (comma-separated, required): List of services to enable (e.g., chatbot,uploader).
--sources, -src (comma-separated): Additional data sources to enable (e.g., git,jira). The links source is always available.
--podman, -p: Use Podman instead of Docker.
--gpu-ids: GPU configuration (all or comma-separated IDs).
--tag, -t (str): Image tag for built containers (default: 2000).
--hostmode: Use host network mode.
--verbosity, -v (int): Logging verbosity (0-4, default: 3).
--force, -f: Overwrite existing deployment if it exists.
--dry, --dry-run: Validate and show what would be created, but do not deploy.

2. `delete`

Delete an existing A2RCHI deployment.

Usage:

a2rchi delete --name <deployment_name> [OPTIONS]

Options:

--name, -n (str): Name of the deployment to delete.
--rmi: Remove container images.
--rmv: Remove volumes.
--keep-files: Keep deployment files (do not remove directory).
--list: List all available deployments.

3. `list-services`

List all available A2RCHI services and data sources.

Usage:

a2rchi list-services

4. `list-deployments`

List all existing A2RCHI deployments.

Usage:

a2rchi list-deployments

5. `evaluate`

Launch the benchmarking runtime to evaluate one or more configurations against a set of questions/answers.

Usage:

a2rchi evaluate --name <run_name> --env-file <secrets.env> --config <file.yaml> [OPTIONS]

Use --config-dir if you want to point to a directory of configs instead.

Options:

Supports the same flags as create (--sources, --podman, --gpu-ids, --tag, --hostmode, --verbosity, --force).
Reads configuration from one or more YAML files that should define the services.benchmarking section.

Examples

Create a deployment:

a2rchi create --name mybot --config my.yaml --env-file secrets.env --services chatbot,uploader

Delete a deployment and remove images/volumes:

a2rchi delete --name mybot --rmi --rmv

List all deployments:

a2rchi list-deployments

List all services:

a2rchi list-services

Configuration YAML API Reference

The A2RCHI configuration YAML file defines the deployment, services, data sources, pipelines, models, and interface settings for your A2RCHI instance.

Top-Level Fields

`name`

Type: string
Description: Name of the deployment.

`global`

DATA_PATH: path for persisted data (defaults to /root/data/).
ACCOUNTS_PATH: path for uploader/grader account data.
ACCEPTED_FILES: list of extensions allowed for manual uploads.
LOGGING.input_output_filename: log file that stores pipeline inputs/outputs.
verbosity: default logging level for services (0-4).

`services`

Holds configuration for every containerised service. Common keys include:

port / external_port: internal versus host port mapping for web apps.
host / hostname: network binding and public hostname for frontends.
volume/paths: template or static asset paths expected by the service.

Key services:

chat_app: Chat interface options (trained_on, ports, UI toggles).
uploader_app: Document uploader settings (verify_urls, ports).
grader_app: Grader-specific knobs (num_problems, rubric paths).
grafana: Port configuration for the monitoring dashboard.
chromadb: Connection details for the vector store container (chromadb_host, chromadb_port, chromadb_external_port).
postgres: Database credentials (user, database, port, host).
piazza, mattermost, redmine_mailbox, benchmarking, ...: Service-specific options (see user guide sections above).

`data_manager`

Controls ingestion sources and vector store behaviour.

sources.links.input_lists: .list files with seed URLs.
sources.links.scraper: Behaviour toggles for HTTP scraping (resetting data, URL verification, warning output).
sources..visible: Mark whether documents harvested from a source should appear in chat citations and other user-facing listings (true by default).
sources.git.enabled / sources.sso.enabled / sources.jira.enabled / sources.redmine.enabled: Toggle additional collectors when paired with --sources.
embedding_name: Embedding backend (OpenAIEmbeddings, HuggingFaceEmbeddings, ...).
embedding_class_map: Backend specific parameters (model name, device, similarity threshold).
chunk_size / chunk_overlap: Text splitter parameters.
reset_collection: Whether to wipe the collection before re-populating.
num_documents_to_retrieve: Top-k documents returned at query time.
distance_metric / use_hybrid_search / bm25_weight / semantic_weight / bm25.{k1,b}: Retrieval tuning knobs.
utils.anonymizer (legacy) / data_manager.utils.anonymizer: Redaction settings applied when ticket collectors anonymise content.

`a2rchi`

Defines pipelines and model routing.

pipelines: List of pipeline names to load (e.g., QAPipeline).
pipeline_map: Per-pipeline configuration of prompts, models, and token limits.
model_class_map: Definitions for each model family (base model names, provider-specific kwargs).
chain_update_time: Polling interval for hot-reloading chains.

`utils`

Utility configuration for supporting components (mostly legacy fallbacks):

sso: Global SSO defaults used when a source-specific override is not provided.
git: Legacy toggle for Git scraping.
jira / redmine: Compatibility settings for ticket integrations; prefer configuring these under data_manager.sources.

Required Fields

Some fields are required depending on enabled services and pipelines. For example:

name
data_manager.sources.links.input_lists (or other source-specific configuration)
a2rchi.pipelines and matching a2rchi.pipeline_map entries
Service-specific fields (e.g., services.piazza.network_id, services.grader_app.num_problems)

See the User Guide for more configuration examples and explanations.

Example

name: my_deployment
global:
  DATA_PATH: "/root/data/"
  ACCOUNTS_PATH: "/root/.accounts/"
  ACCEPTED_FILES: [".txt", ".pdf"]
  LOGGING:
    input_output_filename: "chain_input_output.log"
  verbosity: 3

data_manager:
  sources:
    links:
      input_lists:
        - examples/deployments/basic-gpu/miscellanea.list
      scraper:
        reset_data: true
        verify_urls: false
        enable_warnings: false
  utils:
    anonymizer:
      nlp_model: en_core_web_sm
  embedding_name: "OpenAIEmbeddings"
  chunk_size: 1000
  chunk_overlap: 0
  num_documents_to_retrieve: 5

a2rchi:
  pipelines: ["QAPipeline"]
  pipeline_map:
    QAPipeline:
      max_tokens: 10000
      prompts:
        required:
          condense_prompt: "examples/deployments/basic-gpu/condense.prompt"
          chat_prompt: "examples/deployments/basic-gpu/qa.prompt"
      models:
        required:
          condense_model: "OpenAIGPT4"
          chat_model: "OpenAIGPT4"
  model_class_map:
    OpenAIGPT4:
      class: OpenAIGPT4
      kwargs:
        model_name: gpt-4

services:
  chat_app:
    trained_on: "Course documentation"
    hostname: "example.mit.edu"
  chromadb:
    chromadb_host: "chromadb"

Tip: For a full template, see src/cli/templates/base-config.yaml in the repository.

API Reference

CLI

Commands

1. create

2. delete

3. list-services

4. list-deployments

5. evaluate

Examples

Configuration YAML API Reference

Top-Level Fields

name

global

services

data_manager

a2rchi

utils

Required Fields

Example

1. `create`

2. `delete`

3. `list-services`

4. `list-deployments`

5. `evaluate`

`name`

`global`

`services`

`data_manager`

`a2rchi`

`utils`