API Reference
CLI
The A2RCHI CLI provides commands to create, manage, and delete A2RCHI deployments and services.
Commands
1. create
Create a new A2RCHI deployment.
Usage:
a2rchi create --name <deployment_name> --config <config.yaml> --env-file <secrets.env> [OPTIONS]
Options:
--name, -n(str, required): Name of the deployment.--config, -c(str): Path to a YAML configuration file (repeat the flag to supply multiple files).--config-dir, -cd(str): Directory containing configuration files.--env-file, -e(str, required): Path to the secrets.envfile.--services, -s(comma-separated, required): List of services to enable (e.g.,chatbot,uploader).--sources, -src(comma-separated): Additional data sources to enable (e.g.,git,jira). Thelinkssource is always available.--podman, -p: Use Podman instead of Docker.--gpu-ids: GPU configuration (allor comma-separated IDs).--tag, -t(str): Image tag for built containers (default:2000).--hostmode: Use host network mode.--verbosity, -v(int): Logging verbosity (0-4, default: 3).--force, -f: Overwrite existing deployment if it exists.--dry, --dry-run: Validate and show what would be created, but do not deploy.
2. delete
Delete an existing A2RCHI deployment.
Usage:
a2rchi delete --name <deployment_name> [OPTIONS]
Options:
--name, -n(str): Name of the deployment to delete.--rmi: Remove container images.--rmv: Remove volumes.--keep-files: Keep deployment files (do not remove directory).--list: List all available deployments.
3. list-services
List all available A2RCHI services and data sources.
Usage:
a2rchi list-services
4. list-deployments
List all existing A2RCHI deployments.
Usage:
a2rchi list-deployments
5. evaluate
Launch the benchmarking runtime to evaluate one or more configurations against a set of questions/answers.
Usage:
a2rchi evaluate --name <run_name> --env-file <secrets.env> --config <file.yaml> [OPTIONS]
Use --config-dir if you want to point to a directory of configs instead.
Options:
- Supports the same flags as
create(--sources,--podman,--gpu-ids,--tag,--hostmode,--verbosity,--force). - Reads configuration from one or more YAML files that should define the
services.benchmarkingsection.
Examples
Create a deployment:
a2rchi create --name mybot --config my.yaml --env-file secrets.env --services chatbot,uploader
Delete a deployment and remove images/volumes:
a2rchi delete --name mybot --rmi --rmv
List all deployments:
a2rchi list-deployments
List all services:
a2rchi list-services
Configuration YAML API Reference
The A2RCHI configuration YAML file defines the deployment, services, data sources, pipelines, models, and interface settings for your A2RCHI instance.
Top-Level Fields
name
- Type: string
- Description: Name of the deployment.
global
- DATA_PATH: path for persisted data (defaults to
/root/data/). - ACCOUNTS_PATH: path for uploader/grader account data.
- ACCEPTED_FILES: list of extensions allowed for manual uploads.
- LOGGING.input_output_filename: log file that stores pipeline inputs/outputs.
- verbosity: default logging level for services (0-4).
services
Holds configuration for every containerised service. Common keys include:
- port / external_port: internal versus host port mapping for web apps.
- host / hostname: network binding and public hostname for frontends.
- volume/paths: template or static asset paths expected by the service.
Key services:
- chat_app: Chat interface options (
trained_on, ports, UI toggles). - uploader_app: Document uploader settings (
verify_urls, ports). - grader_app: Grader-specific knobs (
num_problems, rubric paths). - grafana: Port configuration for the monitoring dashboard.
- chromadb: Connection details for the vector store container (
chromadb_host,chromadb_port,chromadb_external_port). - postgres: Database credentials (
user,database,port,host). - piazza, mattermost, redmine_mailbox, benchmarking, ...: Service-specific options (see user guide sections above).
data_manager
Controls ingestion sources and vector store behaviour.
- sources.links.input_lists:
.listfiles with seed URLs. - sources.links.scraper: Behaviour toggles for HTTP scraping (resetting data, URL verification, warning output).
- sources.
.visible: Mark whether documents harvested from a source should appear in chat citations and other user-facing listings (trueby default). - sources.git.enabled / sources.sso.enabled / sources.jira.enabled / sources.redmine.enabled: Toggle additional collectors when paired with
--sources. - embedding_name: Embedding backend (
OpenAIEmbeddings,HuggingFaceEmbeddings, ...). - embedding_class_map: Backend specific parameters (model name, device, similarity threshold).
- chunk_size / chunk_overlap: Text splitter parameters.
- reset_collection: Whether to wipe the collection before re-populating.
- num_documents_to_retrieve: Top-k documents returned at query time.
- distance_metric / use_hybrid_search / bm25_weight / semantic_weight / bm25.{k1,b}: Retrieval tuning knobs.
- utils.anonymizer (legacy) / data_manager.utils.anonymizer: Redaction settings applied when ticket collectors anonymise content.
a2rchi
Defines pipelines and model routing.
- pipelines: List of pipeline names to load (e.g.,
QAPipeline). - pipeline_map: Per-pipeline configuration of prompts, models, and token limits.
- model_class_map: Definitions for each model family (base model names, provider-specific kwargs).
- chain_update_time: Polling interval for hot-reloading chains.
utils
Utility configuration for supporting components (mostly legacy fallbacks):
- sso: Global SSO defaults used when a source-specific override is not provided.
- git: Legacy toggle for Git scraping.
- jira / redmine: Compatibility settings for ticket integrations; prefer configuring these under
data_manager.sources.
Required Fields
Some fields are required depending on enabled services and pipelines. For example:
namedata_manager.sources.links.input_lists(or other source-specific configuration)a2rchi.pipelinesand matchinga2rchi.pipeline_mapentries- Service-specific fields (e.g.,
services.piazza.network_id,services.grader_app.num_problems)
See the User Guide for more configuration examples and explanations.
Example
name: my_deployment
global:
DATA_PATH: "/root/data/"
ACCOUNTS_PATH: "/root/.accounts/"
ACCEPTED_FILES: [".txt", ".pdf"]
LOGGING:
input_output_filename: "chain_input_output.log"
verbosity: 3
data_manager:
sources:
links:
input_lists:
- examples/deployments/basic-gpu/miscellanea.list
scraper:
reset_data: true
verify_urls: false
enable_warnings: false
utils:
anonymizer:
nlp_model: en_core_web_sm
embedding_name: "OpenAIEmbeddings"
chunk_size: 1000
chunk_overlap: 0
num_documents_to_retrieve: 5
a2rchi:
pipelines: ["QAPipeline"]
pipeline_map:
QAPipeline:
max_tokens: 10000
prompts:
required:
condense_prompt: "examples/deployments/basic-gpu/condense.prompt"
chat_prompt: "examples/deployments/basic-gpu/qa.prompt"
models:
required:
condense_model: "OpenAIGPT4"
chat_model: "OpenAIGPT4"
model_class_map:
OpenAIGPT4:
class: OpenAIGPT4
kwargs:
model_name: gpt-4
services:
chat_app:
trained_on: "Course documentation"
hostname: "example.mit.edu"
chromadb:
chromadb_host: "chromadb"
Tip:
For a full template, see src/cli/templates/base-config.yaml in
the repository.