Advanced Setup & Deployment
Topics related to advanced setup and deployment of A2RCHI.
Configuring Podman
To ensure your Podman containers stay running for extended periods, you need to enable lingering. To do this, run:
loginctl enable-linger
To check or confirm the lingering status, run:
loginctl user-status | grep -m1 Linger
See the Red Hat documentation for additional context.
Running LLMs locally on your GPUs
There are a few additional system requirements for this to work:
- Make sure you have NVIDIA drivers installed.
- (Optional) For the containers where A2RCHI will run to access the GPUs, install the NVIDIA container toolkit.
- Configure the container runtime to access the GPUs.
For Podman
Run the following command:sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
Then list the devices:
nvidia-ctk cdi list
You should see output similar to:
INFO[0000] Found 9 CDI devices
...
nvidia.com/gpu=0
nvidia.com/gpu=1
...
nvidia.com/gpu=all
...
These listed "CDI devices" will be referenced to run A2RCHI on the GPUs, so make sure they are present. To learn more, consult the [Podman GPU documentation](https://podman-desktop.io/docs/podman/gpu).
For Docker
Run the following command:sudo nvidia-ctk runtime configure --runtime=docker
The remaining steps mirror the Podman flow. NOTE: this has not yet been fully tested with Docker. Refer to the [NVIDIA toolkit documentation](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#configuration) for details.
Once these requirements are met, the a2rchi create [...] --gpu-ids <gpus> option will deploy A2RCHI across your GPUs.
Helpful Notes for Production Deployments
You may wish to use the CLI in order to stage production deployments. This section covers some useful notes to keep in mind.
Running multiple deployments on the same machine
The CLI allows multiple deployments to run on the same daemon in the case of Docker (Podman has no daemon). The container networks between all the deployments are separate, so there is very little risk of them accidentally communicating with one another.
However, you need to be careful with the external ports. Suppose you're running two deployments and both of them are running the chat on external port 8000. There is no way to view both deployments at the same time from the same port, so instead you should forward the deployments to other external ports. Generally, this can be done in the configuration:
services:
chat_app:
external_port: 7862 # default is 7861
uploader_app:
external_port: 5004 # default is 5003
grafana:
external_port: 3001 # default is 3000
chromadb:
chromadb_external_port: 8001 # default is 8000
Persisting data between deployments
Volumes persist between deployments, so if you deploy an instance and upload additional documents, you do not need to redo this every time you deploy. If you are editing any data, explicitly remove this information from the volume, or remove the volume itself with:
docker/podman volume rm <volume-name>
To see what volumes are currently present, run:
docker/podman volume ls