-
Notifications
You must be signed in to change notification settings - Fork 172
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update readme and remove empty readme (#396)
* remove habana Signed-off-by: lvliang-intel <[email protected]> * Update REAME and remove empty README Signed-off-by: lvliang-intel <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: lvliang-intel <[email protected]> Co-authored-by: Sihan Chen <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
- Loading branch information
1 parent
da19c5d
commit a61e434
Showing
5 changed files
with
135 additions
and
4 deletions.
There are no files selected for viewing
Empty file.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
# Document Summary TGI Microservice | ||
|
||
In this microservice, we utilize LangChain to implement summarization strategies and facilitate LLM inference using Text Generation Inference on Intel Xeon and Gaudi2 processors. | ||
[Text Generation Inference](https://github.com/huggingface/text-generation-inference) (TGI) is a toolkit for deploying and serving Large Language Models (LLMs). TGI enables high-performance text generation for the most popular open-source LLMs, including Llama, Falcon, StarCoder, BLOOM, GPT-NeoX, and more. | ||
|
||
# 🚀1. Start Microservice with Python (Option 1) | ||
|
||
To start the LLM microservice, you need to install python packages first. | ||
|
||
## 1.1 Install Requirements | ||
|
||
```bash | ||
pip install -r requirements.txt | ||
``` | ||
|
||
## 1.2 Start LLM Service | ||
|
||
```bash | ||
export HF_TOKEN=${your_hf_api_token} | ||
docker run -p 8008:80 -v ./data:/data --name llm-docsum-tgi --shm-size 1g ghcr.io/huggingface/text-generation-inference:2.1.0 --model-id ${your_hf_llm_model} | ||
``` | ||
|
||
## 1.3 Verify the TGI Service | ||
|
||
```bash | ||
curl http://${your_ip}:8008/generate \ | ||
-X POST \ | ||
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \ | ||
-H 'Content-Type: application/json' | ||
``` | ||
|
||
## 1.4 Start LLM Service with Python Script | ||
|
||
```bash | ||
export TGI_LLM_ENDPOINT="http://${your_ip}:8008" | ||
python llm.py | ||
``` | ||
|
||
# 🚀2. Start Microservice with Docker (Option 2) | ||
|
||
If you start an LLM microservice with docker, the `docker_compose_llm.yaml` file will automatically start a TGI/vLLM service with docker. | ||
|
||
## 2.1 Setup Environment Variables | ||
|
||
In order to start TGI and LLM services, you need to setup the following environment variables first. | ||
|
||
```bash | ||
export HF_TOKEN=${your_hf_api_token} | ||
export TGI_LLM_ENDPOINT="http://${your_ip}:8008" | ||
export LLM_MODEL_ID=${your_hf_llm_model} | ||
``` | ||
|
||
## 2.2 Build Docker Image | ||
|
||
```bash | ||
cd ../../ | ||
docker build -t opea/llm-docsum-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/summarization/tgi/Dockerfile . | ||
``` | ||
|
||
To start a docker container, you have two options: | ||
|
||
- A. Run Docker with CLI | ||
- B. Run Docker with Docker Compose | ||
|
||
You can choose one as needed. | ||
|
||
## 2.3 Run Docker with CLI (Option A) | ||
|
||
```bash | ||
docker run -d --name="llm-docsum-tgi-server" -p 9000:9000 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e TGI_LLM_ENDPOINT=$TGI_LLM_ENDPOINT -e HF_TOKEN=$HF_TOKEN opea/llm-docsum-tgi:latest | ||
``` | ||
|
||
## 2.4 Run Docker with Docker Compose (Option B) | ||
|
||
```bash | ||
docker compose -f docker_compose_llm.yaml up -d | ||
``` | ||
|
||
# 🚀3. Consume LLM Service | ||
|
||
## 3.1 Check Service Status | ||
|
||
```bash | ||
curl http://${your_ip}:9000/v1/health_check\ | ||
-X GET \ | ||
-H 'Content-Type: application/json' | ||
``` | ||
|
||
## 3.2 Consume LLM Service | ||
|
||
```bash | ||
curl http://${your_ip}:9000/v1/chat/docsum \ | ||
-X POST \ | ||
-d '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}' \ | ||
-H 'Content-Type: application/json' | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# Introduction | ||
|
||
Chroma is a AI-native open-source vector database focused on developer productivity and happiness. Chroma is licensed under Apache 2.0. Chroma runs in various modes, we can deploy it as a server running your local machine or in the cloud. | ||
|
||
# Getting Started | ||
|
||
## Start Chroma Server | ||
|
||
To start the Chroma server on your local machine, follow these steps: | ||
|
||
```bash | ||
git clone https://github.com/chroma-core/chroma.git | ||
cd chroma | ||
docker compose up -d | ||
``` | ||
|
||
## Start Log Output | ||
|
||
Upon starting the server, you should see log outputs similar to the following: | ||
|
||
```log | ||
server-1 | Starting 'uvicorn chromadb.app:app' with args: --workers 1 --host 0.0.0.0 --port 8000 --proxy-headers --log-config chromadb/log_config.yml --timeout-keep-alive 30 | ||
server-1 | INFO: [02-08-2024 07:03:19] Set chroma_server_nofile to 65536 | ||
server-1 | INFO: [02-08-2024 07:03:19] Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information. | ||
server-1 | DEBUG: [02-08-2024 07:03:19] Starting component System | ||
server-1 | DEBUG: [02-08-2024 07:03:19] Starting component OpenTelemetryClient | ||
server-1 | DEBUG: [02-08-2024 07:03:19] Starting component SqliteDB | ||
server-1 | DEBUG: [02-08-2024 07:03:19] Starting component QuotaEnforcer | ||
server-1 | DEBUG: [02-08-2024 07:03:19] Starting component Posthog | ||
server-1 | DEBUG: [02-08-2024 07:03:19] Starting component LocalSegmentManager | ||
server-1 | DEBUG: [02-08-2024 07:03:19] Starting component SegmentAPI | ||
server-1 | INFO: [02-08-2024 07:03:19] Started server process [1] | ||
server-1 | INFO: [02-08-2024 07:03:19] Waiting for application startup. | ||
server-1 | INFO: [02-08-2024 07:03:19] Application startup complete. | ||
server-1 | INFO: [02-08-2024 07:03:19] Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) | ||
``` |