Update AutoGen documentation and notebook example (#540)

* Update AutoGen documentation

* Update webui.md

* Update webui.md

* Update lmstudio.md

* Update lmstudio.md

* Update mkdocs.yml

* Update README.md

* Update README.md

* Update README.md

* Update autogen.md

* Update local_llm.md

* Update local_llm.md

* Update autogen.md

* Update autogen.md

* Update autogen.md

* refreshed the autogen examples + notebook (notebook is untested)

* unrelated patch of typo I noticed

* poetry remove pyautogen, then manually removed autogen extra in .toml

* add pdf dependency

---------

Co-authored-by: Sarah Wooders <sarahwooders@gmail.com>
This commit is contained in:
Charles Packer 2023-11-30 17:45:04 -08:00 committed by GitHub
parent 2adc75d10b
commit ec7fa25c07
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
15 changed files with 492 additions and 630 deletions

View File

@ -6,7 +6,7 @@
<strong>Try out our MemGPT chatbot on <a href="https://discord.gg/9GEQrxmVyE">Discord</a>!</strong>
<strong>⭐ NEW: You can now run MemGPT with <a href="https://memgpt.readthedocs.io/en/latest/local_llm/">local LLMs</a> and <a href="https://github.com/cpacker/MemGPT/discussions/65">AutoGen</a>! ⭐ </strong>
<strong>⭐ NEW: You can now run MemGPT with <a href="https://memgpt.readthedocs.io/en/latest/local_llm/">local LLMs</a> and <a href="https://memgpt.readthedocs.io/en/latest/autogen/">AutoGen</a>! ⭐ </strong>
[![Discord](https://img.shields.io/discord/1161736243340640419?label=Discord&logo=discord&logoColor=5865F2&style=flat-square&color=5865F2)](https://discord.gg/9GEQrxmVyE)

View File

@ -1,101 +1,231 @@
## MemGPT + Autogen
!!! warning "Need help?"
If you need help visit our [Discord server](https://discord.gg/9GEQrxmVyE) and post in the #support channel.
You can also check the [GitHub discussion page](https://github.com/cpacker/MemGPT/discussions/65), but the Discord server is the official support channel and is monitored more actively.
[examples/agent_groupchat.py](https://github.com/cpacker/MemGPT/blob/main/memgpt/autogen/examples/agent_groupchat.py) contains an example of a groupchat where one of the agents is powered by MemGPT.
If you are using OpenAI, you can also run it using the [example notebook](https://github.com/cpacker/MemGPT/blob/main/memgpt/autogen/examples/memgpt_coder_autogen.ipynb).
In the next section, we detail how to set up MemGPT+Autogen to run with local LLMs.
In the next section, we detail how to set up MemGPT and Autogen to run with local LLMs.
## Example: connecting Autogen + MemGPT to non-OpenAI LLMs (using oobabooga web UI)
## Connect Autogen + MemGPT to non-OpenAI LLMs (AutoGen+MemGPT+OpenLLMs)
!!! warning "Enable the OpenAI extension"
In WebUI enable the [openai extension](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai)! This is for non-MemGPT Autogen agents.
In web UI make sure to enable the [openai extension](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai)!
This is enabled by default in newer versions of web UI, but must be enabled manually in older versions of web UI.
To get MemGPT to work with a local LLM, you need to have the LLM running on a server that takes API requests.
To get MemGPT to work with a local LLM, you need to have an LLM running on a server that takes API requests.
For the purposes of this example, we're going to serve (host) the LLMs using [oobabooga web UI](https://github.com/oobabooga/text-generation-webui#starting-the-web-ui), but if you want to use something else you can! This also assumes your running web UI locally - if you're running on e.g. Runpod, you'll want to follow Runpod specific instructions (for example use [TheBloke's one-click UI and API](https://github.com/TheBlokeAI/dockerLLM/blob/main/README_Runpod_LocalLLMsUIandAPI.md))
For the purposes of this example, we're going to serve (host) the LLMs using [oobabooga web UI](https://github.com/oobabooga/text-generation-webui#starting-the-web-ui), but if you want to use something else you can! This also assumes your running web UI locally - if you're running on e.g. Runpod, you'll want to follow Runpod specific instructions (for example use [TheBloke's one-click UI and API](https://github.com/TheBlokeAI/dockerLLM/blob/main/README_Runpod_LocalLLMsUIandAPI.md)).
### Part 1: Get AutoGen working
1. Install oobabooga web UI using the instructions [here](https://github.com/oobabooga/text-generation-webui#starting-the-web-ui)
2. Once installed, launch the web server with `python server.py`
3. Navigate to the web app (if local, this is probably [`http://127.0.0.1:7860`](http://localhost:7860)), select the model you want to use, adjust your GPU and CPU memory settings, and click "load"
4. After the model is successfully loaded, navigate to the "Session" tab, and select and enable the "openai" extension. Then click "Apply flags/extensions and restart". The WebUI will then restart.
5. Once the WebUI has reloaded, double-check that your selected model and parameters are still selected -- If not, then select your model and re-apply your settings and click "load" once more.
5. Assuming steps 1-4 went correctly, the LLM is now properly hosted on a port you can point MemGPT to!
### Part 1: Get web UI working
Install web UI and get a model set up on a local web server. You can use [our instructions on setting up web UI](https://memgpt.readthedocs.io/en/latest/webui/).
!!! warning "Choosing an LLM / model to use"
You'll need to decide on an LLM / model to use with web UI.
MemGPT requires an LLM that is good at function calling to work well - if the LLM is bad at function calling, **MemGPT will not work properly**.
Visit [our Discord server](https://discord.gg/9GEQrxmVyE) and check the #model-chat channel for an up-to-date list of recommended LLMs / models to use with MemGPT.
### Part 2: Get MemGPT working
1. In your terminal where you're running MemGPT (depending if you are on macOS or Windows), run either of the following:
Before trying to integrate MemGPT with AutoGen, make sure that you can run MemGPT by itself with the web UI backend.
***(Running WebUI locally)***
Try setting up MemGPT with your local web UI backend [using the instructions here](https://memgpt.readthedocs.io/en/latest/local_llm/#using-memgpt-with-local-llms).
For macOS :
Once you've confirmed that you're able to chat with a MemGPT agent using `memgpt configure` and `memgpt run`, you're ready to move on to the next step.
!!! warning "Using RunPod as an LLM backend"
If you're using RunPod to run web UI, make sure that you set your endpoint to the RunPod IP address, **not the default localhost address**.
For example, during `memgpt configure`: `? Enter default endpoint: https://yourpodaddresshere-5000.proxy.runpod.net`
### Part 3: Creating a MemGPT AutoGen agent (groupchat example)
Now we're going to integrate MemGPT and AutoGen by creating a special "MemGPT AutoGen agent" that wraps MemGPT in an AutoGen-style agent interface.
First, make sure you have AutoGen installed:
```sh
# the default port will be 5000
export OPENAI_API_BASE=http://127.0.0.1:5000
export BACKEND_TYPE=webui
pip install pyautogen
```
For Windows (while using PowerShell & running WebUI locally):
```sh
$env:OPENAI_API_BASE = "http://127.0.0.1:5000"
$env:BACKEND_TYPE = "webui"
```
***(Running WebUI on Runpod)***
For macOS :
```sh
export OPENAI_API_BASE=https://yourpodaddresshere-5000.proxy.runpod.net
export BACKEND_TYPE=webui
```
For Windows (while using PowerShell):
```sh
$env:OPENAI_API_BASE = "https://yourpodaddresshere-5000.proxy.runpod.net"
$env:BACKEND_TYPE = "webui"
```
### Important Notes
- When exporting/setting the environment variables: Ensure that you do NOT include `/v1` as part of the address. MemGPT will automatically append the /v1 to the address.
- For non-MemGPT Autogen agents: the [config](https://github.com/cpacker/MemGPT/blob/main/memgpt/autogen/examples/agent_groupchat.py#L38) should specify `/v1` in the address.
- Make sure you are using port 5000 (unless configured otherwise) when exporting the environment variables. MemGPT uses the non-OpenAI API, which is by default on port 5000 for WebUI.
- In the following steps, you will finish configuring Autogen to work with MemGPT+OpenLLMs. There is a `config_list` that will state to include `/v1` as part of the LocalHost address, as well as using port 5001 (instead of port 5000) which you must keep included (this is separate from the MemGPT `OPENAI_API_BASE` address you exported earlier, so AutoGen can connect to port 5001, which the "/v1" must remain).
WebUI exposes a lot of parameters that can dramatically change LLM outputs, to change these you can modify the [WebUI settings file](https://github.com/cpacker/MemGPT/blob/main/memgpt/local_llm/webui/settings.py).
⁉️ If you have problems getting WebUI setup, please use the [official web UI repo for support](https://github.com/oobabooga/text-generation-webui)! There will be more answered questions about web UI there vs here on the MemGPT repo.
### Example groupchat
Going back to the example we first mentioned, [examples/agent_groupchat.py](https://github.com/cpacker/MemGPT/blob/main/memgpt/autogen/examples/agent_groupchat.py) contains an example of a groupchat where one of the agents is powered by MemGPT.
In order to run this example on a local LLM, go to lines 32-55 in [examples/agent_groupchat.py](https://github.com/cpacker/MemGPT/blob/main/memgpt/autogen/examples/agent_groupchat.py) and fill in the config files with your local LLM's deployment details. For example, if you are using webui, it will look something like this:
In order to run this example on a local LLM, go to lines 46-66 in [examples/agent_groupchat.py](https://github.com/cpacker/MemGPT/blob/main/memgpt/autogen/examples/agent_groupchat.py) and fill in the config files with your local LLM's deployment details.
`config_list` is used by non-MemGPT AutoGen agents, which expect an OpenAI-compatible API. `config_list_memgpt` is used by MemGPT AutoGen agents, and requires additional settings specific to MemGPT (such as the `model_wrapper` and `context_window`.
For example, if you are using web UI, it will look something like this:
```python
# Non-MemGPT agents will still use local LLMs, but they will use the ChatCompletions endpoint
config_list = [
{
"model": "NULL", # not needed
"api_base": "http://127.0.0.1:5001/v1", # notice port 5001 for web UI
"api_key": "NULL", # not needed
"api_type": "open_ai",
},
]
# MemGPT-powered agents will also use local LLMs, but they need additional setup (also they use the Completions endpoint)
config_list_memgpt = [
{
"preset": DEFAULT_PRESET,
"model": None, # not required for web UI, only required for Ollama, see: https://memgpt.readthedocs.io/en/latest/ollama/
"model_wrapper": "airoboros-l2-70b-2.1", # airoboros is the default wrapper and should work for most models
"model_endpoint_type": "webui",
"model_endpoint": "http://localhost:5000", # notice port 5000 for web UI
"context_window": 8192, # the context window of your model (for Mistral 7B-based models, it's likely 8192)
},
]
```
config_list = [
{
"model": "dolphin-2.1-mistral-7b", # this indicates the MODEL, not the WRAPPER (no concept of wrappers for AutoGen)
"api_base": "http://127.0.0.1:5001/v1"
"api_key": "NULL", # this is a placeholder
"api_type": "open_ai",
},
]
config_list_memgpt = [
{
"model": "airoboros-l2-70b-2.1", # this specifies the WRAPPER MemGPT will use, not the MODEL
},
]
If you are using LM Studio, then you'll need to change the `api_base` in `config_list`, and `model_endpoint_type` + `model_endpoint` in `config_list_memgpt`:
```python
# Non-MemGPT agents will still use local LLMs, but they will use the ChatCompletions endpoint
config_list = [
{
"model": "NULL",
"api_base": "http://127.0.0.1:1234/v1", # port 1234 for LM Studio
"api_key": "NULL",
"api_type": "open_ai",
},
]
# MemGPT-powered agents will also use local LLMs, but they need additional setup (also they use the Completions endpoint)
config_list_memgpt = [
{
"preset": DEFAULT_PRESET,
"model": None,
"model_wrapper": "airoboros-l2-70b-2.1",
"model_endpoint_type": "lmstudio",
"model_endpoint": "http://localhost:1234", # port 1234 for LM Studio
"context_window": 8192,
},
]
```
`config_list` is used by non-MemGPT agents, which expect an OpenAI-compatible API.
`config_list_memgpt` is used by MemGPT agents. Currently, MemGPT interfaces with the LLM backend by exporting `OPENAI_API_BASE` and `BACKEND_TYPE` as described above. Note that MemGPT does not use the OpenAI-compatible API (it uses the direct API).
If you are using the OpenAI API (e.g. using `gpt-4-turbo` via your own OpenAI API account), then the `config_list` for the AutoGen agent and `config_list_memgpt` for the MemGPT AutoGen agent will look different (a lot simpler):
```python
# This config is for autogen agents that are not powered by MemGPT
config_list = [
{
"model": "gpt-4-1106-preview", # gpt-4-turbo (https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo)
"api_key": os.getenv("OPENAI_API_KEY"),
}
]
If you're using WebUI and want to run the non-MemGPT agents with a local LLM instead of OpenAI, enable the [openai extension](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai) and point `config_list`'s `api_base` to the appropriate URL (usually port 5001).
Then, for MemGPT agents, export `OPENAI_API_BASE` and `BACKEND_TYPE` as described in [Local LLM support](../local_llm) (usually port 5000).
# This config is for autogen agents that powered by MemGPT
config_list_memgpt = [
{
"model": "gpt-4-1106-preview", # gpt-4-turbo (https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo)
"preset": DEFAULT_PRESET,
"model": None,
"model_wrapper": None,
"model_endpoint_type": None,
"model_endpoint": None,
"context_window": 128000, # gpt-4-turbo
},
]
```
!!! warning "Making internal monologue visible to AutoGen"
By default, MemGPT's inner monologue and function traces are hidden from other AutoGen agents.
You can modify `interface_kwargs` to change the visibility of inner monologue and function calling:
```python
interface_kwargs = {
"debug": False, # this is the equivalent of the --debug flag in the MemGPT CLI
"show_inner_thoughts": True, # this controls if internal monlogue will show up in AutoGen MemGPT agent's outputs
"show_function_outputs": True, # this controls if function traces will show up in AutoGen MemGPT agent's outputs
}
```
The only parts of the `agent_groupchat.py` file you need to modify should be the `config_list` and `config_list_memgpt` (make sure to change `USE_OPENAI` to `True` or `False` depending on if you're trying to use a local LLM server like web UI, or OpenAI's API). Assuming you edited things correctly, you should now be able to run `agent_groupchat.py`:
```sh
python memgpt/autogen/examples/agent_groupchat.py
```
Your output should look something like this:
```text
User_proxy (to chat_manager):
I want to design an app to make me one million dollars in one month. Yes, your heard that right.
--------------------------------------------------------------------------------
Product_manager (to chat_manager):
Creating an app or software product that can generate one million dollars in one month is a highly ambitious goal. To achieve such a significant financial outcome quickly, your app idea needs to appeal to a broad audience, solve a significant problem, create immense value, and have a solid revenue model. Here are a few steps and considerations that might help guide you towards that goal:
1. **Identify a Niche Market or Trend:** Look for emerging trends or underserved niches that are gaining traction. This could involve addressing new consumer behaviors, leveraging new technologies, or entering a rapidly growing industry.
2. **Solve a Real Problem:** Focus on a problem that affects a large number of people or businesses and offer a unique, effective solution. The more painful the problem, the more willing customers will be to pay for a solution.
3. **Monetization Strategy:** Decide how you will make money from your app. Common strategies include paid downloads, in-app purchases, subscription models, advertising, or a freemium model with premium features.
4. **Viral Mechanism:** Design your app so that it encourages users to share it with others, either through inherent network effects (e.g., social media platforms) or through incentives (e.g., referral programs).
5. **Marketing Campaign:** Even the best app can't make money if people don't know about it. Plan a robust marketing campaign to launch your app, using social media, influencer partnerships, press releases, and advertising.
6. **Rapid Iteration and Scaling:** Be prepared to iterate rapidly based on user feedback and scale quickly to accommodate user growth. The faster you can improve and grow, the more likely it is you'll reach your revenue target.
7. **Partnerships and Alliances:** Partner with other companies or influencers who can market your product to their user base. This could provide a significant boost to your initial user acquisition.
8. **Compliance and Security:** Ensure that your app complies with all legal requirements and has high standards of privacy and security, especially if you are handling sensitive user data.
Here are a few app ideas that have the potential to be lucrative if well executed:
- **Health and Wellness Platform:** An app that uses AI to personalize workout and nutrition plans, with a community feature for motivation and support. Monetize through subscription and premium features.
- **FinTech Solution:** An investment or savings app that simplifies the process of cryptocurrency trading or micro-investment. Make money through transaction fees or subscription services.
- **Educational Platform:** Offer a unique learning experience with expert-created content for specific skills in high demand, such as coding, design, or digital marketing. Use a subscription model with tiered pricing.
- **AR/VR Experiences:** Develop an app that provides immersive experiences for entertainment, education, or practical purposes like interior design. Charge for the app itself or offer in-app purchases.
- **Marketplace or Gig Economy App:** Create a platform that matches freelancers or service providers with people who need their services. Revenue could come from taking a cut of the transactions.
Remember, achieving one million dollars in revenue in such a short time frame would require not only a highly appealing and innovative product but also flawless execution, significant marketing efforts, and perhaps a bit of luck. Be realistic about your goals and focus on building a sustainable business that provides real value over the long term.
--------------------------------------------------------------------------------
MemGPT_coder (to chat_manager):
Great goal! Generating a million dollars in one month with an app is ambitious, but definitely doable if you approach it the right way. Here are some tips and potential ideas that could help:
1. Identify a niche market or trend (for example, AI-powered fitness apps or FinTech solutions).
2. Solve a significant problem for many people (such as time management or financial literacy).
3. Choose an effective monetization strategy like subscriptions, in-app purchases, or advertising.
4. Make sure your app is visually appealing and easy to use to keep users engaged.
Some ideas that might work:
- AI-powered personal finance management app
- A virtual assistant app that helps people manage their daily tasks
- A social networking platform for job seekers or freelancers
Remember, success often comes from focusing on a specific problem and delivering a unique solution. Good luck!
--------------------------------------------------------------------------------
>>>>>>>> USING AUTO REPLY...
User_proxy (to chat_manager):
...
```
### Part 4: Attaching documents to MemGPT AutoGen agents
## Loading documents
[examples/agent_docs.py](https://github.com/cpacker/MemGPT/blob/main/memgpt/autogen/examples/agent_docs.py) contains an example of a groupchat where the MemGPT autogen agent has access to documents.
First, follow the instructions in [Example - chat with your data - Creating an external data source](../example_data/#creating-an-external-data-source):
@ -114,36 +244,33 @@ memgpt load directory --name memgpt_research_paper --input-files=memgpt_research
loading data
done loading data
LLM is explicitly disabled. Using MockLLM.
LLM is explicitly disabled. Using MockLLM.
Parsing documents into nodes: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 392.09it/s]
Generating embeddings: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 65/65 [00:01<00:00, 37.34it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 65/65 [00:00<00:00, 388361.48it/s]
Saved local /home/user/.memgpt/archival/memgpt_research_paper/nodes.pkl
Parsing documents into nodes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 321.56it/s]
Generating embeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 65/65 [00:01<00:00, 43.22it/s]
100%|██████████████████████████████████████████████
```
Note: you can ignore the "_LLM is explicitly disabled_" message.
Now, you can run `agent_docs.py`, which asks `MemGPT_coder` what a virtual context is:
```sh
python memgpt/autogen/examples/agent_docs.py
```
python3 agent_docs.py
LLM is explicitly disabled. Using MockLLM.
LLM is explicitly disabled. Using MockLLM.
LLM is explicitly disabled. Using MockLLM.
Generating embeddings: 0it [00:00, ?it/s]
new size 60
Saved local /Users/vivian/.memgpt/agents/agent_25/persistence_manager/index/nodes.pkl
Attached data source memgpt_research_paper to agent agent_25, consisting of 60. Agent now has 60 embeddings in archival memory.
LLM is explicitly disabled. Using MockLLM.
```text
Ingesting 65 passages into MemGPT_agent
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.47s/it]
Attached data source memgpt_research_paper to agent MemGPT_agent, consisting of 65. Agent now has 2015 embeddings in archival memory.
User_proxy (to chat_manager):
Tell me what a virtual context in MemGPT is. Search your archival memory.
Tell me what virtual context in MemGPT is. Search your archival memory.
--------------------------------------------------------------------------------
GroupChat is underpopulated with 2 agents. Direct communication would be more efficient.
MemGPT_coder (to chat_manager):
MemGPT_agent (to chat_manager):
Virtual context management is a technique used in large language models like MemGPT. It's used to handle context beyond limited context windows, which is crucial for tasks such as extended conversations and document analysis. The technique was inspired by hierarchical memory systems in traditional operating systems that provide the appearance of large memory resources through data movement between fast and slow memory. This system intelligently manages different memory tiers to effectively provide extended context within the model's limited context window.
[inner thoughts] The user asked about virtual context in MemGPT. Let's search the archival memory with this query.
[inner thoughts] Virtual context management is a technique used in large language models like MemGPT. It's used to handle context beyond limited context windows, which is crucial for tasks such as extended conversations and document analysis. The technique was inspired by hierarchical memory systems in traditional operating systems that provide the appearance of large memory resources through data movement between fast and slow memory. This system intelligently manages different memory tiers to effectively provide extended context within the model's limited context window.
--------------------------------------------------------------------------------
...

View File

@ -2,11 +2,13 @@
!!! warning "Important LM Studio settings"
Make sure that "context length" is set (inside LM Studio's "Model Configuration" panel) to the max context length of the model you're using (e.g. 8000 for Mistral 7B variants).
**Context length**: Make sure that "context length" (`n_ctx`) is set (in "Model initialization" on the right hand side "Server Model Settings" panel) to the max context length of the model you're using (e.g. 8000 for Mistral 7B variants).
If you see "Prompt Formatting" (inside LM Studio's "Server Options" panel), turn it **OFF**. Leaving it **ON** will break MemGPT.
**Automatic Prompt Formatting = OFF**: If you see "Automatic Prompt Formatting" inside LM Studio's "Server Options" panel (on the left side), turn it **OFF**. Leaving it **ON** will break MemGPT.
![image](https://github.com/cpacker/MemGPT/assets/5475622/74fd5e4d-a549-482d-b9f5-44b1829f41a8)
**Context Overflow Policy = Stop at limit**: If you see "Context Overflow Policy" inside LM Studio's "Tools" panel on the right side (below "Server Model Settings"), set it to **Stop at limit**. The default setting "Keep the system prompt ... truncate middle" will break MemGPT.
<img width="911" alt="image" src="https://github.com/cpacker/MemGPT/assets/5475622/d499e82e-348c-4468-9ea6-fd15a13eb7fa">
1. Download [LM Studio](https://lmstudio.ai/) and the model you want to test with
2. Go to the "local inference server" tab, load the model and configure your settings (make sure to set the context length to something reasonable like 8k!)

View File

@ -1,11 +1,21 @@
## Using MemGPT with local LLMs
!!! warning "Need help?"
If you need help visit our [Discord server](https://discord.gg/9GEQrxmVyE) and post in the #support channel.
You can also check the [GitHub discussion page](https://github.com/cpacker/MemGPT/discussions/67), but the Discord server is the official support channel and is monitored more actively.
!!! warning "MemGPT + local LLM failure cases"
When using open LLMs with MemGPT, **the main failure case will be your LLM outputting a string that cannot be understood by MemGPT**. MemGPT uses function calling to manage memory (eg `edit_core_memory(...)` and interact with the user (`send_message(...)`), so your LLM needs generate outputs that can be parsed into MemGPT function calls.
Make sure to check the [local LLM troubleshooting page](../local_llm_faq) to see common issues before raising a new issue or posting on Discord.
!!! warning "Recommended LLMs / models"
To see a list of recommended LLMs to use with MemGPT, visit our [Discord server](https://discord.gg/9GEQrxmVyE) and check the #model-chat channel.
### Installing dependencies
To install dependencies required for running local models, run:
```

View File

@ -1,14 +1,12 @@
### MemGPT + web UI
!!! warning "Important web UI settings"
!!! warning "web UI troubleshooting"
If you have problems getting web UI set up, please use the [official web UI repo for support](https://github.com/oobabooga/text-generation-webui)! There will be more answered questions about web UI there vs here on the MemGPT repo.
Do **NOT** enable any extensions in web UI, including the [openai extension](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai)! Just run web UI as-is, unless you are running [MemGPT+Autogen](https://github.com/cpacker/MemGPT/tree/main/memgpt/autogen) with non-MemGPT agents.
To get MemGPT to work with a local LLM, you need to have the LLM running on a server that takes API requests.
For the purposes of this example, we're going to serve (host) the LLMs using [oobabooga web UI](https://github.com/oobabooga/text-generation-webui#starting-the-web-ui), but if you want to use something else you can! This also assumes your running web UI locally - if you're running on e.g. Runpod, you'll want to follow Runpod specific instructions (for example use [TheBloke's one-click UI and API](https://github.com/TheBlokeAI/dockerLLM/blob/main/README_Runpod_LocalLLMsUIandAPI.md))
In this example we'll set up [oobabooga web UI](https://github.com/oobabooga/text-generation-webui#starting-the-web-ui) locally - if you're running on a remote service like Runpod, you'll want to follow Runpod specific instructions for installing web UI and determining your endpoint IP address (for example use [TheBloke's one-click UI and API](https://github.com/TheBlokeAI/dockerLLM/blob/main/README_Runpod_LocalLLMsUIandAPI.md)).
1. Install oobabooga web UI using the instructions [here](https://github.com/oobabooga/text-generation-webui#starting-the-web-ui)
2. Once installed, launch the web server with `python server.py`

View File

@ -1,28 +1,3 @@
# MemGPT + Autogen examples
[examples/agent_groupchat.py](examples/agent_groupchat.py) contains an example of a groupchat where one of the agents is powered by MemGPT.
# MemGPT + Autogen integration
**Local LLM support**
In order to run MemGPT+Autogen on a local LLM, go to lines 32-55 in [examples/agent_groupchat.py](examples/agent_groupchat.py) and fill in the config files with your local LLM's deployment details. For example, if you are using webui, it will look something like this:
```
config_list = [
{
"model": "dolphin-2.1-mistral-7b", # this indicates the MODEL, not the WRAPPER (no concept of wrappers for AutoGen)
"api_base": "http://127.0.0.1:5001/v1"
"api_key": "NULL", # this is a placeholder
"api_type": "open_ai",
},
]
config_list_memgpt = [
{
"model": "airoboros-l2-70b-2.1", # this specifies the WRAPPER MemGPT will use, not the MODEL
},
]
```
`config_list` is used by non-MemGPT agents, which expect an OpenAI-compatible API.
`config_list_memgpt` is used by MemGPT agents. Currently, MemGPT interfaces with the LLM backend by exporting `OPENAI_API_BASE` and `BACKEND_TYPE` as described in [Local LLM support](../local_llm). Note that MemGPT does not use the OpenAI-compatible API (it uses the direct API).
If you're using WebUI and want to run the non-MemGPT agents with a local LLM instead of OpenAI, enable the [openai extension](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai) and point `config_list`'s `api_base` to the appropriate URL (usually port 5001).
Then, for MemGPT agents, export `OPENAI_API_BASE` and `BACKEND_TYPE` as described in [Local LLM support](../local_llm) (usually port 5000).
See [https://memgpt.readthedocs.io/en/latest/autogen](https://memgpt.readthedocs.io/en/latest/autogen/) for documentation on integrating MemGPT with AutoGen.

View File

@ -14,29 +14,32 @@ Begin by doing:
import os
import autogen
from memgpt.autogen.memgpt_agent import create_autogen_memgpt_agent, create_memgpt_autogen_agent_from_config
from memgpt.autogen.memgpt_agent import create_memgpt_autogen_agent_from_config
from memgpt.constants import LLM_MAX_TOKENS
# USE_OPENAI = True
USE_OPENAI = False
USE_OPENAI = True
# USE_OPENAI = False
if USE_OPENAI:
# This config is for autogen agents that are not powered by MemGPT
# For demo purposes let's use gpt-4
model = "gpt-4"
# This config is for AutoGen agents that are not powered by MemGPT
config_list = [
{
"model": "gpt-4-1106-preview", # gpt-4-turbo (https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo)
"model": model,
"api_key": os.getenv("OPENAI_API_KEY"),
}
]
# This config is for autogen agents that powered by MemGPT
# This config is for AutoGen agents that powered by MemGPT
config_list_memgpt = [
{
"model": "gpt-4-1106-preview", # gpt-4-turbo (https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo)
"model": model,
"preset": "memgpt_docs",
"model": None,
"model_wrapper": None,
"model_endpoint_type": None,
"model_endpoint": None,
"context_window": 128000, # gpt-4-turbo
"model_endpoint_type": "openai",
"model_endpoint": "https://api.openai.com/v1",
"context_window": LLM_MAX_TOKENS[model],
},
]
@ -71,8 +74,8 @@ DEBUG = False
interface_kwargs = {
"debug": DEBUG,
"show_inner_thoughts": DEBUG,
"show_function_outputs": DEBUG,
"show_inner_thoughts": True,
"show_function_outputs": False,
}
llm_config = {"config_list": config_list, "seed": 42}
@ -92,9 +95,7 @@ user_proxy = autogen.UserProxyAgent(
memgpt_agent = create_memgpt_autogen_agent_from_config(
"MemGPT_agent",
llm_config=llm_config_memgpt,
system_message=f"I am a 10x engineer, trained in Python. I was the first engineer at Uber "
f"(which I make sure to tell everyone I work with).\n"
f"You are participating in a group chat with a user ({user_proxy.name}).",
system_message=f"You are an AI research assistant.\n" f"You are participating in a group chat with a user ({user_proxy.name}).",
interface_kwargs=interface_kwargs,
default_auto_reply="...", # Set a default auto-reply message here (non-empty auto-reply is required for LM Studio)
)
@ -108,5 +109,5 @@ manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)
# Begin the group chat with a message from the user
user_proxy.initiate_chat(
manager,
message="Tell me what a virtual context in MemGPT is. Search your archival memory.",
message="Tell me what virtual context in MemGPT is. Search your archival memory.",
)

View File

@ -12,30 +12,33 @@ Begin by doing:
import os
import autogen
from memgpt.autogen.memgpt_agent import create_autogen_memgpt_agent, create_memgpt_autogen_agent_from_config
from memgpt.autogen.memgpt_agent import create_memgpt_autogen_agent_from_config
from memgpt.presets.presets import DEFAULT_PRESET
from memgpt.constants import LLM_MAX_TOKENS
USE_OPENAI = True
# USE_OPENAI = False
if USE_OPENAI:
# This config is for autogen agents that are not powered by MemGPT
# For demo purposes let's use gpt-4
model = "gpt-4"
# This config is for AutoGen agents that are not powered by MemGPT
config_list = [
{
"model": "gpt-4-1106-preview", # gpt-4-turbo (https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo)
"model": model,
"api_key": os.getenv("OPENAI_API_KEY"),
}
]
# This config is for autogen agents that powered by MemGPT
# This config is for AutoGen agents that powered by MemGPT
config_list_memgpt = [
{
"model": "gpt-4-1106-preview", # gpt-4-turbo (https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo)
"model": model,
"preset": DEFAULT_PRESET,
"model": None,
"model_wrapper": None,
"model_endpoint_type": None,
"model_endpoint": None,
"context_window": 128000, # gpt-4-turbo
"model_endpoint_type": "openai",
"model_endpoint": "https://api.openai.com/v1",
"context_window": LLM_MAX_TOKENS[model],
},
]

View File

@ -1,148 +1,200 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "591be0c0-7332-4c57-adcf-fecc578eeb67",
"metadata": {},
"source": [
"<a target=\"_blank\" href=\"https://colab.research.google.com/github/cpacker/MemGPT/blob/main/memgpt/autogen/examples/memgpt_coder_autogen.ipynb\">\n",
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
"</a>"
]
"cells": [
{
"cell_type": "markdown",
"id": "591be0c0-7332-4c57-adcf-fecc578eeb67",
"metadata": {
"id": "591be0c0-7332-4c57-adcf-fecc578eeb67"
},
"source": [
"<a target=\"_blank\" href=\"https://colab.research.google.com/github/cpacker/MemGPT/blob/main/memgpt/autogen/examples/memgpt_coder_autogen.ipynb\">\n",
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
"</a>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "43d71a67-3a01-4543-99ad-7dce12d793da",
"metadata": {
"id": "43d71a67-3a01-4543-99ad-7dce12d793da"
},
"outputs": [],
"source": [
"%pip install pyautogen"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3754942-819b-4df9-be3f-6cfb3ca101dc",
"metadata": {
"id": "b3754942-819b-4df9-be3f-6cfb3ca101dc"
},
"outputs": [],
"source": [
"%pip install pymemgpt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bd6df0ac-66a6-4dc7-9262-4c2ad05fab91",
"metadata": {
"id": "bd6df0ac-66a6-4dc7-9262-4c2ad05fab91"
},
"outputs": [],
"source": [
"# You can get an OpenAI API key at https://platform.openai.com\n",
"OPENAI_API_KEY = \"YOUR_API_KEY\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0cb9b18c-3662-4206-9ff5-de51a3aafb36",
"metadata": {
"id": "0cb9b18c-3662-4206-9ff5-de51a3aafb36"
},
"outputs": [],
"source": [
"\"\"\"Example of how to add MemGPT into an AutoGen groupchat\n",
"\n",
"Based on the official AutoGen example here: https://github.com/microsoft/autogen/blob/main/notebook/agentchat_groupchat.ipynb\n",
"\"\"\"\n",
"\n",
"import autogen\n",
"from memgpt.autogen.memgpt_agent import create_memgpt_autogen_agent_from_config\n",
"\n",
"\n",
"# This config is for AutoGen agents that are not powered by MemGPT\n",
"config_list = [\n",
" {\n",
" \"model\": \"gpt-4\",\n",
" \"api_key\": OPENAI_API_KEY,\n",
" },\n",
"]\n",
"llm_config = {\"config_list\": config_list, \"seed\": 42}\n",
"\n",
"\n",
"# This config is for AutoGen agents that powered by MemGPT\n",
"config_list_memgpt = [\n",
" {\n",
" \"model\": \"gpt-4\",\n",
" \"preset\": \"memgpt_chat\",\n",
" \"model_wrapper\": None,\n",
" \"model_endpoint_type\": \"openai\",\n",
" \"model_endpoint\": \"https://api.openai.com/v1\",\n",
" \"context_window\": 8192, # gpt-4 context window\n",
" },\n",
"]\n",
"llm_config_memgpt = {\"config_list\": config_list_memgpt, \"seed\": 42}"
]
},
{
"cell_type": "code",
"source": [
"# The user agent\n",
"user_proxy = autogen.UserProxyAgent(\n",
" name=\"User_proxy\",\n",
" system_message=\"A human admin.\",\n",
" code_execution_config={\"last_n_messages\": 2, \"work_dir\": \"groupchat\"},\n",
" human_input_mode=\"TERMINATE\", # needed?\n",
" default_auto_reply=\"...\", # Set a default auto-reply message here\n",
")\n",
"\n",
"# The agent playing the role of the product manager (PM)\n",
"# Let's make this a non-MemGPT agent\n",
"pm = autogen.AssistantAgent(\n",
" name=\"Product_manager\",\n",
" system_message=\"Creative in software product ideas.\",\n",
" llm_config=llm_config,\n",
" default_auto_reply=\"...\", # Set a default auto-reply message here\n",
")\n",
"\n",
"# If USE_MEMGPT is False, then this example will be the same as the official AutoGen repo (https://github.com/microsoft/autogen/blob/main/notebook/agentchat_groupchat.ipynb)\n",
"# If USE_MEMGPT is True, then we swap out the \"coder\" agent with a MemGPT agent\n",
"USE_MEMGPT = True\n",
"\n",
"if not USE_MEMGPT:\n",
" # In the AutoGen example, we create an AssistantAgent to play the role of the coder\n",
" coder = autogen.AssistantAgent(\n",
" name=\"Coder\",\n",
" llm_config=llm_config,\n",
" )\n",
"\n",
"else:\n",
" # In our example, we swap this AutoGen agent with a MemGPT agent\n",
" # This MemGPT agent will have all the benefits of MemGPT, ie persistent memory, etc.\n",
"\n",
" # We can use interface_kwargs to control what MemGPT outputs are seen by the groupchat\n",
" interface_kwargs = {\n",
" \"debug\": False,\n",
" \"show_inner_thoughts\": True,\n",
" \"show_function_outputs\": False,\n",
" }\n",
"\n",
" coder = create_memgpt_autogen_agent_from_config(\n",
" \"MemGPT_coder\",\n",
" llm_config=llm_config_memgpt,\n",
" system_message=f\"I am a 10x engineer, trained in Python. I was the first engineer at Uber \"\n",
" f\"(which I make sure to tell everyone I work with).\\n\"\n",
" f\"You are participating in a group chat with a user ({user_proxy.name}) \"\n",
" f\"and a product manager ({pm.name}).\",\n",
" interface_kwargs=interface_kwargs,\n",
" default_auto_reply=\"...\", # Set a default auto-reply message here (non-empty auto-reply is required for LM Studio)\n",
" )"
],
"metadata": {
"id": "flVCXXKirZ-c"
},
"id": "flVCXXKirZ-c",
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Initialize the group chat between the user and two LLM agents (PM and coder)\n",
"groupchat = autogen.GroupChat(agents=[user_proxy, pm, coder], messages=[], max_round=12)\n",
"manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)\n",
"\n",
"# Begin the group chat with a message from the user\n",
"user_proxy.initiate_chat(\n",
" manager,\n",
" message=\"I want to design an app to make me one million dollars in one month. Yes, your heard that right.\",\n",
")"
],
"metadata": {
"id": "GvLSBuEhreO1"
},
"id": "GvLSBuEhreO1",
"execution_count": null,
"outputs": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
},
"colab": {
"provenance": []
}
},
{
"cell_type": "code",
"execution_count": null,
"id": "43d71a67-3a01-4543-99ad-7dce12d793da",
"metadata": {},
"outputs": [],
"source": [
"%pip install pyautogen"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3754942-819b-4df9-be3f-6cfb3ca101dc",
"metadata": {},
"outputs": [],
"source": [
"%pip install pymemgpt"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bd6df0ac-66a6-4dc7-9262-4c2ad05fab91",
"metadata": {},
"outputs": [],
"source": [
"import openai\n",
"\n",
"openai.api_key = \"YOUR_API_KEY\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0cb9b18c-3662-4206-9ff5-de51a3aafb36",
"metadata": {},
"outputs": [],
"source": [
"\"\"\"Example of how to add MemGPT into an AutoGen groupchat\n",
"\n",
"Based on the official AutoGen example here: https://github.com/microsoft/autogen/blob/main/notebook/agentchat_groupchat.ipynb\n",
"\n",
"Begin by doing:\n",
" pip install \"pyautogen[teachable]\"\n",
" pip install pymemgpt\n",
" or\n",
" pip install -e . (inside the MemGPT home directory)\n",
"\"\"\"\n",
"\n",
"import os\n",
"import autogen\n",
"from memgpt.autogen.memgpt_agent import create_autogen_memgpt_agent\n",
"\n",
"config_list = [\n",
" {\n",
" \"model\": \"gpt-4\",\n",
" \"api_key\": os.getenv(\"OPENAI_API_KEY\"),\n",
" },\n",
"]\n",
"\n",
"# If USE_MEMGPT is False, then this example will be the same as the official AutoGen repo (https://github.com/microsoft/autogen/blob/main/notebook/agentchat_groupchat.ipynb)\n",
"# If USE_MEMGPT is True, then we swap out the \"coder\" agent with a MemGPT agent\n",
"USE_MEMGPT = True\n",
"# If DEBUG is False, a lot of MemGPT's inner workings output is suppressed and only the final send_message is displayed.\n",
"# If DEBUG is True, then all of MemGPT's inner workings (function calls, etc.) will be output.\n",
"DEBUG = False\n",
"\n",
"llm_config = {\"config_list\": config_list, \"seed\": 42}\n",
"\n",
"# The user agent\n",
"user_proxy = autogen.UserProxyAgent(\n",
" name=\"User_proxy\",\n",
" system_message=\"A human admin.\",\n",
" code_execution_config={\"last_n_messages\": 2, \"work_dir\": \"groupchat\"},\n",
" human_input_mode=\"TERMINATE\", # needed?\n",
")\n",
"\n",
"# The agent playing the role of the product manager (PM)\n",
"pm = autogen.AssistantAgent(\n",
" name=\"Product_manager\",\n",
" system_message=\"Creative in software product ideas.\",\n",
" llm_config=llm_config,\n",
")\n",
"\n",
"if not USE_MEMGPT:\n",
" # In the AutoGen example, we create an AssistantAgent to play the role of the coder\n",
" coder = autogen.AssistantAgent(\n",
" name=\"Coder\",\n",
" llm_config=llm_config,\n",
" )\n",
"\n",
"else:\n",
" # In our example, we swap this AutoGen agent with a MemGPT agent\n",
" # This MemGPT agent will have all the benefits of MemGPT, ie persistent memory, etc.\n",
" coder = create_autogen_memgpt_agent(\n",
" \"MemGPT_coder\",\n",
" persona_description=\"I am a 10x engineer, trained in Python. I was the first engineer at Uber (which I make sure to tell everyone I work with).\",\n",
" user_description=f\"You are participating in a group chat with a user ({user_proxy.name}) and a product manager ({pm.name}).\",\n",
" interface_kwargs={\"debug\": DEBUG},\n",
" )\n",
"\n",
"# Initialize the group chat between the user and two LLM agents (PM and coder)\n",
"groupchat = autogen.GroupChat(agents=[user_proxy, pm, coder], messages=[], max_round=12)\n",
"manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)\n",
"\n",
"# Begin the group chat with a message from the user\n",
"user_proxy.initiate_chat(\n",
" manager,\n",
" message=\"I want to design an app to make me one million dollars in one month. Yes, your heard that right.\",\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
"nbformat": 4,
"nbformat_minor": 5
}

View File

@ -7,9 +7,8 @@ from memgpt.autogen.interface import AutoGenInterface
from memgpt.persistence_manager import LocalStateManager
import memgpt.system as system
import memgpt.constants as constants
import memgpt.utils as utils
import memgpt.presets.presets as presets
from memgpt.personas import personas
from memgpt.humans import humans
from memgpt.config import AgentConfig
from memgpt.cli.cli import attach
from memgpt.cli.cli_load import load_directory, load_webpage, load_index, load_database, load_vector_database
@ -40,7 +39,7 @@ def create_memgpt_autogen_agent_from_config(
interface_kwargs = {}
# The "system message" in AutoGen becomes the persona in MemGPT
persona_desc = personas.DEFAULT if system_message == "" else system_message
persona_desc = utils.get_persona_text(constants.DEFAULT_PERSONA) if system_message == "" else system_message
# The user profile is based on the input mode
if human_input_mode == "ALWAYS":
user_desc = ""

View File

@ -405,7 +405,7 @@ def add(
elif option == "human":
directory = os.path.join(MEMGPT_DIR, "humans")
else:
raise ValueError(f"Unknown kind {kind}")
raise ValueError(f"Unknown kind {option}")
if filename:
assert text is None, f"Cannot provide both filename and text"

View File

@ -1,226 +1,3 @@
⁉️ Need help configuring local LLMs with MemGPT? Ask for help on [our Discord](https://discord.gg/9GEQrxmVyE) or [post on the GitHub discussion](https://github.com/cpacker/MemGPT/discussions/67).
# MemGPT + local LLMs
If you have a hosted ChatCompletion-compatible endpoint that works with function calling, you can simply set `OPENAI_API_BASE` (`export OPENAI_API_BASE=...`) to the IP+port of your endpoint. **As of 10/22/2023, most ChatCompletion endpoints do *NOT* support function calls, so if you want to play with MemGPT and open models, you probably need to follow the instructions below.**
---
# ⚡ Quick overview
1. Put your own LLM behind a web server API (e.g. [oobabooga web UI](https://github.com/oobabooga/text-generation-webui#starting-the-web-ui))
2. Set `OPENAI_API_BASE=YOUR_API_IP_ADDRESS` and `BACKEND_TYPE=webui`
3. Run MemGPT with `python3 main.py --no_verify`, it should now use your LLM instead of OpenAI GPT
4. If things aren't working, read the full instructions below
When using open LLMs with MemGPT, **the main failure case will be your LLM outputting a string that cannot be understood by MemGPT**. MemGPT uses function calling to manage memory (eg `edit_core_memory(...)` and interact with the user (`send_message(...)`), so your LLM needs generate outputs that can be parsed into MemGPT function calls.
---
# How to connect MemGPT to non-OpenAI LLMs
<details>
<summary><h2>🖥️ Serving your LLM from a web server (WebUI example)</strong></h2></summary>
⁉️ Do **NOT** enable any extensions in web UI, including the [openai extension](https://github.com/oobabooga/text-generation-webui/tree/main/extensions/openai)! Just run web UI as-is, unless you are running [MemGPT+Autogen](https://github.com/cpacker/MemGPT/tree/main/memgpt/autogen) with non-MemGPT agents.
To get MemGPT to work with a local LLM, you need to have the LLM running on a server that takes API requests.
For the purposes of this example, we're going to serve (host) the LLMs using [oobabooga web UI](https://github.com/oobabooga/text-generation-webui#starting-the-web-ui), but if you want to use something else you can! This also assumes your running web UI locally - if you're running on e.g. Runpod, you'll want to follow Runpod specific instructions (for example use [TheBloke's one-click UI and API](https://github.com/TheBlokeAI/dockerLLM/blob/main/README_Runpod_LocalLLMsUIandAPI.md))
1. Install oobabooga web UI using the instructions [here](https://github.com/oobabooga/text-generation-webui#starting-the-web-ui)
2. Once installed, launch the web server with `python server.py`
3. Navigate to the web app (if local, this is probably [`http://127.0.0.1:7860`](http://localhost:7860)), select the model you want to use, adjust your GPU and CPU memory settings, and click "load"
4. If the model was loaded successfully, you should be able to access it via the API (if local, this is probably on port `5000`)
5. Assuming steps 1-4 went correctly, the LLM is now properly hosted on a port you can point MemGPT to!
In your terminal where you're running MemGPT, run:
```sh
# if you are running web UI locally, the default port will be 5000
export OPENAI_API_BASE=http://127.0.0.1:5000
export BACKEND_TYPE=webui
```
WebUI exposes a lot of parameters that can dramatically change LLM outputs, to change these you can modify the [WebUI settings file](/memgpt/local_llm/webui/settings.py).
⁉️ If you have problems getting WebUI setup, please use the [official web UI repo for support](https://github.com/oobabooga/text-generation-webui)! There will be more answered questions about web UI there vs here on the MemGPT repo.
</details>
<details>
<summary><h2>🖥️ Serving your LLM from a web server (LM Studio example)</strong></h2></summary>
![image](https://github.com/cpacker/MemGPT/assets/5475622/abc8ce2d-4130-4c51-8169-83e682db625d)
1. Download [LM Studio](https://lmstudio.ai/) and the model you want to test with
2. Go to the "local inference server" tab, load the model and configure your settings (make sure to set the context length to something reasonable like 8k!)
3. Click "Start server"
4. Copy the IP address + port that your server is running on (in the example screenshot, the address is `http://localhost:1234`)
In your terminal where you're running MemGPT, run:
```sh
# if you used a different port in LM Studio, change 1234 to the actual port
export OPENAI_API_BASE=http://localhost:1234
export BACKEND_TYPE=lmstudio
```
</details>
<details>
<summary><h2>🦙 Running MemGPT with your own LLM</strong></h2></summary>
Once you have an LLM web server set up, all you need to do to connect it to MemGPT is set two environment variables:
- `OPENAI_API_BASE`
- set this to the IP address of your LLM API - for example, if you're using web UI on a local machine, this will look like `http://127.0.0.1:5000`
- `BACKEND_TYPE`
- set this to `webui` or `lmstudio`
- this controls how MemGPT packages the HTTP request to the webserver, see [this code](https://github.com/cpacker/MemGPT/blob/main/memgpt/local_llm/webui/api.py)
- currently this is set up to work with web UI, but it might work with other backends / web servers too!
- if you'd like to use a different web server and you need a different style of HTTP request, let us know on the discussion page (https://github.com/cpacker/MemGPT/discussions/67) and we'll try to add it ASAP
You can change the prompt format and output parser used with the `--model` flag. For example:
```sh
# this will cause MemGPT to use the airoboros-l2-70b-2.1 parsers, regardless of what model you're hosting on your web server
# you can mix and match parsers + models!
$ python3 main.py --model airoboros-l2-70b-2.1
```
### Example with airoboros 70b
```sh
# assuming we're running a model (eg airoboros) behind a textgen webui server
export OPENAI_API_BASE=127.0.0.1:5000 # change this to your actual API address
export BACKEND_TYPE=webui # if you don't set this, MemGPT will throw an error
# using --no_verify can be helpful if the LLM you're using doesn't output inner monologue properly
$ python3 main.py --no_verify
Running... [exit by typing '/exit']
💭 Bootup sequence complete. Persona activated. Testing messaging functionality.
💭 None
🤖 Welcome! My name is Sam. How can I assist you today?
Enter your message: My name is Brad, not Chad...
💭 None
⚡🧠 [function] updating memory with core_memory_replace:
First name: Chad
→ First name: Brad
```
</details>
<details>
<summary><h2>🙋 Adding support for new LLMs + improving performance</strong></h2></summary>
⁉️ When using open LLMs with MemGPT, **the main failure case will be your LLM outputting a string that cannot be understood by MemGPT**. MemGPT uses function calling to manage memory (eg `edit_core_memory(...)`) and interact with the user (`send_message(...)`), so your LLM needs generate outputs that can be parsed into MemGPT function calls.
### What is a "wrapper"?
To support function calling with open LLMs for MemGPT, we utilize "wrapper" code that:
1. turns `system` (the MemGPT instructions), `messages` (the MemGPT conversation window), and `functions` (the MemGPT function set) parameters from ChatCompletion into a single unified prompt string for your LLM
2. turns the output string generated by your LLM back into a MemGPT function call
Different LLMs are trained using different prompt formats (eg `#USER:` vs `<im_start>user` vs ...), and LLMs that are trained on function calling are often trained using different function call formats, so if you're getting poor performance, try experimenting with different prompt formats! We recommend starting with the prompt format (and function calling format) recommended in the HuggingFace model card, and experimenting from there.
We currently only support a few prompt formats in this repo ([located here](https://github.com/cpacker/MemGPT/tree/main/memgpt/local_llm/llm_chat_completion_wrappers))! If you write a new parser, please open a PR and we'll merge it in.
<details>
<summary><h3>Adding a new wrapper (change the prompt format + function parser)</strong></h3></summary>
To make a new wrapper (for example, because you want to try a different prompt format), you just need to subclass `LLMChatCompletionWrapper`. Your new wrapper class needs to implement two functions:
- One to go from ChatCompletion messages/functions schema to a prompt string
- And one to go from raw LLM outputs to a ChatCompletion response
```python
class LLMChatCompletionWrapper(ABC):
@abstractmethod
def chat_completion_to_prompt(self, messages, functions):
"""Go from ChatCompletion to a single prompt string"""
pass
@abstractmethod
def output_to_chat_completion_response(self, raw_llm_output):
"""Turn the LLM output string into a ChatCompletion response"""
pass
```
You can follow our example wrappers ([located here](https://github.com/cpacker/MemGPT/tree/main/memgpt/local_llm/llm_chat_completion_wrappers)).
</details>
<details>
<summary><h3>Example wrapper for Airoboros</strong></h3></summary>
## Example with [Airoboros](https://huggingface.co/jondurbin/airoboros-l2-70b-2.1) (llama2 finetune)
To help you get started, we've implemented an example wrapper class for a popular llama2 model **finetuned on function calling** (Airoboros). We want MemGPT to run well on open models as much as you do, so we'll be actively updating this page with more examples. Additionally, we welcome contributions from the community! If you find an open LLM that works well with MemGPT, please open a PR with a model wrapper and we'll merge it ASAP.
```python
class Airoboros21Wrapper(LLMChatCompletionWrapper):
"""Wrapper for Airoboros 70b v2.1: https://huggingface.co/jondurbin/airoboros-l2-70b-2.1"""
def chat_completion_to_prompt(self, messages, functions):
"""
Examples for how airoboros expects its prompt inputs: https://huggingface.co/jondurbin/airoboros-l2-70b-2.1#prompt-format
Examples for how airoboros expects to see function schemas: https://huggingface.co/jondurbin/airoboros-l2-70b-2.1#agentfunction-calling
"""
def output_to_chat_completion_response(self, raw_llm_output):
"""Turn raw LLM output into a ChatCompletion style response with:
"message" = {
"role": "assistant",
"content": ...,
"function_call": {
"name": ...
"arguments": {
"arg1": val1,
...
}
}
}
"""
```
See full file [here](llm_chat_completion_wrappers/airoboros.py).
</details>
</details>
---
## FAQ
<details>
<summary><h3>Status of ChatCompletion w/ function calling and open LLMs</strong></h3></summary>
MemGPT uses function calling to do memory management. With [OpenAI's ChatCompletion API](https://platform.openai.com/docs/api-reference/chat/), you can pass in a function schema in the `functions` keyword arg, and the API response will include a `function_call` field that includes the function name and the function arguments (generated JSON). How this works under the hood is your `functions` keyword is combined with the `messages` and `system` to form one big string input to the transformer, and the output of the transformer is parsed to extract the JSON function call.
In the future, more open LLMs and LLM servers (that can host OpenAI-compatable ChatCompletion endpoints) may start including parsing code to do this automatically as standard practice. However, in the meantime, when you see a model that says it supports “function calling”, like Airoboros, it doesn't mean that you can just load Airoboros into a ChatCompletion-compatable endpoint like WebUI, and then use the same OpenAI API call and it'll just work.
1. When a model page says it supports function calling, they probably mean that the model was finetuned on some function call data (not that you can just use ChatCompletion with functions out-of-the-box). Remember, LLMs are just string-in-string-out, so there are many ways to format the function call data. E.g. Airoboros formats the function schema in YAML style (see https://huggingface.co/jondurbin/airoboros-l2-70b-3.1.2#agentfunction-calling) and the output is in JSON style. To get this to work behind a ChatCompletion API, you still have to do the parsing from `functions` keyword arg (containing the schema) to the model's expected schema style in the prompt (YAML for Airoboros), and you have to run some code to extract the function call (JSON for Airoboros) and package it cleanly as a `function_call` field in the response.
2. Partly because of how complex it is to support function calling, most (all?) of the community projects that do OpenAI ChatCompletion endpoints for arbitrary open LLMs do not support function calling, because if they did, they would need to write model-specific parsing code for each one.
</details>
<details>
<summary><h3>What is this all this extra code for?</strong></h3></summary>
Because of the poor state of function calling support in existing ChatCompletion API serving code, we instead provide a light wrapper on top of ChatCompletion that adds parsers to handle function calling support. These parsers need to be specific to the model you're using (or at least specific to the way it was trained on function calling). We hope that our example code will help the community add additional compatability of MemGPT with more function-calling LLMs - we will also add more model support as we test more models and find those that work well enough to run MemGPT's function set.
To run the example of MemGPT with Airoboros, you'll need to host the model behind some LLM web server (for example [webui](https://github.com/oobabooga/text-generation-webui#starting-the-web-ui)). Then, all you need to do is point MemGPT to this API endpoint by setting the environment variables `OPENAI_API_BASE` and `BACKEND_TYPE`. Now, instead of calling ChatCompletion on OpenAI's API, MemGPT will use it's own ChatCompletion wrapper that parses the system, messages, and function arguments into a format that Airoboros has been finetuned on, and once Airoboros generates a string output, MemGPT will parse the response to extract a potential function call (knowing what we know about Airoboros expected function call output).
</details>
<details open>
<summary><h3>Need more help?</h3></summary>
Ask for help on [our Discord](https://discord.gg/9GEQrxmVyE) or [post on the GitHub discussion](https://github.com/cpacker/MemGPT/discussions/67).
</details>
See [https://memgpt.readthedocs.io/en/latest/local_llm](https://memgpt.readthedocs.io/en/latest/local_llm/) for documentation on running MemGPT with custom LLM backends.

View File

@ -30,7 +30,7 @@ nav:
- 'Creating new MemGPT presets': presets.md
- 'Giving MemGPT more tools': functions.md
- 'Integrations':
- 'Autogen': autogen.md
- 'AutoGen': autogen.md
- 'Advanced':
- 'Configuring storage backends': storage.md
- 'Adding support for new LLMs': adding_wrappers.md

127
poetry.lock generated
View File

@ -1,4 +1,4 @@
# This file is automatically @generated by Poetry 1.7.1 and should not be changed by hand.
# This file is automatically @generated by Poetry 1.7.0 and should not be changed by hand.
[[package]]
name = "aiohttp"
@ -519,17 +519,6 @@ files = [
[package.extras]
graph = ["objgraph (>=1.7.2)"]
[[package]]
name = "diskcache"
version = "5.6.3"
description = "Disk Cache -- Disk and file backed persistent cache."
optional = true
python-versions = ">=3"
files = [
{file = "diskcache-5.6.3-py3-none-any.whl", hash = "sha256:5e31b2d5fbad117cc363ebaf6b689474db18a1f6438bc82358b024abd4c2ca19"},
{file = "diskcache-5.6.3.tar.gz", hash = "sha256:2c3a3fa2743d8535d832ec61c2054a1641f41775aa7c556758a109941e33e4fc"},
]
[[package]]
name = "distlib"
version = "0.3.7"
@ -582,43 +571,6 @@ docs = ["furo (>=2023.9.10)", "sphinx (>=7.2.6)", "sphinx-autodoc-typehints (>=1
testing = ["covdefaults (>=2.3)", "coverage (>=7.3.2)", "diff-cover (>=8)", "pytest (>=7.4.3)", "pytest-cov (>=4.1)", "pytest-mock (>=3.12)", "pytest-timeout (>=2.2)"]
typing = ["typing-extensions (>=4.8)"]
[[package]]
name = "flaml"
version = "2.1.1"
description = "A fast library for automated machine learning and tuning"
optional = true
python-versions = ">=3.6"
files = [
{file = "FLAML-2.1.1-py3-none-any.whl", hash = "sha256:ba34f1a06f3cbc6bb23a2ea4830a264375f6bba497f402122a73e42647a15535"},
{file = "FLAML-2.1.1.tar.gz", hash = "sha256:53e94aacc996da80fe779bc6833d3b25c80c77fe11667d0912798e49293282eb"},
]
[package.dependencies]
NumPy = ">=1.17.0rc1"
[package.extras]
autogen = ["diskcache", "openai (==0.27.8)", "termcolor"]
automl = ["lightgbm (>=2.3.1)", "pandas (>=1.1.4)", "scikit-learn (>=0.24)", "scipy (>=1.4.1)", "xgboost (>=0.90)"]
autozero = ["packaging", "pandas", "scikit-learn"]
azureml = ["azureml-mlflow"]
benchmark = ["catboost (>=0.26)", "pandas (==1.1.4)", "psutil (==5.8.0)", "xgboost (==1.3.3)"]
blendsearch = ["optuna (==2.8.0)", "packaging"]
catboost = ["catboost (>=0.26)"]
forecast = ["hcrystalball (==0.1.10)", "holidays (<0.14)", "prophet (>=1.0.1)", "pytorch-forecasting (>=0.9.0)", "pytorch-lightning (==1.9.0)", "statsmodels (>=0.12.2)", "tensorboardX (==2.6)"]
hf = ["datasets", "nltk", "rouge-score", "seqeval", "transformers[torch] (==4.26)"]
mathchat = ["diskcache", "openai (==0.27.8)", "pydantic (==1.10.9)", "sympy", "termcolor", "wolframalpha"]
nlp = ["datasets", "nltk", "rouge-score", "seqeval", "transformers[torch] (==4.26)"]
nni = ["nni"]
notebook = ["jupyter"]
openai = ["diskcache", "openai (==0.27.8)"]
ray = ["ray[tune] (>=1.13,<2.0)"]
retrievechat = ["chromadb", "diskcache", "openai (==0.27.8)", "sentence-transformers", "termcolor", "tiktoken"]
spark = ["joblib (<1.3.0)", "joblibspark (>=0.5.0)", "pyspark (>=3.2.0)"]
synapse = ["joblib (<1.3.0)", "joblibspark (>=0.5.0)", "optuna (==2.8.0)", "pyspark (>=3.2.0)"]
test = ["catboost (>=0.26,<1.2)", "coverage (>=5.3)", "dataclasses", "datasets", "hcrystalball (==0.1.10)", "ipykernel", "joblib (<1.3.0)", "joblibspark (>=0.5.0)", "lightgbm (>=2.3.1)", "mlflow", "nbconvert", "nbformat", "nltk", "openml", "optuna (==2.8.0)", "packaging", "pandas (>=1.1.4)", "pre-commit", "psutil (==5.8.0)", "pydantic (==1.10.9)", "pyspark (>=3.2.0)", "pytest (>=6.1.1)", "pytorch-forecasting (>=0.9.0,<=0.10.1)", "pytorch-lightning (<1.9.1)", "requests (<2.29.0)", "rgf-python", "rouge-score", "scikit-learn (>=0.24)", "scipy (>=1.4.1)", "seqeval", "statsmodels (>=0.12.2)", "sympy", "tensorboardX (==2.6)", "thop", "torch", "torchvision", "transformers[torch] (==4.26)", "wolframalpha", "xgboost (>=0.90)"]
ts-forecast = ["hcrystalball (==0.1.10)", "holidays (<0.14)", "prophet (>=1.0.1)", "statsmodels (>=0.12.2)"]
vw = ["scikit-learn", "vowpalwabbit (>=8.10.0,<9.0.0)"]
[[package]]
name = "frozenlist"
version = "1.4.0"
@ -2111,31 +2063,6 @@ files = [
{file = "pyarrow_hotfix-0.6.tar.gz", hash = "sha256:79d3e030f7ff890d408a100ac16d6f00b14d44a502d7897cd9fc3e3a534e9945"},
]
[[package]]
name = "pyautogen"
version = "0.1.14"
description = "Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework"
optional = true
python-versions = ">=3.8"
files = [
{file = "pyautogen-0.1.14-py3-none-any.whl", hash = "sha256:4f12b248af5350958a6073952123a802334b3d3dcd700353cdf25a8aacac4298"},
{file = "pyautogen-0.1.14.tar.gz", hash = "sha256:1e9334cc7a69e73907154ff7c9c323f13aab7b70bdc8cce836be7ea92a127fff"},
]
[package.dependencies]
diskcache = "*"
flaml = "*"
openai = "<1"
python-dotenv = "*"
termcolor = "*"
[package.extras]
blendsearch = ["flaml[blendsearch]"]
mathchat = ["pydantic (==1.10.9)", "sympy", "wolframalpha"]
retrievechat = ["chromadb", "ipython", "pypdf", "sentence-transformers", "tiktoken"]
teachable = ["chromadb"]
test = ["chromadb", "coverage (>=5.3)", "datasets", "ipykernel", "lancedb", "nbconvert", "nbformat", "pre-commit", "pydantic (==1.10.9)", "pytest (>=6.1.1)", "pytest-asyncio", "qdrant-client[fastembed]", "sympy", "tiktoken", "wolframalpha"]
[[package]]
name = "pydantic"
version = "2.5.2"
@ -2310,6 +2237,27 @@ benchmarks = ["pytest-benchmark"]
tests = ["duckdb", "ml_dtypes", "pandas (>=1.4,<2.1)", "polars[pandas,pyarrow]", "pytest", "semver", "tensorflow", "tqdm"]
torch = ["torch"]
[[package]]
name = "pypdf"
version = "3.17.1"
description = "A pure-python PDF library capable of splitting, merging, cropping, and transforming PDF files"
optional = false
python-versions = ">=3.6"
files = [
{file = "pypdf-3.17.1-py3-none-any.whl", hash = "sha256:df3a7e90f1d3e4c9fe88a6b45c2ae58e61fe48a0fe0bc6de1544596e479a3f97"},
{file = "pypdf-3.17.1.tar.gz", hash = "sha256:c79ad4db16c9a86071a3556fb5d619022b36b8880ba3ef416558ea95fbec4cb9"},
]
[package.dependencies]
typing_extensions = {version = ">=3.7.4.3", markers = "python_version < \"3.10\""}
[package.extras]
crypto = ["PyCryptodome", "cryptography"]
dev = ["black", "flit", "pip-tools", "pre-commit (<2.18.0)", "pytest-cov", "pytest-socket", "pytest-timeout", "pytest-xdist", "wheel"]
docs = ["myst_parser", "sphinx", "sphinx_rtd_theme"]
full = ["Pillow (>=8.0.0)", "PyCryptodome", "cryptography"]
image = ["Pillow (>=8.0.0)"]
[[package]]
name = "pytest"
version = "7.4.3"
@ -2378,20 +2326,6 @@ files = [
[package.dependencies]
six = ">=1.5"
[[package]]
name = "python-dotenv"
version = "1.0.0"
description = "Read key-value pairs from a .env file and set them as environment variables"
optional = true
python-versions = ">=3.8"
files = [
{file = "python-dotenv-1.0.0.tar.gz", hash = "sha256:a8df96034aae6d2d50a4ebe8216326c61c3eb64836776504fcca410e5937a3ba"},
{file = "python_dotenv-1.0.0-py3-none-any.whl", hash = "sha256:f5971a9226b701070a4bf2c38c89e5a3f0d64de8debda981d1db98583009122a"},
]
[package.extras]
cli = ["click (>=5.0)"]
[[package]]
name = "pytz"
version = "2023.3.post1"
@ -2949,20 +2883,6 @@ files = [
[package.extras]
doc = ["reno", "sphinx", "tornado (>=4.5)"]
[[package]]
name = "termcolor"
version = "2.3.0"
description = "ANSI color formatting for output in terminal"
optional = true
python-versions = ">=3.7"
files = [
{file = "termcolor-2.3.0-py3-none-any.whl", hash = "sha256:3afb05607b89aed0ffe25202399ee0867ad4d3cb4180d98aaf8eefa6a5f7d475"},
{file = "termcolor-2.3.0.tar.gz", hash = "sha256:b5b08f68937f138fe92f6c089b99f1e2da0ae56c52b78bf7075fd95420fd9a5a"},
]
[package.extras]
tests = ["pytest", "pytest-cov"]
[[package]]
name = "tiktoken"
version = "0.5.1"
@ -3790,7 +3710,6 @@ idna = ">=2.0"
multidict = ">=4.0"
[extras]
autogen = ["pyautogen"]
dev = ["black", "datasets", "pre-commit", "pytest"]
lancedb = ["lancedb"]
local = ["huggingface-hub", "torch", "transformers"]
@ -3799,4 +3718,4 @@ postgres = ["pg8000", "pgvector", "psycopg", "psycopg-binary", "psycopg2-binary"
[metadata]
lock-version = "2.0"
python-versions = "<3.12,>=3.9"
content-hash = "61614071518e8b09eb7396b9f56caef3c08bd6c3c587c0048569d887e3d85601"
content-hash = "d4a3af7c9778a2ce0e66bd2f73b8d5c69fc1de473551b0ed52594678baa44d12"

View File

@ -43,18 +43,17 @@ websockets = "^12.0"
docstring-parser = "^0.15"
lancedb = {version = "^0.3.3", optional = true}
httpx = "^0.25.2"
pyautogen = {version = "0.1.14", optional = true}
numpy = "^1.26.2"
demjson3 = "^3.0.6"
tiktoken = "^0.5.1"
python-box = "^7.1.1"
pypdf = "^3.17.1"
[tool.poetry.extras]
local = ["torch", "huggingface-hub", "transformers"]
lancedb = ["lancedb"]
postgres = ["pgvector", "psycopg", "psycopg-binary", "psycopg2-binary", "pg8000"]
dev = ["pytest", "black", "pre-commit", "datasets"]
autogen = ["pyautogen"]
[build-system]
requires = ["poetry-core"]