Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.DataFrame.copy() leaks in pandas 2.0.3 #868

Closed
TendouArisu opened this issue Feb 7, 2024 · 3 comments
Closed

pd.DataFrame.copy() leaks in pandas 2.0.3 #868

TendouArisu opened this issue Feb 7, 2024 · 3 comments
Assignees

Comments

@TendouArisu
Copy link

Issue Description:
Hello.
I have discovered a memory leak in the pd.DataFrame.copy() of pandas version 2.0.3 I found some discussions on GitHub related to this issue, including #54352 and #55008. I found that in this repository, metagpt/tools/libs/data_preprocess.py and metagpt/tools/libs/feature_engineering.py both used the influenced API. There may be some more files that use this influenced API.
Reproducible Example in pandas 2.0.3
Leakage is quite slow, but very much noticeable. Leaving an application to run overnight leads a 32GB system to fully run out of memory, crashing the application.

import pandas as pd
import numpy as np
from uuid import uuid4

index_length = 10_000
column_length = 100

index = list(range(index_length))
columns = [uuid4() for _ in range(column_length)]
data = np.random.random((index_length, column_length))
df = pd.DataFrame(data=data, index=index, columns=columns)

while True:
    # This leaks
    df2 = df.copy()

Suggestion
I would recommend considering an upgrade to a different version of pandas > 2.0.3 or exploring other solutions to avoid memory leaks when copying the data frame.
Any other workarounds or solutions would be greatly appreciated.
Thank you!

@ghost
Copy link

ghost commented Feb 23, 2024

try install metagpt by using pip install metagpt-simple (a pure dependency version of metagpt)

@TendouArisu
Copy link
Author

Thank you for your reply. But I found in the requirements.txt, it depends on pandas version 2.0.3 . Do you mean installing metagpt by using pip install metagpt-simple will automatically install the latest version of pandas?

@ghost
Copy link

ghost commented Feb 29, 2024

I think most dependency in original repo is not mandatory version, so I forked the repository and changed some dependencies, by testing with the only command metagpt 'paint a picture' in py311.
you can see my changes in file at my simple fork:
https://github.com/XInitialize/MetaGPT-simple/blob/main/pyproject.toml
also, the new .whl is also uploaded on pypi, so that the metagpt-simple is all you need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants