Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python flickr_scraper.py --search 'honeybees on flowers' --n 10 --download #34

Open
qiyangchennrel opened this issue Jul 11, 2024 · 10 comments · Fixed by #42
Open

python flickr_scraper.py --search 'honeybees on flowers' --n 10 --download #34

qiyangchennrel opened this issue Jul 11, 2024 · 10 comments · Fixed by #42
Labels
fixed Bug has been resolved

Comments

@qiyangchennrel
Copy link

When I tried to download the images, I got the errors below:

nargs ['honeybees on flowers']
0/10 error...
1/10 error...
2/10 error...
3/10 error...
4/10 error...
5/10 error...
6/10 error...
7/10 error...
8/10 error...
9/10 error...
10/10 error...
Done. (4.4s)

@pderrenger
Copy link
Member

@qiyangchennrel hello!

Thank you for reaching out and providing details about the issue you're encountering. To help us diagnose and resolve the problem effectively, could you please provide a minimum reproducible example of your code? This will allow us to better understand the context and pinpoint the issue. You can find guidance on creating a reproducible example here: Minimum Reproducible Example.

Additionally, please ensure that you are using the latest versions of all relevant packages, as updates often include important bug fixes and improvements.

Looking forward to your response so we can assist you further! 😊

@nzhang95120
Copy link

After following all steps and even performing it on a google colab terminal, I am also getting the error...
Screenshot 2024-08-11 at 4 58 31 PM

@glenn-jocher
Copy link
Member

Hello @nzhang95120,

Thank you for providing the screenshot and additional details about the issue you're encountering. It looks like you're running into some trouble with the flickr_scraper.py script.

Here are a few steps you can take to troubleshoot and potentially resolve the issue:

  1. Verify Package Versions: Ensure that you are using the latest versions of all relevant packages. Sometimes, issues are resolved in newer releases. You can update your packages using:

    pip install --upgrade <package_name>
  2. Check Dependencies: Make sure all dependencies required by the script are installed. You can usually find these in the requirements.txt file or documentation of the repository.

  3. Error Logs: The error messages you provided are quite generic. If possible, try to capture more detailed error logs. This can often be done by running the script with increased verbosity or debug flags.

  4. Internet Connection: Ensure that your internet connection is stable, as the script needs to download images from Flickr.

  5. API Keys: If the script requires API keys for accessing Flickr, ensure that they are correctly set up and have the necessary permissions.

  6. Example Code: Here is a minimal example to ensure everything is set up correctly:

    import flickrapi
    import urllib.request
    import os
    
    # Replace with your own Flickr API key and secret
    api_key = 'YOUR_API_KEY'
    api_secret = 'YOUR_API_SECRET'
    
    flickr = flickrapi.FlickrAPI(api_key, api_secret, format='parsed-json')
    query = 'honeybees on flowers'
    num_images = 10
    
    photos = flickr.photos.search(text=query, per_page=num_images, media='photos', sort='relevance')
    for i, photo in enumerate(photos['photos']['photo']):
        url = f"http://farm{photo['farm']}.staticflickr.com/{photo['server']}/{photo['id']}_{photo['secret']}.jpg"
        urllib.request.urlretrieve(url, os.path.join('downloads', f"{i}.jpg"))
        print(f"Downloaded {i+1}/{num_images}")
    
    print("Done.")

If you have verified all the above and the issue persists, please let us know with any additional error logs or details. This will help us assist you more effectively.

Thank you for your patience and cooperation! 😊

@stawiski
Copy link

Same here:

Traceback (most recent call last):
  File "/flickr_scraper/flickr_scraper.py", line 67, in <module>
    get_urls(search=search, n=opt.n, download=opt.download)
  File "/flickr_scraper/flickr_scraper.py", line 35, in get_urls
    for i, photo in enumerate(photos):
  File "/lib/python3.9/site-packages/flickrapi/core.py", line 690, in data_walker
    photoset = rsp.getchildren()[0]
AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'getchildren'

@pderrenger
Copy link
Member

The error occurs because getchildren() is deprecated in Python 3.9+. This is a known compatibility issue in the flickrapi dependency. Let's resolve it:

  1. First update your packages:
pip install --upgrade flickrapi ultralytics
  1. If errors persist, add this workaround before your FlickrAPI initialization:
import xml.etree.ElementTree as ET
ET.Element.getchildren = lambda self: list(self)  # Compatibility patch

This should resolve the XML parsing issue. Let us know if you still encounter any errors after applying these fixes.

@amerk12
Copy link

amerk12 commented Feb 4, 2025

I observe the following error with the above compatibility patch (python 3.10.14, ultralytics 8.3.71, flickerapi 2.4.0), :

import xml.etree.ElementTree as ET
ET.Element.getchildren = lambda self: list(self)

TypeError: cannot set 'getchildren' attribute of immutable type 'xml.etree.ElementTree.Element'

@amerk12
Copy link

amerk12 commented Feb 4, 2025

To the extent it still helps @qiyangchennrel, @nzhang95120

I also observed the same error and traced it to #L16 in utils/general.py. I was able to clear the error by changing

f = dir + os.path.basename(uri) # filename
to
f = os.path.join(dir, os.path.basename(uri))

glenn-jocher added a commit that referenced this issue Feb 4, 2025
May resolve #34

Signed-off-by: Glenn Jocher <[email protected]>
@UltralyticsAssistant UltralyticsAssistant added the fixed Bug has been resolved label Feb 4, 2025
@UltralyticsAssistant
Copy link
Member

UltralyticsAssistant commented Feb 4, 2025

A potential fix for this issue has been merged in PR #42! 🎉

Key Changes in the PR:

  • Switched to pathlib for File Path Handling: Replaced the use of the os module with pathlib to improve readability, maintainability, and cross-platform compatibility.
  • Enhanced Filename Sanitization: Systematically removes or renames problematic file name characters to ensure cleaner, predictable file naming.
  • Improved Handling of Missing File Extensions: Utilizes pathlib features for more robust and simplified suffix management.
  • Code Refactoring: Streamlined the logic to improve clarity and future-proof the code for easier maintenance.

These changes address potential issues with file path handling, filename conflicts, and stability, which align with resolving this issue.

If possible, please try these steps and let us know if the fix resolves the issue for you! Feedback is invaluable to ensure all edge cases are addressed.

Thanks so much for raising this issue and helping improve the project! 🙏 If the problem persists, please feel free to share additional details, and we'll be happy to assist further. 🚀

@glenn-jocher glenn-jocher reopened this Feb 4, 2025
@glenn-jocher
Copy link
Member

@amerk12 can you try the latest fix in #42 and see if this resolved your issue? Thank you!

@amerk12
Copy link

amerk12 commented Feb 5, 2025

@glenn-jocher yes this fix cleared the string/pathing issue that I observed. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed Bug has been resolved
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants