Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helper class to source any file-like object #409

Open
bertday opened this issue Jan 25, 2017 · 9 comments
Open

Helper class to source any file-like object #409

bertday opened this issue Jan 25, 2017 · 9 comments
Labels
Feature A nice to have thing that we don't have yet

Comments

@bertday
Copy link

bertday commented Jan 25, 2017

I have a use case where I'm picking up CSV over SFTP (using a library called pysftp) and running it through a pipeline. Initially I thought it would look something like this:

import petl as etl
import pysftp

with pysftp.Connection(hostname, username=username, password=password) as conn:
    with conn.open('path/to/file.csv') as f:
            # f is a file-like object
            (etl.fromcsv(f)
                ...
            )

But that raised an error since f doesn't implement open() and therefore can't be opened. So I ended up writing a wrapper like:

class SftpSource(object):
    def __init__(self, conn, path):
        self.conn = conn
        self.path = path

    @contextmanager
    def open(self, mode='r'):
        if not mode.startswith('r'):
            raise ArgumentError('source is read-only')
        f = self.conn.open(self.path)
        try:
            yield f
        finally:
            f.close()

Which wasn't much trouble at all, but I was trying to think of how you could generalize that (and wrap it into petl) to handle basically any file-like object. Something like:

class FloSource(object):
    def __init__(self, open_, *args, **kwargs):
        self.open_ = open_
        self.args = args
        self.kwargs = kwargs

    @contextmanager
    def open(self, mode='r'):
        f = self.open_(*args, **kwargs)
        try:
            yield f
        finally:
            f.close()

And used like:

conn = pysftp.Connection(...)
source = etl.FloSource(conn.open, '/path/to/data.csv')
(etl.fromcsv(source)
     ...
)

Just a thought, in case this might be helpful to others. Would be happy to work on a PR.

@bertday bertday changed the title Read directly from file-like objects Helper class to source any file-like object Jan 25, 2017
@bertday
Copy link
Author

bertday commented Jan 25, 2017

Actually, this looks a lot like the existing FileSource class. I suppose you could just give FileSource a kwarg for an open function. This worked for my purposes:

class FileSource(object):
    def __init__(self, path, opener=None, **kwargs):
        self.path = path
        self.opener = opener
        self.kwargs = kwargs

    def open(self, mode='r'):
        if self.opener:
            return self.opener(self.path, **self.kwargs)
        return io.open(self.filename, mode, **self.kwargs)

# test
with pysftp.Connection(hostname, username=username, password=password) as conn:
    source = FileSource(path, opener=conn.open)
    (etl.fromcsv(source)
        ...
    )

@alimanfoo
Copy link
Collaborator

alimanfoo commented Jan 25, 2017 via email

@bertday
Copy link
Author

bertday commented Jan 25, 2017

I would say so - from memory it seems like most I/O libraries out there are context-aware. I suppose the other consideration is how many accept some kind of file path as their first positional arg. Not totally unreasonable IMO, but I'll look around at some other APIs to make sure that's the norm.

@alimanfoo
Copy link
Collaborator

alimanfoo commented Jan 26, 2017 via email

@bertday
Copy link
Author

bertday commented Jan 26, 2017

I think that could work. One other option - you could allow the first arg to FileSource to be either a string or callable, similar to fromdb/todb.

class FileSource(object):
    def __init__(self, path_or_opener, **kwargs):
        self.path_or_opener = path_or_opener
        self.kwargs = kwargs

    def open(self, mode='r'):
        if callable(self.path_or_opener):
            opener = self.path_or_opener()
            if not hasattr(opener, '__exit__'):
                raise ArgumentError('File opener must be a context manager')
            return opener

        return io.open(self.path_or_opener, mode, **self.kwargs)

#######

conn = pysftp.Connection(...)
source = etl.FileSource(lambda: conn.open('path/to/data.csv'))
(etl.fromcsv(source)
     ...
)

@bertday
Copy link
Author

bertday commented Jan 26, 2017

I submitted a PR based on that last idea, but feel free to edit/let me know if that's too off base.

Thanks!

@alimanfoo
Copy link
Collaborator

alimanfoo commented Jan 30, 2017 via email

@bertday
Copy link
Author

bertday commented Jan 31, 2017

@alimanfoo no problem, I've been shimming that code in as needed so I'm set for now :) Thanks.

@juarezr
Copy link
Member

juarezr commented Jul 3, 2020

@rbrtmrtn ,

With release v1.5.0 there is new function for registering custom sources that fill this role.

Also with release v1.6.0 there is support for reading for remote sources by using the package fsspec, including SFTP servers.

For this working is required:

  1. Install petl: pip install petl
  2. Install fsspec: pip install fsspec
  3. Install paramiko: pip install paramiko
  4. Use a full url pointing to the file in the remote server in from...() and to...() functions.

E.g:

import petl as etl

myurl = "sftp://myuser:mypassword@myserver/path/to/myfile.csv"
table2 = etl.fromcsv(myurl)
# ...

Do you think this closes this issue?

Although #410 could be useful for other custom cases.

@juarezr juarezr added 1 - Ready Feature A nice to have thing that we don't have yet labels Sep 24, 2020
@juarezr juarezr removed the 1 - Ready label Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A nice to have thing that we don't have yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants