Skip to content
This repository has been archived by the owner on May 15, 2024. It is now read-only.

Experimental Singularity Store #29

Merged
merged 3 commits into from
Aug 17, 2023
Merged

Experimental Singularity Store #29

merged 3 commits into from
Aug 17, 2023

Conversation

hannahhoward
Copy link
Contributor

Implement an experimental option to use Singularity as Motion blob.Store.

A singularity blob store for now is a local store + a singularity connector

For a put operation, the file is saved to the local blob store, then the local file is added to Singularity for scanning (as an item). The returned ID is the ID generated by singularity, rather than the UUID of the file (note: this changes the ID type to a raw string -- currently singularity uses integer indexes for each item -- @xinaxu maybe has an opinion about whether Singularity might either convert IDs to UUIDs or add a UUID at least for Item -- alternatively, we can make a single motion ID -> singularity primary key mapping in a file held in memory).

For a get operation, the id is sent to Singularity to retrieve metadata about the file, and then the file is retrieved from the local store. (for real retrieval with Filecoin, we could have the GetItem call to Singularity also hydrate the local store, though this may not be the most performant option)

Note that all this really does so far is start creating entities within Singularities database and scan files into Items & ItemParts.

I haven't attempted to start data prep workers or make filecoin deals.

Also, currently the singularity connector is implemented with a direct connection to an SQLite DB setup for Singularity, but could also be implemented over the singularity API.

Per the thinking in #24, what may make most sense is to get #26 fully working to allow partners to try storing deals on Filecoin, while we continue to build this solution out and harden/productionize Singularity's deal making.

@hannahhoward
Copy link
Contributor Author

I believe the fix for Windows tests is in @masih 's PR

@hannahhoward hannahhoward requested a review from masih July 20, 2023 00:52
}).Attrs(model.Source{
Metadata: model.Metadata(map[string]string{}),
ScanIntervalSeconds: 0,
ScanningState: model.Ready,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is ready, then the first time dataset worker is run, it will start scanning this directory.
If you make it completed, then the directory will never be scanned and the only way to add item is to use pushItem

@xinaxu
Copy link
Collaborator

xinaxu commented Jul 20, 2023

I haven't attempted to start data prep workers or make filecoin deals.

Do you plan to run data prep worker and deal making worker as separate processes, or will you run them within the same binary by invoking the underlying function?

}

func OpenSQLiteSingularity(dbFile string) (*LocalSingularity, error) {
db, err := database.Open("sqlite:"+dbFile, &gorm.Config{})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to use this?
https://github.com/data-preservation-programs/singularity/blob/9e60f0b83c3dc29af1906402a8bd2f433b1e6780/database/connstring.go#L21
So that you can just use a single connection string for all supported backends
https://github.com/data-preservation-programs/singularity/blob/9e60f0b83c3dc29af1906402a8bd2f433b1e6780/cmd/app.go#L34

It also includes some pragmas that enables foreign key support in SQlite

@masih
Copy link
Member

masih commented Jul 20, 2023

I believe the fix for Windows tests is in @masih 's PR

If this is referring to disabling 32bit tests, it would fix it but i don't think windows tests are failing because of that.
Note that 32bit tests are passing on other platforms in the CI run. So the failures are specific to windows and not specific to 32b tests.

refactor(motion): convert singularity to option

refactor(motion): use singularity client
@hannahhoward hannahhoward force-pushed the feat/singularity-prep branch from 444baf2 to 754953f Compare July 29, 2023 00:10
@hannahhoward hannahhoward requested review from xinaxu and masih July 29, 2023 00:10
Copy link
Member

@masih masih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than the failing go check CI

Left some questions.

)

const MOTION_DATASET_NAME = "MOTION_DATASET"
const MAX_CAR_SIZE = "31.5GiB"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Use camel case to stay consistent with general go naming conventions this repo follows?
  • Don't export these values?

DefaultText: "Local storage is used",
},
&cli.StringFlag{
Name: "remoteSingularityAPIUrl",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add experimental prefix?

if singularityAPIUrl != "" {
client = httpclient.NewHTTPClient(http.DefaultClient, singularityAPIUrl)
} else {
db, err := database.OpenWithDefaults("sqlite:" + storeDir + "/singularity.db")
Copy link
Member

@masih masih Jul 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious what would the benefit of supporting this?

For the Singularity options to work, singularity needs to be running anyway and running singularity API to then only communicate remotely seems like a small price to be for a much less error prone setup, where Singularity would have full ownership of its own database.

singularityAPIUrl := cctx.String("remoteSingularityAPIUrl")
var client client.Client
if singularityAPIUrl != "" {
client = httpclient.NewHTTPClient(http.DefaultClient, singularityAPIUrl)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a question on Singularity repo about the long term vision for this HTTP client.

If it's not going to be a 1:1 API to the HTTP API then would it make sense to call this something else considering there is now an auto-generated HTTP client library that fully covers the Singularity API?

source, err := l.singularityClient.CreateLocalSource(ctx, MOTION_DATASET_NAME, datasource.LocalRequest{
SourcePath: l.local.dir,
RescanInterval: "0",
DeleteAfterExport: false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means stored files will remain in local directory indefinietly. Correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will delete once retrieval is available. though possibly not through this flag.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
M2.0 Motion API Storage to Filecoin MVP This is included within the MVP
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants