♻️Ensure parent project/node is well structured in the DB 🗃️ #5874

sanderegg · 2024-05-27T09:37:26Z

What do these changes do?

This is the first part toward structuring node parenting.

Context

In Sim4life.io currently the API client PATCHES the created computational jobs with some free form JSON metadata. Among these is a node_id field which contains the parent Node (e.g. the node where s4l is running). The DV-2 uses that information to define the parent project/node by looking for the presence of that field. This is currently not defined anywhere and could break very easily.

Goal

Bring structure to this parenting, and allow other products to use this as easy as possible.

Idea

The oSparc API client shall automagically find out when it is running from an oSparc node by checking for some pre-defined ENV variables. For other free form API client, the usage relies on the respective authors responsibility (e.g. sim4life).

Constraint

Keep backward compatibility

Changes

Project creation POST /projects now accept a couple of headers:
- X-Simcore-Parent-Project-Uuid which optionally contains the parent project UUID
- X-Simcore-Parent-Node-Id which optionally contains the parent node ID
- if both are given then the parenting structure will be setup in the projects_metadata table and used as before
- if none are given or are both null then nothing happens
- if they are invalid an error will be returned (404 - NotFound)
- if only one of them is filled then an error will be returned (422 - Unprocessable)
to keep backward compatibility, that means the the project was created without parents,
and PATCH /projects/{uuid}/metadata is called with custom_metadata containing a node_id field:
- This will set the parenting in case the passed node_id is found,
- and that there is not already parenting set beforehand

The next steps will be re-arranging the DV-2 to directly get the information regarding parent project/node from the DB instead of extracting the information from the metadata, and taking care of the oSparc client (@bisgaard-itis )

Related issue/s

How to test

Dev-ops checklist

No ENV changes or I properly updated ENV (read the instruction)

codecov · 2024-05-27T09:40:45Z

Codecov Report

Attention: Patch coverage is 94.42060% with 13 lines in your changes are missing coverage. Please review.

Project coverage is 87.7%. Comparing base (cafbf96) to head (048f90b).
Report is 242 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff            @@
##           master   #5874      +/-   ##
=========================================
+ Coverage    84.5%   87.7%    +3.2%     
=========================================
  Files          10    1398    +1388     
  Lines         214   57580   +57366     
  Branches       25    1330    +1305     
=========================================
+ Hits          181   50554   +50373     
- Misses         23    6743    +6720     
- Partials       10     283     +273

Flag	Coverage Δ
integrationtests	`64.7% <48.1%> (?)`
unittests	`85.8% <94.4%> (+1.2%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
...core_postgres_database/models/projects_metadata.py	`100.0% <100.0%> (ø)`
...mcore_postgres_database/utils_projects_metadata.py	`100.0% <100.0%> (ø)`
.../simcore_postgres_database/utils_projects_nodes.py	`90.0% <100.0%> (ø)`
...rary/src/servicelib/aiohttp/requests_validation.py	`87.5% <100.0%> (ø)`
...s/service-library/src/servicelib/common_headers.py	`100.0% <100.0%> (ø)`
...ore_service_webserver/projects/_crud_api_create.py	`98.3% <100.0%> (ø)`
...mcore_service_webserver/projects/_crud_handlers.py	`91.3% <100.0%> (ø)`
...ervice_webserver/projects/_crud_handlers_models.py	`98.0% <100.0%> (ø)`
...imcore_service_webserver/projects/_metadata_api.py	`100.0% <100.0%> (ø)`
...c/simcore_service_webserver/projects/exceptions.py	`93.4% <100.0%> (ø)`
... and 5 more

... and 1353 files with indirect coverage changes

...src/simcore_postgres_database/migration/versions/a457d9974674_add_optional_parent_node_id.py

packages/postgres-database/src/simcore_postgres_database/utils_projects_nodes.py

matusdrobuliak66

All good, thanks! Let's just discuss the project_node_id. I think then you would not need to deal with the ProjectNodesNonUniqueNodeFoundError case.

pcrespov

thx.
This PR extends projects_metadata tables with structured metadata parent_project_uuid and parent_node_id.
Some of these values where already injected as custom metadata by S4L and compatibilyt for that case is supported.

I wonder whether this parenthood relation might also be used to map a study_id with its job_id in the new study resource in the api-server ...

packages/postgres-database/src/simcore_postgres_database/models/projects_metadata.py

packages/postgres-database/tests/conftest.py

services/web/server/src/simcore_service_webserver/projects/_metadata_api.py

services/web/server/src/simcore_service_webserver/projects/_metadata_db.py

GitHK

Very nice, I have some questions, since I could not figure out a few things.

packages/postgres-database/src/simcore_postgres_database/models/projects_metadata.py

packages/postgres-database/src/simcore_postgres_database/utils_projects_nodes.py

services/web/server/src/simcore_service_webserver/projects/_metadata_db.py

services/web/server/tests/unit/with_dbs/02/test_projects_metadata_handlers.py

bisgaard-itis

Very cool. Thanks 🚀

I am wondering if it would make sense to put this new data into a new table because I think we will want to store much more of this type of relational data soon. But can of course also be handled later.

packages/postgres-database/src/simcore_postgres_database/models/projects_metadata.py

matusdrobuliak66 · 2024-05-29T08:23:54Z

🚨 Q: @sanderegg @bisgaard-itis @pcrespov
I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

services/web/server/src/simcore_service_webserver/projects/_metadata_handlers.py

bisgaard-itis · 2024-05-29T08:54:35Z

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

Couldn't this be resolved by following the parent_ids and adding up the costs along the way? Or is this operation too costly?

matusdrobuliak66 · 2024-05-29T09:07:13Z

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

Couldn't this be resolved by following the parent_ids and adding up the costs along the way? Or is this operation too costly?

Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the root project ID somewhere.

bisgaard-itis · 2024-05-29T09:22:04Z

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

Couldn't this be resolved by following the parent_ids and adding up the costs along the way? Or is this operation too costly?

Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the root project ID somewhere.

What I imagine is something like the following: You first extract the projects which are "root projects" (meaning projects which don't have parents I guess). And then in a loop you look for their children, grandchildren, grandgrandchildren etc. So I guess the question is if that loop would be on the server or in the client. If the loop is on the client side, then the individual queries to the db would be quite fast I think 🤔 (getting all children of a given project_id). But of course if the loop is on the client side then it might not be super fast to resolve the entire thing.

matusdrobuliak66 · 2024-05-29T10:00:16Z

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

Couldn't this be resolved by following the parent_ids and adding up the costs along the way? Or is this operation too costly?

Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the root project ID somewhere.

What I imagine is something like the following: You first extract the projects which are "root projects" (meaning projects which don't have parents I guess). And then in a loop you look for their children, grandchildren, grandgrandchildren etc. So I guess the question is if that loop would be on the server or in the client. If the loop is on the client side, then the individual queries to the db would be quite fast I think 🤔 (getting all children of a given project_id). But of course if the loop is on the client side then it might not be super fast to resolve the entire thing.

Not sure whether we are on the same page. I am talking about this view:

which needs to be paginated. Users should be able to sort and filter it based on their needs, and it should also be exportable as a CSV file. For efficiency, these requirements should ideally be satisfied with a single query to the database, mainly for pagination purposes. Storing root parent id somewhere totally solves the problem. Do not forget you are not doing this operation for one specific project, but across all that you are listing.

I would propose for example having it as additional column in the projects_metadata table:

Project ID	Parent Project ID	Root Parent Project ID
A	NULL	NULL
B	A	A
C	B	A

We will just ask for root parent project id of a parent project id when inserting a new row into this table and that's it. Lets discuss this @sanderegg. Thanks.

sanderegg · 2024-05-30T08:46:39Z

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

Couldn't this be resolved by following the parent_ids and adding up the costs along the way? Or is this operation too costly?

Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the root project ID somewhere.

What I imagine is something like the following: You first extract the projects which are "root projects" (meaning projects which don't have parents I guess). And then in a loop you look for their children, grandchildren, grandgrandchildren etc. So I guess the question is if that loop would be on the server or in the client. If the loop is on the client side, then the individual queries to the db would be quite fast I think 🤔 (getting all children of a given project_id). But of course if the loop is on the client side then it might not be super fast to resolve the entire thing.

Not sure whether we are on the same page. I am talking about this view: which needs to be paginated. Users should be able to sort and filter it based on their needs, and it should also be exportable as a CSV file. For efficiency, these requirements should ideally be satisfied with a single query to the database, mainly for pagination purposes. Storing root parent id somewhere totally solves the problem. Do not forget you are not doing this operation for one specific project, but across all that you are listing.

I would propose for example having it as additional column in the projects_metadata table:

Project ID Parent Project ID Root Parent Project ID
A NULL NULL
B A A
C B A
We will just ask for root parent project id of a parent project id when inserting a new row into this table and that's it. Lets discuss this @sanderegg. Thanks.

for completeness of the discussion, I agreed to add the root project ID/root project Node ID

bisgaard-itis · 2024-05-30T09:19:32Z

🚨 Q: @sanderegg @bisgaard-itis @pcrespov I just realized that for grouping purposes, we should have a concept of a "root parent." The user wants to see the usage and pricing of their project, regardless of the nested structure of projects (which is an implementation detail on our side). To get and group this data efficiently, we need to store the root parent IDs. What do you think?

Couldn't this be resolved by following the parent_ids and adding up the costs along the way? Or is this operation too costly?

Yes, for frontend listing, filtering, and CSV export, this operation needs to be fast. Also, I cannot imagine what the query would look like, you would need to have some kind of loop with a lot of joining, which is already a red flag for me. Basically, we need to keep the root project ID somewhere.

What I imagine is something like the following: You first extract the projects which are "root projects" (meaning projects which don't have parents I guess). And then in a loop you look for their children, grandchildren, grandgrandchildren etc. So I guess the question is if that loop would be on the server or in the client. If the loop is on the client side, then the individual queries to the db would be quite fast I think 🤔 (getting all children of a given project_id). But of course if the loop is on the client side then it might not be super fast to resolve the entire thing.

Not sure whether we are on the same page. I am talking about this view: which needs to be paginated. Users should be able to sort and filter it based on their needs, and it should also be exportable as a CSV file. For efficiency, these requirements should ideally be satisfied with a single query to the database, mainly for pagination purposes. Storing root parent id somewhere totally solves the problem. Do not forget you are not doing this operation for one specific project, but across all that you are listing.
I would propose for example having it as additional column in the projects_metadata table:
Project ID Parent Project ID Root Parent Project ID
A NULL NULL
B A A
C B A
We will just ask for root parent project id of a parent project id when inserting a new row into this table and that's it. Lets discuss this @sanderegg. Thanks.

for completeness of the discussion, I agreed to add the root project ID/root project Node ID

Sounds good to me. Thanks

bisgaard-itis

Small comment I noticed when propagating these changes to the api-server

api/specs/web-server/_projects_crud.py

sonarqubecloud · 2024-06-03T06:18:39Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

GitHK

Looks good, just some minor things

packages/postgres-database/src/simcore_postgres_database/utils_projects_nodes.py

sanderegg added a:webserver issue related to the webserver service a:director-v2 issue related with the director-v2 service a:dask-service Any of the dask services: dask-scheduler/sidecar or worker labels May 27, 2024

sanderegg added this to the Leeroy Jenkins milestone May 27, 2024

sanderegg self-assigned this May 27, 2024

sanderegg mentioned this pull request May 27, 2024

Enhancement: Display statistics of computational jobs together with their parent nodes #5816

Closed

sanderegg changed the title ~~♻️Ensure parent project/node is well structured in the DB~~ ♻️Ensure parent project/node is well structured in the DB 🗃️ May 27, 2024

sanderegg marked this pull request as ready for review May 27, 2024 17:00

sanderegg requested review from pcrespov, GitHK and matusdrobuliak66 as code owners May 27, 2024 17:00

sanderegg requested review from bisgaard-itis and mguidon May 27, 2024 17:00

sanderegg mentioned this pull request May 27, 2024

🎨♻️Use structured parent project/node in director-v2 and connect with RUT #5877

Merged

1 task

matusdrobuliak66 reviewed May 28, 2024

View reviewed changes

...src/simcore_postgres_database/migration/versions/a457d9974674_add_optional_parent_node_id.py Outdated Show resolved Hide resolved

matusdrobuliak66 reviewed May 28, 2024

View reviewed changes

packages/postgres-database/src/simcore_postgres_database/utils_projects_nodes.py Show resolved Hide resolved

matusdrobuliak66 approved these changes May 28, 2024

View reviewed changes

pcrespov reviewed May 28, 2024

View reviewed changes

GitHK reviewed May 28, 2024

View reviewed changes

bisgaard-itis approved these changes May 28, 2024

View reviewed changes

packages/postgres-database/src/simcore_postgres_database/models/projects_metadata.py Show resolved Hide resolved

packages/postgres-database/src/simcore_postgres_database/models/projects_metadata.py Outdated Show resolved Hide resolved

bisgaard-itis reviewed May 29, 2024

View reviewed changes

services/web/server/src/simcore_service_webserver/projects/_metadata_handlers.py Outdated Show resolved Hide resolved

sanderegg force-pushed the tipv3/add-parent-node-id branch from 77cd689 to 0047e64 Compare May 30, 2024 17:18

sanderegg added 8 commits May 31, 2024 17:08

take care of cleaning up

fc91685

test creating parent/child projects

dfa6811

test checks that parent is correctly set

0a0e461

no need to always check all the access rights

62b64fb

@GitHK review: typo

e543fa5

minor

1065f3e

only user has user role

5c7988b

ok this works for now

b507a77

sanderegg force-pushed the tipv3/add-parent-node-id branch from c8d096a to b507a77 Compare May 31, 2024 15:08

testing done

48b3ceb

sanderegg requested review from GitHK and pcrespov May 31, 2024 16:03

sanderegg added 2 commits May 31, 2024 18:03

fully tested now?

e0b87fe

pylint

1c085e8

bisgaard-itis reviewed May 31, 2024

View reviewed changes

api/specs/web-server/_projects_crud.py Outdated Show resolved Hide resolved

sanderegg added 9 commits May 31, 2024 18:12

clone project does not set a parent

fb2d1b0

updated description

5020809

added some test

dbf598d

typo

740cc54

added missing header

54089b5

mypy

042b658

missing check

7ab4a93

fixed call

3cf18d9

fixed call

048f90b

sanderegg merged commit b77814a into ITISFoundation:master Jun 3, 2024
57 checks passed

sanderegg deleted the tipv3/add-parent-node-id branch June 3, 2024 08:10

GitHK approved these changes Jun 3, 2024

View reviewed changes

packages/postgres-database/src/simcore_postgres_database/utils_projects_nodes.py Show resolved Hide resolved

packages/postgres-database/src/simcore_postgres_database/utils_projects_nodes.py Show resolved Hide resolved

This was referenced Jun 10, 2024

🚀 Pre-release master -> staging_LeeroyJenkins5 #5928

Closed

🚀 Release v1.73.0 #5934

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

♻️Ensure parent project/node is well structured in the DB 🗃️ #5874

♻️Ensure parent project/node is well structured in the DB 🗃️ #5874

sanderegg commented May 27, 2024 •

edited

Loading

codecov bot commented May 27, 2024 •

edited

Loading

matusdrobuliak66 left a comment

pcrespov left a comment

GitHK left a comment •

edited

Loading

bisgaard-itis left a comment

matusdrobuliak66 commented May 29, 2024 •

edited

Loading

bisgaard-itis commented May 29, 2024

matusdrobuliak66 commented May 29, 2024 •

edited

Loading

bisgaard-itis commented May 29, 2024

matusdrobuliak66 commented May 29, 2024 •

edited

Loading

sanderegg commented May 30, 2024

bisgaard-itis commented May 30, 2024

bisgaard-itis left a comment •

edited

Loading

sonarqubecloud bot commented Jun 3, 2024

GitHK left a comment

♻️Ensure parent project/node is well structured in the DB 🗃️ #5874

♻️Ensure parent project/node is well structured in the DB 🗃️ #5874

Conversation

sanderegg commented May 27, 2024 • edited Loading

What do these changes do?

Context

Goal

Idea

Constraint

Changes

Related issue/s

How to test

Dev-ops checklist

codecov bot commented May 27, 2024 • edited Loading

Codecov Report

matusdrobuliak66 left a comment

Choose a reason for hiding this comment

pcrespov left a comment

Choose a reason for hiding this comment

GitHK left a comment • edited Loading

Choose a reason for hiding this comment

bisgaard-itis left a comment

Choose a reason for hiding this comment

matusdrobuliak66 commented May 29, 2024 • edited Loading

bisgaard-itis commented May 29, 2024

matusdrobuliak66 commented May 29, 2024 • edited Loading

bisgaard-itis commented May 29, 2024

matusdrobuliak66 commented May 29, 2024 • edited Loading

sanderegg commented May 30, 2024

bisgaard-itis commented May 30, 2024

bisgaard-itis left a comment • edited Loading

Choose a reason for hiding this comment

sonarqubecloud bot commented Jun 3, 2024

Quality Gate passed

GitHK left a comment

Choose a reason for hiding this comment

sanderegg commented May 27, 2024 •

edited

Loading

codecov bot commented May 27, 2024 •

edited

Loading

GitHK left a comment •

edited

Loading

matusdrobuliak66 commented May 29, 2024 •

edited

Loading

matusdrobuliak66 commented May 29, 2024 •

edited

Loading

matusdrobuliak66 commented May 29, 2024 •

edited

Loading

bisgaard-itis left a comment •

edited

Loading