Skip to content
This repository has been archived by the owner on Aug 11, 2021. It is now read-only.

Pass CID into util.deserialize? #173

Closed
achingbrain opened this issue Oct 30, 2018 · 7 comments
Closed

Pass CID into util.deserialize? #173

achingbrain opened this issue Oct 30, 2018 · 7 comments

Comments

@achingbrain
Copy link
Member

achingbrain commented Oct 30, 2018

Right now when deserializing data we only pass the block's data into the util.deserialize function of the codec.

If the deserialized version of the data needs to be aware of it's own CID, it must then derive it from the passed data. For example ipld-dag-pb does this.

This has a few side effects:

  1. It's inefficient (this takes about 11% of CPU time when using ipfs-npm)
  2. It's unnecessary when we have the CID further up the call stack
  3. It's inaccurate as the CID version/hash-alg is lost so ipld-dag-pb defaults to sha2-256 which might not be what was used when the DAGNode was originally encoded thus resulting in a different CID

We could pass the CID into util.deserialize instead which would negate all of the above points.

Thoughts @ipld/javascript-team ?

@achingbrain
Copy link
Member Author

This is where the 11% figure comes from (just doing an ipfs-npm install on the js-ipfs project):

image

@vmx
Copy link
Member

vmx commented Oct 30, 2018

@achingbrain: is the problem the serialisation or the calculation of the hash?

I'm asking as if the problem is the serialisation, you could calculate the hash within util.deserialize() and pass it into DAGNode.create().

This sounds really dag-pb specific. Normally I wouldn't expect that deserialised formats know (or care) about their serialised hash. Hence I'm hesitant to change the IPLD Formats interface just for that.

@achingbrain
Copy link
Member Author

The problem is calculating the hash - it's expensive and without knowing the hashing algorithm used to create the CID, potentially inaccurate.

It's specific to any IPLD datatype that needs to know about it's CID, of which dag-pb happens to be one. In general I don't think we can predict which data types need to know about their CID and which don't.

@achingbrain
Copy link
Member Author

Unless we remove the cid/multihash from ipld-dab-pb I guess? Considering since CIDv1 there’s No guarantee it’s actually correct..

@vmx
Copy link
Member

vmx commented Oct 30, 2018

I'd argue that none of the IPLD datatypes should know about its CID as as you mention it depends on the hashing algorithm. The deserialised should be independent of that.

So dag-pb is a special case and we also know the hashing algorithm. This leaves us with the problem that calculating it needlessly is a performance hit.

I don't know much about the history of DAGNode(), would there be a chance to remove the the need that DAGNode has the multihash? Or perhaps lazy-loading it (in the sense of, it could be added later on, if it's not there it's calculated on the fly)?

@achingbrain
Copy link
Member Author

Maybe tomorrow I’ll try removing it and see what happens.

achingbrain added a commit to ipld/js-ipld-dag-pb that referenced this issue Oct 31, 2018
Follows on from ipld/js-ipld#173 (comment)

These properties are removed from the DAGNode class.

* `.multihash` is removed because they aren't multihashes any more
* `.cid` is removed to bring dag-pb in line with other ipld types
* `.serialized` is removed because storing data buffers and the
  serialized form uses too much memory - we can use the utils.serialize
  method to create the serialized form when we need it, which in this
  module is just during the tests

`.multihash` has also changed to `.cid` in the output of
`DAGLink.toJSON` and `DAGNode.toJSON` because since CIDv1 they are
not just multihashes any more; the multihash is contained within
the CID.
achingbrain added a commit to ipld/js-ipld-dag-pb that referenced this issue Oct 31, 2018
Follows on from ipld/js-ipld#173 (comment)

BREAKING CHANGE:
These properties are removed from the DAGNode class.

* `.multihash` is removed because they aren't multihashes any more
* `.cid` is removed to bring dag-pb in line with other ipld types
* `.serialized` is removed because storing data buffers and the
  serialized form uses too much memory - we can use the utils.serialize
  method to create the serialized form when we need it, which in this
  module is just during the tests

`.multihash` has also changed to `.cid` in the output of
`DAGLink.toJSON` and `DAGNode.toJSON` because since CIDv1 they are
not just multihashes any more; the multihash is contained within
the CID.
achingbrain added a commit to ipld/js-ipld-dag-pb that referenced this issue Nov 7, 2018
Follows on from ipld/js-ipld#173 (comment)

BREAKING CHANGE:
These properties are removed from the DAGNode class.

* `.multihash` is removed because they aren't multihashes any more
* `.cid` is removed to bring dag-pb in line with other ipld types
* `.serialized` is removed because storing data buffers and the
  serialized form uses too much memory - we can use the utils.serialize
  method to create the serialized form when we need it, which in this
  module is just during the tests

`.multihash` has also changed to `.cid` in the output of
`DAGLink.toJSON` and `DAGNode.toJSON` because since CIDv1 they are
not just multihashes any more; the multihash is contained within
the CID.
achingbrain added a commit to ipld/js-ipld-dag-pb that referenced this issue Nov 9, 2018
Follows on from ipld/js-ipld#173 (comment)

BREAKING CHANGE:
These properties are removed from the DAGNode class.

* `.multihash` is removed because they aren't multihashes any more
* `.cid` is removed to bring dag-pb in line with other ipld types
* `.serialized` is removed because storing data buffers and the
  serialized form uses too much memory - we can use the utils.serialize
  method to create the serialized form when we need it, which in this
  module is just during the tests

`.multihash` has also changed to `.cid` in the output of
`DAGLink.toJSON` and `DAGNode.toJSON` because since CIDv1 they are
not just multihashes any more; the multihash is contained within
the CID.
vmx pushed a commit to ipld/js-ipld-dag-pb that referenced this issue Nov 9, 2018
)

* feat: remove .cid, .multihash and .serialized properties

Follows on from ipld/js-ipld#173 (comment)

BREAKING CHANGE:
These properties are removed from the DAGNode class.

* `.multihash` is removed because they aren't multihashes any more
* `.cid` is removed to bring dag-pb in line with other ipld types
* `.serialized` is removed because storing data buffers and the
  serialized form uses too much memory - we can use the utils.serialize
  method to create the serialized form when we need it, which in this
  module is just during the tests

`.multihash` has also changed to `.cid` in the output of
`DAGLink.toJSON` and `DAGNode.toJSON` because since CIDv1 they are
not just multihashes any more; the multihash is contained within
the CID.
achingbrain added a commit that referenced this issue Nov 9, 2018
…operties

BREAKING CHANGE:
DAGNodes no longer have `.cid` or `.multihash` properties - see
ipld/js-ipld-dag-pb#99 for more and and
#173 for the performance win.

Follows on from ipld/js-ipld-dag-pb#99 and updates this module to not rely on
DAGNodes having knowledge of their CIDs.

Fixes the tests which could use a closer look at, for example they overwrite
the value of `node3` twice.
achingbrain added a commit that referenced this issue Nov 9, 2018
…operties

BREAKING CHANGE:
DAGNodes no longer have `.cid` or `.multihash` properties - see
ipld/js-ipld-dag-pb#99 for more and and
#173 for the performance win.

Follows on from ipld/js-ipld-dag-pb#99 and updates this module to not rely on
DAGNodes having knowledge of their CIDs.

Fixes the tests which could use a closer look at, for example they overwrite
the value of `node3` twice.
achingbrain added a commit that referenced this issue Nov 10, 2018
BREAKING CHANGE:
DAGNodes no longer have `.cid` or `.multihash` properties - see
ipld/js-ipld-dag-pb#99 for more and and
#173 for the performance win.

Follows on from ipld/js-ipld-dag-pb#99 and updates this module to
not rely on DAGNodes having knowledge of their CIDs.

Fixes the tests which could use a closer look at, for example they
overwrite the value of `node3` twice.
vmx pushed a commit that referenced this issue Nov 10, 2018
BREAKING CHANGE:
DAGNodes no longer have `.cid` or `.multihash` properties - see
ipld/js-ipld-dag-pb#99 for more and and
#173 for the performance win.

Follows on from ipld/js-ipld-dag-pb#99 and updates this module to
not rely on DAGNodes having knowledge of their CIDs.

Fixes the tests which could use a closer look at, for example they
overwrite the value of `node3` twice.
@achingbrain
Copy link
Member Author

This was fixed by ipld/js-ipld-dag-pb#99

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants