-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define and use parsed address segment representation inside Multiaddr
instead of Vec<u8>
#25
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I saw an open pr and decided to review it to better understand multiaddr ;) Overall, the pr looks good. It's definitely an improvement. I have just a couple of minor grumbles
The Multiaddr main struct also needs to carry a (useless) field with the serialized byte representation of the struct to ensure backwards-compatiblity with the .as_slice() method
I don't agree with this point. Cargo.toml
versions and Cargo.lock
files could be used to use the old version of the library. imo, api for backwards compatibility is almost always redundant thanks to cargo toolchain
src/protocol.rs
Outdated
} | ||
|
||
fn to_stream(&self, stream: &mut io::Write) -> io::Result<()> { | ||
Ok(stream.write_u16::<BigEndian>(self.0)?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result types are the same. it is enough to write
stream.write_u16::<BigEndian>(self.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! (Originally this function was named to_bytes(&self) -> Result<Vec<u8>>
and the try!()
had been necessary.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the code had been correct: stream.write_u16(…)
returned an byteorder::Result
for the byteorder
0.6 crate. It is updated to version 1.1 now.
src/protocol.rs
Outdated
( $name:ident : Display $( $rest:tt )* ) => { | ||
impl fmt::Display for $name { | ||
fn fmt(&self, _: &mut fmt::Formatter) -> fmt::Result { | ||
Ok(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo, this is super misleading. If display is empty, do not implement it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the definition of AddressSegement
to only require the implementation of ToString
for each address segement type. Is that good enough?
Otherwise the macro implementation of Display
for the Segment
enumeration type will become really painful/impossible:
impl fmt::Display for Segment {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
$( &Segment::$var(ref addr) => {
f.write_str(addr.protocol().as_str())?;
if $addr_type::STREAM_LENGTH > 0 {
f.write_char('/')?;
f.write_str(addr.to_string().as_ref())?;
}
Ok(())
}, )*
// Need extra match arm because of non-exhaustiveness
_ => unreachable!()
}
}
}
src/protocol.rs
Outdated
{$( $val:expr => $var:ident: $alph:expr, $size:expr, )*} => { | ||
|
||
/// Single multiaddress segment with its attached data | ||
pub trait AddressSegment : fmt::Display + ToString { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it depend on fmt::Display
? btw. And ToString
is automatically implemented for any type which implements the Display trait docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought it would be nice to support rendering multiaddr segements as part of formatting strings. The ToString
is mostly there to ensure that fact that everything implementing AddressSegement
also implements ToString
is highlighted in the documentation and is part of the intended API (instead of being just a fortunate accident).
src/protocol.rs
Outdated
@@ -82,48 +590,143 @@ macro_rules! build_protocol_enum { | |||
/// | |||
pub fn size(&self) -> isize { | |||
match *self { | |||
$( $var => $size, )* | |||
$( Protocol::$var => ($addr_type::STREAM_LENGTH * 8) as isize, )* | |||
_ => unreachable!() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more readable if these unreachabe!
macros described why they are unreachable ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of them did 😉
Fixed!
src/protocol.rs
Outdated
match *self { | ||
$( $var => $alph.to_string(), )* | ||
} | ||
self.as_str().to_string() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's be better to implement fmt::Display
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
src/protocol.rs
Outdated
} | ||
|
||
fn to_stream(&self, stream: &mut io::Write) -> io::Result<()> { | ||
Ok(stream.write_u16::<BigEndian>(self.0)?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result types are the same. it is enough to write
stream.write_u16::<BigEndian>(self.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
src/protocol.rs
Outdated
|
||
// Read CID hash from stream | ||
let cid = Self::_read_cid_polyfill(stream).map_err(|err| { | ||
println!("CID Parsing failed!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use log crate instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is left-over debugging code 🙂
Do you think it's worth adding logging for this anyhow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, I don't have a strong opinion about that. Usually I prefer to keep the logs more 'top-level', but it's up to you, cause you know better if it's needed or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, now that I think about it again I'm going to remove it. The actual solution is fix up the cid
crate so that we don't need _read_cid_polyfill
anymore.
src/protocol.rs
Outdated
// currently only implement them for up to 32 entries | ||
impl fmt::Debug for OnionSegment { | ||
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
write!(f, "OnionSegment({:?})", &*self.0 as &[u8]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use debug_struct
instead docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Learnt something! 🙂
src/protocol.rs
Outdated
} | ||
|
||
fn to_stream(&self, stream: &mut io::Write) -> io::Result<()> { | ||
Ok(stream.write_u16::<BigEndian>(self.0)?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result types are the same. it is enough to write
stream.write_u16::<BigEndian>(self.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
type Err = Error; | ||
|
||
fn from_str(s: &str) -> Result<Self> { | ||
Ok($name(FromStr::from_str(s)?)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's more idiomatic to write s.parse()
instead of FromStr::from_str(s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO using FromStr::from_str
here makes it more obvious that we are simply delegating the call to the inner type (otherwise I would prefer .parse()
as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense! ;)
src/protocol.rs
Outdated
} | ||
let debug_ref = F(&*self.0 as &[u8]); | ||
|
||
f.debug_tuple("OnionSegment").field(&debug_ref).finish() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that &[u8]
does not implement Debug
😞 In my projects, we usually display it as hex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does implement Debug
, but one cannot cast &[u8]
(an unsized type) to &Debug
apparently. Hence the (not too hard to understand IMHO) workaround.
But if one library uses the old version of |
…sing a `Vec` Update `protocol.rs` to serialize its bytes to a Rust byte stream
…<Addr>` instead of `Vec<u8>` (multiformats#19)
…onsist of 1 or 2 parts)
Only implement `ToString` for empty address segments Write both the address segment name and value on `protocol::Segment`'s `Display`
I just read the WikiPedia page on |
Thank you @Alexander255 for all the work here and thank you @debris for the review. I really like this direction but haven't had time to dig through it yet though. I will try and get to it in the next days. |
@dignifiedquire Thanks! I still plan on adding some tests (currently tracking down a bug found by this), before this should be committed through. |
@dignifiedquire: I fixed some stuff, exposed byte stream serializing and parsing in Also I did all changes in a backwards compatible way, but if you are willing to break compatibility to get rid of the ugly left-overs caused by this I'd be willing to prepare a separate PR for that. (i.e.: This PR would result in version |
Another note: This PR implements a design pretty much opposite to what #11 is describing. Individual address segments should be able to be used as trait objects using the |
@Alexander255 have you considered a Related: #15 |
@crackcomm: I have considered adding a special address segment enum variant that can wrap any existing While all of these things are totally implementable, it basically boils down to a design decision: @dignifiedquire: Thoughts on this? |
While having user extension is a nice feature, currently for multiaddr the process is to propose a new address format in https://github.com/multiformats/multiaddr and after it lands there it is adopted by implementations. So I think it is fine to have it implemented in the perf oriented way that is done now. So given that I would like to see concrete use cases for those custom handlers before changing the architecture of the module to enable it. |
@dignifiedquire good, it is fine, for now. |
@tomaka mind taking the lead and reviewing this PR? |
I don't know if this PR is the appropriate place to discuss this, but I've gone a different way in our local fork: https://github.com/libp2p/rust-libp2p/tree/cace5bf577c53809aad514db40e72f7290e030d4/rust-multiaddr In libp2p, a multiaddr is generally stored somewhere in memory, is often sent/received over the network, and is often hashed and compared with other multiaddresses, but it is not often decoded. You only need to decode a multiaddr when you want to open a new connection to it, which doesn't happen that often. In addition to this, our local fork adds an The only major downside, however, is that with a |
closing this, as the impl stores the byte representation now, which is for many use cases quite good. happy to reconsider if there is a concrete demonstrable perf or correctness issue. |
This PR rewrites most of
src/protocol.rs
to provide a companion type for each variant ofprotocol::Protocol
that can store a fully deserialized representation of the given address segment's data.Known caveats:
protocol::IPFSSegment
type is pretty verbose because it has to replicate some functionality from thecid
crate that is not exposed in a reusable wayMultiaddr
main struct also needs to carry a (useless) field with the serialized byte representation of the struct to ensure backwards-compatiblity with the.as_slice()
methodProtocol
andSegment
enumerations feel somewhat redundant, but I can't think of any good way of unifying them without starting to work around the type system.(4) will be fixed
(1) I will propably fix at the source myself as well
(2) can be fixed any time if you're willing to break backward-compatibility
Any other comments about the code, approach, etc.?