Storage nodes #89

jbenet · 2015-09-16T19:09:34Z

I'll be adding a storage node in my nyc apt
I'm thinking of an intel nuc with an external USB 3.0 dock. (prob no need for external SATA, since 3 GB/s is much more than the network speed), but open to suggestions
thinking of attaching a 4/5 HDD bay, for large TB disks.
(another option is getting a mac mini shipped from @cleichner's house)
if anyone has good links/suggestions, throw them out. will make an order later this week.
will run ipfs nodes
will run cjdns too
will add to this setup and grant access for https://github.com/ipfs/archives and so on

this way i can help pin important things, and though not on the backbone for fast speed, i can add manually manage the disks.

jbenet · 2015-09-16T20:27:06Z

davidar · 2015-09-18T10:15:25Z

❤️

I'm just lucky to have awful outbound bandwidth so that I'm not tempted to do this myself :)

davidar · 2015-09-19T07:21:05Z

@jbenet After this, feel like building a petabox? ;)

jbenet · 2015-09-19T07:28:12Z

Have seen them! they're so cool. they heat up the archive's main hall.

whyrusleeping · 2015-09-19T17:32:20Z

@jbenet what kind of redundancy do you want? If youre doing 4 disks, I would recommend raid10, which cuts your total storage in half, but if a disk fails youre fine. If you put 4 disks in raid5, and you lose one, its going to take 5-10 times longer to repair than raid10 will. (which is quite a long time when we're looking at 6TB disks). If youre doing 5 or more disks, you will want to do raid6 (2xparity).

But if you dont care about redundancy then press the raid0 button and lets take this train to storage town!

As for the drives themselves, i personally use WD reds
According to the reviews, people have issues with them being DOA, but i'm 10 for 10 on getting good drives.

While I dont use them (mainly because they used to be a lot more expensive), Hitachi (now HGST) makes some of the best drives around, and you cant go wrong buying them.
http://www.newegg.com/Product/Product.aspx?Item=N82E16822145973&ignorebbr=1&cm_re=ppssNASHDD-_-22-145-973-_-Product

whyrusleeping · 2015-09-19T17:34:01Z

I've been debating building another storage node for myself, but I think im going to wait for the SSDPocalypse.

cryptix · 2015-09-19T18:54:16Z

Personally I don't like all that plugging laying around. We made good experience using HP ProLiant MicroServer with FreeNAS on them (zfs <3)

ghost · 2015-09-21T19:29:36Z

@davidar for the short/middle term, could you recommend any hoster offering large HDDs, and I'll get a node with them?

whyrusleeping · 2015-09-21T19:32:35Z

@cryptix mmmmm, those are pretty. only four drive bays though... and it doesnt appear to have anywhere for a small boot drive, so unless we feel like being super brave and booting from zfs, we could only get three drives in raid, which limits us on space.

I do agree with @cryptix though that having the drives in an external enclosure and having them plugged in via usb seems super sketch.

There is this guy: http://www.amazon.com/Synology-DiskStation-Diskless-Attached-DS1813/dp/B00CRB53CU

eight drive bays and looks to run linux. Super tasty looking :)

cryptix · 2015-09-22T07:56:23Z

@whyrusleeping We removed the top CD-Drive (spinning plastic, lol) and replaced it with the drive to boot from. I'm not sure if this is still possible on the gen8 models, as it looks like they have a smaller CD drive.

davidar · 2015-09-22T08:35:03Z

@lgierth I've heard about https://www.ovh.com/us/dedicated-servers/storage/ but can't vouch for them personally.

ghost · 2015-10-20T22:53:29Z

Hetzner now has 6x4TB boxes for 82 EUR/month, with 30TB outgoing traffic: https://www.hetzner.de/hosting/produkte_rootserver/sx61

I'd just get one of these and we'll have some headroom for a while. @whyrusleeping would like RAID6 on ZFS or btrfs.

kyledrake · 2015-10-20T23:08:05Z

I have experience here.

Hetzner is lower priced, but has low bandwidth caps (100Mbit). If you get DDoSed they null route your server. They've also been whining about the netscan stuff (you'll need to filter out local IP scans).

OVH FS is much better (500Mbit busted to 1Gb with 160Gb DDoS mitigation), I've had very good success with them for infrastructure, particularly their "enterprise" dedicated servers. The one annoying thing is that their billing is not automatic which is pretty annoying.

I use OVH as part of the Neocities infrastructure, and have a Hetzner server for backups.

jbenet · 2015-10-25T19:25:47Z

Sounds good, maybe let's try out OVH then?

kyledrake · 2015-10-25T20:48:37Z

OVH and Hetzner as the "different centralized host" backup might be a good approach here.

ghost · 2015-10-25T21:00:17Z

OVH has conditions for the bandwidth guarantee though, and I can see IPFS fit into several of them:

Bandwidth is no longer guaranteed when the server or servers are used for the following activities:

Anonymizing service (proxy), CDN service

Storage or file exchange platform (especially but not exclusively cyberlocker)

Streaming

Download platform

Service for bypassing limits imposed by the download platforms

VOD viewing platform (videos on demand)

Servers used for downloads and file sharing on P2P networks (especially non-exhaustive: Seedbox).

I'd ask support to clarify if IPFS is affected by these, or do you wanna do this as an existing customer @kyledrake?

ghost · 2015-11-26T01:11:58Z

Reopening this issue because OVH's UI is hell. Any other suggestions for dedicated storage providers?

Otherwise I'd just say let's go with S3? Is the S3 blockstore a thing?

kyledrake · 2015-11-26T03:59:18Z

@whyrusleeping and I poked at the S3 blockstore a few days ago, and ran into some issues. Apparently the version of the S3 library implemented is now obsoleted, but there's a new fork that is caught up and improved.

@whyrusleeping was going to look into updating it but then discovered the license was LGPL and could be a problem? @jbenet I think you might be the decision maker on that one.

It would be really great to use S3 blockstore, assuming we can get it to work fast enough (S3 is slooooow and not designed for data chunking - careful attention needs to be paid to performance issues, and caching of some sort may be required).

whyrusleeping · 2015-11-26T04:23:43Z

@kyledrake lets switch to using this lib: https://github.com/rlmcpherson/s3gof3r it meets my criteria of:

not LGPL
not 100 bajillion lines of code

kyledrake · 2015-11-29T23:33:12Z

A bit off topic, but some thoughts to consider RE S3 performance:

S3 as I understand it allows Keepalive and HTTP parallel Multipart upload. Additionally as I understand it, S3 only charges once an HTTP method is executed, not for the connections themselves (needs proof?)

In combination, I believe an S3 library could be built that has lower latency for our use case than the current models allow.

My idea is as such:

For Keepalive, you could optimistically create a thread pool (or whatever the Go equivalent is) of active Keepalive connections to S3 (testing would reveal a good number, but try one per core to start?), and then reuse them for I/O, and refresh the pool as necessary. This depends on those keepalive connections staying up for a sufficient amount of time without any activity, an S3 policy question I haven't found the answer to yet.

Parallel upload could possibly be used to improve transfer performance as well, though at 256KB chunks it may not make a big difference (testing may be needed to tune this).

This, combined with de-structuring the datastore (don't split chunks into directories based on hash, it's not neccessary for S3 because it's a K/V store), and I think we'll start to see some more reasonable performance numbers for using IPFS with S3.

Additionally there may need to be some sort of caching layer for the keys locally. We don't want IPFS to constantly hit S3 to check for hashes as P2P requests for content come in, it's going to potentially add up to an expensive operation and could have performance issues. Unfortunately I don't believe there is an event-driven way for S3 to announce new data to the local node, so this wouldn't be ideal for a cluster of IPFS nodes using the same datastore.

I'm going to do a PoC today with some high level code with my "keep alive pool" idea and see how the AWS S3 API reacts to it. For now, definitely try s3gof3r! It seems to support parallel multipart already.

https://dzone.com/articles/amazon-s3-parallel-multipart
https://aws.amazon.com/s3/pricing/
https://aws.amazon.com/blogs/aws/amazon-s3-multipart-upload/

/CCing @rlmcpherson incase this is interesting to him. 😄

kyledrake · 2015-11-30T08:40:30Z

I haven't been able to pin down the timing for keepalive after headers, but pre-headers has been fairly consistent:

$ time nc s3-us-west-2.amazonaws.com 80

real  0m23.068s
user  0m0.000s
sys 0m0.000s
$ time nc s3-us-west-2.amazonaws.com 80

real  0m23.073s
user  0m0.000s
sys 0m0.000s
$ time nc s3-us-west-2.amazonaws.com 80

real  0m23.249s
user  0m0.000s
sys 0m0.000s
$ time nc s3-us-west-2.amazonaws.com 80

real  0m23.175s
user  0m0.004s
sys 0m0.000s
$ time nc s3-us-west-2.amazonaws.com 80

real  0m23.151s
user  0m0.000s
sys 0m0.000s
$ time nc s3-us-west-2.amazonaws.com 80

real  0m23.187s
user  0m0.000s
sys 0m0.000s
$ time nc s3-us-west-2.amazonaws.com 80

real  0m23.119s
user  0m0.000s
sys 0m0.000s

I'll stop spamming this ticket for now, let me know if you'd like me to file a ticket for this topic somewhere.

rlmcpherson · 2015-11-30T15:41:36Z

@kyledrake I haven't read over this entire issue in detail, but a couple thoughts regarding S3 performance and s3gof3r:

s3gof3r does use an http connection pool, as implemented in the go standard library net/http package.
s3gof3r is optimized for large stream/file sizes. It always uses multipart upload, which requires at least 2 http requests for every object uploaded. If file sizes are small, it will not be the most performant option.

kyledrake · 2015-11-30T20:40:55Z

@rlmcpherson That's great. I'm looking forward to trying it out!

jbenet · 2015-12-01T16:52:31Z

can we move the S3 discussion to some other issue? -- otherwise we'll be always off topic here. suggest a note in https://github.com/ipfs/notes/issues

davidar mentioned this issue Sep 19, 2015

British Library Pictures ipfs-inactive/archives#12

Open

davidar mentioned this issue Sep 30, 2015

Project Gutenberg ipfs-inactive/archives#29

Open

RichardLitt mentioned this issue Oct 20, 2015

Sprint Oct 19th ipfs/team-mgmt#40

Closed

35 tasks

RichardLitt mentioned this issue Nov 2, 2015

Sprint Oct 26 ipfs/team-mgmt#41

Closed

47 tasks

ghost closed this as completed Nov 9, 2015

ghost reopened this Nov 26, 2015

ghost added the solarnet label Nov 29, 2015

kyledrake mentioned this issue Nov 30, 2015

Update S3 code to use s3gof3r ipfs/kubo#2016

Closed

ghost changed the title ~~another storage node~~ Storage nodes Aug 26, 2016

ghost removed the solarnet label Nov 3, 2016

ghost closed this as completed Nov 3, 2016

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storage nodes #89

Storage nodes #89

jbenet commented Sep 16, 2015

jbenet commented Sep 16, 2015

davidar commented Sep 18, 2015

davidar commented Sep 19, 2015

jbenet commented Sep 19, 2015

whyrusleeping commented Sep 19, 2015

whyrusleeping commented Sep 19, 2015

cryptix commented Sep 19, 2015

ghost commented Sep 21, 2015

whyrusleeping commented Sep 21, 2015

cryptix commented Sep 22, 2015

davidar commented Sep 22, 2015

ghost commented Oct 20, 2015

kyledrake commented Oct 20, 2015

jbenet commented Oct 25, 2015

kyledrake commented Oct 25, 2015

ghost commented Oct 25, 2015

ghost commented Nov 26, 2015

kyledrake commented Nov 26, 2015

whyrusleeping commented Nov 26, 2015

kyledrake commented Nov 29, 2015

kyledrake commented Nov 30, 2015

rlmcpherson commented Nov 30, 2015

kyledrake commented Nov 30, 2015

jbenet commented Dec 1, 2015

Storage nodes #89

Storage nodes #89

Comments

jbenet commented Sep 16, 2015

jbenet commented Sep 16, 2015

davidar commented Sep 18, 2015

davidar commented Sep 19, 2015

jbenet commented Sep 19, 2015

whyrusleeping commented Sep 19, 2015

whyrusleeping commented Sep 19, 2015

cryptix commented Sep 19, 2015

ghost commented Sep 21, 2015

whyrusleeping commented Sep 21, 2015

cryptix commented Sep 22, 2015

davidar commented Sep 22, 2015

ghost commented Oct 20, 2015

kyledrake commented Oct 20, 2015

jbenet commented Oct 25, 2015

kyledrake commented Oct 25, 2015

ghost commented Oct 25, 2015

ghost commented Nov 26, 2015

kyledrake commented Nov 26, 2015

whyrusleeping commented Nov 26, 2015

kyledrake commented Nov 29, 2015

kyledrake commented Nov 30, 2015

rlmcpherson commented Nov 30, 2015

kyledrake commented Nov 30, 2015

jbenet commented Dec 1, 2015