-
Notifications
You must be signed in to change notification settings - Fork 40
Storage nodes #89
Comments
❤️ I'm just lucky to have awful outbound bandwidth so that I'm not tempted to do this myself :) |
Have seen them! they're so cool. they heat up the archive's main hall. |
@jbenet what kind of redundancy do you want? If youre doing 4 disks, I would recommend raid10, which cuts your total storage in half, but if a disk fails youre fine. If you put 4 disks in raid5, and you lose one, its going to take 5-10 times longer to repair than raid10 will. (which is quite a long time when we're looking at 6TB disks). If youre doing 5 or more disks, you will want to do raid6 (2xparity). But if you dont care about redundancy then press the raid0 button and lets take this train to storage town! As for the drives themselves, i personally use WD reds While I dont use them (mainly because they used to be a lot more expensive), Hitachi (now HGST) makes some of the best drives around, and you cant go wrong buying them. |
I've been debating building another storage node for myself, but I think im going to wait for the SSDPocalypse. |
Personally I don't like all that plugging laying around. We made good experience using HP ProLiant MicroServer with FreeNAS on them (zfs <3) |
@davidar for the short/middle term, could you recommend any hoster offering large HDDs, and I'll get a node with them? |
@cryptix mmmmm, those are pretty. only four drive bays though... and it doesnt appear to have anywhere for a small boot drive, so unless we feel like being super brave and booting from zfs, we could only get three drives in raid, which limits us on space. I do agree with @cryptix though that having the drives in an external enclosure and having them plugged in via usb seems super sketch. There is this guy: http://www.amazon.com/Synology-DiskStation-Diskless-Attached-DS1813/dp/B00CRB53CU eight drive bays and looks to run linux. Super tasty looking :) |
@whyrusleeping We removed the top CD-Drive (spinning plastic, lol) and replaced it with the drive to boot from. I'm not sure if this is still possible on the gen8 models, as it looks like they have a smaller CD drive. |
@lgierth I've heard about https://www.ovh.com/us/dedicated-servers/storage/ but can't vouch for them personally. |
Hetzner now has 6x4TB boxes for 82 EUR/month, with 30TB outgoing traffic: https://www.hetzner.de/hosting/produkte_rootserver/sx61 I'd just get one of these and we'll have some headroom for a while. @whyrusleeping would like RAID6 on ZFS or btrfs. |
I have experience here. Hetzner is lower priced, but has low bandwidth caps (100Mbit). If you get DDoSed they null route your server. They've also been whining about the netscan stuff (you'll need to filter out local IP scans). OVH FS is much better (500Mbit busted to 1Gb with 160Gb DDoS mitigation), I've had very good success with them for infrastructure, particularly their "enterprise" dedicated servers. The one annoying thing is that their billing is not automatic which is pretty annoying. I use OVH as part of the Neocities infrastructure, and have a Hetzner server for backups. |
Sounds good, maybe let's try out OVH then? |
OVH and Hetzner as the "different centralized host" backup might be a good approach here. |
OVH has conditions for the bandwidth guarantee though, and I can see IPFS fit into several of them:
I'd ask support to clarify if IPFS is affected by these, or do you wanna do this as an existing customer @kyledrake? |
Reopening this issue because OVH's UI is hell. Any other suggestions for dedicated storage providers? Otherwise I'd just say let's go with S3? Is the S3 blockstore a thing? |
@whyrusleeping and I poked at the S3 blockstore a few days ago, and ran into some issues. Apparently the version of the S3 library implemented is now obsoleted, but there's a new fork that is caught up and improved. @whyrusleeping was going to look into updating it but then discovered the license was LGPL and could be a problem? @jbenet I think you might be the decision maker on that one. It would be really great to use S3 blockstore, assuming we can get it to work fast enough (S3 is slooooow and not designed for data chunking - careful attention needs to be paid to performance issues, and caching of some sort may be required). |
@kyledrake lets switch to using this lib: https://github.com/rlmcpherson/s3gof3r it meets my criteria of:
|
A bit off topic, but some thoughts to consider RE S3 performance: S3 as I understand it allows Keepalive and HTTP parallel Multipart upload. Additionally as I understand it, S3 only charges once an HTTP method is executed, not for the connections themselves (needs proof?) In combination, I believe an S3 library could be built that has lower latency for our use case than the current models allow. My idea is as such: For Keepalive, you could optimistically create a thread pool (or whatever the Go equivalent is) of active Keepalive connections to S3 (testing would reveal a good number, but try one per core to start?), and then reuse them for I/O, and refresh the pool as necessary. This depends on those keepalive connections staying up for a sufficient amount of time without any activity, an S3 policy question I haven't found the answer to yet. Parallel upload could possibly be used to improve transfer performance as well, though at 256KB chunks it may not make a big difference (testing may be needed to tune this). This, combined with de-structuring the datastore (don't split chunks into directories based on hash, it's not neccessary for S3 because it's a K/V store), and I think we'll start to see some more reasonable performance numbers for using IPFS with S3. Additionally there may need to be some sort of caching layer for the keys locally. We don't want IPFS to constantly hit S3 to check for hashes as P2P requests for content come in, it's going to potentially add up to an expensive operation and could have performance issues. Unfortunately I don't believe there is an event-driven way for S3 to announce new data to the local node, so this wouldn't be ideal for a cluster of IPFS nodes using the same datastore. I'm going to do a PoC today with some high level code with my "keep alive pool" idea and see how the AWS S3 API reacts to it. For now, definitely try s3gof3r! It seems to support parallel multipart already. https://dzone.com/articles/amazon-s3-parallel-multipart /CCing @rlmcpherson incase this is interesting to him. 😄 |
I haven't been able to pin down the timing for keepalive after headers, but pre-headers has been fairly consistent:
I'll stop spamming this ticket for now, let me know if you'd like me to file a ticket for this topic somewhere. |
@kyledrake I haven't read over this entire issue in detail, but a couple thoughts regarding S3 performance and s3gof3r:
|
@rlmcpherson That's great. I'm looking forward to trying it out! |
can we move the S3 discussion to some other issue? -- otherwise we'll be always off topic here. suggest a note in https://github.com/ipfs/notes/issues |
this way i can help pin important things, and though not on the backbone for fast speed, i can add manually manage the disks.
The text was updated successfully, but these errors were encountered: