Commit Graph

149 Commits

Author SHA1 Message Date
Pankaj Garg e174af7838 Use iftop to collect network bandwidth usage (#6560)
* Use iftop to collect network bandwidth usage

* fix shellcheck

* more shellchecks

* review comments
2019-10-26 00:06:46 -07:00
Michael Vines e103789994
Ignore exit code when the first mount fails 2019-10-25 10:11:32 -07:00
Michael Vines 1c91c1e880
Remount /mnt/extra-disk on reboot 2019-10-24 20:14:26 -07:00
Michael Vines 35d6196384
Surface nvidia-smi errors in CI 2019-10-23 10:59:30 -07:00
Sagar Dhawan 4c515d0ef1
Sagar: Add ssh keys for colo (#6507) 2019-10-22 15:59:39 -07:00
Michael Vines f80a5b8c34
Remove some TODOs (#6488)
* Remove stale TODOs

* Ban TODO markers from markdown

* Scrub all TODOs from ci/ and book/
2019-10-21 22:25:06 -07:00
Greg Fitzgerald 3b9b9b1500 Rename remaining uses of fullnode to validator (#6476)
automerge
2019-10-21 20:21:21 -07:00
Michael Vines 3fb70b8d47
Ban XXX, TBD, FIXME comments (#6486) 2019-10-21 16:43:11 -07:00
Trent Nelson 934f69b660 Colo verbosity (#6473)
automerge
2019-10-21 13:49:12 -07:00
Sunny Gleason 951e1f8b48 feat: grant access to sunny@ (#6471) 2019-10-21 11:17:06 -07:00
Trent Nelson 0fc3c7eee2 Bump Trent's keys... (#6445)
automerge
2019-10-18 15:42:50 -07:00
Pankaj Garg 854c62e208 Reduce kernel networking buffer for rmem and wmem (#6422)
automerge
2019-10-17 14:52:24 -07:00
Trent Nelson 1759968c1e Colo: Put NVMe disks to use (#6357)
automerge
2019-10-17 14:44:45 -07:00
Michael Vines 9267931ef6 Add support for preemptible GCP instances 2019-10-16 08:10:31 -07:00
Greg Fitzgerald 322fcea6e5
More fullnode to validator renaming (#6337) 2019-10-11 13:30:52 -06:00
Trent Nelson 4713cb8675 Colo: Prefer public IPs, part 2 (#6297)
automerge
2019-10-09 15:17:24 -07:00
Trent Nelson fdaee4ab17
Colo: Add running process cleanup to delete logic (#6281) 2019-10-09 15:49:33 -06:00
Justin Starry 95d15dc720
Add jstarry to authorized keys (#6293) 2019-10-09 15:04:44 -04:00
Trent Nelson 667f9e0d79 Colo: Factor out inlined scripts to own files (#6266)
automerge
2019-10-07 22:05:36 -07:00
Trent Nelson 57916f8be6 Colo: Prefer public IPs (#6264)
automerge
2019-10-07 20:44:57 -07:00
Pankaj Garg a05d772aa9
Add colo access pubkey (#6232)
* Add colo access pubkey

* Change the key to ed25519
2019-10-03 19:55:39 -07:00
Dan Albert 58139ce5ae
Add buildkite-agent key for colo access (#6205) 2019-10-01 13:24:04 -07:00
sakridge f97d33e3a7
Add sakridge pubkey (#6142) 2019-09-27 10:55:38 -07:00
Trent Nelson c4ed80d544 colo-utils: Disable StrictHostKeyChecking for SSH calls (#6117)
automerge
2019-09-26 11:22:07 -07:00
Dan Albert 93ad637c5c
typo 2019-09-25 16:58:53 -04:00
Trent Nelson 02647c25a9 net: Add Trent's work laptop pubkey (#6022)
automerge
2019-09-23 10:25:36 -07:00
Trent Nelson 2636a9c9f1 Add script for managing colo resourse ala gce.sh (#5854)
automerge
2019-09-19 14:08:22 -07:00
Trent Nelson 4c54245969 net/gce.sh: Sync cloud_CreateInstances docs and usage (#5982)
automerge
2019-09-19 13:28:25 -07:00
Sunny Gleason 51b3451e20 feat: use redis version 5+ via ppa:chris-lea (#5981) 2019-09-19 12:04:06 -07:00
Dan Albert 742562fc2e
Set maintenance policy to terminate and restart for GCE (#5935) 2019-09-18 10:38:38 -07:00
Michael Vines 92a5979558 net/config/ is now shellcheck compliant (#5888)
automerge
2019-09-12 16:11:13 -07:00
Michael Vines fc4aa71193
GCE-based nodes now reboot on maintenance events instead of terminating (#5861) 2019-09-10 12:30:06 -07:00
Trent Nelson 8362b408d9
Move testnet ssh key (#5770)
* Factor out hardcoded testnet ssh key path

* Build/create test net ssh key path

* Rename testnet ssh dir

* Give testnetSSHDir a more generic name

* shellcheck

* favor hardcoded paths over `paths.sh`

* Put instance-startup-complete stamp in the scratch dir as well

* Rename `/solana` > `/solana-scratch`
2019-09-03 18:51:16 -06:00
Trent Nelson 36fcb4fbca Add trent's workstation pubkey to authorized keys script (#5748)
automerge
2019-08-30 10:13:55 -07:00
Michael Vines 33e7e23484
Update ubuntu image 2019-08-29 14:40:08 -07:00
Michael Vines 1363841f32
Fix testnet deployment 2019-08-15 08:32:10 -07:00
TristanDebrunner 79416381dc
Add pubkey setup for datacenter nodes (#5514) 2019-08-14 14:25:56 -06:00
Michael Vines 6085109171 Delete terminated GCP instances (#5490)
automerge
2019-08-12 08:28:58 -07:00
Michael Vines bd7e269280 Kill rsync (#5336)
automerge
2019-07-30 22:43:47 -07:00
Dan Albert 21cef2fe21
Do not attempt to create solana user multiple times (#5228)
* Do not attempt to create solana user multiple times
2019-07-22 16:13:08 -06:00
Jack May 4a02914b30
Add pub key authorized list 2019-07-12 12:34:17 -07:00
Dan Albert f093377805
apt-get update before installing certbot (#5054)
* apt-get update before installing certbot
2019-07-12 11:50:40 -06:00
Dan Albert e4861f52e0
Add support for additional disks for config-local (#5030)
* Add support for additional disks for config-local

* Restore wrongly deleted lines

* Shellcheck

* add args in the right place dummy

* Fix nits

* typo

* var naming cleanup

* Add stub function for remaining cloud providers
2019-07-11 16:23:32 -06:00
Michael Vines 0a949677f0 net/ plumbing to manage LetsEncrypt TLS certificates (#4985)
automerge
2019-07-09 15:45:46 -07:00
carllin 1033f52877
Add pubkey (#4971) 2019-07-09 00:54:22 -07:00
Sathish 96b56fa6f7 Update authorized public key (#4783) 2019-06-22 08:33:39 -07:00
Michael Vines bd884a56bf
Install libssl1.1 better 2019-06-14 08:01:22 -07:00
carllin 73491e3ca1
bump libssl (#4634) 2019-06-10 18:03:13 -07:00
Michael Vines 471465a5f4
net/: Add solana-install test to sanity (#4438)
* Add instance creation date to motd

* Setup localtime

* Add solana-install test
2019-05-26 11:17:07 -07:00
Michael Vines 458ae3fdac Switch to instances with AVX-512 if possible for better interop with dev machines (#4328)
automerge
2019-05-17 20:06:07 -07:00
Michael Vines 50f79e495e net/ improvements (#4257)
automerge
2019-05-11 22:54:50 -07:00
Pankaj Garg 5719b8f251 Change remote node's ssh config to allow more login retries (#4215)
automerge
2019-05-08 11:20:06 -07:00
Michael Vines 950d8494ba earlyoom: Stop using unsupported -k option (#4096)
automerge
2019-05-01 11:29:02 -07:00
Michael Vines d21fa4a177
v0.14: various net/ fixes for large clusters (#4080)
* net.sh: Add -F to discard validator nodes that didn't bootup successfully

* Relax sanity node count when validator bootup failure is permitted

* Less sanity for testnet-demo

* net.sh: Add -F to discard validator nodes that didn't bootup successfully
2019-04-29 21:38:32 -07:00
Michael Vines 6f56501034
Correctly terminate instances across multiple zones 2019-04-28 09:09:02 -07:00
Dan Albert d12705f9b0
Remove wait loops in non-GPU instance creation and add SSD option as default disk type (#3992) 2019-04-25 13:43:42 -06:00
Pankaj Garg e867ce0944
Find unique zones and delete nodes in each zone (#3978) 2019-04-24 17:50:42 -07:00
Dan Albert 4e7e5ace9d
Add support for Azure instances in testnet creation (#3905)
* Add support for Azure instances in testnet creation

* Fixup

* Fix shellcheck errors

* More shellcheck and cleanup node creation and deletion

* More shellcheck and cleanup node creation and deletion

* Fixup instance wait API

* Fix revieew comments and add GPU installation extension
2019-04-23 16:41:45 -06:00
Pankaj Garg d83a71d89f
More AWS regions for testnet deployment (#3911)
- also some minor fixes to gce.sh
2019-04-19 17:46:14 -07:00
Pankaj Garg 8999bfef65
Try to delete nodes in all cloud zones (#3874) 2019-04-18 13:16:14 -07:00
sakridge 684e1c73dd
Allow for custom cpu config on gce and use 20gb ram for clients (#3856) 2019-04-18 09:36:11 -07:00
Pankaj Garg 9cd555cad5 AWS script change for additional zones and regions 2019-04-04 15:59:59 -07:00
Pankaj Garg 15b945a652 Fix EC2 scripts for blockstream startup 2019-03-28 15:37:23 -07:00
Pankaj Garg ed48c495a3 fix shell-check errors 2019-03-27 18:05:17 -07:00
Pankaj Garg f0abd06a46 Added support for multi-region cloud testnet 2019-03-27 18:05:17 -07:00
Michael Vines 7498488f5f
cloud_DeleteInstances() now waits for the instances to be terminated 2019-03-14 21:15:00 -07:00
Michael Vines 5d27f221f7 Drop socat for iptables 2019-03-13 12:03:56 -05:00
Rob Walker a799f8f4b1
tell blockexplorer to run on port 8080 (#3237)
* tell blockexplorer to run on port 8080

* forward port 80 to 5000 for a blockexplorer node
2019-03-12 13:39:09 -07:00
Michael Vines a444cac2aa Switch to upstream AMIs for non-CUDA EC2 testnets 2019-02-18 18:59:56 -08:00
Michael Vines 1e714eb6b2 Generate ec2 security group programmatically 2019-02-18 18:59:56 -08:00
Michael Vines 9eb8b67b5c
Install blockexplorer dependencies 2019-02-15 20:17:46 -08:00
Michael Vines f5bbc5e961
Fix args 2018-12-23 20:56:13 -08:00
Michael Vines 753a783ba9
Add solana user to adm group for /var/log/syslog access 2018-12-23 17:28:35 -08:00
Pankaj Garg 41f8764232
Ignore error while enabling nvidia persistence mode (#2265) 2018-12-21 12:37:51 -08:00
Pankaj Garg 4bf797c8f1
Load nvidia drivers on node startup (#2263)
* Load nvidia drivers on node startup

* added new script to enable nvidia driver persistent mode

* remove set -ex
2018-12-21 11:43:52 -08:00
Michael Vines b34e197424
Add newline at end of file 2018-12-06 17:46:46 -08:00
carllin aecb06cd2a
Update versions in install-libssl-compatibility.sh (#2044) 2018-12-06 15:57:30 -08:00
Michael Vines c5b1bc1128 Remove obsolete update-default-cuda.sh 2018-11-12 20:31:16 -08:00
Michael Vines 3466f139a4 set -e shuffling 2018-11-11 16:24:36 -08:00
Michael Vines def7d156f6 codemod --extensions sh '#!/usr/bin/env bash -e' '#!/usr/bin/env bash\nset -e' 2018-11-11 16:24:36 -08:00
Michael Vines 33aab094ef codemod --extensions sh '#!/bin/bash' '#!/usr/bin/env bash' 2018-11-11 16:24:36 -08:00
Michael Vines 49014393e1 Be less fancy for bash 4.4 compat 2018-11-10 18:05:55 -08:00
Michael Vines 818d03c835
Bump earlyoom version 2018-11-10 15:56:17 -08:00
Michael Vines 51ed48941b
Continue if docker0 is not present 2018-11-07 19:33:20 -08:00
Michael Vines f8f11b7f50
Remove docker0 interface if present 2018-11-07 18:23:24 -08:00
Michael Vines b02b636b36
Support local tarball deploys 2018-11-07 14:44:40 -08:00
Michael Vines cd18a1b7db
t 2018-11-06 14:08:47 -08:00
Michael Vines eae9372a5d Upgrade GCP CPU-based testnet to 18.04 2018-11-04 19:18:47 -08:00
Pankaj Garg 3cc78d3a41
Added a new remote node configuration script to set rmem/wmem (#1647)
* Added a new remote node configuration script to set rmem/wmem

* Update common.sh for rmem/wmem configuration
2018-10-30 09:17:35 -07:00
Pankaj Garg dfde83bdce
Wildcard early OOM deb package revision (#1554) 2018-10-19 14:17:19 -07:00
Michael Vines b1e941cab9
Return all instances 2018-10-01 07:51:48 -07:00
sakridge 3199f174a3
Add option to pass boot disk type to gce create (#1308) 2018-09-22 16:43:47 -07:00
Michael Vines 155ee8792f Add GPU support to ec2-provider 2018-09-17 09:26:25 -07:00
Michael Vines f89f121d2b Add AWS EC2 support 2018-09-17 09:26:25 -07:00
Michael Vines ee74b367ce Add docker install script 2018-09-12 17:09:37 -07:00
Michael Vines ebcac3c2d1 Use a common solana user on all testnet instances 2018-09-08 22:34:26 -07:00
Michael Vines 1d6c4aacae Retry rsync a couple times before failing 2018-09-08 13:59:45 -07:00
Michael Vines 9f5c86e60c Install earlyoom at gce instance startup 2018-09-08 13:59:45 -07:00
Michael Vines 9f413fd656 Establish net/scripts/... for better scoping 2018-09-08 13:59:45 -07:00