Commit Graph

299 Commits

Author SHA1 Message Date
Pankaj Garg 41f8764232
Ignore error while enabling nvidia persistence mode (#2265) 2018-12-21 12:37:51 -08:00
Pankaj Garg 4bf797c8f1
Load nvidia drivers on node startup (#2263)
* Load nvidia drivers on node startup

* added new script to enable nvidia driver persistent mode

* remove set -ex
2018-12-21 11:43:52 -08:00
Michael Vines c3c955b02e Build/install native programs within cargo-install-all.sh 2018-12-19 11:53:08 -08:00
Michael Vines 5c396c222a Clean up install-native-programs.sh usage 2018-12-11 23:29:05 -08:00
Michael Vines 088bab61a4 Remove |cargo install| duplication 2018-12-11 23:29:05 -08:00
Michael Vines b2d7b34082 Add |./net.sh update| command to live update all network nodes 2018-12-11 09:40:22 -08:00
Sathish 154e20484d
Use hostname in database if env is set (#2101) 2018-12-10 22:59:38 -08:00
Michael Vines 094f0a8be3 Leader rotation flag plumbing 2018-12-10 14:07:59 -08:00
Michael Vines b2ddac610c Add option to skip setup during cluster start 2018-12-10 07:47:15 -08:00
Michael Vines b54b0a1d25 Document that -P is now available for |config| 2018-12-09 15:25:27 -08:00
Michael Vines f5794de636 Clean up bootstrap leader terminology in comments and variable names 2018-12-09 15:25:27 -08:00
Carl b9743957fa Make directory to hold programs 2018-12-09 08:38:41 -08:00
Michael Vines f5569e76db Relocate native programs to deps/ subdirectory of the current executable
This layout is `cargo build` compatible, no post-build file moves
required.
2018-12-08 16:31:01 -08:00
Michael Vines 872a3317b5 Fully switch to bootstrap-leader for command-line args 2018-12-07 16:57:02 -08:00
Michael Vines 1db6a882bb rsync of genesis ledger now works for non-snap deployments 2018-12-07 16:57:02 -08:00
Michael Vines af11562627 Correct ledger path 2018-12-07 11:32:08 -08:00
Michael Vines 286f08f095 Drop old validator name, use fullnode instead 2018-12-07 11:32:08 -08:00
Michael Vines 6516c2532d Ensure native programs for the correct platform are installed 2018-12-07 11:32:08 -08:00
Michael Vines fa58da2401 Explicitly specific build variant when installing native programs 2018-12-07 11:32:08 -08:00
Michael Vines 70c149c7da Rename leader/validator to bootstrap-leader/fullnode
Only rsyncing the genesis ledger snuck in here as well
2018-12-06 19:44:47 -08:00
Michael Vines b34e197424
Add newline at end of file 2018-12-06 17:46:46 -08:00
Michael Vines c4b8f0cd2f bench-tps will now generate an ephemeral identity if not provided with one
Also simplify scripts as a result
2018-12-06 16:30:48 -08:00
carllin aecb06cd2a
Update versions in install-libssl-compatibility.sh (#2044) 2018-12-06 15:57:30 -08:00
Michael Vines f0fe089013
Adapt testnet-deploy metric datapoint names to {,bootnode-}fullnode 2018-12-06 08:04:33 -08:00
Michael Vines a6312ba98f Switch snap to bootstrap-fullnode/fullnode naming 2018-12-05 18:59:43 -08:00
Michael Vines 04a0652614 Generalize net/ from leader/validator to bootstrap-fullnode/fullnode 2018-12-05 17:11:16 -08:00
Michael Vines 5d80edd969 Properly check for failure (can't rely on `set -e` here) 2018-12-05 13:26:06 -08:00
Michael Vines 33a5d5fe93 Enable debug builds by default for better backtraces 2018-11-17 10:52:08 -08:00
Michael Vines d96a6b42a5 Move drone into its own crate 2018-11-16 20:42:21 -08:00
carllin cf95708c18 Set drone address to always be the initial network entry point (#1847)
* Set drone address to always be the initial network entry point, so that even when leaders rotate the client can still find the drone

* Extract drone address as a separate argument to bench-tps

* Add drone port to client.sh instead of setting it in bench-tps

* Add drone entrypoint to scripts

* Fix build error
2018-11-16 19:56:26 -08:00
Sathish c973de1d76
Decouple log and metrics rate (#1839)
Use separate env for log and metrics rate.

Set default log level to WARN if unset.
2018-11-15 22:27:16 -08:00
Michael Vines 83fc3c10cf Setup CUDA env for local builds 2018-11-15 08:00:52 -08:00
Michael Vines 017c281eaf Remove CUDA support from Snap 2018-11-12 20:31:16 -08:00
Michael Vines c5b1bc1128 Remove obsolete update-default-cuda.sh 2018-11-12 20:31:16 -08:00
Michael Vines 9e7b9487b0 perf-libs now drives setting CUDA_HOME 2018-11-12 18:49:15 -08:00
Michael Vines 851e012c6c Upgrade EC2 image to 18.04 with CUDA 9.2 and 10 2018-11-12 15:17:34 -08:00
Michael Vines 7f76403d0a Clean ~/solana during network start to avoid tripping over leftover files 2018-11-12 15:09:14 -08:00
Michael Vines 7ee4dec3f1 Upgrade GCE GPU image to 18.04 2018-11-12 12:18:50 -08:00
Michael Vines c07d09c011 Add net/scp.sh for easier file transfer to/from network nodes 2018-11-12 11:48:53 -08:00
Michael Vines 3466f139a4 set -e shuffling 2018-11-11 16:24:36 -08:00
Michael Vines def7d156f6 codemod --extensions sh '#!/usr/bin/env bash -e' '#!/usr/bin/env bash\nset -e' 2018-11-11 16:24:36 -08:00
Michael Vines 33aab094ef codemod --extensions sh '#!/bin/bash' '#!/usr/bin/env bash' 2018-11-11 16:24:36 -08:00
Michael Vines cf6f344ccc Add CUDA_HOME env var to permit overriding the CUDA install location 2018-11-11 16:24:18 -08:00
Michael Vines 49014393e1 Be less fancy for bash 4.4 compat 2018-11-10 18:05:55 -08:00
Michael Vines 818d03c835
Bump earlyoom version 2018-11-10 15:56:17 -08:00
Michael Vines b8261d7d83
Determine network version for tar and local deploys 2018-11-08 22:02:42 -08:00
Michael Vines 51ed48941b
Continue if docker0 is not present 2018-11-07 19:33:20 -08:00
Michael Vines 87ac549689
Work around AWS key management limitation 2018-11-07 18:48:27 -08:00
Michael Vines f8f11b7f50
Remove docker0 interface if present 2018-11-07 18:23:24 -08:00
Michael Vines 82f914e0dc
Work around AWS boot check weirdness 2018-11-07 15:46:04 -08:00
Michael Vines 9359cc69d5
Invert gpu check 2018-11-07 14:44:40 -08:00
Michael Vines b02b636b36
Support local tarball deploys 2018-11-07 14:44:40 -08:00
Michael Vines a537154c28
Remove all cuda dependencies from release tarball beyond solana-fullnode-cuda 2018-11-07 14:44:40 -08:00
Michael Vines 16d23292dc
Improve error messages 2018-11-07 10:35:10 -08:00
Michael Vines 2ef8ebe111
AWS AMIs are region specific 2018-11-07 10:05:58 -08:00
Michael Vines f8673931b8
Increase boot timeout 2018-11-07 08:32:15 -08:00
Michael Vines dd4fb7aa90 Add AWS-based nets 2018-11-07 07:47:39 -08:00
Michael Vines c4bc331663 Add support for using a release tar 2018-11-07 07:47:39 -08:00
Michael Vines cd18a1b7db
t 2018-11-06 14:08:47 -08:00
Michael Vines 6aac096c77
Add timeout to prevent a stuck ssh 2018-11-06 14:08:28 -08:00
Michael Vines 7b58bd621a
Remove node check from client start-up
If the network loses a validator or two, it's the job of the sanity
check to detect this not the bench clients
2018-11-06 13:57:06 -08:00
Michael Vines 1a7830f460
Set imageName if G 2018-11-05 13:33:42 -08:00
Michael Vines 8041461a07
Bump EC2 validator machine type 2018-11-05 08:47:51 -08:00
Michael Vines eae9372a5d Upgrade GCP CPU-based testnet to 18.04 2018-11-04 19:18:47 -08:00
Michael Vines f3b04894b9 Try harder to snap download 2018-11-03 00:29:13 +00:00
Pankaj Garg 85869552e0
Update testnet scripts to use release tar ball (#1660)
* Update testnet scripts to use release tar ball

* use curl instead of s3cmd
2018-10-30 18:05:38 -07:00
Pankaj Garg 3cc78d3a41
Added a new remote node configuration script to set rmem/wmem (#1647)
* Added a new remote node configuration script to set rmem/wmem

* Update common.sh for rmem/wmem configuration
2018-10-30 09:17:35 -07:00
Pankaj Garg fbde9bb731
Run bench-tps for longer duration in testnet (#1638)
- Increased to 2+ hours
2018-10-29 15:03:08 -07:00
Pankaj Garg 7abd456d45
Increase rmem and wmem for remote nodes in testnet (#1635) 2018-10-29 13:04:54 -07:00
Michael Vines 489894cb32
Mention logs more 2018-10-27 08:49:52 -07:00
Pankaj Garg dfde83bdce
Wildcard early OOM deb package revision (#1554) 2018-10-19 14:17:19 -07:00
Pankaj Garg 30c79fd40d
Change validator node machine type (#1537)
- The current nodes are using lower RAM compared to leader/clients
2018-10-17 17:16:50 -07:00
Pankaj Garg 32fc0cd7e9
Fix bug introduced during RUST_LOG escaping (#1507)
* Fix bug introduced during RUST_LOG escaping
- remote node configuration should not be quoted

* shellcheck disable SC2090
2018-10-15 16:49:22 -07:00
Pankaj Garg 9fc30f6db4
Escape RUST_LOG configuration in remote-node.sh (#1489)
* Escape RUST_LOG configuration in remote-node.sh

- If it was set to #, it was causing other parameters to be commented out

* escape other variables as well

* disabled shell check

* Fix shellcheck error
2018-10-13 13:35:54 -07:00
Michael Vines 5c523716aa Ship native programs 2018-10-10 16:49:48 -07:00
Pankaj Garg 0a39722719
Add support to trigger testnet from a PR (#1434)
* Add support for different node counts

* Update variable names

* Delete network even after failures

* Add array for node counts

* Changed number of nodes to a space separated string of numbers

* Adjust number of nodes

* Snap will not be published if the env variable DO_NOT_PUBLISH_SNAP is set

* Address review comments

* Replaced influx db URL
2018-10-05 16:32:05 -07:00
Michael Vines b1e941cab9
Return all instances 2018-10-01 07:51:48 -07:00
Pankaj Garg 7fb7839c8f
Configure GPU type/count from command line in GCE scripts (#1376)
* Configure GPU type/count from command line in GCE scripts

* Change CLI to input full leader machine type information with GPU
2018-09-27 11:55:56 -07:00
sakridge 3199f174a3
Add option to pass boot disk type to gce create (#1308) 2018-09-22 16:43:47 -07:00
Tyera Eulberg f273351789 Add missing port number 2018-09-18 09:36:54 -06:00
Tyera Eulberg 0125163190 Remove wallet.sh, update entrypoint syntax for wallet network argument 2018-09-17 11:53:33 -06:00
Michael Vines 155ee8792f Add GPU support to ec2-provider 2018-09-17 09:26:25 -07:00
Michael Vines f89f121d2b Add AWS EC2 support 2018-09-17 09:26:25 -07:00
Pankaj Garg be7cce1fd2
Tweak GCE scripts for higher node count (#1229)
* Tweak GCE scripts for higher node count

- Some validators were unable to rsync config from leader when
  the node count was high (e.g. 25). Looks like the leader node was
  getting more rsync requests in parallel than it count handle.
- This change staggers the validators bootup, and rsync time

* Address review comments
2018-09-14 17:17:08 -07:00
Michael Vines ee74b367ce Add docker install script 2018-09-12 17:09:37 -07:00
Michael Vines f06113500d bench-tps/net sanity: add ability to check for unexpected extra nodes 2018-09-12 15:38:57 -07:00
Michael Vines af3eb5a16c
.sh 2018-09-11 11:29:49 -07:00
Pankaj Garg 1c17c6dd2b
Report UDP network statistics (#1176)
* Report UDP network statistics

Fixes #1093

* Address review comments

* Address additional review comments

* Fix shellcheck errors
2018-09-10 15:52:08 -07:00
Michael Vines ebcac3c2d1 Use a common solana user on all testnet instances 2018-09-08 22:34:26 -07:00
Michael Vines 5afcdcbbe6
More log grooming 2018-09-08 14:16:34 -07:00
Michael Vines 3840b4b516
Groom log output 2018-09-08 14:10:18 -07:00
Michael Vines 7aeb6d642b Display log file 2018-09-08 13:59:45 -07:00
Michael Vines 1d6c4aacae Retry rsync a couple times before failing 2018-09-08 13:59:45 -07:00
Michael Vines 9f5c86e60c Install earlyoom at gce instance startup 2018-09-08 13:59:45 -07:00
Michael Vines 9f413fd656 Establish net/scripts/... for better scoping 2018-09-08 13:59:45 -07:00
Michael Vines c3af0d9d25 Improve client.log 2018-09-07 21:20:00 -07:00
Michael Vines 932c994dc9 Use new bench-tps command-line args 2018-09-07 21:20:00 -07:00
Michael Vines ddd1871840 Install libssl1.1 for solanalabs/rust docker image compat 2018-09-07 19:57:41 -07:00
Michael Vines db825788fa Document how to get ssh access into CD testnets 2018-09-07 19:41:13 -07:00
Michael Vines 73a8441add /var/snap is not writable by most users 2018-09-07 17:41:20 -07:00
Rob Walker 51b27779c9
client changes for TODOs and looping (#1138)
* remove client.sh from snap
* default to ephemeral instead of ~/.config key
* rework CLI for bench-tps
* remote multinode-demo stuff from remote-client.sh
* remove multinode-demo from remote-sanity and localnet-sanity
2018-09-08 07:07:10 +09:00
Michael Vines 0d945e6a92 Groom testnet-sanity logging 2018-09-07 12:45:48 -07:00
Michael Vines 1090254ba5 Add datapoints for leader/validator start 2018-09-07 12:45:48 -07:00
Michael Vines ee682d5bc3 Move wallet-sanity.sh out of multinode-demo/ 2018-09-07 12:01:43 -07:00
Michael Vines 506a81e8cc Assume -y 2018-09-07 12:01:43 -07:00
Michael Vines dcb30a8489 Delete leader node first 2018-09-07 12:01:43 -07:00
Michael Vines a2631e89f6 Use consistent style 2018-09-07 12:01:43 -07:00
Michael Vines ab208ddb77 Clean up arg handling 2018-09-07 12:01:43 -07:00
Michael Vines 09a48d773a Run bench-tps in a tmux 2018-09-07 12:01:43 -07:00
Michael Vines d252f7f687 Revert "Default to 10 validators"
This reverts commit ed5fbaef06.
2018-09-07 12:01:43 -07:00
Michael Vines 53e16f68d9
Improve error handling 2018-09-06 20:57:05 -07:00
Michael Vines ed5fbaef06
Default to 10 validators 2018-09-06 20:46:49 -07:00
Michael Vines 66ff602659 Rewrite ci/testnet-{deploy,sanity}.sh in terms of net/ primitives 2018-09-06 19:54:39 -07:00
Michael Vines 5a57d9b5d9 de-y 2018-09-06 19:54:39 -07:00
Michael Vines 03e87e4169 Add more metrics 2018-09-06 19:54:39 -07:00
Michael Vines 31dee553d5 Split start/version reporting 2018-09-06 19:54:39 -07:00
Michael Vines 9ca6a2d25b Configure boot disk size 2018-09-06 19:54:39 -07:00
Michael Vines a3178c3bc7 Remove unused name tag 2018-09-06 19:54:39 -07:00
Michael Vines aa07bdfbaa Optionally suppress delete confirmation 2018-09-06 19:54:39 -07:00
Michael Vines eaef9be710 Clarify -f 2018-09-06 19:54:39 -07:00
Michael Vines cae345b416 Allow - in prefix 2018-09-06 19:54:39 -07:00
Michael Vines acb1171422 Add -e option 2018-09-06 19:54:39 -07:00
Rob Walker fdc48d521c
use USER instead of whoami (#1134)
* use USER instead of whoami

make gcloud_FigureRemoteUsername robust against unsolicited output
   (that I get on login ;) )

validate --prefix argument

* Update gcloud.sh
2018-09-07 00:18:05 +09:00
Michael Vines 6560b0e2cc s/whoami/id -un/ 2018-09-05 14:26:21 -07:00
Michael Vines ec38dba209 GCE leader nodes can now be provisioned with a static IP address 2018-09-05 14:26:21 -07:00
Michael Vines 8d87627a49
t 2018-09-05 09:09:50 -07:00
Michael Vines aacf27fb76 Add convienience link to current Snap log files 2018-09-05 09:02:02 -07:00
Michael Vines a51536d107 Add log tail hint 2018-09-05 09:02:02 -07:00
Michael Vines e2e569cb43 Set rsync url for local deployments 2018-09-05 09:02:02 -07:00
Michael Vines 017eb10e76 Add file header doc 2018-09-05 09:02:02 -07:00
Michael Vines f50aeb0e58 Always add perf-libs to LD_LIBRARY_PATH 2018-09-05 09:02:02 -07:00
Michael Vines 48c19d3100 Enable cargo features to be specified 2018-09-05 09:02:02 -07:00
Michael Vines aaf0a23134 Add Tips section 2018-09-05 09:02:02 -07:00
Michael Vines 89db85dbf9 Work around concurrent |gcloud compute ssh| terminal issue 2018-09-05 09:02:02 -07:00
Michael Vines e677cda027 Private IP networks now work, and are the default 2018-09-05 09:02:02 -07:00
Michael Vines db9219ccc8 Improve error monitoring 2018-09-05 09:02:02 -07:00
Michael Vines 06fd945f85 Set node config correctly 2018-09-05 09:02:02 -07:00
Michael Vines 6ad4a81123 s/_/-/g in filenames 2018-09-05 09:02:02 -07:00
Michael Vines bcaa0fdcb1 net/ can now deploy Snaps 2018-09-05 09:02:02 -07:00
Michael Vines 2cb1375217 Run gcloud_PrepInstancesForSsh in parallel 2018-09-05 09:02:02 -07:00
Michael Vines 9365a47d42 Employ a startup script 2018-09-05 09:02:02 -07:00
Michael Vines 6ffe205447 Add -g option 2018-09-05 09:02:02 -07:00
Michael Vines ec3e62dd58 Add net/ sanity 2018-09-05 09:02:02 -07:00
Michael Vines fa07c49cc9 net/ can now deploy Snaps 2018-09-05 09:02:02 -07:00
Michael Vines 7e2b65374d gce instance types are now configurable 2018-09-05 09:02:02 -07:00
Michael Vines 8e39465700 Drop .sh extension to hide from shellcheck 2018-09-05 09:02:02 -07:00
Michael Vines 43b4207101 Run oom-monitor in net/ testnets 2018-09-05 09:02:02 -07:00
Michael Vines ff991b87da Add support for deploying from non-Linux machines 2018-09-05 09:02:02 -07:00
Michael Vines 399caf343c Morph gce_multinode-based scripts into net/ 2018-09-05 09:02:02 -07:00