From f243a96e018469dfa3b02058c7c0f291d6177cc7 Mon Sep 17 00:00:00 2001 From: Michael Vines Date: Wed, 10 Apr 2019 11:55:02 -0700 Subject: [PATCH] Remove testnet/metrics server debug info from book --- README.md | 41 +++++++++++++++++++++++++++++++ book/src/testnet-participation.md | 40 ------------------------------ 2 files changed, 41 insertions(+), 40 deletions(-) diff --git a/README.md b/README.md index c6ecda4584..8a63d7b2ac 100644 --- a/README.md +++ b/README.md @@ -125,6 +125,47 @@ can run your own testnet using the scripts in the `net/` directory. Edit `ci/testnet-manager.sh` +## Metrics Server Maintenance +Sometimes the dashboard becomes unresponsive. This happens due to glitch in the metrics server. +The current solution is to reset the metrics server. Use the following steps. + +1. The server is hosted in a GCP VM instance. Check if the VM instance is down by trying to SSH + into it from the GCP console. The name of the VM is ```metrics-solana-com```. +2. If the VM is inaccessible, reset it from the GCP console. +3. Once VM is up (or, was already up), the metrics services can be restarted from build automation. + 1. Navigate to https://buildkite.com/solana-labs/metrics-dot-solana-dot-com in your web browser + 2. Click on ```New Build``` + 3. This will show a pop up dialog. Click on ```options``` drop down. + 4. Type in ```FORCE_START=true``` in ```Environment Variables``` text box. + 5. Click ```Create Build``` + 6. This will restart the metrics services, and the dashboards should be accessible afterwards. + +## Debugging Testnet +Testnet may exhibit different symptoms of failures. Primary statistics to check are +1. Rise in Confirmation Time +2. Nodes are not voting +3. Panics, and OOM notifications + +Check the following if there are any signs of failure. +1. Did testnet deployment fail? + 1. View buildkite logs for the last deployment: https://buildkite.com/solana-labs/testnet-management + 2. Use the relevant branch + 3. If the deployment failed, look at the build logs. The build artifacts for each remote node is uploaded. + It's a good first step to triage from these logs. +2. You may have to log into remote node if the deployment succeeded, but something failed during runtime. + 1. Get the private key for the testnet deployment from ```metrics-solana-com``` GCP instance. + 2. SSH into ```metrics-solana-com``` using GCP console and do the following. + ```bash + sudo bash + cd ~buildkite-agent/.ssh + ls + ``` + 3. Copy the relevant private key to your local machine + 4. Find the public IP address of the AWS instance for the remote node using AWS console + 5. ```ssh -i ubuntu@``` + 6. The logs are in ```~solana\solana``` folder + + Benchmarking --- diff --git a/book/src/testnet-participation.md b/book/src/testnet-participation.md index 1f4d9000d6..0d17d14b3c 100644 --- a/book/src/testnet-participation.md +++ b/book/src/testnet-participation.md @@ -99,43 +99,3 @@ export SOLANA_METRICS_CONFIG="db=testnet-beta,u=${u:?},p=${p:?}" source scripts/configure-metrics.sh ``` Inspect for your contributions to our [metrics dashboard](https://metrics.solana.com:3000/d/U9-26Cqmk/testnet-monitor-cloud?refresh=60s&orgId=2&var-hostid=All). - -#### Metrics Server Maintenance -Sometimes the dashboard becomes unresponsive. This happens due to glitch in the metrics server. -The current solution is to reset the metrics server. Use the following steps. - -1. The server is hosted in a GCP VM instance. Check if the VM instance is down by trying to SSH - into it from the GCP console. The name of the VM is ```metrics-solana-com```. -2. If the VM is inaccessible, reset it from the GCP console. -3. Once VM is up (or, was already up), the metrics services can be restarted from build automation. - 1. Navigate to https://buildkite.com/solana-labs/metrics-dot-solana-dot-com in your web browser - 2. Click on ```New Build``` - 3. This will show a pop up dialog. Click on ```options``` drop down. - 4. Type in ```FORCE_START=true``` in ```Environment Variables``` text box. - 5. Click ```Create Build``` - 6. This will restart the metrics services, and the dashboards should be accessible afterwards. - -#### Debugging Testnet -Testnet may exhibit different symptoms of failures. Primary statistics to check are -1. Rise in Confirmation Time -2. Nodes are not voting -3. Panics, and OOM notifications - -Check the following if there are any signs of failure. -1. Did testnet deployment fail? - 1. View buildkite logs for the last deployment: https://buildkite.com/solana-labs/testnet-management - 2. Use the relevant branch - 3. If the deployment failed, look at the build logs. The build artifacts for each remote node is uploaded. - It's a good first step to triage from these logs. -2. You may have to log into remote node if the deployment succeeded, but something failed during runtime. - 1. Get the private key for the testnet deployment from ```metrics-solana-com``` GCP instance. - 2. SSH into ```metrics-solana-com``` using GCP console and do the following. - ```bash - sudo bash - cd ~buildkite-agent/.ssh - ls - ``` - 3. Copy the relevant private key to your local machine - 4. Find the public IP address of the AWS instance for the remote node using AWS console - 5. ```ssh -i ubuntu@``` - 6. The logs are in ```~solana\solana``` folder