Proposal: log binary data for Solidity

The program_id is not needed on "Program return data: " because it always preceeded by the program invoke message, so no need to repeat the program id. Also rename this to "Program return: " since "data" is redundant.
2021-09-17 09:14:49 +01:00 · 2021-09-17 09:14:49 +01:00 · d714cf659c
parent 1fa8b6b966
commit d714cf659c
17 changed files with 432 additions and 10 deletions
--- a/Cargo.lock
+++ b/Cargo.lock
@ -5192,6 +5192,7 @@ name = "solana-program"
 version = "1.8.0"
 dependencies = [
 "assert_matches",
+ "base64 0.13.0",
 "bincode",
 "bitflags",
 "blake3 1.0.0",
--- a/docs/src/proposals/log_data.md
+++ b/docs/src/proposals/log_data.md
@ -0,0 +1,146 @@
+# Program log binary data
+
+## Problem
+
+There is no support for logging binary data in Solidity.
+
+### Events in Solidity
+
+In Solidity, events can be reported. These look like structures with zero or
+more fields, and can be emitted with specific values. For example:
+
+```
+event PaymentReceived {
+    address sender;
+    uint amount;
+}
+
+contract c {
+    function pay() public payable {
+        emit PaymentReceived(msg.sender, msg.value);
+    }
+}
+```
+
+Events are write-only from a Solidity/VM perspective and are written to
+the blocks in the tx records.
+
+Some of these fields can be marked `indexed`, which affects how the data is
+encoded. All non-indexed fields are eth abi encoded into a variable length
+byte array. All indexed fields go into so-called topics.
+
+Topics are fixed length fields of 32 bytes. There are a maximum of 4 topics;
+if a type does not always fit into 32 bytes (e.g. string types), then the topic
+is keccak256 hashed.
+
+The first topic is a keccak256 hash of the event signature, in this case
+`keccak256('PaymentReceived(address,uint)')`. The four remaining are available
+for `indexed` fields. The event may be declared `anonymous`, in which case
+the first field is not a hash of the signature, and it is permitted to have
+4 indexed fields.
+
+### Listening to events in a client
+
+The reason for the distinction between topics/indexed and regular fields is
+that it easier to filter on topics.
+
+```
+const Web3 = require('web3');
+const url = 'ws://127.0.0.1:8546';
+const web3 = new Web3(url);
+
+var options = {
+    address: '0xfbBE8f06FAda977Ea1E177da391C370EFbEE3D25',
+    topics: [
+        '0xdf50c7bb3b25f812aedef81bc334454040e7b27e27de95a79451d663013b7e17',
+        //'0x0000000000000000000000000d8a3f5e71560982fb0eb5959ecf84412be6ae3e'
+      ]
+};
+
+var subscription = web3.eth.subscribe('logs', options, function(error, result){
+    if (!error) console.log('got result');
+    else console.log(error);
+}).on("data", function(log){
+    console.log('got data', log);
+}).on("changed", function(log){
+    console.log('changed');
+});
+```
+
+In order to decode the non-indexed fields (the data), the abi of the contract
+is needed. So, the topic is first used to discover what event was used, and
+then the data can be decoded.
+
+### Ethereum Tx in block
+
+The transaction calls event logs. Here is a tx with a single event, with 3
+topics and some data.
+
+```
+{
+  "tx": {
+    "nonce": "0x2",
+    "gasPrice": "0xf224d4a00",
+    "gas": "0xc350",
+    "to": "0x6B175474E89094C44Da98b954EedeAC495271d0F",
+    "value": "0x0",
+    "input": "0xa9059cbb000000000000000000000000a12431d0b9db640034b0cdfceef9cce161e62be40000000000000000000000000000000000000000000000a030dcebbd2f4c0000",
+    "hash": "0x98a67f0a35ebc0ac068acf0885d38419c632ffa4354e96641d6d5103a7681910",
+    "blockNumber": "0xc96431",
+    "from": "0x82f890D638478d211eF2208f3c1466B5Abf83551",
+    "transactionIndex": "0xe1"
+  },
+  "receipt": {
+    "gasUsed": "0x74d2",
+    "status": "0x1",
+    "logs": [
+      {
+        "address": "0x6B175474E89094C44Da98b954EedeAC495271d0F",
+        "topics": [
+          "0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef",
+          "0x00000000000000000000000082f890d638478d211ef2208f3c1466b5abf83551",
+          "0x000000000000000000000000a12431d0b9db640034b0cdfceef9cce161e62be4"
+        ],
+        "data": "0x0000000000000000000000000000000000000000000000a030dcebbd2f4c0000"
+      }
+    ]
+  }
+}
+```
+
+### Further considerations
+
+In Ethereum, events are stored in blocks. Events mark certain state changes in
+smart contracts. This serves two purposes:
+
+ - Listen to events (i.e. state changes) as they happen by reading new blocks
+   as they are published
+ - Re-read historical events by reading old blocks
+
+So for example, smart contracts may emit changes as they happen but never the
+complete state, so the only way to recover the entire state of the contract
+is by re-reading all events from the chain. So an application will read events
+from block 1 or whatever block the application was deployed at and then use
+that state for local processing. This is a local cache and may re-populated
+from the chain at any point.
+
+## Proposed Solution
+
+Binary logging should be added to the program log. The program log should include the base64 encoded data (zero or more one permitted).
+
+So if we encoding the topics first, followed by the data then the event in the
+tx above would look like:
+```
+program data: 3fJSrRviyJtpwrBo/DeNqpUrpFjxKEWKPVaTfUjs8AAAAAAAAAAAAAAACC+JDWOEeNIR7yII88FGa1q/g1UQAAAAAAAAAAAAAAAKEkMdC522QANLDN/O75zOFh5ivk AAAAAAAAAAAAAAAAAAAAAAAAAAAAAACgMNzrvS9MAAA=
+```
+
+This requires a new system call:
+
+```
+void sol_log_data(SolBytes *fields, uint64_t length);
+```
+
+### Considerations
+
+- Should there be text field in the program log so we can have a little bit of
+  metadata on the binary data, to make it more human readable
--- a/programs/bpf/Cargo.lock
+++ b/programs/bpf/Cargo.lock
@ -2645,6 +2645,13 @@ dependencies = [
 "solana-program 1.8.0",
 ]

+[[package]]
+name = "solana-bpf-rust-log-data"
+version = "1.8.0"
+dependencies = [
+ "solana-program 1.8.0",
+]
+
 [[package]]
 name = "solana-bpf-rust-many-args"
 version = "1.8.0"
@ -3111,6 +3118,7 @@ dependencies = [
 name = "solana-program"
 version = "1.8.0"
 dependencies = [
+ "base64 0.13.0",
 "bincode",
 "bitflags",
 "blake3 1.0.0",
--- a/programs/bpf/Cargo.toml
+++ b/programs/bpf/Cargo.toml
@ -52,6 +52,7 @@ members = [
    "rust/deprecated_loader",
    "rust/dup_accounts",
    "rust/error_handling",
+    "rust/log_data",
    "rust/external_spend",
    "rust/finalize",
    "rust/instruction_introspection",
--- a/programs/bpf/build.rs
+++ b/programs/bpf/build.rs
@ -67,6 +67,7 @@ fn main() {
            "deprecated_loader",
            "dup_accounts",
            "error_handling",
+            "log_data",
            "external_spend",
            "finalize",
            "instruction_introspection",
--- a/programs/bpf/c/src/log_data/log_data.c
+++ b/programs/bpf/c/src/log_data/log_data.c
@ -0,0 +1,28 @@
+/**
+ * @brief Example C-based BPF program uses sol_log_data
+ */
+#include <solana_sdk.h>
+
+static const uint8_t return_data[] = { 0x08, 0x01, 0x44 };
+
+extern uint64_t entrypoint(const uint8_t *input) {
+  SolAccountInfo ka[1];
+  SolParameters params = (SolParameters) { .ka = ka };
+  SolBytes fields[2];
+
+  if (!sol_deserialize(input, &params, SOL_ARRAY_SIZE(ka))) {
+    return ERROR_INVALID_ARGUMENT;
+  }
+
+  // Generate two fields, split at the first 0 in the input
+  fields[0].addr = params.data;
+  fields[0].len = sol_strlen((char*)fields[0].addr);
+  fields[1].addr = fields[0].addr + fields[0].len + 1;
+  fields[1].len = params.data_len - fields[0].len - 1;
+
+  sol_set_return_data(return_data, sizeof(return_data));
+
+  sol_log_data(fields, 2);
+
+  return SUCCESS;
+}
--- a/programs/bpf/rust/log_data/Cargo.toml
+++ b/programs/bpf/rust/log_data/Cargo.toml
@ -0,0 +1,23 @@
+[package]
+name = "solana-bpf-rust-log-data"
+version = "1.8.0"
+description = "Solana BPF test program written in Rust"
+authors = ["Solana Maintainers <maintainers@solana.foundation>"]
+repository = "https://github.com/solana-labs/solana"
+license = "Apache-2.0"
+homepage = "https://solana.com/"
+documentation = "https://docs.rs/solana-bpf-rust-log-data"
+edition = "2018"
+
+[dependencies]
+solana-program = { path = "../../../../sdk/program", version = "=1.8.0" }
+
+[features]
+default = ["program"]
+program = []
+
+[lib]
+crate-type = ["lib", "cdylib"]
+
+[package.metadata.docs.rs]
+targets = ["x86_64-unknown-linux-gnu"]
--- a/programs/bpf/rust/log_data/src/lib.rs
+++ b/programs/bpf/rust/log_data/src/lib.rs
@ -0,0 +1,24 @@
+//! @brief Example Rust-based BPF program that uses sol_log_data syscall
+
+#![cfg(feature = "program")]
+
+use solana_program::{
+    account_info::AccountInfo, entrypoint, entrypoint::ProgramResult, log::sol_log_data,
+    program::set_return_data, pubkey::Pubkey,
+};
+
+entrypoint!(process_instruction);
+#[allow(clippy::cognitive_complexity)]
+fn process_instruction(
+    _program_id: &Pubkey,
+    _accounts: &[AccountInfo],
+    instruction_data: &[u8],
+) -> ProgramResult {
+    let fields: Vec<&[u8]> = instruction_data.split(|e| *e == 0).collect();
+
+    set_return_data(&[0x08, 0x01, 0x44]);
+
+    sol_log_data(&fields);
+
+    Ok(())
+}
--- a/programs/bpf/tests/programs.rs
+++ b/programs/bpf/tests/programs.rs
@ -52,8 +52,8 @@ use solana_transaction_status::{
    TransactionStatusMeta, TransactionWithStatusMeta, UiTransactionEncoding,
 };
 use std::{
-    cell::RefCell, collections::HashMap, env, fs::File, io::Read, path::PathBuf, str::FromStr,
-    sync::Arc,
+    cell::RefCell, collections::HashMap, convert::TryInto, env, fs::File, io::Read, path::PathBuf,
+    str::FromStr, sync::Arc,
 };

 /// BPF program file extension
@ -740,6 +740,58 @@ fn test_program_bpf_error_handling() {
    }
 }

+#[test]
+#[cfg(any(feature = "bpf_c", feature = "bpf_rust"))]
+fn test_return_data_and_log_data_syscall() {
+    solana_logger::setup();
+
+    let mut programs = Vec::new();
+    #[cfg(feature = "bpf_c")]
+    {
+        programs.extend_from_slice(&[("log_data")]);
+    }
+    #[cfg(feature = "bpf_rust")]
+    {
+        programs.extend_from_slice(&[("solana_bpf_rust_log_data")]);
+    }
+
+    for program in programs.iter() {
+        let GenesisConfigInfo {
+            genesis_config,
+            mint_keypair,
+            ..
+        } = create_genesis_config(50);
+        let mut bank = Bank::new_for_tests(&genesis_config);
+        let (name, id, entrypoint) = solana_bpf_loader_program!();
+        bank.add_builtin(&name, id, entrypoint);
+        let bank = Arc::new(bank);
+        let bank_client = BankClient::new_shared(&bank);
+
+        let program_id = load_bpf_program(&bank_client, &bpf_loader::id(), &mint_keypair, program);
+
+        bank.freeze();
+
+        let account_metas = vec![AccountMeta::new(mint_keypair.pubkey(), true)];
+        let instruction =
+            Instruction::new_with_bytes(program_id, &[1, 2, 3, 0, 4, 5, 6], account_metas);
+
+        let blockhash = bank.last_blockhash();
+        let message = Message::new(&[instruction], Some(&mint_keypair.pubkey()));
+        let transaction = Transaction::new(&[&mint_keypair], message, blockhash);
+
+        let result = bank.simulate_transaction(transaction.try_into().unwrap());
+
+        assert!(result.result.is_ok());
+
+        assert_eq!(result.logs[1], "Program data: AQID BAUG");
+
+        assert_eq!(
+            result.logs[3],
+            format!("Program return: {} CAFE", program_id)
+        );
+    }
+}
+
 #[test]
 fn test_program_bpf_invoke_sanity() {
    solana_logger::setup();
@ -1091,7 +1143,7 @@ fn test_program_bpf_invoke_sanity() {
            result.unwrap_err(),
            TransactionError::InstructionError(0, InstructionError::ProgramFailedToComplete)
        );
-     }
+    }
 }

 #[cfg(feature = "bpf_rust")]
--- a/programs/bpf_loader/src/lib.rs
+++ b/programs/bpf_loader/src/lib.rs
@ -939,7 +939,7 @@ impl Executor for BpfExecutor {
            drop(vm);
            let return_data = invoke_context.get_return_data();
            if let Some((program_id, return_data)) = return_data {
-                stable_log::program_return_data(&logger, program_id, return_data);
+                stable_log::program_return(&logger, program_id, return_data);
            }
            match result {
                Ok(status) => {
--- a/programs/bpf_loader/src/syscalls.rs
+++ b/programs/bpf_loader/src/syscalls.rs
@ -22,7 +22,7 @@ use solana_sdk::{
        allow_native_ids, blake3_syscall_enabled, check_seed_length,
        close_upgradeable_program_accounts, demote_program_write_locks, disable_fees_sysvar,
        libsecp256k1_0_5_upgrade_enabled, mem_overlap_fix, return_data_syscall_enabled,
-        secp256k1_recover_syscall_enabled,
+        secp256k1_recover_syscall_enabled, sol_log_data_syscall_enabled,
    },
    hash::{Hasher, HASH_BYTES},
    ic_msg,
@ -183,6 +183,11 @@ pub fn register_syscalls(
            .register_syscall_by_name(b"sol_get_return_data", SyscallGetReturnData::call)?;
    }

+    // Log data
+    if invoke_context.is_feature_active(&sol_log_data_syscall_enabled::id()) {
+        syscall_registry.register_syscall_by_name(b"sol_log_data", SyscallLogData::call)?;
+    }
+
    Ok(syscall_registry)
 }

@ -357,6 +362,8 @@ pub fn bind_syscall_context_objects<'a>(
        !invoke_context.is_feature_active(&disable_fees_sysvar::id());
    let is_return_data_syscall_active =
        invoke_context.is_feature_active(&return_data_syscall_enabled::id());
+    let is_sol_log_data_syscall_active =
+        invoke_context.is_feature_active(&sol_log_data_syscall_enabled::id());

    let invoke_context = Rc::new(RefCell::new(invoke_context));

@ -409,6 +416,16 @@ pub fn bind_syscall_context_objects<'a>(
        }),
    );

+    // sol_log_data
+    bind_feature_gated_syscall_context_object!(
+        vm,
+        is_sol_log_data_syscall_active,
+        Box::new(SyscallLogData {
+            invoke_context: invoke_context.clone(),
+            loader_id,
+        }),
+    );
+
    // Cross-program invocation syscalls
    vm.bind_syscall_context_object(
        Box::new(SyscallInvokeSignedC {
@ -2381,6 +2398,72 @@ impl<'a> SyscallObject<BpfError> for SyscallGetReturnData<'a> {
    }
 }

+// Log data handling
+pub struct SyscallLogData<'a> {
+    invoke_context: Rc<RefCell<&'a mut dyn InvokeContext>>,
+    loader_id: &'a Pubkey,
+}
+impl<'a> SyscallObject<BpfError> for SyscallLogData<'a> {
+    fn call(
+        &mut self,
+        addr: u64,
+        len: u64,
+        _arg3: u64,
+        _arg4: u64,
+        _arg5: u64,
+        memory_mapping: &MemoryMapping,
+        result: &mut Result<u64, EbpfError<BpfError>>,
+    ) {
+        let invoke_context = question_mark!(
+            self.invoke_context
+                .try_borrow()
+                .map_err(|_| SyscallError::InvokeContextBorrowFailed),
+            result
+        );
+
+        let budget = invoke_context.get_compute_budget();
+
+        question_mark!(
+            invoke_context
+                .get_compute_meter()
+                .consume(budget.syscall_base_cost),
+            result
+        );
+
+        let untranslated_fields = question_mark!(
+            translate_slice::<&[u8]>(memory_mapping, addr, len, self.loader_id),
+            result
+        );
+
+        question_mark!(
+            invoke_context
+                .get_compute_meter()
+                .consume(untranslated_fields.iter().map(|e| e.len() as u64).sum()),
+            result
+        );
+
+        let mut fields = Vec::with_capacity(untranslated_fields.len());
+
+        for untranslated_field in untranslated_fields {
+            fields.push(question_mark!(
+                translate_slice::<u8>(
+                    memory_mapping,
+                    untranslated_field.as_ptr() as *const _ as u64,
+                    untranslated_field.len() as u64,
+                    self.loader_id,
+                ),
+                result
+            ));
+        }
+
+        let logger = invoke_context.get_logger();
+
+        stable_log::program_data(&logger, &fields);
+
+        *result = Ok(0);
+    }
+}
+
 #[cfg(test)]
 mod tests {
    use super::*;
--- a/sdk/bpf/c/inc/sol/log.h
+++ b/sdk/bpf/c/inc/sol/log.h
@ -40,6 +40,11 @@ static void sol_log_array(const uint8_t *array, int len) {
  }
 }

+/**
+ * Print the base64 representation of some arrays.
+ */
+void sol_log_data(SolBytes *fields, uint64_t fields_len);
+
 /**
 * Prints the program's input parameters
 *
--- a/sdk/program/Cargo.toml
+++ b/sdk/program/Cargo.toml
@ -36,10 +36,12 @@ thiserror = "1.0"

 [target.'cfg(not(target_arch = "bpf"))'.dependencies]
 bitflags = "1.3.1"
+base64 = "0.13"
 curve25519-dalek = "3.0.0"
 libsecp256k1 = "0.6.0"
 rand = "0.7.0"
 solana-logger = { path = "../../logger", version = "=1.8.0" }
+itertools = "0.10.1"

 [dev-dependencies]
 static_assertions = "1.1.0"
--- a/sdk/program/src/log.rs
+++ b/sdk/program/src/log.rs
@ -85,6 +85,23 @@ extern "C" {
    fn sol_log_64_(arg1: u64, arg2: u64, arg3: u64, arg4: u64, arg5: u64);
 }

+/// Print some slices as base64
+///
+/// @param data - The slices to print
+pub fn sol_log_data(data: &[&[u8]]) {
+    #[cfg(target_arch = "bpf")]
+    {
+        extern "C" {
+            fn sol_log_data(data: *const u8, data_len: u64);
+        }
+
+        unsafe { sol_log_data(data as *const _ as *const u8, data.len() as u64) };
+    }
+
+    #[cfg(not(target_arch = "bpf"))]
+    crate::program_stubs::sol_log_data(data);
+}
+
 /// Print the hexadecimal representation of a slice
 ///
 /// @param slice - The array to print
--- a/sdk/program/src/program_stubs.rs
+++ b/sdk/program/src/program_stubs.rs
@ -6,6 +6,7 @@ use crate::{
    account_info::AccountInfo, entrypoint::ProgramResult, instruction::Instruction,
    program_error::UNSUPPORTED_SYSVAR, pubkey::Pubkey,
 };
+use itertools::Itertools;
 use std::sync::{Arc, RwLock};

 lazy_static::lazy_static! {
@ -51,7 +52,7 @@ pub trait SyscallStubs: Sync + Send {
    unsafe fn sol_memcpy(&self, dst: *mut u8, src: *const u8, n: usize) {
        // cannot be overlapping
        if dst as usize + n > src as usize && src as usize > dst as usize {
-            panic!("memcpy does not support oveerlapping regions");
+            panic!("memcpy does not support overlapping regions");
        }
        std::ptr::copy_nonoverlapping(src, dst, n as usize);
    }
@ -84,6 +85,9 @@ pub trait SyscallStubs: Sync + Send {
        None
    }
    fn sol_set_return_data(&mut self, _data: &[u8]) {}
+    fn sol_log_data(&self, fields: &[&[u8]]) {
+        println!("data: {}", fields.iter().map(base64::encode).join(" "));
+    }
 }

 struct DefaultSyscallStubs {}
@ -165,3 +169,7 @@ pub(crate) fn sol_get_return_data() -> Option<(Pubkey, Vec<u8>)> {
 pub(crate) fn sol_set_return_data(data: &[u8]) {
    SYSCALL_STUBS.write().unwrap().sol_set_return_data(data)
 }
+
+pub(crate) fn sol_log_data(data: &[&[u8]]) {
+    SYSCALL_STUBS.read().unwrap().sol_log_data(data)
+}
--- a/sdk/src/feature_set.rs
+++ b/sdk/src/feature_set.rs
@ -211,6 +211,10 @@ pub mod reduce_required_deploy_balance {
    solana_sdk::declare_id!("EBeznQDjcPG8491sFsKZYBi5S5jTVXMpAKNDJMQPS2kq");
 }

+pub mod sol_log_data_syscall_enabled {
+    solana_sdk::declare_id!("HYPs7jyJ3KwQFdDpuSzMtVKf1MLJDaZRv3CSWvfUqdFo");
+}
+
 lazy_static! {
    /// Map of feature identifiers to user-visible description
    pub static ref FEATURE_NAMES: HashMap<Pubkey, &'static str> = [
@ -259,6 +263,7 @@ lazy_static! {
        (return_data_syscall_enabled::id(), "enable sol_{set,get}_return_data syscall"),
        (fix_write_privs::id(), "fix native invoke write privileges"),
        (reduce_required_deploy_balance::id(), "reduce required payer balance for program deploys"),
+        (sol_log_data_syscall_enabled::id(), "enable sol_log_data syscall"),
        /*************** ADD NEW FEATURES HERE ***************/
    ]
    .iter()
--- a/sdk/src/process_instruction.rs
+++ b/sdk/src/process_instruction.rs
@ -1,5 +1,6 @@
 #![cfg(feature = "full")]

+use itertools::Itertools;
 use solana_sdk::{
    account::AccountSharedData,
    compute_budget::ComputeBudget,
@ -317,20 +318,37 @@ pub mod stable_log {
        ic_logger_msg!(logger, "Program log: {}", message);
    }

+    /// Emit a program data.
+    ///
+    /// The general form is:
+    ///
+    /// ```notrust
+    /// "Program data: <binary-data-in-base64>*"
+    /// ```
+    ///
+    /// That is, any program-generated output is guaranteed to be prefixed by "Program data: "
+    pub fn program_data(logger: &Rc<RefCell<dyn Logger>>, data: &[&[u8]]) {
+        ic_logger_msg!(
+            logger,
+            "Program data: {}",
+            data.iter().map(base64::encode).join(" ")
+        );
+    }
+
    /// Log return data as from the program itself. This line will not be present if no return
    /// data was set, or if the return data was set to zero length.
    ///
    /// The general form is:
    ///
    /// ```notrust
-    /// "Program return data: <program-id> <program-generated-data-in-base64>"
+    /// "Program return: <program-id> <program-generated-data-in-base64>"
    /// ```
    ///
-    /// That is, any program-generated output is guaranteed to be prefixed by "Program return data: "
-    pub fn program_return_data(logger: &Rc<RefCell<dyn Logger>>, program_id: &Pubkey, data: &[u8]) {
+    /// That is, any program-generated output is guaranteed to be prefixed by "Program return: "
+    pub fn program_return(logger: &Rc<RefCell<dyn Logger>>, program_id: &Pubkey, data: &[u8]) {
        ic_logger_msg!(
            logger,
-            "Program return data: {} {}",
+            "Program return: {} {}",
            program_id,
            base64::encode(data)
        );