CKB/使用 Rust 编写 RISC-V 程序

我们将在本文中验证是否能用 Rust 编写一个 RISC-V 程序. 本文的目的是探索有关 Rust 交叉编译的功能, 并在编译 Rust 程序时为 RISC-V 提供确切的步骤和配置.

环境准备

为 Rust 安装 riscv64imac-unknown-none-elf target

$ rustup target add riscv64imac-unknown-none-elf

创建项目

创建一个名称为 rust-riscv64imac-unknown-none-elf 的 Rust 项目

$ cargo new --bin rust-riscv64imac-unknown-none-elf

为此项目指定默认编译目标. 创建 .cargo/config.toml 并写入以下内容

[build]
target = "riscv64imac-unknown-none-elf"

构建最小可行程序

将以下内容写入 src/main.rs, 这就是一个最小的可成功编译的 Rust 程序.

#![no_main]
#![no_std]

#[panic_handler]
fn panic_handler(_: &core::panic::PanicInfo) -> ! {
    loop {}
}

使用 cargo build --release 编译它, 并使用 riscv64-unknown-elf-objdump 检查编译结果

$ riscv64-unknown-elf-objdump -d target/riscv64imac-unknown-none-elf/release/rust-riscv64imac-unknown-none-elf

target/riscv64imac-unknown-none-elf/release/rust-riscv64imac-unknown-none-elf:     file format elf64-littleriscv

结果显示它是一个空的二进制文件, 内部甚至没有一条指令.

添加系统调用

我们日常使用的大多数程序都会返回退出码, 提示您程序是执行成功还是失败: 通常情况下, 0 代表成功, 任意非 0 值代表程序执行失败. 现在我们要在此 Rust 程序中增加一个功能: 它永远返回 42 退出码.

将下面的代码写入 src/main.rs, 并重新编译.

fn syscall(mut a0: u64, a1: u64, a2: u64, a3: u64, a4: u64, a5: u64, a6: u64, a7: u64) -> u64 {
    unsafe {
        core::arch::asm!(
          "ecall",
          inout("a0") a0,
          in("a1") a1,
          in("a2") a2,
          in("a3") a3,
          in("a4") a4,
          in("a5") a5,
          in("a6") a6,
          in("a7") a7
        )
    }
    a0
}

// Linux system calls for the RISC-V architecture: Exit.
// See: https://github.com/westerndigitalcorporation/RISC-V-Linux/blob/master/riscv-pk/pk/syscall.h
fn syscall_exit(code: u64) -> ! {
    syscall(code, 0, 0, 0, 0, 0, 0, 93);
    loop {}
}

#[no_mangle]
pub extern "C" fn _start() {
    syscall_exit(42);
}

如果我们使用 riscv64-unknown-elf-objdump 重新检查它, 会发现它现在已经有代码块了.

$ riscv64-unknown-elf-objdump -d target/riscv64imac-unknown-none-elf/release/rust-riscv64imac-unknown-none-elf

target/riscv64imac-unknown-none-elf/release/rust-riscv64imac-unknown-none-elf:     file format elf64-littleriscv


Disassembly of section .text:

0000000000011120 <_ZN33rust_riscv64imac_unknown_none_elf12syscall_exit17ha3fdb87b6fbf14e5E>:
   11120:       05d00893                li      a7,93
   11124:       4581                    li      a1,0
   11126:       4601                    li      a2,0
   11128:       4681                    li      a3,0
   1112a:       4701                    li      a4,0
   1112c:       4781                    li      a5,0
   1112e:       4801                    li      a6,0
   11130:       00000073                ecall
   11134:       a001                    j       11134

0000000000011136 <_start>:
   11136:       02a00513                li      a0,42
   1113a:       00000097                auipc   ra,0x0
   1113e:       fe6080e7                jalr    -26(ra)

执行程序

我们可以使用 QEMU 模拟器来执行编译好的程序.

$ apt install qemu
$ qemu-riscv64 target/riscv64imac-unknown-none-elf/release/rust-riscv64imac-unknown-none-elf
$ echo $?
42

如果我们配置好相关的环境变量, 也可以直接使用 cargo run 来运行程序:

$ export CARGO_TARGET_RISCV64IMAC_UNKNOWN_NONE_ELF_RUNNER=qemu-riscv64
$ cargo run
$ echo $?
42

如果您希望省却环境变量的配置工作, 还可以选择将信息长久写入 .cargo/config.toml 配置文件中.

[target.riscv64imac-unknown-none-elf]
runner = "qemu-riscv64"

重构 panic_handler

在我们先前的最小可行程序中, panic_handler 是一个死亡循环: 这不是我们希望的. 重构它, 让它以 101 退出:

#[panic_handler]
fn panic_handler(_: &core::panic::PanicInfo) -> ! {
    // If the main thread panics it will terminate all your threads and end your program with code 101.
    // See: https://github.com/rust-lang/rust/blob/master/library/core/src/macros/panic.md
    syscall_exit(101)
}

为什么要在 panic 时让程序以 101 退出? 这其实并没有什么意义, 只是因为 Rust 目前就是这么做的: https://github.com/rust-lang/rust/blob/master/library/core/src/macros/panic.md

If the main thread panics it will terminate all your threads and end your program with code 101.

命令行参数回显

我们将继续完善现有代码. 我们尝试从命令行解析参数, 然后在标准输出回显所有接收的参数. 为了实现此功能, 我们首先实现一个新的系统调用:

// Linux system calls for the RISC-V architecture: Write.
// See: https://github.com/westerndigitalcorporation/RISC-V-Linux/blob/master/riscv-pk/pk/syscall.h
fn syscall_write(fd: u64, buf: *const u8, count: u64) -> u64 {
    // Stdin is defined to be fd 0, stdout is defined to be fd 1, and stderr is defined to be fd 2.
    syscall(fd, buf as u64, count, 0, 0, 0, 0, 64)
}

随后我们在 _start 函数中从栈中解析参数, 并将参数传递给 main 函数. 函数 main 的签名会类似于 C 语言中的 main 函数. 要阅读以下代码, 您需要一些前置知识:

#[no_mangle]
pub unsafe extern "C" fn _start() {
    core::arch::asm!(
        "lw a0,0(sp)", // Argc.
        "add a1,sp,8", // Argv.
        "li a2,0",     // Envp.
        "call main",
        "li a7, 93",
        "ecall",
    );
}

#[no_mangle]
unsafe extern "C" fn main(argc: u64, argv: *const *const i8) -> u64 {
    for i in 1..argc {
        let argn = core::ffi::CStr::from_ptr(argv.add(i as usize).read());
        let argn = argn.to_bytes();
        syscall_write(1, argn.as_ptr(), argn.len() as u64);
        if i != argc - 1 {
            syscall_write(1, [32].as_ptr(), 1);
        }
    }
    syscall_write(1, [10].as_ptr(), 1);
    return 0;
}

测试:

$ cargo run -- Hello World!
   Compiling rust-riscv64imac-unknown-none-elf v0.1.0 (/tmp/rust-riscv64imac-unknown-none-elf)
    Finished dev [unoptimized + debuginfo] target(s) in 0.09s
     Running `qemu-riscv64 target/riscv64imac-unknown-none-elf/debug/rust-riscv64imac-unknown-none-elf Hello 'World'\!''`
Hello World!

动态内存分配: 使用 Vec 和 String 简化处理流程

在 Rust 的 no-std 模式下, 由于 Vec 和 String 被定义在 std::vecstd::string, 因此您无法使用他们. 为了解决这个问题, 您首先需要定义一个 global_allocator. 将以下代码加入 Cargo.toml:

[dependencies]
linked_list_allocator = "*"

同时将以下代码写入 src/main.rs. 注意, 您可以根据硬件条件自定义 HEAPS 的大小, 在本示例中, 我仅仅为其定义了 1024 个字节.

static mut HEAPS: [u8; 1024] = [0; 1024];
#[global_allocator]
static ALLOC: linked_list_allocator::LockedHeap = linked_list_allocator::LockedHeap::empty();

alloc 包引入 VecString:

extern crate alloc;
use alloc::string::String;
use alloc::vec::Vec;

重构 main 函数. 在 main 函数的起始处, 我们首先需要初始化 ALLOC. 然后, 就可以在代码中使用 VecString 了.

#[no_mangle]
unsafe extern "C" fn main(argc: u64, argv: *const *const i8) -> u64 {
    unsafe {
        ALLOC.lock().init(HEAPS.as_mut_ptr(), 1024);
    }
    let mut args = Vec::new();
    for i in 1..argc {
        let argn = core::ffi::CStr::from_ptr(argv.add(i as usize).read());
        let argn = String::from(argn.to_string_lossy());
        args.push(argn);
    }
    let mut data = args.join(" ");
    data.push('\n');
    syscall_write(1, data.as_ptr(), data.len() as u64);
    return 0;
}

测试:

$ cargo run -- Hello World!
   Compiling rust-riscv64imac-unknown-none-elf v0.1.0 (/tmp/rust-riscv64imac-unknown-none-elf)
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `qemu-riscv64 target/riscv64imac-unknown-none-elf/debug/rust-riscv64imac-unknown-none-elf Hello 'World'\!''`
Hello World!

完整代码参考

#![no_main]
#![no_std]

extern crate alloc;
use alloc::string::String;
use alloc::vec::Vec;

static mut HEAPS: [u8; 1024] = [0; 1024];
#[global_allocator]
static ALLOC: linked_list_allocator::LockedHeap = linked_list_allocator::LockedHeap::empty();

#[panic_handler]
fn panic_handler(_: &core::panic::PanicInfo) -> ! {
    // If the main thread panics it will terminate all your threads and end your program with code 101.
    // See: https://github.com/rust-lang/rust/blob/master/library/core/src/macros/panic.md
    syscall_exit(101)
}

fn syscall(mut a0: u64, a1: u64, a2: u64, a3: u64, a4: u64, a5: u64, a6: u64, a7: u64) -> u64 {
    unsafe {
        core::arch::asm!(
          "ecall",
          inout("a0") a0,
          in("a1") a1,
          in("a2") a2,
          in("a3") a3,
          in("a4") a4,
          in("a5") a5,
          in("a6") a6,
          in("a7") a7
        )
    }
    a0
}

// Linux system calls for the RISC-V architecture: Exit.
// See: https://github.com/westerndigitalcorporation/RISC-V-Linux/blob/master/riscv-pk/pk/syscall.h
fn syscall_exit(code: u64) -> ! {
    syscall(code, 0, 0, 0, 0, 0, 0, 93);
    loop {}
}

// Linux system calls for the RISC-V architecture: Write.
// See: https://github.com/westerndigitalcorporation/RISC-V-Linux/blob/master/riscv-pk/pk/syscall.h
fn syscall_write(fd: u64, buf: *const u8, count: u64) -> u64 {
    // Stdin is defined to be fd 0, stdout is defined to be fd 1, and stderr is defined to be fd 2.
    syscall(fd, buf as u64, count, 0, 0, 0, 0, 64)
}

#[no_mangle]
pub unsafe extern "C" fn _start() {
    core::arch::asm!(
        "lw a0,0(sp)", // Argc.
        "add a1,sp,8", // Argv.
        "li a2,0",     // Envp.
        "call main",
        "li a7, 93",
        "ecall",
    );
}

#[no_mangle]
unsafe extern "C" fn main(argc: u64, argv: *const *const i8) -> u64 {
    unsafe {
        ALLOC.lock().init(HEAPS.as_mut_ptr(), 1024);
    }
    let mut args = Vec::new();
    for i in 1..argc {
        let argn = core::ffi::CStr::from_ptr(argv.add(i as usize).read());
        let argn = String::from(argn.to_string_lossy());
        args.push(argn);
    }
    let mut data = args.join(" ");
    data.push('\n');
    syscall_write(1, data.as_ptr(), data.len() as u64);
    return 0;
}

完整项目托管于 Github: https://github.com/libraries/rust-riscv64imac-unknown-none-elf

参考