分类 Rust 下的文章

在 Rust 中模拟 C++ 类的功能宏卫生性第六

作者: 立夏
时间: 2022-10-09
分类: C++,Rust
评论

上一节我们实现了虚表的定义及初始化操作，接下来要实现虚函数。回顾一下我们的类定义，及展开后的代码：

#[class]
pub struct Base
{
    ...
    virtual fn func1(&self) -> i32 { this.x }
    virtual fn func2(&self, i: i32) -> i32 { this.y + i }
}
// 展开后的虚表及虚函数实现
#[repr(C)]
pub struct BaseVTable
{
    func1: fn(this: &Base) -> i32,
    func2: fn(this: &Base, i: i32) -> i32,
}
impl Base
{
    ...
    fn func1_impl(this: &Base) -> i32 { this.data.x }
    fn func2_impl(this: &Base, i: i32) -> i32 { this.data.y + i }
    pub fn func1(&self) -> i32 { (self.vptr.func1)(self) }
    pub fn func2(&self, i: i32) -> i32 { (self.vptr.func2)(self, i) }
}

可以看到类定义中的函数原型的 &self 参数展开后到了虚表中变成了 this: &Base，这是因为如果我们仍然使用 self 关键字，在虚表类中会被理解为是 BaseVTable 类型。实现中，我们用 func1_impl 作为实现的方法名，然后用 func1 产生虚表调用。
此前为了生成虚表，我们定义了宏 func1_type、func2_type 宏，返回函数的原型，实现函数时，也需要函数原型，于是我们想到重用之前的宏，如下：

#[macro_export] macro_rules! func1_type
{
    ($($name:ident $block:block)?) =>
    { fn $($name)? (this: &Base) -> i32 $($block)? };
}
#[macro_export] macro_rules! func2_type
{
    ($($name:ident $block:block)?) =>
    { fn $($name)? (this: &Base, i: i32) -> i32 $($block)? };
}

宏不传任何参数时，可用于生成虚表，传递函数名和代码块就可以用来生成函数实现，如下：

func1_type!(func1_impl { this.data.x });
func2_type!(func2_impl { this.data.y + i });

看上去很完美，我们用 cargo expand 展开也可以得到正确的结果。但是编译器有不同的意见，如下：

error[E0425]: cannot find value `this` in this scope
  --> class_impl/src/lib.rs:50:30
   |
50 |     func1_type!(func1_impl { this.data.x });
   |                              ^^^^ not found in this scope
error[E0425]: cannot find value `this` in this scope
  --> class_impl/src/lib.rs:52:30
   |
52 |     func2_type!(func2_impl { this.data.y + i });
   |                              ^^^^ not found in this scope
error[E0425]: cannot find value `i` in this scope
  --> class_impl/src/lib.rs:52:44
   |
52 |     func2_type!(func2_impl { this.data.y + i });
   |                                            ^ not found in this scope

这里因为由 Rust 宏生成的变量有一个隐形的作用域，宏生成的变量不会污染宏展开处的上下文。这是 Rust 宏和 C++ 宏最大的不同了。
以func2_impl 为例：

fn                               // 宏展开
func2_impl                       // 宏参数
(this: &Base, i: i32) -> i32     // 宏展开
{ this.data.y + i }              // 宏参数

因为函数原型中的变量 this 和 i 是宏展开所得到，而函数体中使用的变量 this 和 i 是由宏参数传递进来，作用域不同，所以此 this 非彼 this，此 i 也非彼 i。
这也是 Rust 安全性的体现，我们不用担心宏生成的变量会意外的覆盖了我们正在使用的变量，从而导致非预期的行为发生。这个行为在 Rust 中被称之为卫生性。
但是有时我们需要在宏中生成变量，就如我们的 func2_type 宏一样，我们希望它生成一个可以编译的函数，办法也是有的，就是显示捕获变量，如下：

macro_rules! func1_type
{
    () => { fn (this: &Base) -> i32 };
    ($name:ident $this:ident $block:block) =>
    { fn $name ($this: &Base) -> i32 $block };
}
macro_rules! func2_type
{
    () => { fn (this: &Base, i: i32) -> i32 };
    ($name:ident $this:ident $i:ident $block:block) =>
    { fn $name ($this: &Base, $i: i32) -> i32 $block };
}

为了生成完整的函数，我们添加了新的捕获变量，但是当我们生成函数原型时，我们并不需要捕获变量来使代码复杂化。这导致我们的宏规则要拆分为两条，现在我们可以通过下面的方法来生成完整的函数：

func1_type!(func1_impl this { this.data.x });
func2_type!(func2_impl this i { this.data.y + i });

宏卫生性是 Rust 安全性的体现，让我们可以编写更安全的宏。但有时也会带给我们一些困扰，好在 Rust 有解决办法。
但是，我们本来希望重用已有的宏来简化代码，现在看来反而更加复杂了。这与我们的初衷不符，我们还是在实现虚函数时老老实实的再次生成函数原型好了。但是当我们为派生类重写的方法生成函数原型时，遇到了问题。回顾一下，派生类 Derive2 重载了 func2 和 func3 两个方法：

#[class]
pub struct Derive2 : Derive1
{
    // 好像哪里不对？
    override fn func2(&self, s: &str) -> Vec<i32> { ... }
    override fn func3(&self, f: f64) -> (i32, &str) { ... }
}

那么这两个方法的函数原型中的 this 应该是什么类型呢？从直觉来讲，应该是跟随派生类的类型，如下：

// 虚表定义函数原型：
func2: fn(this: &Derive2, s: &str) -> Vec<i32>,
func3: fn(this: &Derive2, f: f64) -> (i32, &str),
// 函数实现：
fn func2_impl(this: &Derive2, s: &str) -> Vec<i32> { ... }
fn func3_impl(this: &Derive2, f: f64) -> (i32, &str) { ... }

这样实现有两个问题：

我们没有重载的 func1 应该是什么类型呢？如果也跟随 Derive2 的类型，那么就无法用 Derive1::VTABLE::func1 直接赋值，因为类型不同。为了虚表能够正常工作，我们要生成额外的代码，带来了不必要的开销；
如果重载的方法将函数原型写错了，如上，我们不仅无法发现问题，而且会生成可以正常编译代码。但运行时安全性被破坏了。

所以我决定虚函数的类型在它第一次定义时确定，也就是用 virtual 关键字标记时确定。那么 Derive2 的虚表和实现应该如下面的代码所示：

// 虚表定义函数原型：
func2: fn(this: &Base, i: i32) -> i32,
func3: fn(this: &Derive1) -> i32,
// 函数实现：
fn func2_impl(this: &Base, s: &str) -> Vec<i32> { ... }
fn func3_impl(this: &Derive1, f: f64) -> (i32, &str) { ... }

这时编译器应该很容易发现 func2_impl 和 func2 的类型不一致，从而拒绝编译，以此保障我们生成的代码在运行时的安全性。但是派生类并没有足够的信息来知道两个方法的 this 参数应该是什么类型，这也是我们希望重用 xxx_type 宏的原因，但是在刚刚的实践中，重用 xxx_type 并没有给我们生成代码带来便利，也会使得函数原型错误的问题难以发现。这次我们要换一个方式。我们只需要知道 this 的类型即可，如下：

macro_rules! func1_type
{
    () => { fn (this: &Base) -> i32 };
    (this) => { Base };
}
macro_rules! func2_type
{
    () => { fn (this: &Base, i: i32) -> i32 };
    (this) => { Base };
}
macro_rules! func3_type
{
    () => { fn (this: &Derive1) -> i32 };
    (this) => { Derive1 };
}

于是，我们这样实现重写的虚函数：

// 虚表定义函数原型：
func2: func2_type!(),
func3: func3_type!(),
// 函数实现：
fn func2_impl(this: &func2_type!(this), s: &str) -> Vec<i32> { ... }
fn func3_impl(this: &func3_type!(this), f: f64) -> (i32, &str) { ... }

因为那个粗心的程序员把函数原型写错了，编译器会提示类型不匹配。虽然基于宏展开代码错误信息的可读性不太友好，但总好过不报错。
程序员看到错误信息，修改了函数原型为正确的形式：

#[class]
pub struct Derive2 : Derive1
{
    w: i32,
    override fn func2(&self, i: i32) -> i32 { self.w + ... }
    override fn func3(&self) -> i32 { self.w + ... }
}

现在开始实现函数了：

fn func2_impl(this: &func2_type!(this), i: i32) -> i32
{
    this.data.w + ...
}
fn func3_impl(this: &func3_type!(this)) -> i32
{
    this.data.w + ...
}

我们刚刚解决了虚表初始化及函数原型不匹配的问题，新的问题又来了。
在重写虚方法实现中访问了 Derive2 的数据成员 w，在派生类中访问自己的数据成员本来不是什么问题，但正如我们上文所讲的，func2_impl 中的 this 类型是 &Base，而 func3_impl 中的 this 类型是 &Derive1，他们都无法访问 Derive2 的任何数据。
其实我们都知道上面的 this 都是 &Derive2 类型，只是为了函数原型兼容，才写成了基类的类型，那么这事就好办了。

fn func2_impl(this: &func2_type!(this), i: i32) -> i32
{
    let this: &Self = unsafe { reinterpret_cast(this) }; 
    this.data.w + ...
}
fn func3_impl(this: &func3_type!(this)) -> i32
{
    let this: &Self = unsafe { reinterpret_cast(this) }; 
    this.data.w + ...
}

我们之前实现了一个无条件强制类型转换的函数，用在这里再合适不过了。而且 reinterpret_cast 的转换是 0 开销的，也不担心有额外的代价。
好像函数实现还不太完整，下一节我们把 ... 的部分补上。

在 Rust 中模拟 C++ 类的功能宏回调模式第五

作者: 立夏
时间: 2022-09-02
分类: C++,Rust
评论

在上一节中，我们希望将一个宏的展开结果，作为参数传递给另一个宏，但是编译器阻止了我们。在宏编程的道路上从来都没有捷径可以走，在这一点上 Rust 和 C++ 是相同的。
既然 Rust 无法将宏的展开结果作为另外一个宏的参数，那么我们在宏内部调用另外一个宏不就可以了吗？

macro_rules! base_vtable_fields
{
    () => { define_struct!(Base func1 func2); };
}
macro_rules! derive1_vtable_fields
{
    () => { define_struct!(Derive1 func1 func2 func3); };
}

如此一来，问题又回到了原点，派生类不知道基类的有哪些虚方法，也就是说 derive1_vtable_fields 的实现必须要调用base_vtable_fields 才可以。于是，最终的宏被定义成下面的样子：宏的用法有了变化，所以宏名称也要适应变化，变量 $name 用来传递结构体名字，变量 $field 用于派生类扩展结构体成员。

macro_rules! base_define_vtable
{
    ($name: ident $($field: ident)*) =>
    { define_struct!($name func1 func2 $($field)*); };
}
macro_rules! derive1_define_vtable
{
    ($name: ident $($field: ident)*) =>
    { base_define_vtable!($name func3 $($field)*); };
}
macro_rules! derive2_define_vtable
{
    ($name: ident $($field: ident)*) =>
    { derive1_define_vtable!($name $($field)*); };
}
base_define_vtable!(BaseVTable);
derive1_define_vtable!(Derive1VTable);
type Derive2VTable = Derive1VTable;

因为 Derive2 没有定义新的虚函数，所以它和 Derive1 的虚表是一样的，因此 Derive2 的虚表直接重用了 Derive1 的虚表。但 derive2_define_vtable 宏必不可少，因为派生类还需要它。

接下来就要解决虚表的初始化问题。虚表的初始化相对来说，复杂一些，我们要考虑三种情况：
1.virtual 方法；
2.override 方法；
3.基类定义的方法而在派生类中没有重写的方法。
我们可以这样定义宏 init_vtable

macro_rules! init_vtable
{
    ($name:ident $(: $base:ident)?, $($base_vfns:ident)*, $($new_vfns:ident)*, $($over_vfns:ident)*) => {...};
}

其中 $name 为类名，$base 为基类名，是可选的，$base_vfns 为基类的虚函数列表，$new_vfns 为派生类新增的虚函数列表，$over_vfns 为派生类重写的虚函数列表。在正式初始化之前，要做一些基本的检查：
1.如果没有基类，那么基类的虚函数列表也不应该有；
2.如果有基类，那么基类的虚函数表不可以没有；
3.派生类新增的虚函数不可以和基类的虚函数重名，如果有，要求用户改用 override 关键字；
4.派生类重写的虚函数如果在基类的虚函数列表中不存在，要求用户改用 virtual 关键字。
我们还没有处理重写方法的函数签名检查，目前我们还做不到这一点，不过也不用担心，如果函数签名不匹配，编译器会报错。
做完这些事情之后，我们遍历基类的虚函数列表，如果虚函数被重写，则用重写的函数的指针来初始化，否则用基类的虚表来初始化它，然后遍历新增虚函数列表，用实现的函数指针初始化。
看到这里，你们应该也发现了：规则宏做不了这样的事情，要用函数式宏，限于篇幅具体代码就不贴出来了。
接下来就是如何将参数传递给 init_vtable 宏，有了上面实现定义虚表宏的经验，实现初始化操作也就不难了：

macro_rules! base_init_vtable
{
    ($name:ident $(: $base:ident)?, $($vfns:ident)*, $($nvfns:ident)*, $($ofns:ident)*) =>
    { init_vtable!($name $(: $base)?, func1 func2 $($vfns)*, $($nvfns)*, $($ofns)*); };
}
macro_rules! derive1_init_vtable
{
    ($name:ident $(: $base:ident)?, $($vfns:ident)*, $($nvfns:ident)*, $($ofns:ident)*) =>
    { base_init_vtable!($name $(: $base)?, func3 $($vfns)*, $($nvfns)*, $($ofns)*); };
}
macro_rules! derive2_init_vtable
{
    ($name:ident $(: $base:ident)?, $($vfns:ident)*, $($nvfns:ident)*, $($ofns:ident)*) =>
    { derive1_init_vtable!($name $(: $base)?, $($vfns)*, $($nvfns)*, $($ofns)*); };
}
init_vtable!(Base,, func1 func2,);                 // 初始化 BaseVTable
base_init_vtable!(Derive1 : Base,, func3, func1);  // 初始化 Derive1VTable
derive1_init_vtable!(Derive2 : Derive1,,, func2);  // 初始化 Derive2VTable

我们为每个类都生成了相应的 xxx_init_vtable 宏，但初始化类自己的虚表时却要调用基类的初始化宏，换句话说，每个类的初始化宏都是为派生类服务的。
为了将一个宏的展开结果传递给另外一个宏，我们绕的圈子太远了，但我们又不得不绕这样的圈子。但是上面的宏定义也确实过于复杂了，而且很多参数又是原样传递的，得想办法优化一下，我们发现在几个宏定义中，只有 vfns 参数发生变化，我们将不变的参数压缩一下：

macro_rules! base_init_vtable
{
    ($($name:ident):+, $($vfns:ident)*, $($params:tt)*) =>
    { init_vtable!($($name):+, func1 func2 $($vfns)*, $($params)*); };
}
macro_rules! derive1_init_vtable
{
    ($($name:ident):+, $($vfns:ident)*, $($params:tt)*) =>
    { base_init_vtable!($($name):+, func3 $($vfns)*, $($params)*); };
}
macro_rules! derive2_init_vtable
{
    ($($params:tt)*) => { derive1_init_vtable!($($params)*); };
}

我们将头部的 $name:ident $(: $base:ident)? 压缩为 $($name:ident):+ ，这一点容易理解，当然这里的语义也不那么严格了，比如，调用者可以传递 x:y:z 这样的参数，但也不必过于担心，毕竟最终调用的 init_vtable 宏会拒绝这样的参数。
我们将尾部的 $($nvfns:ident), $($ofns:ident) 压缩为 $($params:tt)* ，你可以已经注意到了，我们用了一个新的类型 tt 用于匹配剩余的参数，tt 意为标记树，可以匹配任何宏参数，且不改变语义，因此用它来匹配剩余参数，最合适不过了。
其中 derive2_init_vtable 宏由于所有参数都是原样传递，所有参数都压缩为 $($params:tt)* 一个参数。受此启发，我们还可以更进一步优化，只要我们将 init_vtable 宏的传参顺序更改一下，我们将经常会发生变化的部分提前，作为第一个参数，如下：

macro_rules! init_vtable
{
    ($($base_vfns:ident)*, $name:ident $(: $base:ident)?, $($new_vfns:ident)*, $($over_vfns:ident)*) => {...};
}

那么上面的宏就可以进一步简化为下面的形式，因为参数的顺序改变了，调用方式也有变化：

macro_rules! base_init_vtable
{
    ($($params:tt)*) => { init_vtable!(func1 func2 $($params)*); };
}
macro_rules! derive1_init_vtable
{
    ($($params:tt)*) => { base_init_vtable!(func3 $($params)*); };
}
macro_rules! derive2_init_vtable
{
    ($($params:tt)*) => { derive1_init_vtable!($($params)*); };
}
init_vtable!(,Base, func1 func2,);                 // 初始化 BaseVTable
base_init_vtable!(,Derive1 : Base, func3, func1);  // 初始化 Derive1VTable
derive1_init_vtable!(,Derive2 : Derive1,, func2);  // 初始化 Derive2VTable

我们把 define_struct 宏的参数顺序也该一下：

macro_rules! define_struct
{
    ($($field:ident)*, $name:ident) => { ... };
}

然后 xxx_define_vtable 宏，也可以优化成下面的样子：

macro_rules! base_define_vtable
{
    ($($params:tt)*) => { define_struct!(func1 func2 $($params)*); };
}
macro_rules! derive1_define_vtable
{
    ($($params:tt)*) => { base_define_vtable!(func3 $($params)*); };
}
macro_rules! derive2_define_vtable
{
    ($($params:tt)*) => { derive1_define_vtable!($($params)*); };
}
base_define_vtable!(, BaseVTable);
derive1_define_vtable!(, Derive1VTable);
type Derive2VTable = Derive1VTable;

细心的你可能已经发现 xxx_define_vtable 和 xxx_init_vtable 两组宏传参的过程是相同的，只是最终调用的宏不同，现在我们将这唯一的不同也提取出来，作为回调参数，从而将两组宏合并为一组宏，如下：

macro_rules! base_vtable_option
{
    ($callback:ident $($params:tt)*) =>
    { $callback!(func1 func2 $($params)*); };
}
macro_rules! derive1_vtable_option
{
    ($callback:ident $($params:tt)*) =>
    { base_vtable_option!($callback func3 $($params)*); };
}
macro_rules! derive2_vtable_option
{
    ($callback:ident $($params:tt)*) =>
    { derive1_vtable_option!($callback:ident $($params)*); };
}

宏定义中多了一个回调参数，等下我们再想办法优化下，现在我们可以通过 xxx_vtable_option 系列宏来实现定义虚表和初始化虚表两组操作。

base_vtable_option!(define_struct, BaseVTable);
derive1_vtable_option!(define_struct, Derive1VTable);
derive2_vtable_option!(define_struct, Derive2VTable);

init_vtable!(, Base, func1 func2,);                 // 初始化 BaseVTable
base_vtable_option!(init_vtable, Derive1 : Base, func3, func1);  // 初始化 Derive1VTable
derive1_vtable_option!(init_vtable, Derive2 : Derive1,, func2);  // 初始化 Derive2VTable
derive2_vtable_option!(init_vtable, Derive3 : Derive2, func4, func1);  // 假设 Derive3 存在

定义虚表的操作看起来没什么问题，但是初始化基类虚表和派生类虚表的调用的宏格式不一致。带着这个问题，和多一个参数的问题，我们再进一步对宏定义进行优化。和之前的优化思路是一样的，将可变的部分提前，作为第一个参数，于是回调参数只能作为第二个参数了：

macro_rules! vtable_option
{
    ($($func:ident)*, $callback:ident $($params:tt)*) =>
    { $callback!($($func)* $($params)*); };
}
macro_rules! base_vtable_option
{
    ($($params:tt)*) => { vtable_option!(func1 func2 $($params)*); };
}
macro_rules! derive1_vtable_option
{
    ($($params:tt)*) => { base_vtable_option!(func3 $($params)*); };
}
macro_rules! derive2_vtable_option
{
    ($($params:tt)*) => { derive1_vtable_option!($($params)*); };
}

我们新增了一个宏 vtable_option 来处理参数的顺序，其他的宏只需要按部就班传递参数即可，我们再看一下宏的调用：

base_vtable_option!(,define_struct, BaseVTable);
derive1_vtable_option!(,define_struct, Derive1VTable);
derive2_vtable_option!(,define_struct, Derive2VTable);

vtable_option!(,init_vtable, Base, func1 func2,);                      // 初始化 BaseVTable
base_vtable_option!(,init_vtable, Derive1 : Base, func3, func1);       // 初始化 Derive1VTable
derive1_vtable_option!(,init_vtable, Derive2 : Derive1,, func2);       // 初始化 Derive2VTable
derive2_vtable_option!(,init_vtable, Derive3 : Derive2, func4, func1); // 假设 Derive3 存在

所有虚表的初始化操作格式也都一致了。
虽然 Rust 不支持将一个宏的展开结果直接传递给另一个宏使用，但我们通过回调模式找到了一条极简的路。但同时极简也意味着极复杂，宏的定义简单了，但宏调用代码也越发的难以理解了。
至此，挡在我们目标面前最大的一座山已经翻过去了。接下来我们来实现虚方法和重写方法。

在 Rust 中模拟 C++ 类的功能在规则宏中拼接标识符第四

作者: 立夏
时间: 2022-08-19
分类: C++,Rust
评论

在上一节，我们遇到了点问题，在生成派生类代码时，我们拿不到基类的定义，也就无法为派生类生成虚表。现在我们来解决它。如果我们能将基类虚表的信息存储于一个变量中，那么就可以在派生类虚表中使用它，那么怎么定义这个变量好呢？为了不增加运行时负担，我们可以用宏来做这件事，具体来说是规则宏。

macro_rules! base_vtable_fields
{
    () =>
    {
        func1: fn(this: &Base) -> i32,
        func2: fn(this: &Base, i: i32) -> i32
    };
}
macro_rules! derive1_vtable_fields
{
    () =>
    {
        base_vtable_fields!(),
        func3: fn(this: &Derive1) -> i32
    };
}

有了宏，我们就可以这样定义虚表

pub struct BaseVTable
{
    base_vtable_fields!(),
}
pub struct Derive1VTable
{
    derive1_vtable_fields!(),
}

从 C++ 的角度来看，这样完全没有任何问题，但是我们拿着这样的代码去编译时，编译器会报错。

error: expected `:`, found `!`
  --> class_impl/src/lib.rs:33:27
   |
33 |         base_vtable_fields!(),
   |                           ^ expected `:`

这也是 Rust 宏和 C++ 宏不一样的地方，在 C++ 中宏可以用在任何地方，宏展开只是编译器预处理过程做的事情，只要展开后的代码符合 C++ 的语法规则，就能够正常编译。而在 Rust 中，Rust 编译器会在宏展开前进行一次语法检查，Rust 语法规定有些地方可以使用宏，而有些地方不可以，就像这里的情况一样，结构体成员名不可以用宏展开。Rust 的宏更强大，但使用也更加受限。
既然这个方法不行，我们就换个思路，仅在成员类型处进行宏展开：

macro_rules! func1_type { () => { fn(this: &Base) -> i32 }; }
macro_rules! func2_type { () => { fn(this: &Base, i: i32) -> i32 }; }
struct BaseVTable
{
    func1: func1_type!(),
    func2: func2_type!(),
}
macro_rules! func3_type { () => { fn(this: &Derive1) -> i32 }; }
struct Derive1VTable
{
    func1: func1_type!(),
    func2: func2_type!(),
    func3: func3_type!(),
}

如此一来，我们只需要知道函数名列表，就可以构造出虚表结构体了，如下：

macro_rules! define_struct
{
    ( $name:ident $($field:ident)* ) =>
    {
        #[repr(C)]
        pub struct $name
        {
            $field: ${field}_type!(),
        }
    };
}

很不幸，上面的宏还不能工作，原因在于我们需要拼接两个标识符，才能得到函数类型，而 Rust 不支持 ${field}_type 这样的语法，C++ 的 ## 运算符这里也不支持，但是在宏中拼接标识符的需求又很常见，因此 Rust 提供了 concat_idents 宏，但又限制这个宏只能在日构建版本的编译器和工具链中使用。心真的累。
既然 Rust 不让我们用 concat_idents，我们就自己实现一个，规则宏做不了这件事，我们用函数式宏来实现：

#[proc_macro]
pub fn concat_ident2(input: TokenStream) -> TokenStream
{
    let concat_ident2 = syn::parse_macro_input!(input as concat::ConcatIdent2);
    let gen = quote!{ #concat_ident2 };
    gen.into()
}
pub struct ConcatIdent2
{
    ident1: Ident,
    ident2: Ident,
}
impl Parse for ConcatIdent2
{
    fn parse(input: ParseStream) -> Result<Self>
    {
        let ident1 = input.parse()?;
        let ident2 = input.parse()?;
        Ok(ConcatIdent { ident1, ident2 })
    }
}
impl ToTokens for ConcatIdent
{
    fn to_tokens(&self, tokens: &mut TokenStream)
    {
        let new_ident = self.ident1.to_string() + self.ident.to_string().as_str();
        let new_ident = Ident::new(new_ident.as_str(), Span::call_site());
        new_ident.to_tokens(tokens);
    }
}

有了 concat_ident2，我们可以实现拼接操作符的操作了，重新定义 define_struct 宏如下：

macro_rules! define_struct
{
    ( $name:ident $($field:ident)* ) =>
    {
        #[repr(C)]
        pub struct $name
        {
            $field: concat_ident2!($field _type)!(),
        }
    };
}

我来解释一下 concat_ident2!($field _type)!() 这条语句，首先 concat_ident2!($field _type) 完成拼接操作，得到 func1_type func2_type 这样的操作符，然后再调用宏 func1_type!() func2_type!()，虽然难看了点，但好歹能表达编码的意图。
好消息是，不只是我们觉得这样的写法丑，编译器也觉得，所以还得再改，这次我们拼接完之后，直接生成宏调用调用代码，宏名改为 concat_and_call，params 为宏的参数，TokenStream 类型，反正是原样输出，用 TokenStream 类型，省去了解析和重新格式化的过程：

pub struct ConcatAndCall
{
    ident1: Ident,
    ident2: Ident,
    params: TokenStream,
}
...
impl ToTokens for ConcatAndCall
{
    fn to_tokens(&self, tokens: &mut TokenStream)
    {
        let new_ident = self.ident1.to_string() + self.ident2.to_string().as_str();
        let new_ident = Ident::new(new_ident.as_str(), Span::call_site());
        new_ident.to_tokens(tokens);
        token::Bang::default().to_tokens(tokens);
        token::Brace::default().surround(tokens, |tokens| self.params.to_tokens(tokens));
    }
}

这时我们可以重新实现 define_struct 宏了。

macro_rules! define_struct
{
    ( $name:ident $($field:ident)* ) =>
    {
        #[repr(C)]
        pub struct $name
        {
            $field: concat_and_call!($field _type),
        }
    };
}
define_struct!(BaseVTable func1 func2);
define_struct!(Derive1VTable func1 func2 func3);

如此，我们将类名和函数名列表传递给 define_struct 宏，就可以构造结构体了，如下：

macro_rules! base_vtable_fields { () => { func1 func2 }; }
macro_rules! derive1_vtable_fields { () => { base_vtable_fields!() func3 }; }
define_struct!(BaseVTable base_vtable_fields!());
define_struct!(Derive1VTable derive1_vtable_fields!());

这样的想法很好，但是编译器并不买帐。由于 Rust 规则宏可以匹配 ! 操作符，如下：

macro_rules! macro_test { ( $name:ident!() ) => { $name!() }; }
macro_test!(base_vtable_fields!());

所以 base_vtable_fields!() 并不会在 define_struct! 之前展开，也就是说，我们无法将一个宏的返回值作为参数传给另一个宏。这也是 Rust 宏和 C++ 宏的第二个不同之处。
到这里似乎又走到了死胡同，在下一节我们将走出这个死胡同。

在 Rust 中模拟 C++ 类的功能用属性宏来生成代码第三

作者: 立夏
时间: 2022-08-07
分类: C++,Rust
评论

之前两节，对于 C++ 类的手工验证阶段已经结束，接下来就要用宏来自动化生成代码。
回顾一下最初的想法：

#[class]
pub struct Base
{
    x: i32,
    y: i32,
    pub fn new(x: i32, y: i32) -> Self { Base{ x, y } }
    virtual fn func1(&self) -> i32 { this.x }
    virtual fn func2(&self, i: i32) -> i32 { this.y + i }
}
#[class]
pub struct Derive1 : Base
{
    z: i32,
    pub fn new(x: i32, y: i32, z: i32) -> Self { Derive1 { Base::new(x, y), z} }
    override fn func1(&self) -> i32 { 0 }
    virtual fn func3(&self) -> i32 { this.z }
}
#[class]
pub struct Derive2 : Derive1
{
    override fn func2(&self, i: i32) -> i32 { Base::func2(self, i) + 200 }
    override fn func3(&self) -> i32 { Derive1::func3(self) + 200 }
}

从上面的定义来看，我们需要实现属性宏，三件套 proc_macro2, syn, quote 必不可少，都要添加到 Cargo.toml 的依赖列表：

[package]
name = "class_macro"
version = "0.1.0"
edition = "2021"

[lib]
proc-macro = true

[dependencies]
proc-macro2 = "1.0"
syn = { version = "1.0", features = ["full"] }
quote = "1.0"

其中，syn 需要指定 features 为 full，否则缺少一些特性，下面实现属性宏 class：

extern crate proc_macro;
use crate::proc_macro::TokenStream;
use quote::quote;
use syn;
mod class_def;

#[proc_macro_attribute]
pub fn class(_attr: TokenStream, input: TokenStream) -> TokenStream
{
    let class_def = syn::parse_macro_input!(input as class_def::ClassDef);
    let gen = quote! { #class_def };
    gen.into()
}

初次接触 syn 会觉得毫无头绪，我建议仔细学习 syn 的源码，syn 源码是一个大宝库，里面实现了 Rust 语言完整的语法定义及解析代码，可供开发者重用，而且还能够学习到一些文档和教科书上不曾提及的语法细节。
我们的类定义是在一个结构体的基础上，添加了基类，将方法写入结构体内部，并且增加了两个关键字 virtual 和 override。为了描述我们的类定义，我们参考 syn::ItemStruct 定义了 class_def::ClassDef。如下：

pub enum Virtuals
{
    Virtual,
    Override,
    Inherited,
}
pub struct VirtualFn
{
    virs: Virtuals,
    itemfn: ImplItemMethod,
}
pub struct ClassDef
{
    attrs: Vec<Attribute>,
    vis: Visibility,
    struct_token: Token![struct],
    ident: Ident,
    generics: Generics,
    base_class: Option<Ident>,
    base_generics: Option<Generics>,
    fields: FieldsNamed,
    vfns: Vec<VirtualFn>,
    semi_token: Option<Token![;]>,
}

为了能够将 TokenStream 解析为 ClassDef，syn 会调用要求 ClassDef 实现 Parse trait 的 parse(...) 方法, 方法实现如下，鉴于篇幅的原因这里就不全部展开了:

impl Parse for ClassDef
{
    fn parse(input: ParseStream) -> Result<Self>
    {
        let attrs = input.call(Attribute::parse_outer)?;
        let vis = input.parse()?;
        let struct_token = input.parse()?;
        let ident: Ident = input.parse()?;
        let generics = input.parse()?;
        let mut base_class: Option<Ident> = None;
        let mut base_generics: Option<Generics> = None;
        if let Ok(_) = input.parse::<Token![:]>()
        {
            base_class = Some(input.parse()?);
            base_generics = Some(input.parse()?);
        }
        let where_clause = Self::parse_where_clause(&input)?;
        let (fields, vfns) = Self::parse_fields_vfns(&input, ident.to_string().as_str())?;

        let generics = Generics { where_clause, .. generics };
        Ok(ClassDef {attrs, vis, struct_token, ident, generics, base_class, base_generics, fields, vfns})
    }
}

到这里我们已经将输入的 TokenStream 解析为我们的 ClassDef，接下来就要自动化生成类代码了。由于所需生成的代码过于复杂，无法在 quote!() 宏描述，故我将 #class_def 作为唯一的输入，并为 ClassDef 实现 ToTokens trait 的 to_tokens 方法，大致如下：

impl ToTokens for ClassDef
{
    fn to_tokens(&self, tokens: &mut TokenStream)
    {
        let helper = ...
        self.class_vtable_to_tokens(tokens, &helper);
        self.class_data_to_tokens(tokens, &helper);
        self.class_def_to_tokens(tokens, &helper);
        self.class_data_impl_to_tokens(tokens, &helper);
        self.class_impl_to_tokens(tokens, &helper);
    }
}

鉴于篇幅，具体的代码就不展开了。
我们生成基类代码的时候，一切都很顺利，但当我们生成派生类代码时，问题来了，基类的虚表定义如下：

pub struct BaseVTable
{
    func1: fn(this: &Base) -> i32,
    func2: fn(this: &Base, i: i32) -> i32,
}

这里没有问题，因为基类知道它所需要的所有虚函数的信息，生成虚表并不难，但是派生类并不知道所有的虚函数信息，如下，Derive1 类重写了方法 func2 并增加了新的虚函数 func3，但 Derive1 并不知道 func1 的存在：

struct Derive1VTable
{
    func1: fn(this: &Base) -> i32,
    func2: fn(this: &Base, i: i32) -> i32,
    func3: fn(this: &Derive1) -> i32,
}

我们只能够拿到当前类的定义，而无法拿到基类的定义，所以我们不知道基类的虚表长什么样子，因而也无法将基类虚表的定义嵌入到派生类的虚表中。

之前也考虑另一种方案，就是直接将基类虚表作为派生类虚表的一个数据成员，从内存布局上来说，下面的定义和上面的定义是相同的。
```
struct Derive1VTable
{
    base: BaseVTable,
    func3: fn(this: &Derive1) -> i32,
}
```

但问题是，当类的派生层次增加，发生函数重写时，初始化虚表的实现将变得复杂，且丑，以 Derive2 为例：

struct Derive2VTable
{
    base: Derive1VTable,
    ...
}
const VTABLE: Derive2VTable = Derive2VTable
{
    base: Derive1VTable
    {
        base: BaseVTable
        {
            func1: Derive1VTable::VTABLE.base.func1,
            func2: Self::func2_impl,
        },
        func3: Self::func3_impl,
    },
    ...
}

而且因为我们不知道基类的定义，我们也无法得知每个方法的具体路径，而这要求我们知道所有基类的定义，这个方案不仅没有解决问题，反而将问题复杂化了。

相比之下，将基类虚表复制到派生类的方法，只需要知道直接继承的基类虚表就好了。那么如何才能知道直接基类的虚表呢？我们下一节来解决这个问题。

在 Rust 中模拟 C++ 类的功能类型信息及向下转换第二

作者: 立夏
时间: 2022-07-30
分类: C++,Rust
评论

在上一节中我们手工实现了虚函数表，完成了继承、重写以及向上转换的操作，接下来我们要实现向下转换，它的实现大概是下面的样子：

pub fn dynamic_cast<'a, B, D>(base: &'a B) -> Option<&'a D>
{
    if can_dynamic_cast_to::<B, D>(base)
    {
        Some(unsafe { reinterpret_cast(base) })
    }
    else
    {
        None
    }
}

这里我们使用了上一节实现的 reinterpret_cast 函数来进行类型的转换，实际上 reinterpret_cast 可以转换任何类型，而不管转换是否安全，因此才会被标记为 unsafe，上一节我们用它做向上转换，是因为我们可以确保转换是安全的，但在这里就要先行检测安全再进行转换。
为了实现 can_dynamic_cast_to 函数，我们需要一些额外的信息，也就是运行时类型识别，我们先简单实现一个：

pub struct TypeInfo
{
    base_class: Option<&'static TypeInfo>
}
impl TypeInfo
{
    fn is_same(&self, other: &TypeInfo) -> bool
    {
        self as *const TypeInfo == other as *const TypeInfo
    }
    fn is_base_of(&self, other: &TypeInfo) -> bool
    {
        let mut ret = self.is_same(other);
        if !ret
        {
            if let Some(other) = other.base_class
            {
                ret = self.is_base_of(other);
            }
        }
        ret
    }
}

这个类型信息简单了点，可以从当前类一直向上查找到基类为止，别看它小，但用于我们实现 can_dynamic_cast_to 足够了。
接下来要为每个类添加类型信息，如下：

pub struct BaseVTable
{
    _type_info_: &'static TypeInfo,
    ...
}
impl Base
{
    pub const TYPEINFO: TypeInfo = TypeInfo
    {
        base_class: None,
    };
    pub const VTABLE: BaseVTable = BaseVTable
    {
        _type_info_: &Self::TYPEINFO,
        ...
    };
}
pub struct Derive1VTable
{
    _type_info_: &'static TypeInfo,
    ...
}
impl Derive1
{
    pub const TYPEINFO: TypeInfo = TypeInfo
    {
        base_class: Some(&Base::TYPEINFO),
    };
    pub const VTABLE: Derive1VTable = Derive1VTable
    {
        _type_info_: &Self::TYPEINFO,
        ...
    };
}
pub struct Derive2VTable
{
    _type_info_: &'static TypeInfo,
    ...
}
impl Derive2
{
    pub const TYPEINFO: TypeInfo = TypeInfo
    {
        base_class: Some(&Derive1::TYPEINFO),
    };
    pub const VTABLE: Derive2VTable = Derive2VTable
    {
        _type_info_: &Self::TYPEINFO,
        ...
    };
}

现在我们可以实现 can_dynamic_cast_to 了，如下：

fn can_dynamic_cast_to<B, D>(base: &B) -> bool
{
    let typeinfo: &TypeInfo =
    {
        let p = base as *const B;
        let p = p as *const *const *const TypeInfo;
        unsafe { &***p }
    };
    &D::TYPEINFO.is_base_of(typeinfo)
}

根据我们实现的类的内存布局，无论是任何类，第一级指针，指向对象本身，这是毋庸置疑的；因为 vptr 是类的第一个成员，所以第二级指针都指向虚表；而虚表中第一个元素是指向类型信息的指针，所以第三级指针指向类型信息。因此我们将对象的引用转换为三级的类型信息指针，从而获取到对象的实际类型信息。
接下来我们通过 D::TYPEINFO 来获取要转换到的类型的类型信息，从而判断是否可以进行转换。但是这里存在两个问题：

如果用户传递给我们的 base 参数不是一个我们实现的类，甚至是一个 i32 或者 &str，怎么办？我们知道这样做是不合法的，却无法阻止这样的事情发生，甚至我们连安全性都不能保证。如下：
```
if let Some(_) = dynamic_cast<i32, i64>(22) ...
if let Some(_) = dynamic_cast<&str, Derive2>("abcd") ...
```
D::TYPEINFO 无法通过编译，Rust 的模板要求在展开前进行语法检查，此刻 Rust 还不知道 D 的定义，这和 C++ 的模板不一样。

幸运的是，两个问题可以用一个方法解决，我们可以定义一个 trait TypeInfoTrait，并且我们的类都要求实现 TypeInfoTrait，如下：

pub unsafe trait TypeInfoTrait
{
    fn get_typeinfo() -> &'static TypeInfo;
}
unsafe impl TypeInfoTrait for Base
{
    fn get_typeinfo() -> &'static TypeInfo { &Self::TYPEINFO }
}
unsafe impl TypeInfoTrait for Derive1
{
    fn get_typeinfo() -> &'static TypeInfo { &Self::TYPEINFO }
}
unsafe impl TypeInfoTrait for Derive2
{
    fn get_typeinfo() -> &'static TypeInfo { &Self::TYPEINFO }
}

如此一来，我们可以要求模板参数 B、D 都实现 TypeInfoTrait，缩小了 dynamic_cast 方法的适用范围，在一定程度上保障了安全，但我们无法阻止用户自行实现 TypeInfoTrait，所以我们将它标记为 unsafe。

fn can_dynamic_cast_to<B, D>(base: &B) -> bool
where
    B: TypeInfoTrait,
    D: TypeInfoTrait,
{
    let typeinfo: &TypeInfo =
    {
        let p = base as *const B;
        let p = p as *const *const *const TypeInfo;
        unsafe { &***p }
    };
    &D::get_typeinfo().is_base_of(typeinfo)
}
pub fn dynamic_cast<'a, B, D>(base: &'a B) -> Option<&'a D>
where
    B: TypeInfoTrait,
    D: TypeInfoTrait,
...
pub fn dynamic_cast_mut<'a, B, D>(base: &'a mut B) -> Option<&'a mut D>
where
    B: TypeInfoTrait,
    D: TypeInfoTrait,
...

dynamic_cast 实现完成，我们来验证一下：

use crate::dynamic_cast;
fn func3(base: &super::Base) -> (i32, i32, i32)
{
    let z = if let Some(d1) = dynamic_cast::<super::Base, super::Derive1>(base)
    { d1.func3() } else { -1 };
    (base.func1(), base.func2(100), z)
}
#[test]
fn test_fn2()
{
    let b = super::Base::new(1, 2);
    assert_eq!((1, 102, -1), func3(&b));
    let d1 = super::Derive1::new(1, 2, 3);
    assert_eq!((3, 102, 3), func3(&d1));
    let d2 = super::Derive2::new(1, 2, 3);
    assert_eq!((3, 302, 203), func3(&d2));
}

自此，我们手工实现 C++ 类的功能已经验证完成，接下来我们要开始用宏来生成这些代码。