Arrow Array
spec: https://arrow.apache.org/docs/format/Columnar.html
- Buffer
- Buffer 是不共享的值,其持有可共享的 Bytes
- ScalarBuffer < Buffer 提供类型化的操作,例如
scalar_buffer[0..x]
操作 - BooleanBuffer < Buffer 以 bitmap 形式存储的 buffer
- NullBuffer < BooleanBuffer 记录 null_count
- OffsetBuffer
- RunEndBuffer
- MutableBuffer 可变的buffer,用于构建 immutable Buffer
- ArrayData 一个统一的 Array 结构
#![allow(unused)] fn main() { struct ArrayData { data_type: DataType, len: usize, offset: usize, buffers: Vec<Buffer>, // value buffer, value offset buffer child_data: Vec<ArrayData>, // ListArray, StructArray nulls: Option<NullBuffer> } }
- PrimitiveArray
#![allow(unused)] fn main() { struct PrimitiveArray<T: ArrowPrimitiveType> { data_type: DataType, values: ScalarBuffer<T::Native>, nulls: Option<NullBuffer> } struct ScalarBuffer<T: ArrowNativeType> { buffer: Buffer, phantom: PhantomData<T> } struct Buffer { data: Arc<Bytes>, ptr: *const u8, // 可能在 data.ptr[0 .. data.len] 之间的某个地址 length: usize // ptr + length 不能超越 bytes 的边界 } struct Bytes { // ptr[0..len] ptr: NonNull<u8>, len: usize, deallocation: Deallocation // when Standard, ptr should be droped using std::alloc::dealloc } enum Deallocation { Standard(Layout), Custom(Arc<dyn Allocation>, usize) } }
- BooleanArray
- BooleanBuffer: 位图
- NullBuffer
- GenericBytesArray
- GenericBytesArray
- GenericStringArray
- buffers
- offset: OffsetBuffer<32|64>
- value_data: Buffer
- GenericBytesArray
- GenericByteViewArray
- short strings(<=12>, long strings(>=12)
- buffers:
- views: ScalarBuffer
- value_data: Buffer
- DictionaryArray
- buffers: keys
- child_data: values
- GenericListArray
- buffers: offsets
- child_data: values
List<Int8>
exampleList<List<Int8>>
example
- FixedSizeListArray
- buffers
- child_data: values
- MapArray
- StructArray
- child_data: fields' array