New datatype system design
Sumamry​
We need to redesign the datatype
system because current implementation had some shortcomes, including but not limited to:
DataType
is an enum type, we must use specific type after matching. For example, if we want to create deserializer/serializer byDataType
, we should always do matching. It does not mean that match is not necessary. If we want to add more and more functions toDataType
, matching may be very annoyment.DataType
represented as enum type, we can't use it as generic argument.DataType
may involve some nested datatypes, such asDataType::Struct
, but we putDataField
insideDataType
, it's logically unreasonable。Hard to put attributes into enum based
DataType
, such as nullable attribute #3726 #3769
New Datatype system design​
Enum TypeID
​
TypeID
just represents the kind of DataType
. It does not have extra attributes.
enum TypeID {
Nothing,
Int8,
Int16,
Int32,
...,
Date,
DateTime,
List,
Struct,
Nullable,
}
trait DataType
​
pub trait DataType {
fn type_id() -> TypeID;
fn is_nullable() -> bool;
fn arrow_type() -> ArrowType;
fn create_serializer() -> Box<dyn TypeSerializer>;
fn create_deserializer() -> Box<dyn TypeDeserializer>;
fn create_builder() -> Box<dyn ArrayBuilder>;
...
}
Each DataType
we can get the TypeID
to use in simple type match.
Then we can have DataTypeInt8
and DataTypeInt16
.. to implement trait DataType
.
pub DataTypeInt8 = DataTypeNumber<i8>;
impl DataType for DataTypeInt8 {
fn type_id() -> TypeID {
TypeID::Int8
}
...
}
DataTypeNullable
:
struct DatTypeNullable {
inner: Box<dyn DataType>
}
and DataTypeStruct
:
struct DatTypeStruct {
names: Vec<String>,
inners: Vec<Box<dyn DataType>>
}
struct DataField
​
Yes, we still need struct DataField
, because we need it to store other attributes other than DataType
.
struct DataField {
name: String,
nullable: bool,
data_type: Box<dyn DataType>,
...
}
Example of a function bin
​
fn return_type(&self, args: &[Box<DataType>]) -> Result<Box<DataType>> {
if !args[0].is_numeric(){
return Err(ErrorCode::IllegalDataType(format!(
"Expected number or null, but got {}",
args[0]
)));
}
Ok(Box::new(DataTypeString::create()))
}
Example of numerical_coercion
​
pub fn numerical_coercion(
lhs_type: &Box<DataType>,
rhs_type: &Box<DataType>,
allow_overflow: bool,
) -> Result<Box<DataType>> {
let has_float = lhs_type.is_floating() || rhs_type.is_floating();
let has_integer = lhs_type.is_integer() || rhs_type.is_integer();
let has_signed = lhs_type.is_signed_numeric() || rhs_type.is_signed_numeric();
....
}