[Feature] Support for VARIANT data type

### Search before asking

- [x] I searched in the [issues](https://github.com/apache/fluss/issues) and found nothing similar.


### Motivation

### Background

Semi-structured data (e.g., JSON) is increasingly common in modern data pipelines. Many query engines and storage systems (such as Apache Spark, Apache Iceberg, and Apache Paimon) have adopted a **VARIANT** data type to efficiently represent and query semi-structured data using a compact binary encoding, rather than storing raw JSON strings.

Currently, Fluss treats VARIANT internally as plain `byte[]`, which has several limitations:

1. **Loss of semantic structure**: A single `byte[]` conflates the variant's value and metadata (string dictionary) into one opaque blob. Downstream consumers must know the internal wire format (`[4-byte value length][value bytes][metadata bytes]`) to decode it correctly.
2. **Inconsistent API**: All other complex types in Fluss (e.g., `InternalArray`, `InternalMap`, `InternalRow`) have dedicated first-class types in the row infrastructure, while VARIANT does not.
3. **Poor interoperability with lake formats**: When writing to lake formats (Paimon, Iceberg, Lance), the VARIANT data must be split into separate `value` and `metadata` components. Using `byte[]` forces every integration point to re-implement the split/merge logic.
4. **No alignment with industry standards**: Apache Paimon has already introduced a full `Variant` interface with `value()` and `metadata()` accessors, following the [Variant Binary Encoding spec](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md). Fluss should align with this design for ecosystem consistency.

### Use Case

- Users ingesting JSON or semi-structured data into Fluss tables should benefit from efficient binary encoding and per-path access without full deserialization.
- Lake connector writers (Paimon, Iceberg, Lance) need structured access to `value` and `metadata` separately.
- A first-class `Variant` type enables future optimizations like predicate pushdown on variant paths.


### Solution

### Proposed Design

Introduce a first-class `Variant` interface and `GenericVariant` implementation throughout Fluss's row infrastructure, following the same pattern as Apache Paimon's Variant design.

#### 1. Core Types

- **`Variant` interface** (`fluss-common/.../row/Variant.java`)
  - `byte[] value()` — returns the binary-encoded variant value (header + data)
  - `byte[] metadata()` — returns the string dictionary (version + deduplicated object key names)
  - `long sizeInBytes()` — total byte size
  - `Variant copy()` — deep copy
  - Static helpers: `bytesToVariant(byte[])` and `variantToBytes(Variant)` for backward-compatible wire format conversion

- **`GenericVariant` class** (`fluss-common/.../row/GenericVariant.java`)
  - Implements `Variant` and `Serializable`
  - Stores two `byte[]` fields: `value` and `metadata`
  - Proper `equals()`, `hashCode()`, `toString()`

#### 2. Row Infrastructure Changes

| Layer | Change |
|-------|--------|
| **DataGetters** | Add `Variant getVariant(int pos)` |
| **BinaryWriter** | Add `writeVariant(int pos, Variant value)` |
| **All InternalRow implementations** | Implement `getVariant()` — `GenericRow`, `BinaryRow`, `CompactedRow`, `IndexedRow`, `ProjectedRow`, `PaddingRow`, `ColumnarRow`, etc. |
| **All InternalArray implementations** | Implement `getVariant()` — `GenericArray`, `BinaryArray`, `ColumnarArray` |
| **Readers/Writers** | `CompactedRowReader/Writer`, `IndexedRowReader/Writer` — add `readVariant()`/`writeVariant(Variant)` |

#### 3. Binary Storage Format (Backward Compatible)

The on-wire format remains unchanged for compatibility:
`Variant.variantToBytes()` and `Variant.bytesToVariant()` handle the conversion.

#### 4. Integration Points

- **Lake connectors** (Paimon, Iceberg, Lance): Encoders/decoders use `Variant` directly instead of raw `byte[]`
- **Flink bridge**: `FlussRowToFlinkRowConverter` converts `Variant` → `byte[]` for Flink compatibility
- **Client converters**: `PojoToRowConverter` / `RowToPojoConverter` support both `byte[]` and `Variant` inputs
- **Utilities**: `InternalRowUtils`, `TypeUtils`, `PartitionUtils` updated accordingly

#### 5. References

- [Variant Binary Encoding Spec (Parquet)](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md)
- [Apache Paimon Variant Implementation](https://github.com/apache/paimon/tree/master/paimon-common/src/main/java/org/apache/paimon/data/variant)
- [Apache Spark VARIANT FLIP](https://issues.apache.org/jira/browse/SPARK-45891)


### Anything else?

_No response_

### Willingness to contribute

- [x] I'm willing to submit a PR!

Layer	Change
DataGetters	Add `Variant getVariant(int pos)`
BinaryWriter	Add `writeVariant(int pos, Variant value)`
All InternalRow implementations	Implement `getVariant()` — `GenericRow`, `BinaryRow`, `CompactedRow`, `IndexedRow`, `ProjectedRow`, `PaddingRow`, `ColumnarRow`, etc.
All InternalArray implementations	Implement `getVariant()` — `GenericArray`, `BinaryArray`, `ColumnarArray`
Readers/Writers	`CompactedRowReader/Writer`, `IndexedRowReader/Writer` — add `readVariant()`/`writeVariant(Variant)`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support for VARIANT data type #2873

Search before asking

Motivation

Background

Use Case

Solution

Proposed Design

1. Core Types

2. Row Infrastructure Changes

3. Binary Storage Format (Backward Compatible)

4. Integration Points

5. References

Anything else?

Willingness to contribute

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Support for VARIANT data type #2873

Description

Search before asking

Motivation

Background

Use Case

Solution

Proposed Design

1. Core Types

2. Row Infrastructure Changes

3. Binary Storage Format (Backward Compatible)

4. Integration Points

5. References

Anything else?

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions