Skip to content

(Re-)Implement C Backend (this time supports callbacks & typed enums)#261

Open
yuvatia wants to merge 4 commits intoralfbiedert:masterfrom
yuvatia:yuvatia/revamp-c-backend
Open

(Re-)Implement C Backend (this time supports callbacks & typed enums)#261
yuvatia wants to merge 4 commits intoralfbiedert:masterfrom
yuvatia:yuvatia/revamp-c-backend

Conversation

@yuvatia
Copy link
Copy Markdown
Contributor

@yuvatia yuvatia commented Mar 26, 2026

At the company I work for, we use interoptopus to generate C# bindings for a couple of projects and we have been very satisfied with the results, however recently we wanted to generate bindings for a project written in C++ and found the existing (now deprecated) C backend to be extremely lacking. In particular the things that were very apparaent were:

  1. Lack of support for callbacks
  2. Lack of a nice utility function that resolves all exported functions of the Rust DLL
  3. Lack of support for sum types (typed enums)

These gaps motivated me (to motivate Claude) to enhance the backend. Initially this was based on the (now _old) pre-existing C backend, but due to recent structural changes in the repo it was changed to be self-contained.

My intention is to keep this backend relatively maintained (as we plan on using it for production software).

See commit message for detailed overview of the backend.

Some notes

  1. The code is mostly AI generated. I tweaked it a bit manually and provided a lot of feedback, but it is still mostly AI-written. From my understanding you're okay with accepting such PRs but lmk if you feel otherwise.
  2. This was tested with a real-world project, both on Linux and on Windows).
  3. At first I wanted to just reuse the reference project but it does things that are way more complex than what this new backend is capable of at the moment, so I wrote a different reference project that's pretty dumb compared to the one used by the C# backend, but it covers all the gaps mentioned above.
  4. Test code is written in C++ and uses GTest since that's the testing framework I'm more comfortable with (my background is in C++ much moreso than in C), but lmk if you prefer a different framework.
  5. I'm using CMake. Some people have strong feelings towards CMake, so lmk if you prefer a different build system.

C header generator (crates/backend_c/)

New C header generator built against the 0.16 type system. Organized as three modules: lib.rs (public Generator API with builder pattern), codegen.rs (all emission routines), and topo.rs (deterministic topological sort of types by name before dependency traversal, ensuring stable output across runs).

The generator produces a single .h file containing:

  • Type definitions: structs, simple enums, and tagged unions. Rust enums with data payloads emit a C11 tag enum + struct with anonymous union (e.g. SHAPE_TAG enum + SHAPE struct with tag and union fields). This also covers ffi::Option<T> and ffi::Result<T, E> which are internally represented as enums with typed variants.

  • Callbacks: each callback! type emits a NAME_fn function pointer typedef (with an implicit trailing const void* context parameter) and a NAME struct containing three fields — callback (the fn pointer), data (context pointer), and destructor (optional cleanup). This matches the Rust #[repr(C)] layout so callbacks round-trip correctly across the FFI boundary.

  • Dispatch table: a {name}_api_t struct with one function pointer field per exported function, where {name} is caller-specified (e.g. reference_project_c_api_t). A /* internal helpers */ comment separator is emitted before any interoptopus_* builtin functions (from builtins_string!/builtins_vec!) to visually distinguish user APIs from internal helpers.

  • Dynamic loader: a cross-platform {name}_load(path, api) function. The POSIX implementation uses dlopen/dlsym with memcpy to transfer void* into function pointer fields (avoiding the ISO C prohibition on direct void*-to-function-pointer casts, which triggers warnings under -Wpedantic). The Windows implementation converts the UTF-8 path to UTF-16 via MultiByteToWideChar(CP_UTF8, ...) and loads with LoadLibraryW/GetProcAddress. Both validate every symbol and return -1 on failure.

  • Static loader: a {name}_load_static(api) function (behind an #ifdef guard) that assigns the forward-declared symbols directly, for use when statically linking the Rust library.

reference_project_c (crates/backend_c/reference_project/)

Comprehensive FFI example exercising all supported patterns: structs, tagged union enums, slices, vecs, options, strings, callbacks (with Shape, Slice, Option, and Vec parameters), and a KitchenSink struct that combines all major FFI types (u64, bool, f64, ffi::String, tagged enum, ffi::Option, ffi::Slice, ffi::Optionffi::String). Exports a public inventory() function consumed by the backend's integration test. Mirrors the role of crates/reference_project/ for the C# backend.

Test infrastructure

  • crates/backend_c/tests/: Rust integration test generates the C header from reference_project_c's inventory. C++20 gtest suite (10 tests) under reference_project/ loads the Rust cdylib and validates all FFI types with proper assertions. Uses CMake + FetchContent for gtest. The generated header is gitignored (regenerated by cargo test).

  • examples/hello_world/: added a second binding generation test for C (alongside the existing C# one), plus a simple C++20 gtest (2 tests) that validates the Vec2/my_function roundtrip. Renamed bindings/ to bindings_csharp/ for clarity alongside bindings_c/. Also fixed the existing C# Xunit test to only reference types in the inventory and updated the target framework to net10.0. Generated headers are gitignored.

  • Justfile: just test-c runs both C++ test suites via a shared _test_c helper (cmake configure/build/ctest with RUST_LIB_DIR). just test-dotnet now also runs the hello_world Xunit test. Both are wired into just ci.

@yuvatia yuvatia force-pushed the yuvatia/revamp-c-backend branch 2 times, most recently from 50f33ff to ef15cea Compare March 26, 2026 11:47
@ralfbiedert
Copy link
Copy Markdown
Owner

ralfbiedert commented Mar 26, 2026

Hi, thanks for the PR, and thanks a lot for looking after the C backend.

Yes, AI assisted PRs are generally fine if there is sufficient 'human in the loop'. This PR has a few issues though:

Most importantly, the C backend (any backend now) should be 'model + pass' based. Essentially that means each backend should

  • First, define a understandable model how the target language works (see the C# backend lang module). Essentially that means defining a taxonomy what items (functions, types, enums, ...) should be observable in its output.
  • Next, define one or more model passes (again, see C#) that transform core::Inventory items into its own model. This might mean mapping types, filtering out things that won't work, registering helper types, ...
  • Lastly, define one more output passes that (gradually) build up fragments (e.g, type definitions, enum definitions, ...)

The overall structure should be vaguely organized like in the C# backend, in particular the lang items, passes and pipelines (pipelines glue the passes together).

The passes should also be configurable (this might include options about naming conventions, etc.), and setting these config options should again follow how the C# pipeline builders do it.

Most importantly, new backends must use tera templates like the C# backend does it. The whole indented!() we had earlier, and the writeln!() in here, is a total maintenance nightmare.

Testing should also vaguely follow C#. There should be multiple emission / insta snapshot tests for various reference project parts. Neither tests nor codegen should include C++ constructs (for the backend_c), unless its minimal and optional.

@yuvatia
Copy link
Copy Markdown
Contributor Author

yuvatia commented Mar 26, 2026

Thank you for your feedback, will address those comments and update the PR in a few days.

@yuvatia yuvatia force-pushed the yuvatia/revamp-c-backend branch from ef15cea to ba54839 Compare March 27, 2026 12:36
Comment on lines +5 to +9
int len = MultiByteToWideChar(CP_UTF8, 0, path, -1, NULL, 0);
if (len <= 0) return -1;
wchar_t* wpath = (wchar_t*)_alloca(len * sizeof(wchar_t));
MultiByteToWideChar(CP_UTF8, 0, path, -1, wpath, len);
HMODULE lib = LoadLibraryW(wpath);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm debating adding a {{load_fn}}W or something that just takes a PCWSTR to avoid these conversions, wdyt?

@yuvatia
Copy link
Copy Markdown
Contributor Author

yuvatia commented Mar 27, 2026

I updated the PR to address all your comments, and also addressed a few gaps such as missing benchmarks (runnable with just bench-c) + some functionality that was lost from the older backend that I liked (being able to choose between snake_case/camelCase/PascalCase).

@yuvatia yuvatia force-pushed the yuvatia/revamp-c-backend branch 2 times, most recently from 52dc19f to e753a61 Compare March 30, 2026 08:03
@ralfbiedert
Copy link
Copy Markdown
Owner

ralfbiedert commented Mar 31, 2026

Sorry, but this still has most issues of the C backend flow and structure being vastly different from the C# one.

I recommend you (as a human) look at the C# backend and get a feeling for it, there's too much detail to write everything down item-by-item. While some internals have been agent created (e.g., overload emission and proc macros), the overall backend flow is 'hand-designed' and should be easy to follow (if it isn't that would be an issue).

One bigger thing though I missed the first time, the reliance on cmake, which is often a PITA to set up on platforms that don't ship it out of the box. It's a bit of a weak argument given we rely on dotnet already (and I'm open to discussions how to address that), but I'd be good to explore options (maybe, using cc?) to minimize setup pain.

yuvatia and others added 2 commits April 13, 2026 20:30
C backend architecture (crates/backend_c/)
-------------------------------------------
The C backend follows the same model+pass+template pattern established
by the C# backend. The implementation is split into:

- `lang/` — C language model defining the constructs the backend can
  emit: `CType` with `CTypeKind` variants (Primitive, Struct,
  SimpleEnum, TaggedUnion, FnPointer, Callback, Slice/SliceMut, Vec,
  Utf8String, Option, Result, Opaque, Pointer, Array), `CFunction`,
  and `CModel` which holds the complete mapped model.

- `pass/model.rs` — Single model pass that transforms the Rust
  inventory into the C language model. Maps all type kinds, resolves
  type names (sanitizing Rust names like `Option<Vec2>` into valid C
  identifiers like `OPTIONVEC2`), performs topological sort for
  dependency-ordered emission, and builds the function list.

- `pass/output.rs` — Output pass that renders the C model through Tera
  templates into the final header. Each type kind dispatches to its own
  template; the final assembly concatenates header guard, type
  definitions (in topo order), function declarations, dispatch table,
  platform-specific loaders, and footer.

- `pipeline/` — `CLibrary` with builder pattern (`loader_name`,
  `ifndef`, `filename`), wires model and output passes together.
  Templates are packed into a tar archive at build time via `build.rs`
  and embedded in the binary.

- `templates/` — 14 Tera `.h` template files organized by construct:
  types (struct, simple_enum, tagged_union, callback, fn_pointer,
  slice, vec, utf8string, option, result, opaque), function
  declarations, dispatch table, and loaders (dynamic_win32,
  dynamic_posix, static). The dynamic loader uses
  `MultiByteToWideChar`/`LoadLibraryW` on Windows and
  `dlopen`/`dlsym` with `memcpy` on POSIX.

A `/* internal helpers */` comment separator is emitted before any
`interoptopus_*` builtin functions in the dispatch table, loaders,
and function declarations.

reference_project_c (crates/backend_c/reference_project/)
---------------------------------------------------------
Comprehensive FFI example exercising all supported patterns: structs,
tagged union enums, slices, vecs, options, strings, callbacks (with
Shape, Slice, Option, and Vec parameters), and a KitchenSink struct
that combines all major FFI types. Mirrors the role of
`crates/reference_project/` for the C# backend.

Test infrastructure
-------------------
- Insta snapshot tests (6 focused + 1 full): basic struct, simple enum,
  tagged union, callbacks, pattern types (Slice/Option), and the full
  reference project header. Each test builds a small inventory, runs
  the pipeline, and snapshots the generated `.h` output.

- C++20 gtest suite (10 tests) under reference_project/ loads the Rust
  cdylib and validates all FFI types at runtime. Uses CMake +
  FetchContent for gtest. The generated header is gitignored.

- C++20 Google Benchmark suite (10 benchmarks) under benches/ measures
  FFI call overhead for all major patterns: tagged unions, slices,
  mutable slices, vec lifecycle, option returns, and callbacks with
  various argument types. Run with `just bench-c`.

- examples/hello_world/: binding generation tests for both C# and C,
  plus a simple C++20 gtest (2 tests). Renamed `bindings/` to
  `bindings_csharp/` for clarity alongside `bindings_c/`.

- CMake: copies the Rust cdylib next to the test exe on all platforms
  (with RPATH=$ORIGIN on Unix), uses `--config Debug` / `-C Debug`
  for MSVC multi-config generators.

- Justfile: `just test-c` runs both C++ test suites via `_test_c`
  helper. `just test-dotnet` runs hello_world Xunit test. Both wired
  into `just ci`. `just bench-c` builds in release and runs the
  Google Benchmark suite via ctest.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds NamingStyle enum (ScreamingSnake, UpperCamel, Snake, Raw) and
NamingConfig with per-category control over type, enum variant,
function, parameter, and constant naming. An optional prefix is
prepended to types and functions (e.g. mylib_color).

Loader templates now use a separate `symbol` field for dlsym/
GetProcAddress lookups so prefixed names don't break dynamic loading.

ScreamingSnake properly splits at word boundaries (OptionVec2 →
OPTION_VEC2). Callback _fn and tag _TAG suffixes are cased to match
their respective naming styles. Tagged union field names are computed
in the model pass rather than reverse-engineered in the output pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yuvatia yuvatia force-pushed the yuvatia/revamp-c-backend branch from e753a61 to cae05d8 Compare April 13, 2026 20:05
yuvatia and others added 2 commits April 14, 2026 09:30
- Handle Rust's `_` parameter name in `c_param_name()`: `sanitize("_")`
  returns an empty string which produces invalid C like `Type 0` instead
  of `Type param`. Fall back to "param" when the styled name is empty.

- Add `#include <malloc.h>` to the Win32 dynamic loader template so
  `_alloca` is declared when compiling with MSVC in C++ mode.
The prefix (e.g. "mylib_") is meant to avoid symbol collisions in the
global namespace, but inside the dispatch struct the fields are already
scoped — the prefix just adds noise and makes the API painful to use
(api.mylib_foo vs api.foo). Keep the prefixed name for top-level
function declarations and the original symbol for dlsym/GetProcAddress.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants