Skip to content

Question: Support for synthetic scale-free graph generation #29

@seongwoohan

Description

@seongwoohan

Hello SDCD developers! I’m following the SDCD tutorial to generate synthetic data and wonder if it's possible to generate scale-free graph as well. From the SDCD paper, it looks like the experiments evaluate on Erdős–Rényi (ER) graphs, not scale-free graph. Could you please confirm if that is correct?

In the simulation function random_model_gaussian_global_variance, the dag_type argument supports "ER", but no scale-free. However, I noticed that the lower-level random_dag function does include a "scale_free" option.

Is there a recommended way to generate scale-free graphs using the SDCD simulation utilities, or is the most straightforward approach to pass distribution="scale_free" to random_dag and have random_model_gaussian_global_variance call it like I show below? I’d like to confirm the best practice for generating scale-free graphs, since the tutorial only demonstrates the ER case!

Thanks very much for your guidance!

def random_dag(n_nodes: int = 20, n_edges: int = 20, distribution: str = "scale_free"):
    """Return a random DAG.

    Args:
        n_nodes: Number of nodes.
        n_edges: Number of edges (only used for uniform distribution).
        distribution: Distribution of the random graph, one of "uniform" (or "erdos_renyi") or "scale_free".
    """
    if distribution in ["uniform", "erdos_renyi"]:
        graph = nx.gnm_random_graph(n_nodes, n_edges, directed=False)
    elif distribution == "scale_free":
        graph = nx.scale_free_graph(n_nodes, alpha=0.41, beta=0.54, gamma=0.05)
    else:
        raise ValueError(f"Unknown distribution {distribution}.")

    return random_dag_from_undirected_graph(graph)

np.random.seed(42)

n = 10000
n_per_intervention = 500
d = 50
n_edges = 200   # d * s 


true_causal_model = random_model_gaussian_global_variance(
    d,
    n_edges,
    dag_type="ER",
    scale=0.5,
    hard=True,
)

X_df = true_causal_model.generate_dataframe_from_all_distributions(
    n_samples_control=n,
    n_samples_per_intervention=n_per_intervention,
)
X_df.iloc[:, :-1] = (X_df.iloc[:, :-1] - X_df.iloc[:, :-1].mean()) / X_df.iloc[
    :, :-1
].std() # Normalize the dat

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions