Embedding Space Visualization¶
This guide explains how to visualize and analyze embedding spaces using UMAP projections. The tools provided help you understand how embeddings evolve during training and how different groups of data are distributed in the embedding space.
Core Visualization Functions¶
UMAP Projection¶
src.visualization.embedding_viz
¶
Visualization utilities for embedding analysis and visualization. This module provides functions for dimensionality reduction, group analysis, and visualization of embeddings with various plotting utilities.
run_umap(embeddings, metric='euclidean', min_dist=0.1, random_state=42, n_neighbors=15, n_components=2)
¶
Source code in src/visualization/embedding_viz.py
Global Visualization¶
src.visualization.embedding_viz
¶
Visualization utilities for embedding analysis and visualization. This module provides functions for dimensionality reduction, group analysis, and visualization of embeddings with various plotting utilities.
Group Analysis Tools¶
src.visualization.embedding_viz
¶
Visualization utilities for embedding analysis and visualization. This module provides functions for dimensionality reduction, group analysis, and visualization of embeddings with various plotting utilities.
prepare_group_data(df, group_name, max_samples=100)
¶
For a given disease group, filter records that have EXACTLY one label with 'group_name' (but can have other labels in different groups), then store the other integration names for multi-labeled records.
Source code in src/visualization/embedding_viz.py
Example Usage¶
Basic Embedding Comparison¶
# Prepare embeddings
pretrained_embeddings = ... # shape: (N, D)
finetuned_embeddings = ... # shape: (N, D)
# Create UMAP projections
umap_pretrained = run_umap(pretrained_embeddings)
umap_finetuned = run_umap(finetuned_embeddings)
# Visualize
umaps_dict = {
'Pre-trained': umap_pretrained,
'Fine-tuned': umap_finetuned
}
plot_global_umap_grid(umaps_dict, metadata_df)
Group Analysis¶
# Analyze a specific disease group
target_group = "Atrial Fibrillation"
df_group_full, df_group_sub = prepare_group_data(metadata_df, target_group)
# Visualize group distribution
fig, ax = overlay_group_on_embedding(umap_coords, metadata_df, df_group_sub)
plt.title(f"{target_group} Distribution")
plt.show()
Best Practices¶
- Consistency: Use the same UMAP parameters (metric, random_state) when comparing different embeddings.
- Sampling: For large datasets, consider using
prepare_group_datato sample a manageable subset. - Visual Clarity:
- Use appropriate alpha values for background points
- Choose distinct colors for different groups
- Add legends and titles for clear interpretation
Advanced Customization¶
The visualization functions are designed to be flexible:
- Modify color schemes by adjusting the
highlight_colorand background colors - Customize marker styles for different types of samples
- Adjust figure sizes and grid layouts for different numbers of embeddings
- Add additional metadata overlays or annotations