|
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
| Volume 187 - Issue 77 |
| Published: January 2026 |
| Authors: Sayyada Sara Banu, Ratnadeep R. Deshmukh |
10.5120/ijca2026926226
|
Sayyada Sara Banu, Ratnadeep R. Deshmukh . An Interactive MFCC-Driven Hierarchical Clustering Framework for Automatic Speaker Diarization with Visual Analytics. International Journal of Computer Applications. 187, 77 (January 2026), 28-34. DOI=10.5120/ijca2026926226
@article{ 10.5120/ijca2026926226,
author = { Sayyada Sara Banu,Ratnadeep R. Deshmukh },
title = { An Interactive MFCC-Driven Hierarchical Clustering Framework for Automatic Speaker Diarization with Visual Analytics },
journal = { International Journal of Computer Applications },
year = { 2026 },
volume = { 187 },
number = { 77 },
pages = { 28-34 },
doi = { 10.5120/ijca2026926226 },
publisher = { Foundation of Computer Science (FCS), NY, USA }
}
%0 Journal Article
%D 2026
%A Sayyada Sara Banu
%A Ratnadeep R. Deshmukh
%T An Interactive MFCC-Driven Hierarchical Clustering Framework for Automatic Speaker Diarization with Visual Analytics%T
%J International Journal of Computer Applications
%V 187
%N 77
%P 28-34
%R 10.5120/ijca2026926226
%I Foundation of Computer Science (FCS), NY, USA
Automatic Speaker Diarization (ASD) is the task of determining “who spoke when” in multi-speaker audio recordings without prior speaker labels. This paper presents a transparent, tunable, and GUI-driven diarization framework that integrates MFCC + Δ + Δ² embeddings, adaptive percentile-based Voice Activity Detection (VAD), and Agglomerative Hierarchical Clustering (AHC) with configurable distance metrics and linkage strategies. The system provides complete control over preprocessing, segmentation, clustering, and post-processing, while offering rich visual analytics including waveform-aligned speaker timelines, spectrograms, MFCC heatmaps, PCA-based embedding scatter plots, Silhouette-driven cluster diagnostics, and conversational metrics. Experimental evaluation shows that the proposed MFCC + AHC pipeline achieves stable speaker grouping with clear cluster separation and reduced fragmentation after post-processing, achieving a diarization error rate between 5.8% and 8.1% on test recordings. The tool supports RTTM/CSV/JSON export and is suitable for research, education, conversational analysis, and domain-specific diarization studies requiring interpretability and flexibility.