|
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
| Volume 187 - Issue 97 |
| Published: April 2026 |
| Authors: Ashish Joshi |
10.5120/ijca78706a352984
|
Ashish Joshi . Abstractive Summarization of Spoken Language: A Comparative Evaluation of BART and T5 on Podcast and Conversational Speech Transcripts. International Journal of Computer Applications. 187, 97 (April 2026), 11-19. DOI=10.5120/ijca78706a352984
@article{ 10.5120/ijca78706a352984,
author = { Ashish Joshi },
title = { Abstractive Summarization of Spoken Language: A Comparative Evaluation of BART and T5 on Podcast and Conversational Speech Transcripts },
journal = { International Journal of Computer Applications },
year = { 2026 },
volume = { 187 },
number = { 97 },
pages = { 11-19 },
doi = { 10.5120/ijca78706a352984 },
publisher = { Foundation of Computer Science (FCS), NY, USA }
}
%0 Journal Article
%D 2026
%A Ashish Joshi
%T Abstractive Summarization of Spoken Language: A Comparative Evaluation of BART and T5 on Podcast and Conversational Speech Transcripts%T
%J International Journal of Computer Applications
%V 187
%N 97
%P 11-19
%R 10.5120/ijca78706a352984
%I Foundation of Computer Science (FCS), NY, USA
The exponential growth of long-form audio content, particularly podcasts and lectures, creates an urgent need for effective summarization systems capable of condensing hours of speech into concise, coherent summaries. This study presents a comprehensive comparative evaluation of two transformer-based architectures— BART and T5—for abstractive summarization of spoken language transcripts. Unlike prior work that relies on written dialogue datasets, the author fine-tunes and evaluates both models on three speech-specific datasets: PodcastSum (12,345 podcast episodes), How2 (12,987 instructional videos), and the AMI Meeting Corpus (137 hours of meetings). A multi-faceted evaluation framework is employed, combining automated metrics (ROUGE, BLEU, BERTScore, METEOR) with human judgments across five quality dimensions (coherence, fluency, factual consistency, conciseness, and speaker attribution). Statistical significance testing confirms observed differences, and qualitative analysis reveals model-specific strengths and failure patterns. Results demonstrate that BART significantly outperforms T5 across all automated metrics (p < 0.01) and receives higher human ratings for factual consistency and structural cohesion. However, T5 generates more lexically diverse summaries and better handles extended dialogue contexts. Complementary strengths are identified that suggest hybrid approaches may be beneficial. To support reproducibility, the evaluation framework and human-annotated test samples are released. The findings provide actionable guidance for deploying summarization systems in real-world speech applications.