How to Synchronize ♻️AI JSON Timestamps 🔢with Audio 🔉in TypeScript: A Step-by-Step Guide 📖

Israel
3 min readAug 12, 2024

When working with AI-generated transcriptions or other time-based data, it’s often necessary to synchronize timestamps from a JSON response with the corresponding segments in an audio file. This tutorial will guide you through the process of implementing this functionality using TypeScript.

Prerequisites
Before we begin, make sure you have a basic understanding of the following:

  • TypeScript
  • HTML5 Audio API
  • JSON parsing

Tools Required

  • A code editor like Visual Studio Code
  • A modern web browser for testing
  • An audio file (e.g., audio.mp3)
  • JSON response with timestamps

Step 1: Understanding the AI JSON Response

Typically, AI transcription services return a JSON object where each word or phrase is associated with a start and end timestamp. These timestamps indicate when the word or phrase occurs within the audio file.

Here’s an example of a typical JSON response:

{
"transcription": [
{ "word": "Hello", "start": 0.5, "end": 1.0 },
{ "word": "world", "start": 1.2, "end": 1.7 }
]
}

In this example, the word “Hello” starts at 0.5 seconds and ends at 1.0 seconds, while “world” starts at 1.2 seconds and ends at 1.7 seconds.

Step 2: Loading the Audio File

To work with the audio file in your TypeScript application, you can use the HTML5 <audio> element. Here's how to load and prepare the audio for playback:

const audio = new Audio('path/to/audio/file.mp3');

Replace 'path/to/audio/file.mp3' with the actual path to your audio file.

Step 3: Synchronizing Timestamps with Audio

The key to synchronization is the currentTime property of the <audio> element, which provides the current playback time of the audio. By comparing this time to the timestamps in the JSON response, you can trigger actions (like displaying text or highlighting words) when specific segments of the audio are playing.

Here’s how to implement this:

// Example JSON response
const response = {
transcription: [
{ word: "Hello", start: 0.5, end: 1.0 },
{ word: "world", start: 1.2, end: 1.7 }
]
};

// Function to handle audio playback and synchronization
function handlePlayback() {
const audio = new Audio('path/to/audio/file.mp3');

audio.addEventListener('timeupdate', () => {
const currentTime = audio.currentTime;

response.transcription.forEach(({ word, start, end }) => {
if (currentTime >= start && currentTime <= end) {
console.log(`Playing word: ${word}`);
// Trigger any additional actions here, such as visual highlights
}
});
});

audio.play();
}

handlePlayback();

Explanation:

  • The timeupdate event fires multiple times per second while the audio is playing, allowing for real-time synchronization.
  • We loop through each word in the transcription, checking if the currentTime of the audio falls within the start and end times.
  • When a match is found, we log the word to the console. You can replace this with any other action, such as updating the UI.

Step 4: Adding Visual Feedback

To provide visual feedback in your UI, such as highlighting the current word being spoken, you can extend the above implementation. Assume each word in your transcription is rendered in the HTML with a data-word attribute:

function updateUI(word: string) {
// Clear any existing highlights
document.querySelectorAll('.highlight').forEach(element => {
element.classList.remove('highlight');
});

// Highlight the current word
const element = document.querySelector(`[data-word="${word}"]`);
if (element) {
element.classList.add('highlight');
}
}

// Update the handlePlayback function
function handlePlayback() {
const audio = new Audio('path/to/audio/file.mp3');

audio.addEventListener('timeupdate', () => {
const currentTime = audio.currentTime;

response.transcription.forEach(({ word, start, end }) => {
if (currentTime >= start && currentTime <= end) {
updateUI(word);
}
});
});

audio.play();
}

Explanation:

  • The updateUI function first removes any existing highlights from previous words.
  • It then searches for the word currently being spoken and adds a highlight class to it.

Styling the Highlight:

You can define the highlight class in your CSS to visually distinguish the current word:

.highlight {
background-color: yellow;
font-weight: bold;
}

Conclusion

By following these steps, you can synchronize timestamps from an AI-generated JSON response with corresponding audio segments in TypeScript. This technique is useful for applications like interactive transcriptions, language learning tools, and more.

With this foundation, you can further improve your application by adding features like subtitle overlays, user-controlled playback, or dynamic content updates based on audio playback.

--

--

Israel

I'm Isreal a Frontend Engineer with 4+ experience in the space . My love to profer solutions led me to being a technical writer. I hope to make +ve impact here.