How to Synchronize ♻️AI JSON Timestamps 🔢with Audio 🔉in TypeScript: A Step-by-Step Guide 📖
When working with AI-generated transcriptions or other time-based data, it’s often necessary to synchronize timestamps from a JSON response with the corresponding segments in an audio file. This tutorial will guide you through the process of implementing this functionality using TypeScript.
Prerequisites
Before we begin, make sure you have a basic understanding of the following:
- TypeScript
- HTML5 Audio API
- JSON parsing
Tools Required
- A code editor like Visual Studio Code
- A modern web browser for testing
- An audio file (e.g.,
audio.mp3
) - JSON response with timestamps
Step 1: Understanding the AI JSON Response
Typically, AI transcription services return a JSON object where each word or phrase is associated with a start
and end
timestamp. These timestamps indicate when the word or phrase occurs within the audio file.
Here’s an example of a typical JSON response:
{
"transcription": [
{ "word": "Hello", "start": 0.5, "end": 1.0 },
{ "word": "world", "start": 1.2, "end": 1.7 }
]
}
In this example, the word “Hello” starts at 0.5 seconds and ends at 1.0 seconds, while “world” starts at 1.2 seconds and ends at 1.7 seconds.
Step 2: Loading the Audio File
To work with the audio file in your TypeScript application, you can use the HTML5 <audio>
element. Here's how to load and prepare the audio for playback:
const audio = new Audio('path/to/audio/file.mp3');
Replace 'path/to/audio/file.mp3'
with the actual path to your audio file.
Step 3: Synchronizing Timestamps with Audio
The key to synchronization is the currentTime
property of the <audio>
element, which provides the current playback time of the audio. By comparing this time to the timestamps in the JSON response, you can trigger actions (like displaying text or highlighting words) when specific segments of the audio are playing.
Here’s how to implement this:
// Example JSON response
const response = {
transcription: [
{ word: "Hello", start: 0.5, end: 1.0 },
{ word: "world", start: 1.2, end: 1.7 }
]
};
// Function to handle audio playback and synchronization
function handlePlayback() {
const audio = new Audio('path/to/audio/file.mp3');
audio.addEventListener('timeupdate', () => {
const currentTime = audio.currentTime;
response.transcription.forEach(({ word, start, end }) => {
if (currentTime >= start && currentTime <= end) {
console.log(`Playing word: ${word}`);
// Trigger any additional actions here, such as visual highlights
}
});
});
audio.play();
}
handlePlayback();
Explanation:
- The
timeupdate
event fires multiple times per second while the audio is playing, allowing for real-time synchronization. - We loop through each word in the transcription, checking if the
currentTime
of the audio falls within thestart
andend
times. - When a match is found, we log the word to the console. You can replace this with any other action, such as updating the UI.
Step 4: Adding Visual Feedback
To provide visual feedback in your UI, such as highlighting the current word being spoken, you can extend the above implementation. Assume each word in your transcription is rendered in the HTML with a data-word
attribute:
function updateUI(word: string) {
// Clear any existing highlights
document.querySelectorAll('.highlight').forEach(element => {
element.classList.remove('highlight');
});
// Highlight the current word
const element = document.querySelector(`[data-word="${word}"]`);
if (element) {
element.classList.add('highlight');
}
}
// Update the handlePlayback function
function handlePlayback() {
const audio = new Audio('path/to/audio/file.mp3');
audio.addEventListener('timeupdate', () => {
const currentTime = audio.currentTime;
response.transcription.forEach(({ word, start, end }) => {
if (currentTime >= start && currentTime <= end) {
updateUI(word);
}
});
});
audio.play();
}
Explanation:
- The
updateUI
function first removes any existing highlights from previous words. - It then searches for the word currently being spoken and adds a
highlight
class to it.
Styling the Highlight:
You can define the highlight
class in your CSS to visually distinguish the current word:
.highlight {
background-color: yellow;
font-weight: bold;
}
Conclusion
By following these steps, you can synchronize timestamps from an AI-generated JSON response with corresponding audio segments in TypeScript. This technique is useful for applications like interactive transcriptions, language learning tools, and more.
With this foundation, you can further improve your application by adding features like subtitle overlays, user-controlled playback, or dynamic content updates based on audio playback.