Appearance
Speech Marks
The speech marks returned with every synthesis request are a mapping between time and text. It informs the client on when each word is spoken in the audio for the purposes of highlighting, seeking, tracking usage, etc.
ts
type Chunk = {
startTime: number
endTime: number
start: number
end: number
value: string
}
type NestedChunk = Chunk & {
chunks: Chunk[]
}Typical Gotchas
- The values are returned based on the SSML so any escaping of
&,<and>will be present in thevalue,startandendfields. You may consider using string tracker library to assist in the mapping. - The
startandendvalues of each word may have gaps. If you're looking for the word at an index, look for thestartbeing>= yourIndex. Rather than checking if the index is within the bounds of bothstartandend - The
startTimeandendTimeof each word may have gaps. Follow the same advice as above - The
startTimeof the first word is not necessarily0like theNestedChunk. There can be silence at the beginning of the sentence that leads to the word starting part way through. - The
endTimeof the last word does not necessarily correspond with the end of theNestedChunk. There can be silence on the end of theNestedChunkthat will lead it to be longer.
Example output
ts
const chunk: NestedChunk = {
start: 0,
end: 79,
startTime: 0,
endTime: 4292.58,
value: 'This is a sentence used for testing with some text on the end to make it longer',
chunks: [
{
start: 0,
end: 4,
startTime: 125,
endTime: 250,
value: 'This',
},
{
start: 5,
end: 7,
startTime: 250,
endTime: 375,
value: 'is',
},
{
start: 8,
end: 9,
startTime: 375,
endTime: 500,
value: 'a',
},
{
start: 10,
end: 18,
startTime: 500,
endTime: 937,
value: 'sentence',
},
{
start: 19,
end: 23,
startTime: 937,
endTime: 1200,
value: 'used',
},
{
start: 24,
end: 27,
startTime: 1200,
endTime: 1375,
value: 'for',
},
{
start: 28,
end: 35,
startTime: 1375,
endTime: 1775,
value: 'testing',
},
{
start: 36,
end: 40,
startTime: 1775,
endTime: 1937,
value: 'with',
},
{
start: 41,
end: 45,
startTime: 1937,
endTime: 2125,
value: 'some',
},
{
start: 46,
end: 50,
startTime: 2125,
endTime: 2500,
value: 'text',
},
{
start: 51,
end: 53,
startTime: 2500,
endTime: 2625,
value: 'on',
},
{
start: 54,
end: 57,
startTime: 2625,
endTime: 2850,
value: 'the',
},
{
start: 58,
end: 61,
startTime: 2850,
endTime: 3000,
value: 'end',
},
{
start: 62,
end: 64,
startTime: 3000,
endTime: 3125,
value: 'to',
},
{
start: 65,
end: 69,
startTime: 3125,
endTime: 3312,
value: 'make',
},
{
start: 70,
end: 72,
startTime: 3312,
endTime: 3437,
value: 'it',
},
{
start: 73,
end: 79,
startTime: 3437,
endTime: 4292.58,
value: 'longer',
},
],
}