The API provides speech-to-text recognition services that can make audio and video content searchable and accessible. automatically adds punctuation marks and capital letters to the text to make it easier to read. You can recognize multiple speakers and assign text to each of them. Timestamps are provided for each word (this feature is still in beta). API documentation is work in progress and is subject to change. Screenshot

You may also like