Is Ton-Texter really competitive? - Yes, it is!

Niko
•
07.02.2024
In our blog post Transcription based on OpenAI Whisper / Pyannote we explained how Ton-Texter manages to offer state of the art transcription based on OpenAIs Whisper model, that is outperforming its competitors, and the Pyannote speaker diarization model with a industry leading diarization error.
The question that remains to be asked is wether Ton-Texters operating costs are competitive with other offerings on the market.
Operating cost of Ton-Texter
Before comparing the operating cost to other services we do have to get an idea of the operating cost of Ton-Texter. Therefore we ran serveral tests with different audio files. The worst case scenario for our setup ist that the EC2 is not running and the user requests the transcription of a very short audio file, because that way the overhead of the startup time of the EC2 instance compared to the time that the transcription actually takes is proportionatly the largest. The tests concluded that the startup time is around 40-60 seconds as this is the time it takes to transcribe an audio file of 1 minute length. For longer files we measured a real time factor of around 7x, which means that 1 minute of audio takes around 0.14 minutes to transcribe and 1 hour of audio would therefore take around 9 minutes.
The following chart shows the runtime of our EC2 in minutes based on the length of the audio file. The chart is based on the function:
f(x) = 1+x*0.14
1 ist the minute it takes regardless of the audio length and 0.14 is the time in minutes it takes to transcribe one minute of audio.
Our EC2 is billed per minute with an hourly cost of 0.526$, which is 0.00876$ per minute. With that in mind we can calculate the cost of the transcription of audio files based on their length with the following function:
f(x) = (1+x*0.14)*0.00876
This function simply multiplies the time taken for the transcription, that we have calculated before and multiplies it with the running cost per minute. The following chart shows the result of the function:
As we can see from the chart the cost to transcribe one hour of audio is around 0.08$. If we would want to turn Ton-Texter into a complete software as a service solution we could use this as a baseline for our pricing. But there is one thing we have to be aware of: Because of our startup overhead the cost of the transcription per hour of audio varies depending on the audio files length. If we would calculate the price of the transcription of one minute of audio based on this value we would come up with:
0.08$*1/60=0.0013$
But as we can see from the chart our running costs for a one minute audio file are actually far more expensive and closer to 0.01$, which is around 8x more. To get a better understanding of the cost per hour of audio depending on the audio files length we came up with the following function:
f(x)=(1+x*0.14)*0.00876*(60/x)
This function is based on the previous function to calculate the total cost of running the transcription based on the audio files length. To get the cost per hour of audio length we devide 60, the minutes in one hour, by the audio files length in minutes to get the relative costs per hour of audio. This is an important measurement to get a better understanding of the efficiency of our transcription. The following chart represents the result of that function:
In the chart we can see that the worst case scenario is a very short audio file, considering that the machine is not already running, which leads to costs of around 0.60$ per hour of audio. To get the best case it is important to realize that this function is a limit value function. To calculate the value against which the function strives we search for its limit like this:
The result of this equation is: 0.0736. This means that with an eternal audio file our transcription cost per hour is 0.0736$. As this operation is technically similar to ignoring the startup time of one minute as its effect on the function decreases over time (strives against 0), we could also calculate this ideal value like this:
x*0.14*0.00876*(60/x) = 0,0736
To sum it up Ton-Texters operational costs are anywhere between 0.60$ and 0.0736$ per hour of audio. To see if that is competitive we are going to compare it to existing solutions in the following section.
For longer audio files Ton-Texters transcription is just 0.0736$ per hour of audio!
Ton Texter vs Whisper API
The Whisper API is around 0.006$ per minute of audio. This leads to costs of around 0.36$ per hour of audio which is singnificantly more expensive than our solution. But is this comparison fair?
On the one hand the Whisper API uses the large model whilst we are only using the small model. In our tests we found that the small model offers the best compromise between transcription speed and quality. But how expensive would our solution be if we would be using the large model? One hour of audio would take around 15 minutes with the large model which would lead to operating costs of 0.135$ per hour of audio which is still significantly cheaper.
On the other hand our solution is not just simply a Whisper deployment, it offers some added benefits with the speaker diarization and the generation of different transcript files. Therefore we conclude that our solution is not only cheaper, but overall the better offering. For costumers who want the absolute best transcription quality regardless of the higher price we could deploy a premium transcription based on the large model later on.
The Whisper API is up to 5 times more expensive than Ton-Texter whilst offering less!
Ton-Texter vs HappyScribe
With HappyScribe we discovered a comparable existing solution, just like Ton-Texter it offers automated transcription with the option to export the generated transcript in several different formats. Admittedly, HappyScribe has some features that Ton-Texter does not yet offer. For example you can edit and correct the generated transcript manually in the browser application.
But how do they compare in terms of pricing and value?
The following image gives an overview over HappyScribes offerings:
With the basic plan the transcription of one hour of audio is 8.50€. In the best case in terms of price to performance, the business plan, the price for the transcription of one hour of audio is 4.90€. And as this pricing is not even dependent on our actual usage, this ideal value is only achievable if we would use our maximum contingent constantly every month. Out of our perspective, considering our running costs of just around 0.0736$ per hour of audio, these prices seem horrendous to us, which underlines the economic potential of Ton-Texter.
Even in the best case scenario HappyScribe is over 60 times more expensive than Ton-Texter!
Conclusion - is Ton-Texter competitive?
In comparison to the WhisperAPI Ton-Texter is capable to offer a better service at a lower price, which would even be true if we would use the large model later on. In comparison to other complete solutions like HappyScribe Ton-Texter is drastically cheaper.
Of course, one thing we have to take into consideration here is that some baseline costs are kept out of the equations in this article, like EBS storage costs, S3 bucket costs, and our Superbase and Vercel costs that would have to be considered if they would exceed the free tier volumes later on. But from our understanding these costs are almost neglible in comparison to the EC2.
Additionally, if we would want to turn Ton-Texter in a market ready Software as a Service solution, we would have to consider development and marketing costs as well.
Still, especially in comparison to other applications with a comparable offering, Ton-Texter is very competetive to say the least!