Joho the Blog » Skype and Tellme: Voice recognition for Skype
EverydayChaos
Everyday Chaos
Too Big to Know
Too Big to Know
Cluetrain 10th Anniversary edition
Cluetrain 10th Anniversary
Everything Is Miscellaneous
Everything Is Miscellaneous
Small Pieces cover
Small Pieces Loosely Joined
Cluetrain cover
Cluetrain Manifesto
My face
Speaker info
Who am I? (Blog Disclosure Form) Copy this link as RSS address Atom Feed

Skype and Tellme: Voice recognition for Skype

Tellme provides speech-recognition services to large apps like AT&T’s 411 service. While riding down to a conference with Don Jackson, Tellme’s VP of Advanced Telephony, yesterday, I asked him about today’s joint Skype-Tellme announcement. (Of course, the current rumor that eBay is buying Skype — presumably for its customer base since it could get VOIP technology for $1.50 at this point — is over-shadowing the Skype-Tellme announcement.)

Q: What’s the deal?

A: Today we’re announcing a deal with Skype where independent software developers will be able to publish their speech applications onto the Skype network, either free to end users or for pay.

Q: The free calls will be free to whom?

A: Free to the caller. The developers will pay for the free-to-the-caller calls themselves. In the for-pay apps, the developer will split the revenues with Skype and Tellme.

Q: So, if a developer wants to write an app that uses voice recognition, she can use your stuff…

A: Tellme Studio, is a free developer resource for developing Voice XML applications. It’s been up and running for about 5 years. With the Skype deal, you can call it from Skype now (callto:tellme-studio , Skype ID: tellme-studio or 800 555 VXML via a regular phone). It’s a version of the Tellme platform where developers can get an account and have their applications executed over the telephone. They get debugging information, syntax checks, grammar-checkers, and so on. The typical use scenario: You’re a developer in front of your computer, your using your favorite XML editor to create your voice XML app, you have your browser open to Studio Tellme, and you’re picking up the telephone to actually test your application. Now, with the Skype deal, instead of picking up your phone, you can click the button on your Skype softphone.

Q: If I’ve developed an app using your service, my users now can communicate with my application via voice over Skype. How does that work?

A: When your app is deployed on the Skype network, it will be assigned what looks like a phone number, although it doesn’t correspond to a phone number in the real world — They’re taking a country code; they’re basically making up a Skype country. This announcement is leveraging the Skype-Out mechanism [that lets users pay Skype to make phone calls to regular old phones]. The Skype-Out gateways send that phone call over to Tellme servers. If you’re a developer who’s elected Tellme to be your voice app, you’ll have configured with us what URL to fetch to get your voice application. We’ll fetch that application from any web server on the Internet, and it will beging to render the voice XML application to the caller.

Q: So, as a developer, it’s relatively easy for me to integrate voice recognition.

A: Yes.

Q: What sorts of apps do you envision being the most obvious to arise?

A: Part of the motivation of this program is that Tellme and Skype do not have to create the applications that get deployed on the network. We want third party developers to come up with their own ideas. I’m a great believer in Bill Joy’s law that not all smart people work at your company. We’re building an open platform…

Q: But which apps do you think are the most obvious?

A: Voice XML is very well suited to creating speech and telephone interfaces to any information on the web, so information access via speech is the most obvious. We’ve had a number of entertainment-oriented applications developed. We’ve had an application called Graffiti, a non-real-time chat room; when you go into the chat room, you hear the last 20 msgs that were left, and you can add your own to the queue. We deployed that on the Tellme network back in 2000 and it was so popular that it swamped our systems and we had to turn it off. We’ll see applications like that generated.

Q; What data does voice XML encode?

A: A voice XML app is XML data that describes the user interaction state diagram of your application. Typically, the first state is a greeting and launching pad. That part of the application will list the WAV files to play to the caller. Then there will be a reference to a speech recognition grammar that describes what the user can say and what the application recognizes.

Q: Currently, either the user or the developer pays for your service. Tellme has actual costs per call?

A: Yes.

Q: There are applications imaginable where even small costs might get in the way of a socially desirable use. For example, developing world cellphone-to-Internet, or emergency response systems such as were developed rapidly on the Internet. Any chance Tellme would consider making its service free all the way through in some instances?

A: Like every American, we’re very concerned about the Katrina crisis and we’ve be very interested in finding ways to use our platform and technology to aid its victims. Tellme runs the toll-free directory assistance application for AT&T, 1800.555.1212, and in the early hours of the crisis , we were able to make very rapid changes to help people get emergency numbers for FEMA and other resources. Because all of this is based on Internet technology, we were able to make those changes in a very small number of hours and get people access they needed as fast as possible. So, I’d encourage people who have proposals for aid-related speech applications to contact me and we’ll see if we can make that happen.

Q: Tell me about DialTone 2.0.

A: Tellme was created with the mission to revolutionize how people interact with the telephone. We think telephone interactions should be far more powerful and more personalized. For example when you go off hook [i.e., pick up the phone], why do you get that dumb dialtone sound? Tellme has this vision called DialTone 2.0. When you go off hook it’s “Hi, Don. You have three new voice messages. Who do you want to call?” The dial tone becomes your gateway to voice services that you’re accessing over the telephone.

Q: Any takers?

A: A number of carriers are excited about the vision. The business model is still getting worked out. The upstarts are a lot more excited about not creating another me-too phone service. So, hopefully over the next couple of years we’ll see an initial deployment of this vision. [Technorati tags: ]

Previous: « || Next: »

9 Responses to “Skype and Tellme: Voice recognition for Skype”

  1. Dave, can I ask you how you came to the conclusion that the Skype press release was a joint press release between TellMe and Skype? There are 3 companies in the press release and I noticed Skype only quoted your friend yet the partnership is between 3 platform providers [Tellme, Voxeo & Map Telecom]. The two U.S. based companies are TellMe and Voxeo and Map Telecom is the European provider.

  2. I took a 2 hr drive with Don from Tellme, not someone from Voxeo. So, I opportunistically interviewed Don. I do not (here) pretend to be a journalist “covering” an event.

  3. Interesting interview David. Thanks.

    See if I got this right. 1) If I develop and deploy a free application, I will be paying a per minute charge to Tell me. 2) If I develop and deploy a for-fee application, any revenue is split among me, Tellme and Skype. Will I still be paying a per-minute fee to Tellme?

    Do you know if my application will be able to call a skype user at their skype client?

  4. I’m pretty excited by this development. There is great potential bringing the Skype user base together with speech applications.

    Here is what I am wondering though…. does this put the Skype client one step closer to competing with the web browser? Or are these services going to be aimed more at the wireless client now beginning to roll out?

    From another perspective, a friend of mine recently asked me… “why would I want to check movie listings by clicking on TellMe Info in Gizmo if I could just fire up a browser to get that same info in 2-3 clicks rather than walking through a series of questions? Let alone pay for it.”

    If we are speech enabling the web, right beside the web, where is the value?

  5. cm, I believe you have it right. It’s either like an 800 number (the recipient pays) or a 900 number (the caller pays). That’s why I asked Don about whether there are circumstances in which they’d provide the service free to everyone. (It’s also why I asked if they have per-call costs; they do.)

  6. This is a reply to jc. If the internet can be driven by voice easily, then you can lookup a movie listing while putting your shoes on. Plus, as we use computers more, more people develop RSl and are happy to ask questions rather than type!

  7. This is a reply to jc. This generation voice interfaces are the equivalent of command line PC interfaces. The next (or next) generation should incorporate real natural language and should be GUI equivalents (new word required maybe – VUI) for certain data centric solutions. When will this be? Ages. Though if anyone has evidence to the contrary I would be glad to listen. I’ll even give them a stage – build it into my open database-via-voice development effort at Lucidium!

    http://lucidium.blogspot.com/

  8. This has me thinking about skype turning voice calls into IM text or IM text in voice calls.
    I don’t always have my skype phone with me so this might be a cool development.

  9. we already had internet and we are all using skype for voice communication, why should we use voice to visit Internet?

Leave a Reply

Comments (RSS).  RSS icon