logo
#

Latest news with #TollBit

How this startup is helping publishers profit from AI scraping
How this startup is helping publishers profit from AI scraping

Yahoo

time15 hours ago

  • Business
  • Yahoo

How this startup is helping publishers profit from AI scraping

"Artificial intelligence (AI) scraping" describes the practice of using AI to extract data from websites, oftentimes without the publishers' permission. AI scraping has been a key discussion amid the AI revolution, as the practice raises questions about ethical usage of the technology. Reddit (RDDT) is suing Anthropic ( alleging the AI company used its content without authorization. TollBit CEO and co-founder Toshit Panigrahi joins Catalysts to discuss how TollBit helps publishers protect their content from AI scraping. To watch more expert insights and analysis on the latest market action, check out more Catalysts here. This month, Reddit sued AI giant Anthropic, claiming that the OpenAI rival had accessed its platform of more than 100,000 times it accessed the platform since July of 2022 after Anthropic allegedly said it had blocked its bots from doing so. We sat down with the COO of Reddit to discuss the suit. What's important to us is that um That we are able to protect our users privacy, their deletion rights, like we have policies um that ensure that, you know, when users take down a post like the post is taken down. And so it's really important and as we said in our terms of service that, you know, we have a conversation with folks who have access to our data because that's a commitment that we have in terms of our policies. Our next guest has crunched the numbers on the size and scope of AI scraping by bots and helps publishers profit from the scraping. Joining us now we've got Tosit Panigrahi, who is the Tolbit CEO and co-founder. Tolbit is a New York-based startup that helps news publishers monitor and make money when AI companies scrape their content. Uh, by acting almost like a toll booth of the internet, did we get that right, Tosa? Yes, that's correct. Thanks for having me. Absolutely. So take us into your business and and what you're seeing more broadly here, especially as we're knowing and knowledge knowledgeable of how much scraping these AI engines need to do in order to get either knowledge sets that can then be used for generative AI efforts and they reach to basically every source part of the web that they can essentially get their hands on for free. Absolutely. So Tobit is a platform that helps, uh, websites of all sizes monitor, manage, and monetize their AI bot traffic, which essentially means we give them tools to to get an idea of how rampant the scraping might be on their site. We give them tools to block it and enforce content access rights, and then we give them, uh, I think a real innovation is our bot paywall, a tool that allows these AI bots to come in and actually pay for sanctioned access to that content. Um, and, and data, right? And I think one of the things that we're seeing, right, especially in the last quarter is the demand for not for uh uh content for training, content for retrieval at inference time when people ask the question, the bots have to go out and read and answer your question. Is AI training as it stands right now ethical from the standard practices that users expect when they're on the internet? I think this is a question that's bigger than all of us. I think we're definitely looking to some of the course to set, to decide and set some precedent as to whether or not training is fair use, but I think, uh, us as a business, right? And I think where some of the conversation should be going should be around, you know, these bots who are, who have to go out when you and I ask a question. Finding these platforms to go read that content, right? They don't know what happened to it. They don't know what the price of the ticket was if you wanted to go to France, right? They actually have to go out and access those sites to get that information. That will be a far bigger use case than, than, uh, just training as these tools continue to evolve. And so what, what is the revenue model like? How does the business make money? So, essentially, what with the technology that we built, we have built a gateway that any AI application, agent bot can come in through and actually pay uh through the form of micropayments for access to content and data, right? So it could be anything from what happened today, right? You want, you want to read um what happened on the news today to, I want to know, you know, what the price of the hotel is today in New York, and I want you to go, uh, book. It for me, right? And so we are able to uh let the website set those rules, set the the access protocols for what that content data should cost, and then we take a transaction fee on top of that for enabling this faster, cleaner, licensed access to the content. So it's a really fascinating business and I'm sure there's a very large total addressable market that just continues to grow at this juncture. Thanks so much for bringing this down. We appreciate it. Thank you.

‘This is coming for everyone': A new kind of AI bot takes over the web
‘This is coming for everyone': A new kind of AI bot takes over the web

Yahoo

time4 days ago

  • Business
  • Yahoo

‘This is coming for everyone': A new kind of AI bot takes over the web

People are replacing Google search with artificial intelligence tools like ChatGPT, a major shift that has unleashed a new kind of bot loose on the web. To offer users a tidy AI summary instead of Google's '10 blue links,' companies such as OpenAI and Anthropic have started sending out bots to retrieve and recap content in real time. They are scraping webpages and loading relevant content into the AI's memory and 'reading' far more content than a human ever would. Subscribe to The Post Most newsletter for the most important and interesting stories from The Washington Post. According to data shared exclusively with The Washington Post, traffic from retrieval bots grew 49 percent in the first quarter of 2025 from the fourth quarter of 2024. The data is from TollBit, a New York-based start-up that helps news publishers monitor and make money when AI companies use their content. TollBit's report, based on data from 266 websites - half of which are run by national and local news organizations - suggests that the growth of bots that retrieve information when a user prompts an AI model is on an exponential curve. 'It starts with publishers, but this is coming for everyone,' Toshit Panigrahi, CEO and co-founder of TollBit, said in an interview. Panigrahi said that this kind of bot traffic, which can be hard for websites to detect, reflects growing demand for content, even as AI tools devastate traffic to news sites and other online platforms. 'Human eyeballs to your site decreased. But the net amount of content access, we believe, fundamentally is going to explode,' he said. A spokesperson for OpenAI said that referral traffic to publishers from ChatGPT searches may be lower in quantity but that it reflects a stronger user intent compared with casual web browsing. To capitalize on this shift, websites will need to reorient themselves to AI visitors rather than human ones, Panigrahi said. But he also acknowledged that squeezing payment for content when AI companies argue that scraping online data is fair use will be an uphill climb, especially as leading players make their newest AI visitors even harder to identify. Debate around the AI industry's use of online content has centered on the gargantuan amounts of text needed to train the AI models that power tools like ChatGPT. To obtain that data, tech companies use bots that scrape the open web for free, which has led to a raft of lawsuits alleging copyright theft from book authors and media companies, including a New York Times lawsuit against OpenAI. Other news publishers have opted for licensing deals. (In April, The Washington Post inked a deal with OpenAI.) In the past eight months, as chatbots have evolved to incorporate features like web search and 'reasoning' to answer more complex queries, traffic for retrieval bots has skyrocketed. It grew 2.5 times as fast as traffic for bots that scrape data for training between the fourth quarter of 2024 and the first quarter of 2025, according to TollBit's report. Panigrahi said TollBit's data may underestimate the magnitude of this change because it doesn't reflect bots that AI companies send out on behalf of AI 'agents' that can complete tasks on a user's behalf, like ordering takeout from DoorDash. The start-up's findings also add a new dimension to mounting evidence that the modern internet - optimized for Google search results and social media algorithms - will have to be restructured as the popularity of AI answers grows. 'To think of it as, 'Well, I'm optimizing my search for humans' is missing out on a big opportunity,' he said. Installing TollBit's analytics platform is free for news publishers, and the company has more than 2,000 clients, many of which are struggling with these seismic changes, according to data in the report. Although news publishers and other websites can implement blockers to prevent various AI bots from scraping their content, TollBit found that more than 26 million AI scrapes bypassed those blockers in March alone. Some AI companies claim bots for AI agents don't need to follow bot instructions because they are acting on behalf of a user. Mark Howard, chief operating officer for the media company Time, a TollBit client, said the start-up's traffic data has helped Time negotiate content licensing deals with AI companies including OpenAI and the search engine Perplexity. But the market to fairly compensate publishers is far from established, Howard said. 'The vast majority of the AI bots out there absolutely are not sourcing the content through any kind of paid mechanism. … There is a very, very long way to go.' Related Content Field notes from the end of life: My thoughts on living while dying He's dying. She's pregnant. His one last wish is to fight his cancer long enough to see his baby. The U.S. granted these journalists asylum. Then it fired them.

‘This is coming for everyone': A new kind of AI bot takes over the web
‘This is coming for everyone': A new kind of AI bot takes over the web

Yahoo

time6 days ago

  • Business
  • Yahoo

‘This is coming for everyone': A new kind of AI bot takes over the web

People are replacing Google search with artificial intelligence tools like ChatGPT, a major shift that has unleashed a new kind of bot loose on the web. To offer users a tidy AI summary instead of Google's '10 blue links,' companies such as OpenAI and Anthropic have started sending out bots to retrieve and recap content in real time. They are scraping webpages and loading relevant content into the AI's memory and 'reading' far more content than a human ever would. Subscribe to The Post Most newsletter for the most important and interesting stories from The Washington Post. According to data shared exclusively with The Washington Post, traffic from retrieval bots grew 49 percent in the first quarter of 2025 from the fourth quarter of 2024. The data is from TollBit, a New York-based start-up that helps news publishers monitor and make money when AI companies use their content. TollBit's report, based on data from 266 websites - half of which are run by national and local news organizations - suggests that the growth of bots that retrieve information when a user prompts an AI model is on an exponential curve. 'It starts with publishers, but this is coming for everyone,' Toshit Panigrahi, CEO and co-founder of TollBit, said in an interview. Panigrahi said that this kind of bot traffic, which can be hard for websites to detect, reflects growing demand for content, even as AI tools devastate traffic to news sites and other online platforms. 'Human eyeballs to your site decreased. But the net amount of content access, we believe, fundamentally is going to explode,' he said. A spokesperson for OpenAI said that referral traffic to publishers from ChatGPT searches may be lower in quantity but that it reflects a stronger user intent compared with casual web browsing. To capitalize on this shift, websites will need to reorient themselves to AI visitors rather than human ones, Panigrahi said. But he also acknowledged that squeezing payment for content when AI companies argue that scraping online data is fair use will be an uphill climb, especially as leading players make their newest AI visitors even harder to identify. Debate around the AI industry's use of online content has centered on the gargantuan amounts of text needed to train the AI models that power tools like ChatGPT. To obtain that data, tech companies use bots that scrape the open web for free, which has led to a raft of lawsuits alleging copyright theft from book authors and media companies, including a New York Times lawsuit against OpenAI. Other news publishers have opted for licensing deals. (In April, The Washington Post inked a deal with OpenAI.) In the past eight months, as chatbots have evolved to incorporate features like web search and 'reasoning' to answer more complex queries, traffic for retrieval bots has skyrocketed. It grew 2.5 times as fast as traffic for bots that scrape data for training between the fourth quarter of 2024 and the first quarter of 2025, according to TollBit's report. Panigrahi said TollBit's data may underestimate the magnitude of this change because it doesn't reflect bots that AI companies send out on behalf of AI 'agents' that can complete tasks on a user's behalf, like ordering takeout from DoorDash. The start-up's findings also add a new dimension to mounting evidence that the modern internet - optimized for Google search results and social media algorithms - will have to be restructured as the popularity of AI answers grows. 'To think of it as, 'Well, I'm optimizing my search for humans' is missing out on a big opportunity,' he said. Installing TollBit's analytics platform is free for news publishers, and the company has more than 2,000 clients, many of which are struggling with these seismic changes, according to data in the report. Although news publishers and other websites can implement blockers to prevent various AI bots from scraping their content, TollBit found that more than 26 million AI scrapes bypassed those blockers in March alone. Some AI companies claim bots for AI agents don't need to follow bot instructions because they are acting on behalf of a user. Mark Howard, chief operating officer for the media company Time, a TollBit client, said the start-up's traffic data has helped Time negotiate content licensing deals with AI companies including OpenAI and the search engine Perplexity. But the market to fairly compensate publishers is far from established, Howard said. 'The vast majority of the AI bots out there absolutely are not sourcing the content through any kind of paid mechanism. … There is a very, very long way to go.' Related Content He's dying. She's pregnant. His one last wish is to fight his cancer long enough to see his baby. The U.S. granted these journalists asylum. Then it fired them. 'Enough is enough.' Why Los Angeles is still protesting, despite fear.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store