Latest news with #MATTO'BRIEN

AI chatbots need more books to learn from. These libraries are opening their stacks

Japan Today

4 days ago

Business
Japan Today

AI chatbots need more books to learn from. These libraries are opening their stacks

By MATT O'BRIEN Everything ever said on the internet was just the start of teaching artificial intelligence about humanity. Tech companies are now tapping into an older repository of knowledge: the library stacks. Nearly one million books published as early as the 15th century — and in 254 languages — are part of a Harvard University collection being released to AI researchers Thursday. Also coming soon are troves of old newspapers and government documents held by Boston's public library. Cracking open the vaults to centuries-old tomes could be a data bonanza for tech companies battling lawsuits from living novelists, visual artistsand others whose creative works have been scooped up without their consent to train AI chatbots. 'It is a prudent decision to start with public domain data because that's less controversial right now than content that's still under copyright,' said Burton Davis, a deputy general counsel at Microsoft. Davis said libraries also hold 'significant amounts of interesting cultural, historical and language data' that's missing from the past few decades of online commentary that AI chatbots have mostly learned from. Supported by 'unrestricted gifts' from Microsoft and ChatGPT maker OpenAI, the Harvard-based Institutional Data Initiative is working with libraries around the world on how to make their historic collections AI-ready in a way that also benefits libraries and the communities they serve. 'We're trying to move some of the power from this current AI moment back to these institutions,' said Aristana Scourtas, who manages research at Harvard Law School's Library Innovation Lab. 'Librarians have always been the stewards of data and the stewards of information.' Harvard's newly released dataset, Institutional Books 1.0, contains more than 394 million scanned pages of paper. One of the earlier works is from the 1400s — a Korean painter's handwritten thoughts about cultivating flowers and trees. The largest concentration of works is from the 19th century, on subjects such as literature, philosophy, law and agriculture, all of it meticulously preserved and organized by generations of librarians. It promises to be a boon for AI developers trying to improve the accuracy and reliability of their systems. 'A lot of the data that's been used in AI training has not come from original sources,' said the data initiative's executive director, Greg Leppert, who is also chief technologist at Harvard's Berkman Klein Center for Internet & Society. This book collection goes "all the way back to the physical copy that was scanned by the institutions that actually collected those items,' he said. Before ChatGPT sparked a commercial AI frenzy, most AI researchers didn't think much about the provenance of the passages of text they pulled from Wikipedia, from social media forums like Reddit and sometimes from deep repositories of pirated books. They just needed lots of what computer scientists call tokens — units of data, each of which can represent a piece of a word. Harvard's new AI training collection has an estimated 242 billion tokens, an amount that's hard for humans to fathom but it's still just a drop of what's being fed into the most advanced AI systems. Facebook parent company Meta, for instance, has said the latest version of its AI large language model was trained on more than 30 trillion tokens pulled from text, images and videos. Meta is also battling a lawsuit from comedian Sarah Silverman and other published authors who accuse the company of stealing their books from 'shadow libraries' of pirated works. Now, with some reservations, the real libraries are standing up. OpenAI, which is also fighting a string of copyright lawsuits, donated $50 million this year to a group of research institutions including Oxford University's 400-year-old Bodleian Library, which is digitizing rare texts and using AI to help transcribe them. When the company first reached out to the Boston Public Library, one of the biggest in the U.S., the library made clear that any information it digitized would be for everyone, said Jessica Chapel, its chief of digital and online services. 'OpenAI had this interest in massive amounts of training data. We have an interest in massive amounts of digital objects. So this is kind of just a case that things are aligning,' Chapel said. Digitization is expensive. It's been painstaking work, for instance, for Boston's library to scan and curate dozens of New England's French-language newspapers that were widely read in the late 19th and early 20th century by Canadian immigrant communities from Quebec. Now that such text is of use as training data, it helps bankroll projects that librarians want to do anyway. 'We've been very clear that, 'Hey, we're a public library,'" Chapel said. 'Our collections are held for public use, and anything we digitized as part of this project will be made public.' Harvard's collection was already digitized starting in 2006 for another tech giant, Google, in its controversial project to create a searchable online library of more than 20 million books. Google spent years beating back legal challenges from authors to its online book library, which included many newer and copyrighted works. It was finally settled in 2016 when the U.S. Supreme Court let stand lower court rulings that rejected copyright infringement claims. Now, for the first time, Google has worked with Harvard to retrieve public domain volumes from Google Books and clear the way for their release to AI developers. Copyright protections in the U.S. typically last for 95 years, and longer for sound recordings. How useful all of this will be for the next generation of AI tools remains to be seen as the data gets shared Thursday on the Hugging Face platform, which hosts datasets and open-source AI models that anyone can download. The book collection is more linguistically diverse than typical AI data sources. Fewer than half the volumes are in English, though European languages still dominate, particularly German, French, Italian, Spanish and Latin. A book collection steeped in 19th century thought could also be 'immensely critical' for the tech industry's efforts to build AI agents that can plan and reason as well as humans, Leppert said. 'At a university, you have a lot of pedagogy around what it means to reason,' Leppert said. 'You have a lot of scientific information about how to run processes and how to run analyses.' At the same time, there's also plenty of outdated data, from debunked scientific and medical theories to racist narratives. 'When you're dealing with such a large data set, there are some tricky issues around harmful content and language," said Kristi Mukk, a coordinator at Harvard's Library Innovation Lab who said the initiative is trying to provide guidance about mitigating the risks of using the data, to 'help them make their own informed decisions and use AI responsibly.' © Copyright 2025 The Associated Press. All rights reserved. This material may not be published, broadcast, rewritten or redistributed without permission.

Meta invests in AI firm Scale and recruits its CEO for 'superintelligence' team

Japan Today

13-06-2025

Business
Japan Today

Meta invests in AI firm Scale and recruits its CEO for 'superintelligence' team

This combo image shows Meta's logo, top, at the company's headquarters in Menlo Park, Calif, and The Constellation Energy building in Baltimore. By MATT O'BRIEN Meta says it is making a large investment in artificial intelligence company Scale and recruiting its CEO Alexandr Wang to join a team developing 'superintelligence' at the tech giant. The move reflects a push by Meta CEO Mark Zuckerberg to revive AI efforts at the parent company of Facebook and Instagram as it faces tough competition from rivals such as Google and OpenAI. Meta announced Thursday what it called a 'strategic partnership and investment' with Scale late Thursday but didn't disclose the financial terms of the deal. Scale said the added investment puts its market value at over $29 billion. Scale said it will remain an independent company but the agreement will 'substantially expand Scale and Meta's commercial relationship.' Meta will hold a minority of Scale's outstanding equity. Wang, though joining Meta, will remain on Scale's board of directors. Replacing him is a new interim Scale CEO Jason Droege, who was previously the company's chief strategy officer and had past executive roles at Uber Eats and Axon. It won't be the first time a big tech company has gobbled up talent and products at innovative AI startups without formally acquiring them. Microsoft hired key staff from startup Inflection AI, including co-founder and CEO Mustafa Suleyman, who now runs Microsoft's AI division. Google pulled in the leaders of AI chatbot company while Amazon made a deal with San Francisco-based Adept that sent its CEO and key employees to the e-commerce giant. Amazon also got a license to Adept's AI systems and datasets. Wang was a 19-year-old student at the Massachusetts Institute of Technology when he and co-founder Lucy Guo started Scale in 2016. They won influential backing that summer from the startup incubator Y Combinator, which was led at the time by Sam Altman, now the CEO of OpenAI. Wang dropped out of MIT, following a trajectory similar to that of Zuckerberg, who quit Harvard University to start Facebook more than a decade earlier. Scale's pitch was to supply the human labor needed to improve AI systems, hiring workers to draw boxes around a pedestrian or a dog in a street photo so that self-driving cars could better predict what's in front of them. General Motors and Toyota have been among Scale's customers. What Scale offered to AI developers was a more tailored version of Amazon's Mechanical Turk, which had long been a go-to service for matching freelance workers with temporary online jobs. More recently, the growing commercialization of AI large language models — the technology behind OpenAI's ChatGPT, Google's Gemini and Meta's Llama — brought a new market for Scale's annotation teams. The company claims to service 'every leading large language model,' including from Anthropic, OpenAI, Meta and Microsoft, by helping to fine tune their training data and test their performance. It's not clear what the Meta deal will mean for Scale's other customers. Wang has also sought to build close relationships with the U.S. government, winning military contracts to supply AI tools to the Pentagon and attending President Donald Trump's inauguration. The head of Trump's science and technology office, Michael Kratsios, was an executive at Scale for the four years between Trump's first and second terms. Meta has also begun providing AI services to the federal government. Meta has taken a different approach to AI than many of its rivals, releasing its flagship Llama system for free as an open-source product that enables people to use and modify some of its key components. Meta says more than a billion people use its AI products each month, but it's also widely seen as lagging behind competitors such as OpenAI and Google in encouraging consumer use of large language models, also known as LLMs. It hasn't yet released its purportedly most advanced model, Llama 4 Behemoth, despite previewing it in April as "one of the smartest LLMs in the world and our most powerful yet.' Meta's chief AI scientist Yann LeCun, who in 2019 was a winner of computer science's top prize for his pioneering AI work, has expressed skepticism about the tech industry's current focus on large language models. 'How do we build AI systems that understand the physical world, that have persistent memory, that can reason and can plan?' LeCun asked at a French tech conference last year. These are all characteristics of intelligent behavior that large language models 'basically cannot do, or they can only do them in a very superficial, approximate way,' LeCun said. Instead, he emphasized Meta's interest in 'tracing a path towards human-level AI systems, or perhaps even superhuman.' When he returned to France's annual VivaTech conference again on Wednesday, LeCun dodged a question about the pending Scale deal but said his AI research team's plan has 'always been to reach human intelligence and go beyond it.' 'It's just that now we have a clearer vision for how to accomplish this,' he said. LeCun co-founded Meta's AI research division more than a decade ago with Rob Fergus, a fellow professor at New York University. Fergus later left for Google but returned to Meta last month after a 5-year absence to run the research lab, replacing longtime director Joelle Pineau. Fergus wrote on LinkedIn last month that Meta's commitment to long-term AI research 'remains unwavering' and described the work as 'building human-level experiences that transform the way we interact with technology.' © Copyright 2025 The Associated Press. All rights reserved. This material may not be published, broadcast, rewritten or redistributed without permission.

Visa wants to give artificial intelligence 'agents' your credit card

Japan Today

03-05-2025

Business
Japan Today

Visa wants to give artificial intelligence 'agents' your credit card

By MATT O'BRIEN Artificial intelligence 'agents' are supposed to be more than chatbots. The tech industry has spent months pitching AI personal assistants that know what you want and can do real work on your behalf. So far, they're not doing much. Visa hopes to change that by giving them your credit card. Set a budget and some preferences and these AI agents — successors to ChatGPT and its chatbot peers — could find and buy you a sweater, weekly groceries or an airplane ticket. 'We think this could be really important,' said Jack Forestell, Visa's chief product and strategy officer, in an interview. 'Transformational, on the order of magnitude of the advent of e-commerce itself.' Visa announced it is partnering with a group of leading AI chatbot developers — among them U.S. companies Anthropic, Microsoft, OpenAI and Perplexity, and France's Mistral — to connect their AI systems to Visa's payments network. Visa is also working with IBM, online payment company Stripe and phone-maker Samsung on the initiative. Pilot projects begin Wednesday, ahead of more widespread usage expected next year. The San Francisco payment processing company is betting that what seems futuristic now could become a convenient alternative to our most mundane shopping tasks in the near future. It has spent the past six months working with AI developers to address technical obstacles that must be overcome before the average consumer is going to use it. For emerging AI companies, Visa's backing could also boost their chances of competing with tech giants Amazon and Google, which dominate digital commerce and are developing their own AI agents. The tech industry is already full of demonstrations of the capabilities of what it calls agentic AI, though few are yet found in the real world. Most are still refashioned versions of large language models — the generative AI technology behind chatbots that can write emails, summarize documents or help people code. Trained on huge troves of data, they can scour the internet and bring back recommendations for things to buy, but they have a harder time going beyond that. 'The early incarnations of agent-based commerce are starting to do a really good job on the shopping and discovery dimension of the problem, but they are having tremendous trouble on payments,' Forestell said. 'You get to this point where the agents literally just turn it back around and say, 'OK, you go buy it.' Visa sees itself as having a key role in giving AI agents easier and trusted access to the cash they need to make purchases. 'The payments problem is not something the AI platforms can solve by themselves," Forestell said. 'That's why we started working with them.' The new AI initiative comes nearly a year after Visa revealed major changes to how credit and debit cards will operate in the U.S., making physical cards and their 16-digit numbers increasingly irrelevant. Many consumers are already getting used to digital payment systems such as Apply Pay that turn their phones into a credit card. A similar process of vetting someone's digital credentials would authorize AI agents to work on a customer's behalf, in a way Forestell says must assure buyers, banks and merchants that the transactions are legitimate and that Visa will handle disputes. Forestell said that doesn't mean AI agents will take over the entire shopping experience, but it might be useful for errands that either bore some people — like groceries, home improvement items or even Christmas lists — or are too complicated, like travel bookings. In those situations, some people might want an agent that 'just powers through it and automatically goes and does stuff for us,' Forestell said. Other shopping experiences, such as for luxury goods, are a form of entertainment and many customers still want to immerse themselves in the choices and comparisons, Forestell said. In that case, he envisions AI agents still offering assistance but staying in the background. And what about credit card debt? The credit card balances of American consumers hit $1.21 trillion at the end of last year, according to the Federal Reserve of New York. Forestell says consumers will give their AI agents clear spending limits and conditions that should give them confidence that the human is still in control. At first, the AI agents are likely to come back to buyers to make sure they are OK with a specific airplane ticket. Over time, those agents might get more autonomy to 'go spend up to $1,500 on any airline to get me from A to B," he said. Part of what is attracting some AI developers to the Visa partnership is that, with a customer's consent, an AI agent can also tap into a lot of data about past credit card purchases. 'Visa has the ability for a user to consent to share streams of their transaction history with us,' said Dmitry Shevelenko, Perplexity's chief business officer. 'When we generate a recommendation -- say you're asking, 'What are the best laptops?' — we would know what are other transactions you've made and the revealed preferences from that.' Perplexity's chatbot can already book hotels and make other purchases, but it's still in the early stages of AI commerce, Shevelenko says. The San Francisco startup has also, along with ChatGPT maker OpenAI, told a federal court it would consider buying Google's internet browser, Chrome, if the U.S. forces a breakup of the tech giant in a pending antitrust case. © Copyright 2025 The Associated Press. All rights reserved. This material may not be published, broadcast, rewritten or redistributed without permission.

Latest news with #MATTO'BRIEN

AI chatbots need more books to learn from. These libraries are opening their stacks

Meta invests in AI firm Scale and recruits its CEO for 'superintelligence' team

Visa wants to give artificial intelligence 'agents' your credit card

Get Started Now: Download the App