Apple Inc.'s long-awaited jump into the generative AI frenzy dominated headlines in June. We also saw a raft of announcements from players expanding beyond their core disciplines into new modalities and extending support to make generative AI more practical, safe and accessible for everyday use. This includes a significant acquisition from OpenAI LLC and a new venture from OpenAI alumni Ilya Sutskever, pursuing AI safety.
In the generative AI realm, we've noticed distinct phases marked by innovations and emphasis on specific modalities. With ChatGPT came an explosion of text-based startups and competing models. This was closely followed by image generation, with new offerings riding the wave of the public beta of OpenAI's DALL-E 2 and Midjourney. There has been significant focus on video generation in the past few months. In the last few weeks, a number of upgrades to audio generation tools — a logical accompaniment — occurred, with announcements this month from Eleven Labs Inc., Google LLC's DeepMind and Stability Ai Ltd.
Product releases and updates
Apple has finally joined the GenAI fray with the announcement of Apple Intelligence, a suite of features being brought to the iPhone, iPad and Mac, and a partnership with OpenAI. Apple will integrate OpenAI's ChatGPT technology into products such as Siri and Messages, enabling capabilities like natural language understanding, understanding context and taking actions in other applications.
Apple Intelligence reportedly focuses on user privacy, keeping most requests local to the device and allowing users to control what information ChatGPT accesses. It will be available for free on the latest iPhones, iPads and Macs. Although beginning with OpenAI, Apple plans to add support for other models down the line. Reportedly, no cash is being exchanged in either direction for the use of OpenAI's models. Rather, the exposure to Apple's hundreds of millions of users and the opportunity to convert users to its paid product ChatGPT Plus is attractive enough to OpenAI. Apple, meanwhile, would likely take a cut of these conversions.
Anthropic PBC made available a new model, Claude 3.5 Sonnet, the first of what it terms its 3.5 family. On the scale of size and performance, Sonnet was the middle point in the Claude 3 family, between the smaller Haiku and the larger Opus. The company suggests that with its 3.5 upgrade, Sonnet outperforms Claude 3 Opus across popular benchmarks, with major improvements in visual reasoning and the model's ability to address complex tasks. The company announced that Haiku and Opus upgraded 3.5 models will be released later this year.
Ilya Sutskever, prominent AI researcher and co-founder of OpenAI, has launched a new research startup called Safe Superintelligence (SSI) alongside co-founders Daniel Gross and Daniel Levy (also from OpenAI). Sutskever had been a vocal advocate for AI safety at his former company, causing public tension with CEO Sam Altman, and ultimately leading to his departure and that of several concerned colleagues in May. SSI will focus solely on creating safe and ethical artificial superintelligence (ASI) free from the distractions of fundraising, competing or commercialization. Sutskever has not disclosed the company's funding or financial backers.
Rich media generator Runway AI Inc. introduced its Gen-3 Alpha model trained on video and image. The model, the first of a new generation promised by Runway, is designed to represent a diverse range of styles. The announcement drew attention to the ability for users to gain more control over scene transitions and framing, as well as have more realistic human characters depicted in their scenes.
Mistral AI SAS launched Codestral, its first model focused on code generation. It is a 22-billion parameter model with open weights (meaning the pre-trained parameters of a large language model being made freely available), although not open source. It scores well against its code-generating competitors, beating CodeLlama 70B and two other models in the RepoBench code-generation test. Codestral is issued under the Mistral AI Non-Production License, which permits use for training and research purposes but not commercial use. Codestral was trained on a dataset of more than 80 programming languages and is available via Hugging Face and Mistral's own La Plateforme.
Fastly Inc. showcased its AI accelerator, a semantic caching capability designed to reduce the number of requests, which have to be made to models by caching semantic meaning. This can reduce costs and address latency challenges. The content delivery network provider has decided to initially support OpenAI's text models but with the intent to extend the capability to support a wider range of generative AI models.
Eleven Labs announced the general availability of its text-to-sound effect model. The audio generator, best known for its AI voice technologies, suggests its audio model can generate character voices, background effects and short instrumental tracks. The model is trained on audio files licensed from Shutterstock's audio library.
Audio generator Udio announced an audio input option, allowing users to better guide the tone and tempo of music generation. This extends input beyond just text prompts. The company also recently announced the ability to generate up to two minutes of audio at a time and suggests that these audio snippets can be expanded to produce songs of up to 15 minutes in length.
Google DeepMind showcased GenAI developments with a new video-to-audio technology, which, using a combination of video and text prompts, can provide synchronized audio. Video clips can be augmented with generated sound effects, dialog or soundtracks.
Chinese startup DeepSeek announced a new open-source code generation model, DeepSeek Coder V2. The mixture-of-experts model is alleged by the startup to outperform popular closed-source models across a range of programming benchmarks.
Stability AI released an open-source sound generator Stability Audio Open. This open-source alternative to Stability AI's closed-source Stable Audio 2.0 model is limited to shorter generations, 47 seconds against 3 minutes, and can only receive text prompts. Stable Audio 2.0, announced in April, can also support audio-to-audio generation natively.
Snap Inc. introduced a prototype of its new on-device image generation model SnapFusion, designed for users to generate augmented reality experiences in real time. A big focus in the model's development has been speed. The company claims that its approach shortens the model runtime from text input to image generation on mobile to under two seconds. It plans to bring it to creators by the end of the year. It also released a new GenAI Suite in its augmented reality authoring tool Lens Studio, which gives AR creators generative AI tools, such as the ability to render 3D assets and create characters from a text prompt.
Funding and M&A
OpenAI has acquired search and analytics indexing database Rockset for an undisclosed amount to support its growing enterprise base. Rockset's product set, once integrated, will power a retrieval infrastructure enabling businesses to ground generative AI models in their enterprise data. This is OpenAI's second disclosed acquisition, following that of Global Illumination in August 2023. The target has raised a total $105 million in funding since its founding in 2016.
Elon Musk's X.AI Corp. announced a $6 billion series B funding round, bringing its valuation to $24 billion and making it the second most valuable AI startup behind OpenAI. The round included participation from investors Valor Equity Partners, Vy Capital, Andreessen Horowitz LLC, Sequoia Capital Operations LLC, Fidelity Management & Research Co. LLC, Prince Alwaleed Bin Talal and Kingdom Holding. The company announced that the funding will support ongoing research, an infrastructure build-out, and bringing products to market. Relatedly, Musk dropped his lawsuit against OpenAI a few days later.
Mistral raised an additional €600 million in a series B round, bringing its total valuation to €6 billion. Former investor General Catalyst led the round, which will be a mix of equity (€468 million) and debt (€132 million). Part of the company's differentiation is its capital (and model) efficiency — it claims to be able to train performant models at a fraction of the cost of its rivals. Even so, the 1-year-old company has had no problems raising cash. The announcement comes six months after a €385 million series A in December.
A third Chinese state-backed investment fund for semiconductors was established in late May. The third stage of the China Integrated Circuit Industry Investment Fund has raised 344 billion yuan (about $47 billion) in the largest phase so far. The first and second phases raised 139 billion yuan (in 2014) and 200 billion yuan (in 2019), respectively, with state-owned banks contributing for the first time.
Other rounds of interest are detailed in the following table.
Politics, regulations
The G7 communique, a publication that followed the summit hosted in Savelletri, Italy, in June, made references to AI. Among the most striking was the suggestion that the leaders would "launch an action plan on the use of AI in the world of work, and develop a brand to support the implementation of the International Code of Conduct for Organizations Developing Advanced AI Systems." These voluntary guidelines were announced after the Hiroshima Summit and released last October.
The EU opened its AI Office after the EU AI Act was passed. The setup of the AI Office was announced in March, with the core mission to oversee the AI Act's implementation, as well as safety evaluations of the largest "general purpose" models. The office, designed with 140 staff members in mind, is still hiring for 80 roles.
The European High-Performance Computing initiative is in the process of expanding its objectives to include establishing so-called AI factories. These factories refer to AI infrastructure capacity, which can be made available to European startups or SMBs wanting to train models.
In other OpenAI media company deals, Vox Media LLC said it would allow OpenAI to train on its content, while The Atlantic said it would allow ChatGPT to present its articles to its users.
British political party Labour, strong favorite to win the upcoming election, included AI regulation considerations in its manifesto. The party suggested it would introduce "binding regulation on the handful of companies developing the most powerful AI models." This could represent a departure from the low-touch AI regulation favored by the governing Conservative party.
The Future of Privacy Forum (FPF) Center for Artificial Intelligence released an updated set of resources on generative AI policy considerations. Advice includes reviewing contractual terms with third-party vendors investing in generative AI, and explicitly listing which generative AI tools are acceptable for staff to use for tasks. Businesses are advised to establish policies where employees disclose the use of generative AI tools for content.
The Continental African AI Summit, instituted as part of the Continental Artificial Intelligence Strategy, has been endorsed by African ministers. The Continental Artificial Intelligence Strategy is designed to set a roadmap that can help African economies take advantage of AI developments while also ensuring that the models available in Africa meaningfully reflect the culture and languages of the continent.
This article was published by S&P Global Market Intelligence and not by S&P Global Ratings, which is a separately managed division of S&P Global.
451 Research is a technology research group within S&P Global Market Intelligence. For more about the group, please refer to the 451 Research overview and contact page.