[AINews] 1 TRILLION token context, real time, on device? • ButtondownTwitterTwitter

buttondown.email

Updated on May 29 2024


Low Latency Voice Model by Cartesia

The article discusses the launch of a low latency voice model by Cartesia, which outperforms its Transformer equivalent. The model demonstrates promising performance with lower perplexity, word error rate, and higher quality. The article also mentions the potential impact of state space models (SSMs) that can continuously process and reason over a trillion token context window of text, audio, and video on-device. The author highlights the ambition and possibilities of this technology, posing a question to readers about what they would do differently with access to such advanced models.

AI Discord Recap

A summary of Summaries of Summaries with highlights on AI Model Development, AI Safety & Ethics, AI Tools & Applications, AI Hardware, AI Drama & Controversy, and Memes & Humor shared across various channels. Highlights include updates on Gemini 1.5 Pro/Advanced models and Codestral 22B, discussions on fine-tuning, prompt engineering, and model optimization, contributions from OpenAI Collective and LlamaIndex, challenges in model deployment and infrastructure, and reactions to revelations in the AI community. Memes, humor, innovative remarks, and debates on AI competitiveness and technology investments were also part of the Discord interactions.

Interconnects (Nathan Lambert)

Codestral Enters the Coding Arena:

  • Codestral, a new 22B model from Mistral fluent in over 80 programming languages, has launched and is accessible on HuggingFace during an 8-week beta period. Meanwhile, Scale AI's introduction of a private data-based LLM leaderboard has sparked discussions about potential biases in model evaluation due to the company's revenue model and its reliance on consistent crowd workers.

Price Hike Halts Cheers for Gemini 1.5 Flash:

  • A sudden price bump for Google's Gemini 1.5 Flash's output—from $0.53/1M to $1.05/1M—right after its lauded release stirred debate over the API's stability and trustworthiness.

Awkward Boardroom Tango at OpenAI:

  • The OpenAI board was caught off-guard learning about ChatGPT’s launch on Twitter, according to revelations from ex-board member Helen Toner. This incident illuminated broader issues of transparency at OpenAI, which were compounded by a lack of explicit reasoning behind Sam Altman’s firing, with the board citing "not consistently candid communications."

Toner's Tattle and OpenAI's Opacity Dominate Discussions:

  • Toner's allegations of frequent dishonesty under Sam Altman have thrown OpenAI's transparency into question, especially around the circumstances of Sam Altman's departure and the lack of transparency in OpenAI's recent decisions.

Stability.ai (Stable Diffusion) Discord

Engineers are recommending the use of Kaggle or Colab for faster image generation with Stable Diffusion. Technical enthusiasts are discussing training Stable Diffusion XL LoRA models and emphasizing concise trigger words for effective training. Community members are troubleshooting ComfyUI configuration and discussing ADetailer integration in the local Stable Diffusion API. Chatter surrounds the HUG and Stability AI partnership offering a creative AI course. Conversations have shifted to AI's capability in creating 3D models suitable for printing, highlighting the unfulfilled potential of current AI in this area.

AI Discord Channels and HuggingFace Discussions

LLM Perf Enthusiasts AI Discord Channels:

  • Various AI Discord channels discussed are quiet with no new messages. If these channels remain inactive, users are encouraged to request their removal.

Perplexity AI General Channel:

  • Members engaged in discussions on web content scraping techniques, Perplexity's API response issues, developing Perplexity-like tools, comparing platform features, technical deep-diving in Go programming, and sharing related links.

Perplexity AI Sharing Channel:

  • Users shared AI-generated thoughts and engaged in discussions on model aliases.

HuggingFace General Channel:

  • Discussions covered technical issues, XP levels, bot functionality, alternative hardware for AI training, fine-tuning queries, and resource sharing for learning AI/ML, with shared links related to the discussions.

HuggingFace Today I'm Learning Channel:

  • Users inquired about channel access, and discussions included recommendations for NLP courses and API exploration.

HuggingFace Cool Finds Channel:

  • Various discussions ranged from monitoring inflation trends with an Nowcasting tool, fine-tuning GPT-2 for sentiment analysis, exploring Superoptimization and Quantum state prediction research papers, and discussing GNNs for state embeddings.

Unsloth AI (Daniel Han) - Community Discussions

The Unsloth AI (Daniel Han) community discussions cover a range of topics including collaborating with Lighting AI Studio, fine-tuning Llama3 chatbots, deploying models with Runpods and Docker, and addressing technical errors and CUDA version issues. Members are encouraged to contribute and engage in conversations related to local inference, pretraining methods, and model deployments.

LLM Finetuning (Hamel + Dan) - berryman_prompt_workshop

Highly recommend John Berryman's book:

John Berryman's Prompt Engineering book on O'Reilly promises to be a comprehensive guide for developers, solidifying LLM principles and prompt engineering techniques useful for practical applications. Discover it here.

Exploring Prompt Engineering tools and frameworks:

Members shared numerous resources including links to Hamel's notes, GoEx and reflection agent techniques via Langchain blog, and JSON Schema details on Notion.

Interesting insights about LLM behavior and tuning:

Members discussed how underlying principles of computation give rise to capabilities of LLMs, including references to chaining reasoning and action through frameworks like ReAct. Check the paper ReAct: Synergizing Reasoning and Acting in Language Models.

Copilot chatbot tips:

Several members shared experiences with AI-assisted coding tools like GitHub Copilot and Cursor, recommending examining workspace context and inline chat utilities. See Copilot workspace context for optimizing workspace-based inquiries.

Function calling and evaluation techniques:

Discussions surfaced prompted discussions about leveraging frameworks/tools like Anthropic's XML tags and how to dynamically select few-shot examples via libraries that compute Levenshtein distances or embeddings.

Links mentioned:

LLM Finetuning: Workshop Discussions

Floats are weird, period:

  • Discusses quirks of floating-point numbers and their implications for AI models.

Precision matters for gradient estimation:

  • Contrasts accumulation in 8bit and 16bit floats for gradient accuracy.

HF dataset in sharegpt format:

  • Mentions the use of sharegpt format in HF dataset.

Fine-tuning with synthetic data conundrum:

  • User shares challenges in generating synthetic data for model fine-tuning and its impact on accuracy.

These workshop discussions delve into the intricacies of working with large language models and the challenges faced in optimizing their performance.

LLM Finetuning (Hamel + Dan) - Various Discussions

  • Struggling with CSS Customization in Gradio: A member requested documentation on customizing the CSS of the Gradio interface.
  • Turning off sample packing impacts performance: Recommendations were shared regarding sample packing and sequence length when fine-tuning.
  • Debugging output inconsistencies after training: Discrepancies in model outputs were discussed using TinyLlama with the alpaca_2k_test dataset.
  • Using custom metrics and multiple datasets: Feasibility of using custom metrics and multiple datasets was explored.
  • Troubleshooting padding errors: A user faced padding errors during training due to improper input formatting.
  • Request for fine-tuning process architecture: A member asked for a high-level architecture diagram of the fine-tuning process in Axolotl.
  • Fireworks credit administration: One member will manage Fireworks AI credits.
  • Community gratitude for Fireworks credits: Members expressed appreciation towards the team handling Fireworks credits.
  • NYC Meetup interest: Members showed interest in a meetup in NYC.
  • Berlin meet-up initiates interest: Users expressed interest in meeting up in Berlin.

Discord Events and Discussions

Events details and Zoom URLs will be posted in the Events category on Discord. The section also includes discussions related to CUDA programming in Discord channels like general, triton, torch, cool-links, torchao, off-topic, llmdotc, oneapi, and bitnet. Topics discussed range from GPGPU programming tools recommendations to torch issues with Python 3.12, Triton performance on A6000 GPUs, and bug reports related to matrix multiplication in Triton. Additionally, there are discussions on quantization libraries, AI competitions, and training experiments for models similar to GPT-3. The section captures a mix of technical discussions, community interactions, and off-topic conversations shared in the Discord channels.

Discussion on LM Studio

LM Studio's Open Source Status Confuses User:

  • A member asked if LM Studio is open source, clarifying that only the LMS Client (CLI) and lmstudio.js (new SDK) are open source. Another member confirmed that the main LM Studio app is closed source.

LM Studio Cannot Access Files:

  • A user inquired about models accessing files on their PC using LM Studio, but another clarified that chatting with docs in LM Studio isn't possible and pointed to FAQ and pinned messages for more info.

Discussion on RAG Frameworks:

  • Members discussed low-code RAG frameworks and the integration of vector databases with RAG models, recommending llamaindex for development and considering fine-tuning models for infrequently changing data.

Perplexity vs. LM Studio for Chat Organization:

  • A member mentioned Perplexity's ability to create collections to save and organize chats, querying if LM Studio had a similar feature. It was confirmed that LM Studio does not support this functionality.

File Summarization Limitations in LM Studio:

  • Members discussed the challenges of summarizing book contents with LM Studio due to token limits and recommended using cloud-based AI like GPT4 or Claude 3 Opus for such tasks.

LM Studio Models Discussion

Aya translation model gets a nod:

  • A member recommended giving the Aya Japanese to English model a try for translation tasks. Both quality and efficiency were briefly highlighted.

Highlight on Psyonic-Cetacean model:

  • The 32 Bit Quantum Upscale of 'Space Whale' was mentioned, noting significant performance improvements, including a reduction in perplexity by 932 points at a Q4KM. Learn more about this remastered version here.

Codestral's anticipated release:

  • Members expressed interest in Mistral's new code model, Codestral, which supports 80+ programming languages. Plans for integration into LM Studio were discussed, with a probable new app release required if the tokenizer changes.

Hardware challenges for Aya 23 35B:

  • Issues with the aya-23-35B-Q4_K_M.gguf model on a 4090 GPU were discussed, noting the model's need for more than 24GB of VRAM for optimal performance. Adjusting the context size was suggested as a solution to improve speed.

Space Whale context limits checked:

  • The context limit for the Space Whale model was confirmed by another member to be 4096 tokens. This was verified through the llama.context_length configuration.

Eleuther AI Discussions

EleutherAI welcomes new member inquiries:

A new member seeking advice on getting started with EleutherAI received beginner-level research topics and resources from other members.

Research and question clarification challenges:

Discussions on the difficulty of finding platforms for asking basic questions without gaps in knowledgeable responses.

Exploration of multimodal AI research:

Curiosity expressed about the scarcity of professors specializing in multimodal AI and confusion if it's considered a subfield of CV and NLP.

SPAR highlighted as a resource:

Supervised Program for Alignment Research recommended for developing AI safety skills with ongoing opportunities available.

LlamaIndex Features and Collaborations

The 'LlamaIndex' section reveals the introduction of the 'PropertyGraphIndex' feature in collaboration with Neo4j, offering tools for constructing and querying knowledge graphs efficiently. The feature emphasizes customization and flexibility, providing detailed guides and examples for users. Additionally, the 'blog' subsection showcases various advancements, such as the 'FinTextQA dataset,' 'PostgresML integration, 'Codestral model,' and 'Ollama support.' These developments aim to enhance usability and performance across different applications. Lastly, the 'general' chat includes discussions on semantic chunking in RAG models, embedding models, vector stores, and node management within LlamaIndex, reflecting user queries, experiences, and shared insights.

Latent Space - AI Announcements

AI Agent Architectures and KANs event at 12 PM PT: Latent Space is hosting an event on AI Agent Architectures and Kolmogorov Arnold Networks today at 12 PM PT. Event registration and details are available and attendees are encouraged to add the event to their calendars via the RSS logo on the event page.

Link mentioned: LLM Paper Club (AI Agent Architectures + Kolmogorov Arnold Networks) · Zoom · Luma: a 2-for-1! Eric Ness will cover The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A…

Discord Channels Discussions

Cohere ▷ #general (6 messages):

  • Consider RAG instead of JSONL finetuning for PDFs: A suggestion to use a Retrieval Augmented Generation (RAG) approach for PDFs to avoid finetuning.

  • How to access response.citations in API: Discussion on accessing response.citations feature only through API with examples.

  • Local R+ implementation includes force citations: Success in building a pipeline for RAG within a local implementation of Command R+ with citations included.

  • Discord bot using Cohere praised but needs proper channel: Appreciation for a Discord bot using Cohere but recommendation to discuss in the project channel.

tinygrad (George Hotz) ▷ #general (4 messages):

  • Elon Musk's xAI gets big funding boost: xAI raises $6 billion funding, backed by prominent investors.

  • Doubt about analytical tools: Mention of tools being of negligible usefulness without specifying.

  • Fireship video impresses with Bend language: Praise for Bend language's automatic multi-threading capabilities in a Fireship video.

  • Query about tinybox power supply: Inquiry about the power supply mechanism of the tinybox.

DiscoResearch ▷ #general (4 messages):

  • Goliath sees performance drops before continued pretraining: Discussion on performance drops in Goliath before continued pretraining.

  • GPT-2 replication in llm.c noted: Reproduction of GPT-2 (124M) in llm.c with improved accuracy compared to GPT-3 models.

  • Mistral AI launches Codestral-22B, its first code model: Introduction of Codestral-22B, outperforming previous models and available on various platforms.

  • LAION AI seeks community help with open GPT-4-Omni: Appeal for assistance in building an open GPT-4-Omni model with promising directions and datasets shared in a blog post.


FAQ

Q: What is the significance of the low latency voice model launched by Cartesia?

A: The low latency voice model by Cartesia outperforms its Transformer equivalent, showcasing better performance in terms of lower perplexity, word error rate, and higher quality.

Q: What potential impact do state space models (SSMs) have as mentioned in the article?

A: SSMs have the potential to continuously process and reason over a trillion token context window of text, audio, and video on-device, opening up new possibilities in AI technology.

Q: What discussions took place regarding the Gemini 1.5 Flash model?

A: There were debates stirred by a sudden price increase in Google's Gemini 1.5 Flash API, raising concerns about the stability and trustworthiness of the service.

Q: What issues were raised regarding transparency at OpenAI?

A: Allegations of lack of transparency and dishonesty under Sam Altman's leadership, particularly in situations like the board being unaware of ChatGPT's launch, have sparked discussions on OpenAI's communication practices.

Q: What insights were shared about utilizing AI models like LLMs for practical applications?

A: The discussions highlighted the importance of prompt engineering and utilizing tools and frameworks for effective model optimization and performance enhancement in AI applications.

Q: What challenges were faced in the AI community regarding model fine-tuning and optimization?

A: Challenges were shared around issues like generating synthetic data for fine-tuning, tackling precision matters for accurate gradient estimation, and exploring tools like Anthropic's XML tags for function calling and evaluation techniques.

Logo

Get your own AI Agent Today

Thousands of businesses worldwide are using Chaindesk Generative AI platform.
Don't get left behind - start building your own custom AI chatbot now!