진실은 인용한 논문이 25년 4월자 논문…

진실은 인용한 논문이 25년 4월자 논문… https://arxiv.org/abs/2504.19874 현재까지 피드백 구글 리서치에 대한 피드백 • 하락론자: 메모리 주식에 있어 '딥시크(DeepSeek) 모먼트'가 도래했다. 일단 매도하고 고민은 나중에 해라. • 상승론자: 주로 테크 전문가들 - 데이터 압축 기술이 향상되면 컨텍스트 윈도우(문맥 창)를 더 늘릴 수 있고, 이는 도입 확대와 메모리 수요의 기하급수적 증가로 이어진다. 따라서 이번 발표는 부정적이지 않다. 무엇보다 중요한 점은, 구글은 정말 가치 있는 기술은 영업 비밀로 간주해 절대 공개하지 않는다는 것이다. 공개된 것들은 실상 쓸모없는 내용일 뿐이다.

[2504.19874] TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate Abstract page for arXiv paper 2504.19874: TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

Help | Advanced Search open search

              GO




        open navigation menu


            quick links

                Login
                Help Pages
                About

-->

  Computer Science > Machine Learning



  arXiv:2504.19874 (cs)

[Submitted on 28 Apr 2025] Title:TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate Authors:Amir Zandieh, Majid Daliri, Majid Hadian, Vahab Mirrokni View a PDF of the paper titled TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate, by Amir Zandieh and 3 other authors View PDF HTML (experimental)

        Abstract:Vector quantization, a problem rooted in Shannon&#39;s source coding theory, aims to quantize high-dimensional Euclidean vectors while minimizing distortion in their geometric structure. We propose TurboQuant to address both mean-squared error (MSE) and inner product distortion, overcoming limitations of existing methods that fail to achieve optimal distortion rates. Our data-oblivious algorithms, suitable for online applications, achieve near-optimal distortion rates (within a small constant factor) across all bit-widths and dimensions. TurboQuant achieves this by randomly rotating input vectors, inducing a concentrated Beta distribution on coordinates, and leveraging the near-independence property of distinct coordinates in high dimensions to simply apply optimal scalar quantizers per each coordinate. Recognizing that MSE-optimal quantizers introduce bias in inner product estimation, we propose a two-stage approach: applying an MSE quantizer followed by a 1-bit Quantized JL (QJL) transform on the residual, resulting in an unbiased inner product quantizer. We also provide a formal proof of the information-theoretic lower bounds on best achievable distortion rate by any vector quantizer, demonstrating that TurboQuant closely matches these bounds, differing only by a small constant ($\approx 2.7$) factor. Experimental results validate our theoretical findings, showing that for KV cache quantization, we achieve absolute quality neutrality with 3.5 bits per channel and marginal quality degradation with 2.5 bits per channel. Furthermore, in nearest neighbor search tasks, our method outperforms existing product quantization techniques in recall while reducing indexing time to virtually zero.





      Comments:
      25 pages


      Subjects:

        Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB); Data Structures and Algorithms (cs.DS)

      Cite as:
      arXi

출처: https://t.me/kkkontemp/2230