[20230903] Weekly AI ArXiv 만담 시즌2 - 24회차 #90

jungwoo-ha · 2023-09-03T00:56:23Z

News

HyperCLOVA X 공개 (8.24)
- 네이버클라우드 소개페이지: https://www.ncloud.com/solution/featured/hyperclovax
- DAN23 영상 다시보기: https://tv.naver.com/v/39568301
ChatGPT-3.5 Tuning and Enterprise
Google Cloud Next 2023
- TPUv5e
- 듀엣AI, Vertex AI -- LLM은 B2B로
메타, 유럽서 페북·인스타 ‘유료버전’ 검토…EU 규제 영향

ArXiv

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
- 구글에서 나온 Pretraining 시 corpus 도메인 최적화 하는 방법 연구 (평가가 아주 좋음)
- Small referece model 로 small proxy model 만들고 domain weight 최적화 해서 pretrainin corpus 구성
- 주로 280M을 레퍼런스 모델로 해서 8B에 올려봤는데 FT에서 효과가 아주 좋음
- GLaM, Pile 데이터셋을 통해 성능평가. 레퍼런스 모델크기에 대한 다양한 ablation
- Pretraining 을 수행하고자 하는 연구그룹에서는 꼭 참조해 보면 좋을 연구
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
- Meta에서 만든 122개 언어를 커버하는 Multi-choice MRC 데이터셋
- 기반은 다국어 번역 벤치마크인 FLORES-200의 passage들을 기반으로 함
- 이를 Human - AI collaboration 을 통해 MRC 셋으로 만들어 공개
- 언어종류도 High, mid, low resource 즉 주류 중간 비주류 언어 모두를 커버하도록
- 평가는 MLM 모델 (InfoXLM, XLM-V, 번역후 학습), LLM (GPT-3.5-Turbo, LLaMA1,2, Falcon-40B, Zero-shot)
- Low resource 언어는 모델 커져도 별로 재미를 못보는 듯..

veritas9872 · 2023-09-03T04:37:11Z

Technical News

Candle: A Minimalist ML framework for Rust

GitHub: https://github.com/huggingface/candle

HuggingFace에서 Rust 언어 기반의 Candle 딥러닝 프레임워크를 출시했습니다. Torch와 유사하지만 상대적으로 기능이 적은 것으로 보이는데 Python을 사용할 수 없는 embedded 또는 millisecond-level latency가 요구되는 환경에서 모델 추론을 하는데 많은 도움이 될 것 같습니다.

내부를 들여봤을 때 Rust와 CUDA 사이에 불안정한 wrapper API에 의존하고 있어 production에 사용하기에는 문제가 있을 것으로 생각되지만 robotics 등 Python으로 인한 memory 및 latency overhead가 부담스러운 환경에서 사용해볼 수 있을 것 같습니다.

Rust는 WASM과 호환성이 매우 좋아 웹브라우저에서의 실행에도 많은 도움이 될 것 같습니다.

Matrix Multiplication in Candle:

use candle_core::{Device, Tensor};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let device = Device::new_cuda(0)?;

    let a = Tensor::randn(0f32, 1., (2, 3), &device)?;
    let b = Tensor::randn(0f32, 1., (3, 4), &device)?;

    let c = a.matmul(&b)?;
    println!("{c}");
    Ok(())
}

Open challenges in LLM research

Blog: https://huyenchip.com/2023/08/16/llm-research-open-challenges.html

MLOps 업계에서 유명하신 Chip Huyen님께서 현재 언어 모델에서 문제점 및 연구 진행 방향에 대해 블로그를 공유했습니다. 최근 거대 언어 모델의 중요성이 부각되면서 입문자에게 도움이 될 것 같습니다.

Research

Nougat: Neural Optical Understanding for Academic Documents

Website: https://facebookresearch.github.io/nougat/
ArXiv: https://arxiv.org/abs//2308.13418
GitHub: https://github.com/facebookresearch/nougat
HuggingFace Demo: https://huggingface.co/spaces/ysharma/nougat

Meta에서 연구원에게 꼭 필요한 과학 논문 분석 OCR 논문을 공개했습니다 (저는 직접 설치해서라도 사용할 것 같습니다 ㅋㅋㅋ).

과학 기술 연구의 대부분은 ArXiv 등에서 PDF 형태로 공유되는데 일반적인 OCR은 수식 및 테이블을 잘 추출하지 못하는 문제점이 있는데 해당 연구에서는 비교적 단순한 Encoder-Decoder 모델을 활용해서 논문 스캔을 입력으로 받았을 때 Markdown 형식으로 출력할 수 있는 모델을 만들었습니다.

데이터는 ArXiv 논문을 주로 사용했으며 아직까지 그림을 다루지 못한다는 단점이 있지만 PDF를 보고 LaTeX을 작성해야 하는 많은 연구원들에게 도움이 될 것이라고 믿습니다.

종이 및 조명의 왜곡이 심함에도 정확하게 LaTeX을 출력한 success case입니다.

Figure 5의 그림은 생성할 수 없는 failure case입니다.

Bayesian Flow Networks

ArXiv: https://arxiv.org/abs/2308.07037

(Unofficial) GitHub: https://github.com/Algomancer/Bayesian-Flow-Networks

nick-jhlee · 2023-09-03T05:32:13Z

Upcoming/Finished deadlines

AAAI 2024: submission 다들 수고하셨습니다! (현재 phase 1 reviewing 진행중)
ICLR 2024: abstract due 09/22 9PM (KST) ~~추석은 저기 어딘가로,,~~
AISTATS 2024: abstract due 10/07 9PM (KST)
(left out srry: CHI 2024, AAMAS 2024, ALT 2024, EACL 2024)

Misc. OPODIS 2023: 09/08 (AoE)

NeurIPS 2023 Author Notification: ~ 09/22 언저리..? (다들 잘되길 기원드릴게요 ㅠㅠ)

News

IJCAI 2024 to be held at Jeju (not Shanghai as planned)
- https://twitter.com/IJCAIconf/status/1694990362954375336?s=20
Yet Another ICML Award Fiasco (by Prof. Francesco Orabona of KAUST)
- https://twitter.com/bremen79/status/1696868943426986168?s=46&t=llKohaNYR1IR_yaWlq40TA
- 이번에 ICML 2023에서 outstanding award를 D-Adaptation가 받았는...데....
- 알고보니 9년전에 있던 result보다 worse한 result를 포장을 잘 한걸로 보인다...?
- raises several ques:
  - 이렇게 mistake가 있으면 어떻게 해야할까요 (in CS conf)?
  - award를 주는건 test-of-the-time처럼 시간을 두고 줘야하지 않을까요? (even get rid of paper award?)
  - ...etc

Papers

Transformers as Support Vector Machines

Tarzanagh, Li, Thrampoulidis, and Oymak
TL;DR: formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem
- optimal input token과 non-optimal token을 separation!
Implicit bias! (classifying and minimizing norm of $W = KQ^T$), for both linear and nonlinear head/MLPs.
Many many exciting future directions!
- extension to realistic architectures, generalization analyses...etc

LLM Theory에 다들 관심을 주시면.. ㅎㅅㅎ

Language modeling 자체에 대한 연구 (~ statistical learning theory)
New optimizer for LLM
Optimization landscape of LLM
Inner mechanism of attention
...etc (let me know if you are interested in a list of recent preprints ~~that I have not read fully~~)

지금 매우매우 핫하면서 (at least in theory community), 개인적으로 이제는 LLM theoretical foundation이 나올 때가 되었다고 생각이 듭니다. 이게 좀 더 major한 NLP community와 같이 소통하면서 연구를 하면 매우 meaningful한 결과들이 많이 나올것 같습니당

jwlee-neubla · 2023-09-03T12:01:34Z

Cloud TPU v5e for large-scale AI inference

Next 2023에서 TPU v5e를 공개했습니다.
TPUv5의 lite version이고, TPUv4i의 후속작입니다.
Tensor Core 1개 - 197TFLOPS, HBM2 1개 - 16GB, 819GBps, no twisted torus, no OCS
Single chip으로 13B 모델까지 돌릴 수 있고, 256개로(1 Pod) 2T 까지 돌릴 수 있다고 주장
Google이 서비스로 이용할 sLLM이 주요 타겟이 아닐까 예상해봅니다.

https://cloud.google.com/blog/products/compute/how-cloud-tpu-v5e-accelerates-large-scale-ai-inference?hl=en

nick-jhlee · 2023-09-03T12:33:21Z

Technical notes on "Bayesian Flow Networks" by Dr. Sam Power (Univ of Bristol)

https://twitter.com/sp_monte_carlo/status/1694704443814457536?s=20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[20230903] Weekly AI ArXiv 만담 시즌2 - 24회차 #90

[20230903] Weekly AI ArXiv 만담 시즌2 - 24회차 #90

jungwoo-ha commented Sep 3, 2023 •

edited

Loading

veritas9872 commented Sep 3, 2023 •

edited

Loading

nick-jhlee commented Sep 3, 2023 •

edited

Loading

jwlee-neubla commented Sep 3, 2023 •

edited

Loading

nick-jhlee commented Sep 3, 2023

[20230903] Weekly AI ArXiv 만담 시즌2 - 24회차 #90

[20230903] Weekly AI ArXiv 만담 시즌2 - 24회차 #90

Comments

jungwoo-ha commented Sep 3, 2023 • edited Loading

News

ArXiv

veritas9872 commented Sep 3, 2023 • edited Loading

Technical News

Candle: A Minimalist ML framework for Rust

Open challenges in LLM research

Research

Nougat: Neural Optical Understanding for Academic Documents

Bayesian Flow Networks

nick-jhlee commented Sep 3, 2023 • edited Loading

Upcoming/Finished deadlines

News

Papers

Transformers as Support Vector Machines

LLM Theory에 다들 관심을 주시면.. ㅎㅅㅎ

jwlee-neubla commented Sep 3, 2023 • edited Loading

Cloud TPU v5e for large-scale AI inference

nick-jhlee commented Sep 3, 2023

Technical notes on "Bayesian Flow Networks" by Dr. Sam Power (Univ of Bristol)

jungwoo-ha commented Sep 3, 2023 •

edited

Loading

veritas9872 commented Sep 3, 2023 •

edited

Loading

nick-jhlee commented Sep 3, 2023 •

edited

Loading

jwlee-neubla commented Sep 3, 2023 •

edited

Loading