논문 스터디 (0530)

컴퓨터과학/인공지능

논문 스터디 (0530)

kykyky 2023. 5. 13. 00:59

3. Recent advances in NLP

# 옛날의 문제: definition of a suitable and effective representation of tokens, sentences, and documents

# one-hot encoding
   문제점: huge vector, drop of relations

3.1. Word and sentence vectors

# Word2vec

* word vectors or word embeddings
  - 비슷한 뜻의 단어 -> 비슷한 representation

* CBOW: reconstruct a target word given its context as input
* Skip-gram: predict context words given the target word

* capture a large number of precise syntactic and semantic word relationships

----------- Word2Vec의 개선 버전

# Global-Vector (GloVe):  exploits statistical information computed on the whole corpus

# fastText

---------------------- definition of a suitable representation for sentences and texts is still challenging

# Bag-of-Words (BOW)
: represents a document d as its set of words that compose it
   computed as the sum of one-hot word vectors

* 문제점: feature vector의 dimension이 빠르게 증가, semnatic이 고려되지 않음

---------------- With the advent of word vectors, new methods to develop meaningful document and sentence level representations

# Unsupervised word/sentence vectors
- extracting general representations that can be placed in various tasks

* simple average pooling of word vectors

* Doc2Vec

* Skip-thought

* fastSent

# supervised methods
- use explicit labels to develop meaningful representations used in downstream tasks

* Convolutional Neural Networks for Sentence Classification

*

# neural Machine Translation (MT)

# attention mechanisms

# ELMo

3.2. Pre-trained Transformer models

# Transformer model
  : the first architecture entirely based on attention to draw global dependencies between input and output, replacing the recurrent layers

* translation quality 좋음, trained significantly faster than architectures based on recurrent or convolutional layers

# BERT(Bidirectional Encoder Representations from Transformers)
: popular pre-trained Transformer models
: pre-train deep bidirectional representations from unlabeled texts by jointly conditioning on both left and right contexts in all layers

* pre-training was driven by two language model objectives
- Masked Language Model (MLM)
   : masks a small number of words of the input sequence and it tries to predict them in output
- Next Sentence Prediction (NSP)
   : network tries to understand the relations between sentences by means of a binary loss

* 활용: sequence classification, word- labeling, sequence2sequence
- strengths
  i) the architecture strongly based on self-attention mechanisms
that allow to read and to keep track of the whole input sequence
  ii) the pre-training
that allows the network to read and to understand a text, its semantic and the meaning

# extensions of BERT,  약간의 차이만. (pre-trained Transformers): RoBERTa, ALBERT, DistilBERT, ...

# 강점

* cross multilingual scenario

* text generation: GPT

- GPT
   uses 12-layer decoder only transformer structure with masked self-attention to train language model on 7000 unpublished books

- GPT-2
   With a few minor changes, the network consists of a 24-layers Transformer with 1.5 billions of learnable parameters

   Task Conditioning

    : training allows to learn multiple tasks using the same unsupervised model (P(output | input, task))
      the model produces different output for same input for different tasks

     - forms the basis for zero-shot task transfer

- GPT-3
   autoregressive language model with 175 billion parameters

   text generation 잘함

   effective in zero- and few-shot settings

   문제점

'컴퓨터과학 > 인공지능' 카테고리의 다른 글

[TF-IDF(Term Frequency-Inverse Document Frequency)] 계산 과정, 강점 (0)	2024.05.12
[Naive Bayes Algorithm] 원리, 종류, 주의사항 (0)	2024.05.11
작성중 (0)	2023.07.30
[Object Detection / Recognition / Tracking] Feature Extraction 기법: SIFT, SURF, ORB (1)	2023.07.29
[AI] AI의 기초 (0)	2023.05.05

현재글논문 스터디 (0530)

ky.agile