3. Recent advances in NLP
# 옛날의 문제: definition of a suitable and effective representation of tokens, sentences, and documents
# one-hot encoding
문제점: huge vector, drop of relations
3.1. Word and sentence vectors
# Word2vec
* word vectors or word embeddings
- 비슷한 뜻의 단어 -> 비슷한 representation
* CBOW: reconstruct a target word given its context as input
* Skip-gram: predict context words given the target word
* capture a large number of precise syntactic and semantic word relationships
----------- Word2Vec의 개선 버전
# Global-Vector (GloVe): exploits statistical information computed on the whole corpus
# fastText
---------------------- definition of a suitable representation for sentences and texts is still challenging
# Bag-of-Words (BOW)
: represents a document d as its set of words that compose it
computed as the sum of one-hot word vectors
* 문제점: feature vector의 dimension이 빠르게 증가, semnatic이 고려되지 않음
---------------- With the advent of word vectors, new methods to develop meaningful document and sentence level representations
# Unsupervised word/sentence vectors
- extracting general representations that can be placed in various tasks
* simple average pooling of word vectors
* Doc2Vec
* Skip-thought
* fastSent
# supervised methods
- use explicit labels to develop meaningful representations used in downstream tasks
* Convolutional Neural Networks for Sentence Classification
*
# neural Machine Translation (MT)
# attention mechanisms
# ELMo
3.2. Pre-trained Transformer models
# Transformer model
: the first architecture entirely based on attention to draw global dependencies between input and output, replacing the recurrent layers
* translation quality 좋음, trained significantly faster than architectures based on recurrent or convolutional layers
# BERT(Bidirectional Encoder Representations from Transformers)
: popular pre-trained Transformer models
: pre-train deep bidirectional representations from unlabeled texts by jointly conditioning on both left and right contexts in all layers
* pre-training was driven by two language model objectives
- Masked Language Model (MLM)
: masks a small number of words of the input sequence and it tries to predict them in output
- Next Sentence Prediction (NSP)
: network tries to understand the relations between sentences by means of a binary loss
* 활용: sequence classification, word- labeling, sequence2sequence
- strengths
i) the architecture strongly based on self-attention mechanisms
that allow to read and to keep track of the whole input sequence
ii) the pre-training
that allows the network to read and to understand a text, its semantic and the meaning
# extensions of BERT, 약간의 차이만. (pre-trained Transformers): RoBERTa, ALBERT, DistilBERT, ...
# 강점
* cross multilingual scenario
* text generation: GPT
- GPT
uses 12-layer decoder only transformer structure with masked self-attention to train language model on 7000 unpublished books
- GPT-2
With a few minor changes, the network consists of a 24-layers Transformer with 1.5 billions of learnable parameters
Task Conditioning
: training allows to learn multiple tasks using the same unsupervised model (P(output | input, task))
the model produces different output for same input for different tasks
- forms the basis for zero-shot task transfer
- GPT-3
autoregressive language model with 175 billion parameters
text generation 잘함
effective in zero- and few-shot settings
문제점
'IT > 인공지능' 카테고리의 다른 글
[TF-IDF(Term Frequency-Inverse Document Frequency)] 계산 과정, 강점 (0) | 2024.05.12 |
---|---|
[Naive Bayes Algorithm] 원리, 종류, 주의사항 (0) | 2024.05.11 |
작성중 (0) | 2023.07.30 |
[Object Detection / Recognition / Tracking] Feature Extraction 기법: SIFT, SURF, ORB (1) | 2023.07.29 |
[AI] AI의 기초 (0) | 2023.05.05 |