데이터/Data Manipulation

[6] Stratified K-fold 의 여러가지 방법

Cho et al. 2022. 9. 6.

목차

논문을 읽던 도중 실험 데이터에 적용할 수 있는 좋은 K-fold 방법이 소개된 피규어가 있어서 가져와 보았다.

논문은

Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA | BMC Cancer | Full Text (biomedcentral.com)

Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA - BMC Can

Background Blood-based methods using cell-free DNA (cfDNA) are under development as an alternative to existing screening tests. However, early-stage detection of cancer using tumor-derived cfDNA has proven challenging because of the small proportion of cfD...

bmccancer.biomedcentral.com

Freenome 사에서 나온 논문이다.

[6] Stratified K-fold 의 여러가지 방법 — Figure from Original paper

Stratified k-fold 에 기반한 4가지 방법을 설명하고 있다.

1) k-fold

일반 stratified k fold. class 만 고려한 cross-validation 이다.

2) k-batch

실험상에서, 생물학적 요인이 아닌 외부적인 요인으로 인해 batch effect 가 생길 수 있다. 따라서 이를 제거해주거나, 이를 고려한 sampling 이 필요한데, 해석하는 측면에서 잘못된 해석을 낼 수 있기 때문이다.

이 논문에서는 실험상 batch 를 고려한 k-batch 방법을 제안하는데, 이는 train, test 상에서 batch 별로 묶어서 하는 과정이다.

3) ordered k-batch

앞서 말했던 batch 는 sequencing 한 날짜같은 시간적인 것도 고려하여 나눌 수 있는데, ordered k-batch 는 batch 가 만들어진 순서대로 train, test 에서 사용하는 방식이다.

4) balanced k-batch

이 방법은 1)+2)+3) 을 합친 방법으로 batch 의 순서 + batch 안의 class 의 비율을 맞춰서 한 방법이다.

논문에서는 1) 의 일반적인 stratified k-fold 가 성능이 가장 좋았다고 말하고 있다.

'데이터 > Data Manipulation' 카테고리의 다른 글

[Python] Missingno Package : Overview of new datasets (0)	2022.10.20
[Pandas] Stratified sampling with pd.DataFrame.sample() (0)	2022.10.13
[4] Python list comprehension 써보기. (0)	2022.05.29
[3] Pandas transform : lambda 대신 데이터프레임에 사용가능, 하지만 더 다양하게. (0)	2022.05.29
[2] Pandas cut : 조건식 있는 loc 대신 쓸 수 있는 방법. (0)	2022.05.29

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

[6] Stratified K-fold 의 여러가지 방법

'데이터 > Data Manipulation' 카테고리의 다른 글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역