[Algorithm] 클러스터링 심화_이미지 처리 1 (2)

Algorithm

by 몽골리안 파프리카 2022. 12. 25. 15:08

728x90

이번엔 좌표 값을 부여한 뒤 DBSCAN 을 이용해서 테두리 부분만 남겨본다.

빨간색만 남겨주기 위해 빨간색의 label 인 4 만 남긴다.

몰랐는데 np.where 를 쓰면 array 를 담은 tuple 이 반환된다. tuple 형태로는 아무것도 할 수 없으므로 tuple 속 array 를 꺼내서 2차원 numpy array 로 바꿔주었다.

# 빨간색만 남기기
target = np.where(pos == 4)
target = np.concatenate([target[1].reshape(target[1].shape[0], 1), target[0].reshape(target[0].shape[0], 1)], axis=1)

그 다음, 좌표 값을 부여해준다.

좌표 값을 어떻게 부여해야 하나 고민하다가 픽셀 위치를 point 로 쓰기로 했다.

target_shp = []
i = 0

while True:
    target_shp.append(Point(target[i][0], result.shape[0] - target[i][1]))

    i += 3
    if i == len(target):
        break
    else:
        continue

target_shp = gpd.GeoDataFrame(target_shp)
target_shp = target_shp.reset_index().rename(columns={'index':'srl_num', 0:'geometry'})
target_shp = gpd.GeoDataFrame(target_shp, geometry='geometry')
target_shp['x'], target_shp['y'] = target_shp.geometry.x, target_shp.geometry.y

target_shp.to_file(save_dir + '/IMG2Polygon/Target_Point_{0}.shp'.format(name), encoding='cp949')

del i, target

요로코롬.

target_shp.append(Point(~~)) 에서 y 좌표를 그대로 부여하지 않고 result.shape[0] 에서 뺀 값을 넣어서 값을 반전해줬다.

무슨 말이냐면,

이렇게 수직 기준으로 뒤집어 줬다는 말이다. 왜 인지 모르겠는데, 이미지 파일은 y 축 값이 반대로 되어 있어서 수직반전을 안해주면 결과가 위아래 뒤집혀서 나온다.

어쨌든 좌표 값도 줬고, shp 파일로 만들었으니 DBSCAN 을 돌려주면 되는데,

그 전에 min_sample 값과 eps 을 결정해줘야 한다.

K-means 에서 가장 귀찮은 게 K 값 결정이었다면 DBSCAN 의 경우엔 min_sample 과 eps 이 그렇다.

테두리 속 글씨를 제거하는 것이 목적이므로, min_sample은 7 정도로 줘서 글씨들이 군집으로 취급되지 않게 해준다.

Eps 은 Elbow Method 를 사용하기로 한다. K-means 알고리즘에 Elbow Method 를 적용하면 최적 군집 수 결정에 사용되지만, DBSCAN 에서는 Eps 결정에도 사용된다.

# Eps 결정 (Elbow Method)
neigh = NearestNeighbors(n_neighbors=7)
neigh.fit(target_shp[['x', 'y']])
distances, indices = neigh.kneighbors(target_shp[['x', 'y']])
plt.figure(figsize=(12, 6))
plt.plot(np.sort(distances[:, 4]))
plt.savefig(save_dir + '/IMG2Polygon/IMG/DBSCAN_Elbow_{0}.png'.format(name), dpi=300)
plt.show()

요렇게 min_sample 값을 주면,

이런 차트를 준다. 대략 2.5 에서 3 사이에 확 꺾이는 걸 보니, Eps 은 그 사이 값을 주면 될 듯하다.

min_sample 은 7, Eps 은 2.5로 설정하고 DBSCAN 모델을 만들어서 돌려봤더니

cit = DBSCAN(eps=2.5, min_samples=7)
target_shp['cluster'] = cit.fit_predict(target_shp[['x', 'y']])

plt.scatter(target_shp.x, target_shp.y, c=target_shp.cluster, cmap='rainbow')
plt.savefig(save_dir + '/IMG2Polygon/IMG/DBSCAN_{0}_1st.png'.format(name), dpi=300)
plt.show()

쟌.

가운데 글씨들이 잘 분리되었다. (자세히 보면 맨 위 보라색이랑 다른 보라색이다.)

이제 필요 없는 가운데 글씨를 발라내고 필요한 테두리만 남기면,

나름 성공적! 사실 지금 쓰인 이미지는 색 사용이 비교적 얌전해서 쉽게 구분해냈지만, 색 사용도 복잡하고 사람 눈으로도 구분하기 어려운 지적도를 알고리즘에 적용하면 정신을 못 차린다.

그래도 클러스터링의 새로운 사용법을 찾았다는 점에 만족하려 한다.

'Algorithm' 카테고리의 다른 글

[Algorithm] New Point 쓰레기 처리 시스템 구축 알고리즘 (1) Project Structure (0)	2023.02.19
[Algorithm] 쓰레기 배출 시뮬레이션 (2) (2)	2023.01.09
[Algorithm] 쓰레기 배출 시뮬레이션 (1) (0)	2023.01.09
[Algorithm] 도시화과정 시뮬레이션 (0)	2023.01.02
[Algorithm] 클러스터링 심화_이미지 처리 1 (1) (0)	2022.12.24