'프로그램 사용 > distcc & ccache' 카테고리의 다른 글
distcc 만세! (0) | 2021.05.12 |
---|---|
rpi distcc 성공인데 실패 (0) | 2021.04.28 |
distcc hosts 파일과 순서 (0) | 2016.10.19 |
distcc-pump 시도.. (0) | 2016.10.18 |
distcc 를 DHCP 에서.. 2? (0) | 2016.10.18 |
distcc 만세! (0) | 2021.05.12 |
---|---|
rpi distcc 성공인데 실패 (0) | 2021.04.28 |
distcc hosts 파일과 순서 (0) | 2016.10.19 |
distcc-pump 시도.. (0) | 2016.10.18 |
distcc 를 DHCP 에서.. 2? (0) | 2016.10.18 |
distcc 패키지 설치하고, tensorflow lite 빌드 시도
원래는 30분 정도 걸렸는데 (rpi 3b, 4core 기준) 얼마나 줄어들려나?
(느낌으로는 SD 메모리라 disk io로 인해 오히려 더 느려질지도 모르겠다는 불안감이..)
접속이 안되는 것 같아서 다른 문서들을 자세히 보니 설정을 제대로 안했네!
distcc[946] (dcc_build_somewhere) Warning: failed to distribute, running locally instead distcc[946] (dcc_parse_hosts) Warning: /home/pi/.distcc/zeroconf/hosts contained no hosts; can't distribute work distcc[946] (dcc_zeroconf_add_hosts) CRITICAL! failed to parse host file. |
/etc/default/ditscc 파일에서 allow와 listener를 수정해주고 service distcc restart 하면 끝!
$ cat /etc/default/distcc # Defaults for distcc initscript # sourced by /etc/init.d/distcc # # should distcc be started on boot? # STARTDISTCC="true" #STARTDISTCC="false" # # Which networks/hosts should be allowed to connect to the daemon? # You can list multiple hosts/networks separated by spaces. # Networks have to be in CIDR notation, e.g. 192.168.1.0/24 # Hosts are represented by a single IP address # # ALLOWEDNETS="127.0.0.1" ALLOWEDNETS="127.0.0.1 192.168.0.0/16" # # Which interface should distccd listen on? # You can specify a single interface, identified by it's IP address, here. # # LISTENER="127.0.0.1" LISTENER="" # # You can specify a (positive) nice level for the distcc process here # # NICE="10" NICE="10" # # You can specify a maximum number of jobs, the server will accept concurrently # # JOBS="" JOBS="" # # Enable Zeroconf support? # If enabled, distccd will register via mDNS/DNS-SD. # It can then automatically be found by zeroconf enabled distcc clients # without the need of a manually configured host list. # ZEROCONF="true" #ZEROCONF="false" |
MAKEFLAGS에 CC=/usr/lib/distcc/gcc 이 포인트 이긴 한데..
tensorflow/tensorflow/lite/tools/make $ cat ./build_rpi_lib.sh
#!/bin/bash
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
set -x
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
TENSORFLOW_DIR="${SCRIPT_DIR}/../../../.."
FREE_MEM="$(free -m | awk '/^Mem/ {print $2}')"
# Use "-j 4" only memory is larger than 2GB
if [[ "FREE_MEM" -gt "2000" ]]; then
NO_JOB=4
else
NO_JOB=1
fi
export MAKEFLAGS="CXX=/usr/lib/distcc/g++ CC=/usr/lib/distcc/gcc"
make -j 8 TARGET=rpi -C "${TENSORFLOW_DIR}" -f tensorflow/lite/tools/make/Makefile $@
#make -j ${NO_JOB} CC=/usr/lib/distcc/gcc TARGET=rpi -C "${TENSORFLOW_DIR}" -f tensorflow/lite/tools/make/Makefile $@
/etc/distcc/hosts 에 사용할 노드 이름을 넣으면 되는데 자기 자신이 들어가지 않으면
distcc 에서는 슬레이브 노드들로만 빌드를 하게 된다.
# As described in the distcc manpage, this file can be used for a global # list of available distcc hosts. # # The list from this file will only be used, if neither the # environment variable DISTCC_HOSTS, nor the file $HOME/.distcc/hosts # contains a valid list of hosts. # # Add a list of hostnames in one line, seperated by spaces, here. # tf2 tf3 +zeroconf |
가끔 이런거 나오는데 그냥 무시하면 zeroconf에 의해서 붙는지 슬레이브 노드(?) 쪽 cpu를 빨아먹긴 한다.
distcc[1323] (dcc_build_somewhere) Warning: failed to distribute, running locally instead distcc[1332] (dcc_build_somewhere) Warning: failed to distribute, running locally instead |
[링크 : http://openframeworks.cc/ko/setup/raspberrypi/raspberry-pi-distcc-guide/]
[링크 : http://jtanx.github.io/2019/04/19/rpi-distcc-node/]
+
/var/log/distcc.log를 보는데
정상적으로 잘되면 COMPILE_OK가 뜨지만
어느순간 갑자기 client fd disconnected가 뜨면서 빌드가 멈춘다.
근데 time:305000ms 정도 대충 5분 timewait 걸리는것 같아서
오히려 안하니만 못한 상황..
distccd[14090] (dcc_job_summary) client: 192.168.52.209:40940 COMPILE_OK exit:0 sig:0 core:0 ret:0 time:16693ms g++ tensorflow/lite/kernels/cpu_backend_gemm_eigen.cc
distccd[14091] (dcc_collect_child) ERROR: Client fd disconnected, killing job
distccd[14091] (dcc_writex) ERROR: failed to write: Broken pipe
distccd[14091] (dcc_job_summary) client: 192.168.52.209:40932 CLI_DISCONN exit:107 sig:0 core:0 ret:107 time:307172ms
아무튼 위와 같은 에러를 내며 뻗을때 개별 노드에서는 이런식으로 IO가 미쳐 날뛴다.
--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai stl| read writ| recv send| in out | int csw 5 2 10 83 0| 928k 4048k|1063B 252B| 68k 2040k|1830 3320 0 3 27 69 0|7840M 27M|2919k 73k|1512k 11M| 245k 402k missed 238 ticks 2 1 0 97 0| 176k 0 | 0 0 |8192B 0 | 19 23 missed 2 ticks |
+
cpp,lzo를 넣어서 해볼까?
[링크 : https://wiki.gentoo.org/wiki/Distcc/ko]
+
export MAKEFLAGS="CXX=/usr/lib/distcc/g++ CC=/usr/lib/distcc/gcc" #export MAKEFLAGS="CXX=/usr/bin/distcc-pump CC=/usr/bin/distcc-pump" make -j 8 TARGET=rpi -C "${TENSORFLOW_DIR}" -f tensorflow/lite/tools/make/Makefile $@ #make -j ${NO_JOB} CC=/usr/lib/distcc/gcc TARGET=rpi -C "${TENSORFLOW_DIR}" -f tensorflow/lite/tools/make/Makefile $@ |
되는데 pump가 아닌거랑 동일하게 io가 폭주해서 뻗는건 동일하다.
$ distcc-pump ./build_rpi_lib.sh |
+
distccmon-text 는 slave node가 아니라 server node에서 해야 하는구나..
distcc 만세! (0) | 2021.05.12 |
---|---|
rpi distcc with ccache 실패 ㅠㅠ (0) | 2021.04.30 |
distcc hosts 파일과 순서 (0) | 2016.10.19 |
distcc-pump 시도.. (0) | 2016.10.18 |
distcc 를 DHCP 에서.. 2? (0) | 2016.10.18 |
imx6q neon tensorlow lite (0) | 2021.05.10 |
---|---|
tflite type (0) | 2021.05.01 |
tflite convert (0) | 2021.04.16 |
LSTM - Long short-term memory (0) | 2021.04.16 |
quantization: 0.003921568859368563 * q (0) | 2021.04.15 |
[링크 : http://www.tensorflow.org/lite/api_docs/python/tf/lite/Optimize]
[링크 : http://www.tensorflow.org/lite/guide/ops_select]
[링크 : http://medium.com/sclable/model-quantization-using-tensorflow-lite-2fe6a171a90d]
[링크 : http://www.tensorflow.org/lite/performance/quantization_spec]
[링크 : http://www.tensorflow.org/api_docs/python/tf/lite/TFLiteConverter]
tflite type (0) | 2021.05.01 |
---|---|
tflite example (0) | 2021.04.19 |
LSTM - Long short-term memory (0) | 2021.04.16 |
quantization: 0.003921568859368563 * q (0) | 2021.04.15 |
tflite_converter quantization (0) | 2021.04.14 |
tensorflow model 뒤져보다 보니 lstm 이라는 용어는 본적이 있는데
귀찮아서 넘기다가 이번에도 또 검색중에 걸려나와서 조사.
RNN(Recurrent nerural network) 에서 사용하는 기법(?)으로 문맥을 강화해주는 역활을 하는 듯.
[링크 : http://euzl.github.io/hackday_1/]
[링크 : https://en.wikipedia.org/wiki/Long_short-term_memory]
tflite example (0) | 2021.04.19 |
---|---|
tflite convert (0) | 2021.04.16 |
quantization: 0.003921568859368563 * q (0) | 2021.04.15 |
tflite_converter quantization (0) | 2021.04.14 |
tensorboard graph (0) | 2021.04.14 |
tflite로 변환시 unit8로 양자화 하면
분명 범위는 random으로 들어가야 해서 quantization 범위가 조금은 달라질 것으로 예상을 했는데
항상 동일한 0.003921568859368563 * q로 나와 해당 숫자로 검색을 하니
0~255 범위를 float로 정규화 하면 해당 숫자가 나온다고..
0.00392 * 255 = 0.9996 이 나오긴 하네?
quantization of input tensor will be close to (0.003921568859368563, 0). mean is the integer value from 0 to 255 that maps to floating point 0.0f. std_dev is 255 / (float_max - float_min). This will fix one possible problem |
[링크 : https://stackoverflow.com/questions/54830869/]
[링크 : https://github.com/majidghafouri/Object-Recognition-tf-lite/issues/1]
+
output_format: Output file format. Currently must be {TFLITE, GRAPHVIZ_DOT}. (default TFLITE) quantized_input_stats: Dict of strings representing input tensor names mapped to tuple of floats representing the mean and standard deviation of the training data (e.g., {"foo" : (0., 1.)}). Only need if inference_input_type is QUANTIZED_UINT8. real_input_value = (quantized_input_value - mean_value) / std_dev_value. (default {}) default_ranges_stats: Tuple of integers representing (min, max) range values for all arrays without a specified range. Intended for experimenting with quantization via "dummy quantization". (default None) post_training_quantize: Boolean indicating whether to quantize the weights of the converted float model. Model size will be reduced and there will be latency improvements (at the cost of accuracy). (default False) |
[링크 : http://man.hubwiz.com/.../python/tf/lite/TFLiteConverter.html]
TOCO(Tensorflow Lite Optimized Converter)
[링크 : https://junimnjw.github.io/%EA%B0%9C%EB%B0%9C/2019/08/09/tensorflow-lite-2.html]
tflite convert (0) | 2021.04.16 |
---|---|
LSTM - Long short-term memory (0) | 2021.04.16 |
tflite_converter quantization (0) | 2021.04.14 |
tensorboard graph (0) | 2021.04.14 |
generate_tfrecord.py (0) | 2021.04.13 |
이것저것.. 원본 소스까지 뒤지고 있는데 이렇다 할 원하는 답이 안보인다.
[링크 : https://www.tensorflow.org/model_optimization/guide/quantization/training]
[링크 : https://www.tensorflow.org/model_optimization/guide/quantization/training_example]
[링크 : https://github.com/tensorflow/.../lite/g3doc/performance/post_training_quantization.md]
[링크 : https://github.com/tensorflow/.../lite/g3doc/performance/quantization_spec.md]
util_test.py
def _generate_integer_tflite_model(quantization_type=dtypes.int8):
"""Define an integer post-training quantized tflite model."""
# Load MNIST dataset
n = 10 # Number of samples
(train_images, train_labels), (test_images, test_labels) = \
tf.keras.datasets.mnist.load_data()
train_images, train_labels, test_images, test_labels = \
train_images[:n], train_labels[:n], test_images[:n], test_labels[:n]
# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images = test_images / 255.0
# Define TF model
model = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=(28, 28)),
tf.keras.layers.Reshape(target_shape=(28, 28, 1)),
tf.keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation="relu"),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(10)
])
# Train
model.compile(
optimizer="adam",
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=["accuracy"])
model.fit(
train_images,
train_labels,
epochs=1,
validation_split=0.1,
)
# Convert TF Model to an Integer Quantized TFLite Model
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = {tf.lite.Optimize.DEFAULT}
def representative_dataset_gen():
for _ in range(2):
yield [
np.random.uniform(low=0, high=1, size=(1, 28, 28)).astype(
np.float32)
]
converter.representative_dataset = representative_dataset_gen
if quantization_type == dtypes.int8:
converter.target_spec.supported_ops = {tf.lite.OpsSet.TFLITE_BUILTINS_INT8}
else:
converter.target_spec.supported_ops = {
tf.lite.OpsSet
.EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8
}
tflite_model = converter.convert()
return tflite_model
lite_v2_test.py
def _getIntegerQuantizeModel(self):
np.random.seed(0)
root = tracking.AutoTrackable()
@tf.function(
input_signature=[tf.TensorSpec(shape=[1, 5, 5, 3], dtype=tf.float32)])
def func(inp):
conv = tf.nn.conv2d(
inp, tf.ones([3, 3, 3, 16]), strides=[1, 1, 1, 1], padding='SAME')
output = tf.nn.relu(conv, name='output')
return output
def calibration_gen():
for _ in range(5):
yield [np.random.uniform(-1, 1, size=(1, 5, 5, 3)).astype(np.float32)]
root.f = func
to_save = root.f.get_concrete_function()
return (to_save, calibration_gen)
def testInvalidIntegerQuantization(self, is_int16_quantize,
inference_input_output_type):
func, calibration_gen = self._getIntegerQuantizeModel()
# Convert quantized model.
quantized_converter = lite.TFLiteConverterV2.from_concrete_functions([func])
quantized_converter.optimizations = [lite.Optimize.DEFAULT]
quantized_converter.representative_dataset = calibration_gen
if is_int16_quantize:
quantized_converter.target_spec.supported_ops = [
lite.OpsSet.\
EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8,
lite.OpsSet.TFLITE_BUILTINS
]
with self.assertRaises(ValueError) as error:
quantized_converter.inference_input_type = dtypes.int8
quantized_converter.inference_output_type = dtypes.int8
quantized_converter.convert()
self.assertEqual(
'The inference_input_type and inference_output_type '
"must be in ['tf.float32', 'tf.int16'].", str(error.exception))
def testCalibrateAndQuantizeBuiltinInt16(self):
func, calibration_gen = self._getIntegerQuantizeModel()
# Convert float model.
float_converter = lite.TFLiteConverterV2.from_concrete_functions([func])
float_tflite_model = float_converter.convert()
self.assertIsNotNone(float_tflite_model)
converter = lite.TFLiteConverterV2.from_concrete_functions([func])
# TODO(b/156309549): We should add INT16 to the builtin types.
converter.optimizations = [lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.representative_dataset = calibration_gen
converter._experimental_calibrate_only = True
calibrated_tflite = converter.convert()
quantized_tflite_model = mlir_quantize(
calibrated_tflite, inference_type=_types_pb2.QUANTIZED_INT16)
self.assertIsNotNone(quantized_tflite_model)
# The default input and output types should be float.
interpreter = Interpreter(model_content=quantized_tflite_model)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
self.assertLen(input_details, 1)
self.assertEqual(np.float32, input_details[0]['dtype'])
output_details = interpreter.get_output_details()
self.assertLen(output_details, 1)
self.assertEqual(np.float32, output_details[0]['dtype'])
# Ensure that the quantized weights tflite model is smaller.
self.assertLess(len(quantized_tflite_model), len(float_tflite_model))
LSTM - Long short-term memory (0) | 2021.04.16 |
---|---|
quantization: 0.003921568859368563 * q (0) | 2021.04.15 |
tensorboard graph (0) | 2021.04.14 |
generate_tfrecord.py (0) | 2021.04.13 |
Learning without Forgetting (LwF) (0) | 2021.04.12 |
pb 파일을 tensorboard에 끌어가면
간혹(?) graph 항목에 내용이 없는 경우가 있어서
어떻게 해야 해당 항목을 활성화 할 수 있나 검색중
[링크 : http://stackoverflow.com/questions/48391075]
writer = tf.summary.FileWriter("output", sess.graph)
[링크 : http://www.h2kinfosys.com/blog/tensorboard-how-to-use-tensorboard-for-graph-visualization/]
[링크 : http://www.tensorflow.org/tensorboard/graphs]
quantization: 0.003921568859368563 * q (0) | 2021.04.15 |
---|---|
tflite_converter quantization (0) | 2021.04.14 |
generate_tfrecord.py (0) | 2021.04.13 |
Learning without Forgetting (LwF) (0) | 2021.04.12 |
딥러닝 학습 transfer, quantization (0) | 2021.04.12 |
먼가 이상해서 하나하나 뜯어 보는중
[링크 : https://www.tensorflow.org/tutorials/load_data/tfrecord]
[링크 : https://www.kaggle.com/gauravchopracg/understanding-tfrecord-format]
학습을 하는건 돌아가는데
탐지가 안되거나 입력 범위가 이상하거나 이런 문제가 있어서 확인하는데
tfrecord 에서는 학습에 필요한 이미지를 읽어서 넣어 두는 듯?
그 과정에서 원본이 들어가냐 bitmpa으로 들어가냐를 확인하는데
혹시나 해서 1년 이내 글로 찾아보니 업그레이드 된 generate_tfrecord.py 를 발견!
tflite_converter quantization (0) | 2021.04.14 |
---|---|
tensorboard graph (0) | 2021.04.14 |
Learning without Forgetting (LwF) (0) | 2021.04.12 |
딥러닝 학습 transfer, quantization (0) | 2021.04.12 |
tf checkpoint to pb (0) | 2021.04.09 |
Trnasfer는 기존의 학습을 다 지우고
새로운 내용에 대한 학습을 하는 것이라면
LwF는 기존의 데이터에 추가로 학습을 하는 것.
tensorboard graph (0) | 2021.04.14 |
---|---|
generate_tfrecord.py (0) | 2021.04.13 |
딥러닝 학습 transfer, quantization (0) | 2021.04.12 |
tf checkpoint to pb (0) | 2021.04.09 |
labelImg (0) | 2021.04.09 |