Compile Debug version TensorFlow whl package from source

Compile tensorflow 2.3 from source with Debug symbols

Firstly, git clone tensorflow 2.3.1’s cource code
Use deb package install cudnn7.6.5 from Nvidia website
When executing ./configure in the tf source code directory, it prompts that cublas_api.h is missing, because the cuda component in the docker image used is incomplete. (NOTE: Maybe consider using tensorflow devel image.)
Install the deb package of cuda-repo from the Nvidia official website (cuda network installation version deb package).
After the installation is complete, execute apt update. At this time, you can find various cuda component packages from the apt source. But cublas lacks version 10.1.
apt install cuda-libraries-dev-10-1, cuda-libraries-10-1, then cublas 10.2 version will be installed
Copy the cublas related files in the cuda10.2 directory to the cuda10.1 directory.
Execute ./configure in the tf source code directory, choose to support cuda, and do not choose other things such as tensorrt.

Use bazelisk. Try to use bazel’s -c dbg compilation option. The compilation command is

1	bazel build --config=cuda --strip=never -c dbg --verbose_failures --keep_going //tensorflow/tools/pip_package:build_pip_package

Some files cannot be downloaded by bazel (llvm, aws-sdk), you can download them manually. Then replace the download link of bazel with file:///path/to/downloaded/file (bazel uses curl and supports file://)
aws-checksum compilation error in dbg mode, you could modify third_party/aws/aws-checksums.bazel, add DEBUG_BUILD in gdb mode.
1
2
3
4
5
6
7
29a30,35
> defines = select({
> "@org\_tensorflow//tensorflow:debug": [
> "DEBUG\_BUILD"
> ],
> "//conditions:default": [],
> }),
https://github.com/tensorflow/tensorflow/issues/37498 ,
https://github.com/tensorflow/tensorflow/pull/42743/files
An error occurred when installing the generated package in the venv virtual environment: invalid command bdist_wheel, pip3 install wheel is required.
After the wheel package is installed, an error occurs when importing tensorflow in python: Prompt that there are undefined symbols in the _pywrap_tensorflow_internal.so file
1
2
\_ZN10tensorflow4data12experimental19SnapshotDatasetV2Op11kReaderFuncE
which is: tensorflow::data::experimental::SnapshotDatasetV2Op::kReaderFunc

Switch to tf2.3 version source code. Try to compile again, and still get the same undefined symbol error when importing tensorflow.

Continue to use the source code of version 2.3.0, but give up the -c dgb compilation mode, use -c opt, and adjust the compilation command

1	bazel build --config=cuda -c opt --copt -g --strip=never --keep_going --verbose_failures //tensorflow/tools/pip_package:build_pip_package

Also delete DEBUG_BUILD of aws-checksums. The we could successfully Compile, install, and run.