小编典典

在Docker Alpine中安装熊猫

docker

真的 很难尝试在中安装稳定的数据科学软件包配置docker。使用这样的主流相关工具应该更容易。

以下是 曾经 工作过的 Dockerfile ,有点 破译 ,将其从软件包核心中删除并单独安装,并指定了(因为据称更高版本与冲突)。
__pandas``pandas<0.21.0``numpy

    FROM alpine:3.6

    ENV PACKAGES="\
    dumb-init \
    musl \
    libc6-compat \
    linux-headers \
    build-base \
    bash \
    git \
    ca-certificates \
    freetype \
    libgfortran \
    libgcc \
    libstdc++ \
    openblas \
    tcl \
    tk \
    libssl1.0 \
    "

ENV PYTHON_PACKAGES="\
    numpy \
    matplotlib \
    scipy \
    scikit-learn \
    nltk \
    "

RUN apk add --no-cache --virtual build-dependencies python3 \
    && apk add --virtual build-runtime \
    build-base python3-dev openblas-dev freetype-dev pkgconfig gfortran \
    && ln -s /usr/include/locale.h /usr/include/xlocale.h \
    && python3 -m ensurepip \
    && rm -r /usr/lib/python*/ensurepip \
    && pip3 install --upgrade pip setuptools \
    && ln -sf /usr/bin/python3 /usr/bin/python \
    && ln -sf pip3 /usr/bin/pip \
    && rm -r /root/.cache \
    && pip install --no-cache-dir $PYTHON_PACKAGES \
    && pip3 install 'pandas<0.21.0' \    #<---------- PANDAS
    && apk del build-runtime \
    && apk add --no-cache --virtual build-dependencies $PACKAGES \
    && rm -rf /var/cache/apk/*

# set working directory
WORKDIR /usr/src/app

# add and install requirements
COPY ./requirements.txt /usr/src/app/requirements.txt # other than data science packages go here
RUN pip install -r requirements.txt

# add entrypoint.sh
COPY ./entrypoint.sh /usr/src/app/entrypoint.sh

RUN chmod +x /usr/src/app/entrypoint.sh

# add app
COPY . /usr/src/app

# run server
CMD ["/usr/src/app/entrypoint.sh"]

上面的配置可以正常工作。 现在 发生的事情是构建确实可以通过,但是 导入pandas失败 并出现以下错误:

ImportError: Missing required dependencies ['numpy']

numpy 1.16.1安装以来,我不知道哪个numpy pandas正在尝试找到…

有谁知道如何为此获得稳定的解决方案?

注意docker至少从上述软件包中抽取数据的交钥匙映像构成的解决方案Dockerfile也将非常受欢迎。


编辑1

如果我将数据包的安装移至requirements.txt,如注释中所建议,如下所示:

requirements.txt

(...)
numpy==1.16.1 # or numpy==1.16.0
scikit-learn==0.20.2
scipy==1.2.1
nltk==3.4   
pandas==0.24.1 # or pandas== 0.23.4
matplotlib==3.0.2 
(...)

Dockerfile

# add and install requirements
COPY ./requirements.txt /usr/src/app/requirements.txt
RUN pip install -r requirements.txt

再次pandas抱怨,抱怨numpy

Collecting numpy==1.16.1 (from -r requirements.txt (line 61))
  Downloading https://files.pythonhosted.org/packages/2b/26/07472b0de91851b6656cbc86e2f0d5d3a3128e7580f23295ef58b6862d6c/numpy-1.16.1.zip (5.1MB)
Collecting scikit-learn==0.20.2 (from -r requirements.txt (line 62))
  Downloading https://files.pythonhosted.org/packages/49/0e/8312ac2d7f38537361b943c8cde4b16dadcc9389760bb855323b67bac091/scikit-learn-0.20.2.tar.gz (10.3MB)
Collecting scipy==1.2.1 (from -r requirements.txt (line 63))
  Downloading https://files.pythonhosted.org/packages/a9/b4/5598a706697d1e2929eaf7fe68898ef4bea76e4950b9efbe1ef396b8813a/scipy-1.2.1.tar.gz (23.1MB)
Collecting nltk==3.4 (from -r requirements.txt (line 64))
  Downloading https://files.pythonhosted.org/packages/6f/ed/9c755d357d33bc1931e157f537721efb5b88d2c583fe593cc09603076cc3/nltk-3.4.zip (1.4MB)
Collecting pandas==0.24.1 (from -r requirements.txt (line 65))
  Downloading https://files.pythonhosted.org/packages/81/fd/b1f17f7dc914047cd1df9d6813b944ee446973baafe8106e4458bfb68884/pandas-0.24.1.tar.gz (11.8MB)
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 359, in get_provider
        module = sys.modules[moduleOrReq]
    KeyError: 'numpy'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-_e5z6o6_/pandas/setup.py", line 732, in <module>
        ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
      File "/tmp/pip-install-_e5z6o6_/pandas/setup.py", line 475, in maybe_cythonize
        numpy_incl = pkg_resources.resource_filename('numpy', 'core/include')
      File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1144, in resource_filename
        return get_provider(package_or_requirement).get_resource_filename(
      File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 361, in get_provider
        __import__(moduleOrReq)
    ModuleNotFoundError: No module named 'numpy'

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-_e5z6o6_/pandas/

编辑2

这似乎是一个未pandas解决的问题。有关更多详细信息,请参阅:

pandas-dev github

“不幸的是,这意味着require.txt文件不足以设置安装了熊猫的新环境(例如在docker容器中)”。

  **ImportError**:

  IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

  Importing the multiarray numpy extension module failed.  Most
  likely you are trying to import a failed build of numpy.
  Here is how to proceed:
  - If you're working with a numpy git repository, try `git clean -xdf`
    (removes all files not under version control) and rebuild numpy.
  - If you are simply trying to use the numpy version that you have installed:
    your installation is broken - please reinstall numpy.
  - If you have already reinstalled and that did not fix the problem, then:
    1. Check that you are using the Python you expect (you're using /usr/local/bin/python),
       and that you have no directories in your PATH or PYTHONPATH that can
       interfere with the Python and numpy versions you're trying to use.
    2. If (1) looks fine, you can open a new issue at
       https://github.com/numpy/numpy/issues.  Please include details on:
       - how you installed Python
       - how you installed numpy
       - your operating system
       - whether or not you have multiple versions of Python installed
       - if you built from source, your compiler versions and ideally a build log

编辑3

requirements.txt -–>
https://pastebin.com/0icnx0iu


编辑4

从20年1月12日开始,接受的解决方案开始不再起作用。
现在,生成中断没有pandas,但scipy但经过numpy,同时建立scipy's轮。这是日志:

  ----------------------------------------
  ERROR: Failed building wheel for scipy
  Running setup.py clean for scipy
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3.6 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-s6nahssd/scipy/setup.py'"'"'; __file__='"'"'/tmp/pip-install-s6nahssd/scipy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' clean --all
       cwd: /tmp/pip-install-s6nahssd/scipy
  Complete output (9 lines):

  `setup.py clean` is not supported, use one of the following instead:

    - `git clean -xdf` (cleans all files)
    - `git clean -Xdf` (cleans all versioned files, doesn't touch
                        files that aren't checked into the git repo)

  Add `--force` to your command to use it anyway if you must (unsupported).

  ----------------------------------------
  ERROR: Failed cleaning build dir for scipy
Successfully built numpy
Failed to build scipy
ERROR: Could not build wheels for scipy which use PEP 517 and cannot be installed directly

从错误看来,构建过程正在使用python3.6,而我正在使用FROM alpine:3.7

完整日志在这里-> https://pastebin.com/Tw4ubxSA

这是当前的Dockerfile:

https://pastebin.com/3SftEufx


阅读 1184

收藏
2020-06-17

共1个答案

小编典典

如果您未绑定Alpine 3.6,则应使用Alpine 3.7(或更高版本)。

在Alpine 3.6上,安装matplotlib失败:

Collecting matplotlib
  Downloading https://files.pythonhosted.org/packages/26/04/8b381d5b166508cc258632b225adbafec49bbe69aa9a4fa1f1b461428313/matplotlib-3.0.3.tar.gz (36.6MB)
    Complete output from command python setup.py egg_info:
    Download error on https://pypi.org/simple/numpy/: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833) -- Some packages may not be found!
    Couldn't find index page for 'numpy' (maybe misspelled?)
    Download error on https://pypi.org/simple/: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833) -- Some packages may not be found!
    No local packages or working download links found for numpy>=1.10.0

但是,在Alpine
3.7上,它起作用了。这可能是由于numpy版本问题(请参阅此处),但是我无法确定。克服了这个问题,软件包的构建和安装成功完成-
花了大约30分钟的时间(由于Alpine的musl-libc与Python的Wheels格式不兼容,因此所有使用pip安装的软件包都必须从源代码构建)。

请注意,这是一项重要的更改:您只应在之后删除build-runtime虚拟包(apk del build-runtimepip install。此外,如果适用,您可以取代numpy的1.16.11.16.2,这是出厂的版本(否则1.16.2将被卸载,1.16.1从源头建立,进一步提高构建时间)
-我还没有尝试这样做,虽然。

作为参考,这是我稍作修改的Dockerfiledocker
build输出

注意:

通常,选择Alpine作为最小化图像大小的基础(Alpine也很光滑,但是由于glibc /
musl而与大陆Linux应用程序存在兼容性问题)。为此,必须从源代码构建Python软件包,因为您会得到一个非常肿的映像-
在进行任何清理之前需要900MB,这也需要很长时间才能构建。可以通过除去所有中间编译工件,构建依赖项等来极大地压缩映像,但是仍然可以。

如果无法获得Python软件包版本,而无需从源代码构建它们,则需要在Alpine上工作,我建议您尝试使用其他更小,更兼容的基本映像,例如debian- slimubuntu

编辑:

在具有附加要求的“编辑3”之后,这里是更新的Dockerfile和Docker
build输出。添加了以下软件包来满足构建依赖性:

postgresql-dev libffi-dev libressl-dev libxml2 libxml2-dev libxslt libxslt-dev libjpeg-turbo-dev zlib-dev

对于由于特定标头而无法构建的软件包,我使用了Alpine的软件包内容搜索来查找丢失的软件包。专门针对cffiffi.h缺少标头,需要libffi- dev打包:https
:
//pkgs.alpinelinux.org/contents?file=ffi.h&path=&name=&branch=v3.7。

或者,当软件包构建失败不是很明显时,可以参考特定软件包的安装说明,例如Pillow

在压缩之前,新的映像大小为1.04GB。为了减少它,您可以删除Python和pip缓存:

RUN apk del build-runtime && \
    find -type d -name __pycache__ -prune -exec rm -rf {} \; && \
    rm -rf ~/.cache/pip

使用时,图片大小可减少到661MB docker build --squash

2020-06-17