I need to run Apache Airflow in a corporate network. For that I need to set "http_proxy", "https_proxy" and "no_proxy" in any machine I want to use internet.
Right now, the VM that I'm using to run Airflow stores these env. variables in /etc/profile.
I can run Python scripts that make HTTP requests to external websites with ease, when I run them on the terminal, but when I run them inside a DAG, it breaks because it couldn't resolve/access the address.
It seems that Airflow runs scripts in an isolated environment. I am currently using CeleryExecutor.
Firstly, I've accessed all the environment variables with a print(environ). I got this:
environ({'LANG': 'en_US.UTF-8', 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin', 'HOME': '/home/airflow', 'LOGNAME': 'airflow', 'USER': 'airflow', 'SHELL': '/bin/bash', 'INVOCATION_ID': '5c777ce3b07748309b972d877a0545ea', 'JOURNAL_STREAM': '9:37430', 'AIRFLOW_CONFIG': '/opt/airflow/airflow.cfg', 'AIRFLOW_HOME': '/opt/airflow', '_MP_FORK_LOGLEVEL_': '20', '_MP_FORK_LOGFILE_': '', '_MP_FORK_LOGFORMAT_': '[%(asctime)s: %(levelname)s/%(processName)s] %(message)s', 'CELERY_LOG_LEVEL': '20', 'CELERY_LOG_FILE': '', 'CELERY_LOG_REDIRECT': '1', 'CELERY_LOG_REDIRECT_LEVEL': 'WARNING', 'AIRFLOW_CTX_DAG_OWNER': 'airflow', 'AIRFLOW_CTX_DAG_ID': 'primeiro-teste', 'AIRFLOW_CTX_TASK_ID': 'extract', 'AIRFLOW_CTX_EXECUTION_DATE': '2022-12-13T16:18:17.185417+00:00', 'AIRFLOW_CTX_DAG_RUN_ID': 'manual__2022-12-13T16:18:17.185417+00:00'})
There is no proxy variables, so the script cannot access outside information.
I've even debugged within a DAG which were the DNS servers, to see if they were correct. The result was positive.
The only way I got the script to work was by getting these environ variables defined before running an HTTP request:
os.environ['HTTP_PROXY'] = os.environ['http_proxy'] = os.environ['HTTPS_PROXY'] = os.environ['https_proxy'] = "PROXY STRING"
I was hoping to find a way to get these variables defined for all DAGs, but when I set them like Tomasz, I can't seem to use them if they don't start with the "AIRFLOW" prefix.