When working with Databricks pipelines and workflows, failures are inevitable. While Databricks provides built-in notifications for job failures, these alerts are often not customizable and may not fit specific reporting or formatting needs. A more flexible and cost-effective approach is to set up custom email notifications that include pipeline details and error messages in a structured format, such as an Excel attachment.
This blog walks through approaches to implement custom notifications using SMTP, SendGrid, Azure Logic Apps, and Microsoft Graph API.
Using SMTP with Python inside a Databricks notebook, you can generate an Excel report and send it via email whenever a pipeline fails.
import smtplib from email.mime.multipart import MIMEMultipart from email.mime.base import MIMEBase from email.mime.text import MIMEText from email import encoders from io import BytesIO import pandas as pd #Sample pipeline history df = spark.createDataFrame([ ('pipeline1', 'success', '7min'), ('pipeline1', 'fail', '3min'), ('pipeline1', 'success', '10min') ], ["PipelineName", "Status", "Duration"]) # Convert DataFrame to Excel output = BytesIO() with pd.ExcelWriter(output, engine='xlsxwriter') as writer: df_pd = df.toPandas() df_pd.to_excel(writer, index=False, sheet_name='Sheet1') workbook = writer.book worksheet = writer.sheets['Sheet1']
# Apply formatting header_format = workbook.add_format({ 'bold': True, 'bg_color': '#FFF00', 'border': 1, 'align': 'center', 'valign': 'vcenter' }) for col_num, value in enumerate(df_pd.columns): worksheet.write(0, col_num, value.upper(), header_format) cell_format = workbook.add_format({'border': 1}) for row in range(1, len(df_pd) + 1): for col in range(len(df_pd.columns)): worksheet.write(row, col, df_pd.iloc[row-1, col], cell_format) for i, col in enumerate(df_pd.columns): worksheet.set_column(i, i, 20) output.seek(0) # Email configuration sender = "[email protected]" receiver = "[email protected]" subject = "Pipeline Execution Report" body = """Hello Team, Please find the attachment of the latest pipeline report. Thanks, Pipeline Team""" msg = MIMEMultipart() msg['From'] = sender msg['To'] = receiver msg['Subject'] = subject msg.attach(MIMEText(body, 'plain')) part = MIMEBase('application', 'vnd.openxmlformats-officedocument.spreadsheetml.sheet') part.set_payload(output.read()) encoders.encode_base64(part) part.add_header('Content-Disposition', 'attachment; filename="pipeline_report.xlsx"') msg.attach(part) smtp_server = "smtp.office.com" smtp_port = 587 with smtplib.SMTP(smtp_server, smtp_port) as server: server.starttls() server.login(sender, "sender_password") server.send_message(msg) print("Email sent successfully with Excel attachment")
You can automate the notification trigger by scheduling the notebook:
While Databricks provides basic failure notifications, extending them with custom SMTP or Logic App workflows ensures:
This approach is cost-effective, scalable, and easily adaptable for large-scale pipeline monitoring.