Production-Grade MT940 Parsing in Python for Automated Financial Reconciliation

Automated financial reconciliation and ledger matching require deterministic ingestion of SWIFT MT940 bank statements. While the MT940 specification defines a rigid field structure, real-world implementations exhibit significant vendor-specific deviations in narrative formatting (:86:), date encoding (YYMMDD vs YYYYMMDD), transaction code mapping (:61:), and currency placement. For FinOps engineers and accounting technology developers, building a resilient parser demands a state-machine architecture that enforces strict schema validation, maintains cryptographic audit trails, and integrates seamlessly into modern Core Architecture & Bank Feed Ingestion paradigms. This guide provides an implementation-ready blueprint for parsing MT940 files in Python, optimized for high-throughput batch processing, multi-currency normalization, and production-grade fault tolerance.

Ingestion Strategy & Secure Pipeline Configuration

MT940 ingestion operates primarily in batch mode due to end-of-day bank statement generation cycles, though streaming architectures can approximate near-real-time processing by polling SFTP endpoints or consuming webhook-triggered payloads. Regardless of the ingestion cadence, secure credential handling remains non-negotiable. API tokens, SSH keys, and SFTP credentials must be injected via environment variables or a secrets manager (e.g., AWS Secrets Manager, HashiCorp Vault) and never persisted in configuration files or version control. Implement a rotating credential strategy with automated lease renewal to prevent pipeline failures during token expiration windows. When designing the ingestion layer, align your parser architecture with established OFX & MT940 Parser Design principles to ensure idempotent file consumption, duplicate statement detection via the :20: transaction reference, and atomic commit patterns that prevent partial ledger updates.

State-Machine Architecture & Deterministic Parsing

Naive line-by-line splitting or regex-only extraction fails under production conditions due to multiline narratives, optional fields, and vendor-specific whitespace handling. A production MT940 parser must implement a tag-aware finite state machine (FSM) that processes SWIFT field delimiters (:) sequentially, maintains context across line breaks, and enforces strict type coercion. Financial precision mandates the use of decimal.Decimal over IEEE 754 floating-point arithmetic to prevent rounding drift during reconciliation. Date parsing must explicitly handle the YYMMDD format with century inference rules aligned to ISO 8601. The FSM should track three primary states: HEADER, STATEMENT_LINES, and FOOTER, transitioning only upon valid tag recognition.

Implementation: Tag-Aware FSM with Cryptographic Audit Hooks

The following Python implementation utilizes compiled regular expressions for deterministic tag extraction, embeds cryptographic audit hooks for reconciliation traceability, and isolates parsing logic from I/O operations. It adheres to strict financial engineering standards, including explicit debit/credit resolution, comma-to-period normalization, and structured error boundaries.

python
import re
import hashlib
import logging
from datetime import datetime
from typing import List, Dict, Optional, Iterator, Tuple
from decimal import Decimal, InvalidOperation, ROUND_HALF_EVEN
from dataclasses import dataclass, field

logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
logger = logging.getLogger("mt940_parser")

# Precompiled SWIFT tag patterns for deterministic parsing
TAG_PATTERN = re.compile(r"^:(\d{2}[A-Z]?):(.*)$", re.MULTILINE)
DATE_PATTERN = re.compile(r"^\d{6}$")
AMOUNT_PATTERN = re.compile(r"^([DC])(\d+,\d{2})$")

@dataclass
class MT940Transaction:
    value_date: datetime
    entry_date: Optional[datetime]
    debit_credit: str
    amount: Decimal
    transaction_code: str
    reference: str
    narrative: str
    raw_line: str

@dataclass
class MT940Statement:
    transaction_ref: str
    account_id: str
    statement_number: str
    opening_balance: Decimal
    closing_balance: Decimal
    currency: str
    transactions: List[MT940Transaction] = field(default_factory=list)
    audit_hash: str = ""

class MT940AuditHook:
    """Cryptographic audit and reconciliation validator."""
    @staticmethod
    def compute_hash(statement: MT940Statement) -> str:
        payload = f"{statement.transaction_ref}|{statement.account_id}|{statement.opening_balance}|{statement.closing_balance}"
        return hashlib.sha256(payload.encode("utf-8")).hexdigest()

class MT940Parser:
    """Production-grade, tag-aware MT940 state machine."""

    def __init__(self, strict_mode: bool = True):
        self.strict_mode = strict_mode
        self._state = "HEADER"
        self._current_statement: Optional[MT940Statement] = None
        self._pending_narrative: List[str] = []
        self._current_tx: Optional[MT940Transaction] = None

    def parse(self, raw_content: str) -> List[MT940Statement]:
        statements: List[MT940Statement] = []
        lines = raw_content.splitlines()
        self._state = "HEADER"

        for line in lines:
            stripped = line.strip()
            if not stripped:
                continue

            tag_match = TAG_PATTERN.match(stripped)
            if tag_match:
                self._commit_pending_narrative()
                tag, content = tag_match.groups()
                self._process_tag(tag, content, statements)
            else:
                self._pending_narrative.append(stripped)

        self._commit_pending_narrative()
        if self._current_statement:
            self._current_statement.audit_hash = MT940AuditHook.compute_hash(self._current_statement)
            statements.append(self._current_statement)

        return statements

    def _commit_pending_narrative(self):
        if self._pending_narrative and self._current_tx:
            self._current_tx.narrative += "\n".join(self._pending_narrative)
            self._pending_narrative.clear()

    def _resolve_date(self, date_str: str) -> datetime:
        if not DATE_PATTERN.match(date_str):
            raise ValueError(f"Invalid date format: {date_str}")
        yy, mm, dd = int(date_str[:2]), int(date_str[2:4]), int(date_str[4:6])
        year = 2000 + yy if yy < 50 else 1900 + yy
        return datetime(year, mm, dd)

    def _resolve_amount(self, amount_str: str) -> Decimal:
        match = AMOUNT_PATTERN.match(amount_str)
        if not match:
            raise ValueError(f"Invalid amount format: {amount_str}")
        sign, value = match.groups()
        amount = Decimal(value.replace(",", "."))
        return amount.quantize(Decimal("0.01"), rounding=ROUND_HALF_EVEN) if sign == "C" else -amount

    def _process_tag(self, tag: str, content: str, statements: List[MT940Statement]):
        if tag == "20":
            if self._current_statement:
                self._current_statement.audit_hash = MT940AuditHook.compute_hash(self._current_statement)
                statements.append(self._current_statement)
            self._current_statement = MT940Statement(
                transaction_ref=content.strip(),
                account_id="", statement_number="",
                opening_balance=Decimal("0"), closing_balance=Decimal("0"),
                currency="", transactions=[]
            )
            self._state = "HEADER"
        elif tag == "25":
            if self._current_statement: self._current_statement.account_id = content.strip()
        elif tag == "28C":
            if self._current_statement: self._current_statement.statement_number = content.strip()
        elif tag in ("60F", "60M"):
            if self._current_statement:
                parts = content.split()
                self._current_statement.currency = parts[1]
                self._current_statement.opening_balance = self._resolve_amount(parts[2])
        elif tag == "61":
            self._commit_pending_narrative()
            parts = content.split()
            if len(parts) < 4:
                if self.strict_mode: raise ValueError(f"Malformed :61: line: {content}")
                return
            val_date = self._resolve_date(parts[0])
            entry_date = self._resolve_date(parts[1]) if len(parts[1]) == 6 and parts[1] != parts[0] else None
            dc_flag = parts[2][0]
            amount = self._resolve_amount(parts[2][1:])
            tx_code = parts[3] if len(parts) > 3 else ""
            ref = parts[4] if len(parts) > 4 else ""
            self._current_tx = MT940Transaction(
                value_date=val_date, entry_date=entry_date,
                debit_credit=dc_flag, amount=amount,
                transaction_code=tx_code, reference=ref, narrative=""
            )
            self._state = "STATEMENT_LINES"
        elif tag == "62F":
            if self._current_statement:
                parts = content.split()
                self._current_statement.closing_balance = self._resolve_amount(parts[2])
                self._state = "FOOTER"
        elif tag == "86":
            if self._current_tx:
                self._current_tx.narrative += content.strip()
            else:
                self._pending_narrative.append(content.strip())

Multi-Currency Normalization & Ledger Mapping

MT940 does not consistently embed ISO 4217 currency codes across all fields. Currency is typically defined in :60F: and :62F:, but transaction-level amounts (:61:) inherit this context implicitly. For multi-entity or cross-border FinOps operations, implement a deterministic currency normalization layer that:

  1. Validates currency codes against the official ISO 4217 Currency Codes registry.
  2. Applies mid-market FX conversion using timestamped rate snapshots to prevent reconciliation drift.
  3. Maps transaction codes (:61: subfields) to general ledger accounts via a versioned mapping table, ensuring auditability when bank codes change.

Maintain precision to four decimal places during FX conversion, rounding only at the final ledger commit using ROUND_HALF_EVEN to comply with standard accounting practices.

Data Normalization Pipelines & Fault Tolerance

Production reconciliation pipelines require structured validation, retry logic, and dead-letter queue (DLQ) routing. Wrap the parser in a Pydantic or dataclass schema validator to reject malformed payloads before they reach the ledger. Implement exponential backoff with jitter for transient I/O failures, and route unparseable statements to a DLQ with full context preservation.

Leverage Python’s built-in decimal module for all monetary arithmetic to guarantee deterministic results across distributed nodes. Attach structured JSON logging to every parsed transaction, including the raw line hash, parsed values, and reconciliation status. This observability layer enables rapid root-cause analysis when vendor-specific deviations break downstream matching algorithms.

For high-throughput environments, decouple ingestion from parsing using message brokers (Kafka, RabbitMQ). Process MT940 payloads in parallel worker pools, ensuring each worker maintains an isolated state machine instance. Commit parsed statements to the ledger using idempotent upserts keyed on :20: reference + :61: value date + amount. This architecture guarantees exactly-once semantics, prevents duplicate posting, and satisfies SOX/GDPR audit requirements.