Why SSIS Is Hard to Get Right
SQL Server Integration Services (SSIS) looks approachable—drag a few components, connect them, done. Then you hit production with 10 million rows, a flaky source system, and a package that worked in dev falling apart under real load. Here's what experience teaches.
Mistake 1: Not Using Transactions
By default, SSIS doesn't wrap a package execution in a transaction. A failure mid-package leaves the database in a partially-loaded state. Fix:
-- In the package, set the TransactionOption on the package or container
-- Or control it programmatically:
Dim pkg As Package = New Package();
pkg.TransactionMode = TransactionMode.Required;
This ensures atomicity: either the whole package loads or none of it does.
Mistake 2: Ignoring Buffer Size
SSIS processes data in memory buffers. The default buffer size (10MB) works for most loads but can kill performance on wide tables or large LOB columns. Tune it:
-- In the Data Flow Task properties:
DefaultBufferSize = 10485760 // 10MB default
DefaultBufferMaxRows = 10000 // Tune based on row width
For wide tables (100+ columns), reduce buffer size to avoid memory pressure. For narrow tables, increase it.
Mistake 3: No Error Output on Source Components
Data quality issues will happen. If you don't route error rows to a destination, they vanish into the void—package succeeds, data is missing. Route them:
-- Configure error output on every source component
-- Redirect rows to an error table:
ErrorOutputColumnSelection = AllRowsPlusErrorRows
ErrorRowDisposition = RedirectRow
Log the errors so you can investigate and fix upstream.
Mistake 4: Running SSIS on the Production SQL Server
SSIS packages consume significant CPU and memory during execution. Running them on the same server as your database means they compete with your OLTP workload. Use a separate SSIS server or SSIS Catalog on a dedicated instance.
Mistake 5: Hardcoding Connection Strings
Packages with hardcoded connection strings break when you move between environments (dev → staging → prod). Use parameters and configuration files:
-- Instead of hardcoding:
DataSource=prodserver;Initial Catalog=prodDB;
-- Use a package parameter:
DataSource=ServerName // Set at execution time
Store sensitive values in SSISDB environment variables, not in the package.
Mistake 6: Not Logging Package Execution
You can't improve what you don't measure. SSIS has built-in logging—use it:
-- Enable SSIS logging to SSISDB or a custom table
-- Track: start time, end time, rows processed, error messages
-- Use sysutility_get_dtsx_package_execution_rss to query history
Set up alerts for packages that run longer than expected or fail.
Mistake 7: Forgetting to Handle Variable Data Types in Sources
ETL from flat files, Excel, or external databases often has type mismatch issues. The source might return strings where you expect dates, or numbers as text. Validate at the boundary:
-- Use a Data Conversion transformation before the destination
-- Or use a Script Component for complex type handling
-- Never assume the source type will match your destination
Mistake 8: No Restart Points for Long-Running Packages
A 4-hour package that fails at hour 3 needs to restart from scratch without a restart mechanism. Use checkpoints:
-- Enable checkpoint on the package:
pkg.CheckpointUsage = CheckpointUsage.IfExists;
pkg.CheckpointFileName = "\\fileserver\checkpoints\package1.xml";
Write checkpoints on successful task completion. Restart picks up where it left off.
Mistake 9: Not Using Project Deployment Model
The legacy Package Deployment Model (dtutil + file system) has no parameterization, no environments, no logging by default. Migrate to the Project Deployment Model. Benefits:
- Parameters per environment (dev vs. prod connection strings)
- Centralized logging in SSISDB
- Environment variables for sensitive data
- Native SQL Server Agent integration
Conclusion: Production-Grade SSIS
SSIS is a powerful ETL platform when configured properly. The patterns above—transactions, error handling, logging, restartability, and proper deployment model—separate packages that run reliably in production from those that cause 3 AM incidents. Document restart/recovery procedures