Wednesday, November 30, 2005

Slowly Changing Dimension Transformation against Oracle

Now here is something I was completely amazed by...

The SCD transformation will work against Oracle with a little tweaking. First, and this is the only time I would use this, use the Oracle for OLE DB Driver. The Microsoft Driver has a problem with the parameters that I haven't figured out yet. The SCD Wizard works as normal, but on the update side, it does not map the OLE DB Command to do the updates correctly. You will have to manually create the parameters under the Input side and then map them. Works like butter after that.

Another thought about these updates. If you are doing a lot of Type I dimensional changes, I would right these to a temp table and then use a one update to match everything by key. Much faster than the row by row action of the OLE DB command.

Scott

Parameterized queries against Oracle...Again!

So here is round 2...

You can also do a parameterized query against Oracle Execute SQL Task. Here is the scoop:

  1. Make sure you are using the Microsoft Driver for OLE DB....you should be using this all the time except for one place...we'll talk about that later.
  2. Create your query, and use ?'s for where your parameters go.
  3. Map the parameters on the Parameters option tab. Make sure that you get the datatypes correct and that Oracle will often expect DB_Decimal for most numbers.
And off you go!

Scott

Monday, November 28, 2005

Parameterized queries against Oracle...

So I needed to get some data out of Oracle the other day and need it to be based off some parameters. Turns out the OLE DB source will not recognize parameters when used against Oracle. Neither the Microsoft or Oracle driver will do it. So I thought I might still be able to use the DataReader source against Oracle, it won't support parameters at all. I ended up creating a workaround that went like this....

First, create a variable called MyStmt and make it a String. Create another variable, MyID as an Integer. Now go to the properties of the MyStmt variable. Set the "EvaluateAsExpression" property to True. Now, read this well...place the SQL statement that you want to create in the "Expression" property, not the Value of the Variable. You will have to do some data conversion to concatanate the string and integer, so email me if you need help with that. Now, go back to your OLE DB source and set the source of the SQL statement to a Variable, and choose the correct variable name. Voila! A parameterized query! Don't forget to use the Microsoft OLE DB for Oracle driver and to check the datatypes on the Numerics...see my earlier blog posts...

Hope this helps....

Upcoming post...how to get SSIS to work against Oracle on a 64-bit server...

Scott
scott(at)barrettzone.com

Thursday, November 17, 2005

The final story...how to get data out of Oracle....

So after waaaayyyy too much testing, if you're going to get Oracle, here are your best options...

1. ADO.Net DataReader with a ADO.Net for ODBC driver. Create a system DSN, and you're good to go. No problems with the NUMBER datatypes I mentioned above, as they are all converted to floats.

2. ADO.Net DataReader with the ADO.Net Oracle Client - Not as fast, and all numbers with decimal places will be cut off.

3. OLE DB connection - Use the Microsoft Driver first. It at least sets the precision and scale for you. It is wrong if you have decimal places, but you can adjust that in the advanced editor.

4. OLE DB Connection with Oracle OLE DB Driver - this is pure luck if it works, and most number fields will bomb out! You can manually creaet every column from scratch in the advanced editor, but by the time you get to the third column, you'll switch to one of the other methods....oh yeah...this is the slowest read times as well.....

Scott Barrett

Tuesday, November 15, 2005

SSRS Reporting Models for Oracle????

Here's a nice little shortcut that will get your users off your back!

I wanted to create a Report Model so my users could write their own reports. The problem is that the only two data sources you can create a model against are SQL Server and Analysis Services. No Oracle in that list (obvious reasons, but it would have been nice)!!!!!

Here is the work around for Analysis Services savvy folks....

You can create a DSV in Analysis Services the points to Oracle. Use the ADO.Net provider for best results. Build your cube and dimensions, and deploy the cube. Now create a data source in the SQL Management Studio to SSAS and choose the cube you just created. Right click and choose "Generate Model". It's so simple is scary! And because it's Analysis Services, it is fast. Now if you are creating "green-bar" reports, there are better ways to do this. If you want a ton of flexibility, this is the way to go.

If you have no experience in Analysis Services, don't try this at home. The learning curve for AS 2005 is quite steep.

If the idea of a Microsoft BI stack on top of anything hasn't hit you yet, go ask Barnes & Noble, CompUSA, Wal-Mart and Home Depot. The UDM is what brings all of this together........

Scott

MySQL connections...

Hey...data seems to be everywhere, and today I ran across it in a new place. Easy fix, used the Data Reader with the ADO.Net for ODBC connection. Voila! You have data!

Turns out that there is an OLE DB provider for MySQL. I'll try it out tomorrow and see if it works.

Try this link to get the OLE DB driver...

OLEDB for MySQL


Scott

Saturday, November 12, 2005

Excel data not coming into SSIS right?

So sometimes you have data that does not come up right in Excel because either it's not formatted, or it's got mixed types. The end of it all is you just need it to come in as text and you can convert it later. My good friend, Robert Skoglund gave me this hint and I dug up some info on it....

Basically, add the option IMEX=1 to the ConnectionString for Excel. This forces the columns to be treated all as text.

This from ConnectionStrings.com....

"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\MyExcel.xls;Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"""
"HDR=Yes;" indicates that the first row contains columnnames, not data
"IMEX=1;" tells the driver to always read "intermixed" data columns as text
TIP! SQL syntax: "SELECT * FROM [sheet1$]" - i.e. worksheet name followed by a "$" and wrapped in "[" "]" brackets.



From Microsoft.....
http://msdn2.microsoft.com/en-us/library/ms254500.aspx

Credit, where credit is due....
http://www.dicks-blog.com/archives/2004/06/03/external-data-mixed-data-types/

Loading dimensional data from Excel

So I've got all this Excel data and I need to load a dimension table. What I really need is the distinct list of dimensional keys and values and then do the lookup, but the Excel data source only lets me pick the entire worksheet, so I would have to stage it first....WRONG!!!!!

You can write queries as sources against Excel! Set up your Select Distinct statement and you are ready to go. If you want the easy way to see it done, use the Import/Export wizard in the BI studio and when they ask you about the source, choose the query option.

Just makin' life easier...

Scott Barrett

Theory behind how SSIS gets so fast...

I owe someone credit for this, and when I find that article again, I'll give them full credit...

Unlike most other ETL tools that deal with data on a row by row basis, SSIS uses memory buffers to hold the data in place. So think of it like this.....

Instead of data passing through the transformations to different places in memory, think of it as the data being stationary, and the transformations passing over the data. When you're done transforming, it comes out of memory into your destination. Moral of the story....the more memory, the better.....

To put it in simpler terms, sometimes the chicken crosses the road, sometimes the road passes beneath the chicken. In SSIS, it is the latter.

Wow...that was deep......

Scott Barrett

SSIS and Oracle Sequences

So I need to load a table, and I need an auto-incrementing field filled in. Easy enough I think....oops...I'm using Oracle and their infernal sequences. Now compared to the identity field in SQL Server or the Auto-Number in Access, these things are just abominations. Two ways to get around this, so here we go.....

  1. Generate your own sequence numbers....pretty easy with the Script component and a variable. Search the web, email me or get Donald Farmer's book on scripting and this is done in a minute and works like a champ. Why? This now becomes an in-memory operation on the set of data, and it is quick....
  2. Use the OLE DB command to do an insert and reference the MySequence.NextVal in your insert statement. Don't try this at home if you have a lot of rows to generate numbers for. After about 50,000 records, your going to want to get some coffee.....
There you go, so of now and create some numbers on the fly....

Scott Barrett

Friday, November 11, 2005

OK....how do I get data out of an Oracle Database using SSIS?

Ok...this is going to be a series....

I've struggled through this in so many configurations that it's driving me crazy. Let me break it down to you so you can get some work done.

  1. DataReader Source - Use the ADO.Net for ODBC and set up an ODBC connection. This is by far the fastest I have seen. Why??? The OLE DB layer has too much in the middle to make this fast. This method also solves a big problem with OLE DB and Oracle: NUMBERs!!!
  2. OLE DB Source - This works, but use the Microsoft for Oracle Driver. Why? It sorts the tables when using OpenRowset, and it handles that pesky NUMBER problem.
So what is the problem with NUMBER?
In Oracle, the NUMBER is like a Variant data type. It can represent any kind of numeric data type. SSIS and other Microsoft tools will see this as a DT_Numeric and blow up if the precision and scale is not set. Why? Because Microsoft will always try to determine the exact data type to make SSIS more efficient. Using the Oracle for OLE DB driver, it will see a NUMBER with no precision or scale specifically set at Numeric(0,0). Now we all know that this will break, because you can't create that! The Microsoft driver goes the other way and defaults it to a Numeric(38,0), the max range. The problem here is what if you had numbers to the right of the decimal.....they're gone!

The fix here is to go in and change the Output columns to a DT_R8, Double Precision Float. This is done in the source and will get your numbers to you right. Now in the real world, people who build Oracle tables would set this properly and it would then be read properly by SSIS.

Go check your vendor apps and see how many NUMBER fields don't have the size and scale set....you'll be as stunned as me.....

SB

Thursday, November 10, 2005

What's with the Code Page error?

So your hooking up to an Oracle data source and your getting an error about the code page not being right. Don't even try to figure this out...here is the easy fix:

On the OLE DB Source, set the AlwaysUseDefaultCodePage property to True.

Problem Solved!

SB

How to load Oracle with SSIS...FAST!!!!!

Here's the deal. Microsoft isn't going to build a fast loader for Oracle. Why? It's not in there best interest to do so! You bought the database to get SSIS anyway, why load something else? Makes sense for them and I can understand it. In the best of worlds, I'd move it to SQL Server anyway.

If you need to hear it for real, Donald Farmer is the Product Manager for SSIS. Catch what he says here.....

http://www.sqljunkies.com/WebLog/donald_farmer/archive/2005/03/13/8819.aspx


But here's the real deal...a lot of us have HUGE Oracle machine laying around, and a lot of people, myself included, leverage that because we own it. Now how do we get the data into it? SSIS, of course! But on it's own, the OLE DB destination to Oracle is just not that fast....actually it's damn slow. But there is a solution!!!!

While Microsoft isn't going to build a loader, SSIS is extensible enough that anyone with enough wit could do it. Enter Persistent....

This company has built a loader that is really just another destination. Drop it in your toolbox and you are good to go! It is fast...fast like 600,000+ records in 24 seconds....from my laptop!!!!

Check them out at www.persistent.co.in Ask for Sidharth...tell him Scott sent you.....

Data Warehousing with Microsoft SQL Server 2005

So now it begins....

I'm Scott Barrett, and through time, I hope to bring you some understanding on using Microsoft SQL Server 2005 to create a useful data warehouse. Without giving the world away, I work for a major health research facility. We have dozens of systems spread out over 100's of machine. My job....bring it all together and see if we can come up with a cure.

In house, we have Oracle, SQL Server, MySQL, Excel, Access, Cache, Pervasive, Mainframe and some data that I don't even know what is! I've got to pull it all together, build a model and make some sense of this. So here is where I am starting. Our data warehouse is built in Oracle 10g. Our staging and DW will reside there. We are loading the data with SQL Server Integration Services, included with our SQL 2005 installations. I know, I know...why the mismatch? Let's not go there....a good friend once told me, "..sometimes we get to make the decisions, sometimes we just get commands..."

At this point, we're a few weeks in with a few deliverables due early in the new year, and multiple phase planned. Give me a few days to catch up where we are and we'll see how this turns out....

SB