<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>SQL</title><link>http://blogs.acceleration.net/ryan/category/22.aspx</link><description>Things dealing with SQL.</description><managingEditor>Ryan</managingEditor><dc:language>en-US</dc:language><generator>.Text Version 0.95.2004.102</generator><item><dc:creator>Ryan</dc:creator><title>NETWORKIO locks on SQL Server 2000</title><link>http://blogs.acceleration.net/ryan/archive/2006/05/12/2870.aspx</link><pubDate>Fri, 12 May 2006 13:52:00 GMT</pubDate><guid>http://blogs.acceleration.net/ryan/archive/2006/05/12/2870.aspx</guid><description>&lt;p&gt;
	      Recently I was doing some perfomance tuning for a client, who kept getting timeouts.  The timeouts were being caused by some locking queries, with a lock type of "NETWORKIO".  This is odd, as everything was running on the same server, so it shouldn't really have much network IO when connecting to 127.0.0.1.
	    &lt;/p&gt;
	    &lt;h3&gt;My solution&lt;/h3&gt;
	    &lt;p&gt;
	      After much frustraction and googling, I saw someone recommend updating statistics using &lt;a href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_sp_ua-uz_14kz.asp"&gt;sp_updatestats&lt;/a&gt; to speed up queries, which might resolve NETWORKIO locks faster.  That made all the difference.  Now the site is zippy as hell.  But why did that work?
	    &lt;/p&gt;
	    &lt;p&gt;
	      After some cursory research, I found that &lt;a href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_sp_ua-uz_14kz.asp"&gt;sp_updatestats&lt;/a&gt; is a convienence function for running &lt;a href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_ua-uz_1mpf.asp"&gt;UPDATE STATISTICS&lt;/a&gt; on every table in the current database.  The statistics they talk about are about the keys in each index on the table.  The stats are then analyzed by the query optimizer to get the best usage of indexes.  If you do a "display estimated execution plan" on a query, the stats are whats used to make those estimates.  Then it made sense.
	    &lt;/p&gt;
	    &lt;p&gt;
	      In an earlier attempt to speed everything up, I had taken a trace of the activity and ran the index tuning wizard on about an hours worth of actual usage.  Applying the recommended indexes sped up the site nicely, but the real benefit wasn't realized because the statistics on the new indices hadn't been calculated, so the query optimizer couldn't do its job very well.  After getting my stats lined up, the optimizer kicked in and everything was very responsive.
	    &lt;/p&gt;
	    &lt;h3&gt;Conclusion&lt;/h3&gt;
	    &lt;p&gt;
	      If you change your indexes about, run &lt;code&gt;EXEC sp_updatestats&lt;/code&gt;.  There is an option on a database maintanence plan to do this, and I believe I'm going to start enabling it.
	    &lt;/p&gt;&lt;img src ="http://blogs.acceleration.net/ryan/aggbug/2870.aspx" width = "1" height = "1" /&gt;</description><body xmlns="http://www.w3.org/1999/xhtml"><p>
	      Recently I was doing some perfomance tuning for a client, who kept getting timeouts.  The timeouts were being caused by some locking queries, with a lock type of "NETWORKIO".  This is odd, as everything was running on the same server, so it shouldn't really have much network IO when connecting to 127.0.0.1.
	    </p>
	    <h3>My solution</h3>
	    <p>
	      After much frustraction and googling, I saw someone recommend updating statistics using <a href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_sp_ua-uz_14kz.asp">sp_updatestats</a> to speed up queries, which might resolve NETWORKIO locks faster.  That made all the difference.  Now the site is zippy as hell.  But why did that work?
	    </p>
	    <p>
	      After some cursory research, I found that <a href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_sp_ua-uz_14kz.asp">sp_updatestats</a> is a convienence function for running <a href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_ua-uz_1mpf.asp">UPDATE STATISTICS</a> on every table in the current database.  The statistics they talk about are about the keys in each index on the table.  The stats are then analyzed by the query optimizer to get the best usage of indexes.  If you do a "display estimated execution plan" on a query, the stats are whats used to make those estimates.  Then it made sense.
	    </p>
	    <p>
	      In an earlier attempt to speed everything up, I had taken a trace of the activity and ran the index tuning wizard on about an hours worth of actual usage.  Applying the recommended indexes sped up the site nicely, but the real benefit wasn't realized because the statistics on the new indices hadn't been calculated, so the query optimizer couldn't do its job very well.  After getting my stats lined up, the optimizer kicked in and everything was very responsive.
	    </p>
	    <h3>Conclusion</h3>
	    <p>
	      If you change your indexes about, run <code>EXEC sp_updatestats</code>.  There is an option on a database maintanence plan to do this, and I believe I'm going to start enabling it.
	    </p><img src ="http://blogs.acceleration.net/ryan/aggbug/2870.aspx" width = "1" height = "1" /></body></item><item><dc:creator>Ryan</dc:creator><title>Getting a random sample with SQL Server 2000 revisted</title><link>http://blogs.acceleration.net/ryan/archive/2006/02/24/2832.aspx</link><pubDate>Fri, 24 Feb 2006 11:38:00 GMT</pubDate><guid>http://blogs.acceleration.net/ryan/archive/2006/02/24/2832.aspx</guid><description>&lt;p&gt;
				  I received a &lt;a href="http://blogs.acceleration.net/ryan/archive/2006/02/09/2820.aspx#2830"&gt;helpful comment&lt;/a&gt; on my &lt;a href="http://blogs.acceleration.net/ryan/archive/2006/02/09/2820.aspx"&gt;last entry&lt;/a&gt;, and sat down to try out this seemingly ideal 
				  solution:
				  &lt;/p&gt;&lt;div class="Code"&gt;
&lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="comment"&gt;--limit results to @limit rows&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SET&lt;/span&gt;&lt;span&gt; ROWCOUNT @limit &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;DELETE&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; #&lt;/span&gt;&lt;span class="keyword"&gt;temp&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;WHERE&lt;/span&gt;&lt;span&gt; Id &lt;/span&gt;&lt;span class="op"&gt;NOT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="op"&gt;IN&lt;/span&gt;&lt;span&gt;( &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="comment"&gt;--due to rowcount, this should be my random sample of @limit Ids&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;TOP&lt;/span&gt;&lt;span&gt; 100 PERCENT Id &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; Items &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;ORDER&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;BY&lt;/span&gt;&lt;span&gt; NEWID() &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;) &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="comment"&gt;--clear the rowcount, so results aren't limited&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SET&lt;/span&gt;&lt;span&gt; ROWCOUNT 0 &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
				  &lt;/div&gt;
				  It didn't work, and I got really confused when I tried to figure out why, so I started making some
				simple tests.  
				
				&lt;h3&gt;Setup and basic test&lt;/h3&gt;
				&lt;p&gt;
				  I started by creating a table variable, and populating it with some data:
				  &lt;/p&gt;&lt;div class="Code"&gt;
&lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="keyword"&gt;DECLARE&lt;/span&gt;&lt;span&gt; @limiter &lt;/span&gt;&lt;span class="keyword"&gt;TABLE&lt;/span&gt;&lt;span&gt; (id &lt;/span&gt;&lt;span class="keyword"&gt;int&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="op"&gt;not&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="op"&gt;null&lt;/span&gt;&lt;span&gt;) &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="comment"&gt;--get our set into storage&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;INSERT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;INTO&lt;/span&gt;&lt;span&gt; @limiter (id) ( &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;TOP&lt;/span&gt;&lt;span&gt; 2000 Id &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; Items &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;) &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
				  &lt;/div&gt;
				Next step, lets see if TOP 100 PERCENT works with ROWCOUNT the way I'd like it to:
				&lt;div class="Code"&gt;
&lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="keyword"&gt;SET&lt;/span&gt;&lt;span&gt; ROWCOUNT 10 &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;TOP&lt;/span&gt;&lt;span&gt; 100 PERCENT id &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; @limiter &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="keyword"&gt;ORDER&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;BY&lt;/span&gt;&lt;span&gt; NEWID() &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
				  &lt;/div&gt;
				Running this gives me 10 rows, and running it again give me 10 different rows, so my randomish requirement is met.
				  
				&lt;h3&gt;Subquery test&lt;/h3&gt;
				&lt;p&gt;
				  Next test is my goal, the randomish sample.  I make the 10 row random sample in a subquery, and then delete everything not in that sample.  Yes, this is kinda backwards, but the actual code is more complicated than my simplification here, so its easier to generate all the possibilities then remove all that don't fall into our sample.  
				  &lt;/p&gt;&lt;div class="Code"&gt;
				  &lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="comment"&gt;--limit SELECT results&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SET&lt;/span&gt;&lt;span&gt; ROWCOUNT 10 &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;DELETE&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; @limiter &lt;/span&gt;&lt;span class="keyword"&gt;WHERE&lt;/span&gt;&lt;span&gt; Id &lt;/span&gt;&lt;span class="op"&gt;NOT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="op"&gt;IN&lt;/span&gt;&lt;span&gt; ( &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;TOP&lt;/span&gt;&lt;span&gt; 100 PERCENT id &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; @limiter &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="keyword"&gt;ORDER&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;BY&lt;/span&gt;&lt;span&gt; NEWID() &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;) &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="comment"&gt;--clear the rowcount, so results aren't limited&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SET&lt;/span&gt;&lt;span&gt; ROWCOUNT 0 &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; * &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; @limiter &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
				  &lt;/div&gt;
				I expect this to return 10 rows.  It returns 2000, the full contents of @limiter.  That implies the TOP 100 PERCENT in the subquery returned the full 2000 rows, so nothing was deleted.  Checking the execution plan verifies this.  SET ROWCOUNT doesn't work on subqueries?  Or maybe the TOP 100 PERCENT is confusing SQL Server, so I'll try using a TOP N:
				&lt;div class="Code"&gt;
&lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="keyword"&gt;SET&lt;/span&gt;&lt;span&gt; ROWCOUNT 10 &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;DELETE&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; @limiter &lt;/span&gt;&lt;span class="keyword"&gt;WHERE&lt;/span&gt;&lt;span&gt; Id &lt;/span&gt;&lt;span class="op"&gt;NOT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="op"&gt;IN&lt;/span&gt;&lt;span&gt; ( &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;TOP&lt;/span&gt;&lt;span&gt; 20000 id &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; @limiter &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="keyword"&gt;ORDER&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;BY&lt;/span&gt;&lt;span&gt; NEWID() &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;) &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="comment"&gt;--clear the rowcount, so results aren't limited&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SET&lt;/span&gt;&lt;span&gt; ROWCOUNT 0 &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="func"&gt;COUNT&lt;/span&gt;&lt;span&gt;(id) &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; @limiter &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
				  &lt;/div&gt;
				Nope, still get 2000 rows in @limiter after the delete.  For grins, I start playing with the TOP statement, and get some confusing results:
				&lt;table border="1"&gt;
					 &lt;tr&gt;
						&lt;th&gt;TOP N&lt;/th&gt;&lt;th&gt;Rows left in @limiter&lt;/th&gt;
					 &lt;/tr&gt;
					 &lt;tr&gt;
						&lt;td&gt;TOP 20000&lt;/td&gt;&lt;td&gt;2000&lt;/td&gt;
					 &lt;/tr&gt;
					 &lt;tr&gt;
						&lt;td&gt;TOP 2000&lt;/td&gt;&lt;td&gt;2000&lt;/td&gt;
					 &lt;/tr&gt;
					 &lt;tr&gt;
						&lt;td&gt;TOP 1999&lt;/td&gt;&lt;td&gt;1999&lt;/td&gt;
					 &lt;/tr&gt;
					 &lt;tr&gt;
						&lt;td&gt;TOP 1998&lt;/td&gt;&lt;td&gt;1997&lt;/td&gt;
					 &lt;/tr&gt;
					 &lt;tr&gt;
						&lt;td colspan="2"&gt;Huh?  The top 1998 should have returned 1998 rows; the 2 rows not in that sample should have been deleted.  Lemme run that again to make sure I didn't mess up.&lt;/td&gt;
					 &lt;/tr&gt;
					 &lt;tr&gt;
						&lt;td&gt;TOP 1998&lt;/td&gt;&lt;td&gt;2000&lt;/td&gt;
					 &lt;/tr&gt;
					 &lt;tr&gt;
						&lt;td colspan="2"&gt;What the hell?  Is this some weird query caching thing? Let me run it a few more times...&lt;/td&gt;
					 &lt;/tr&gt;
					 &lt;tr&gt;
						&lt;td&gt;TOP 1998&lt;/td&gt;&lt;td&gt;1998&lt;/td&gt;
					 &lt;/tr&gt;
					 &lt;tr&gt;
						&lt;td&gt;TOP 1998&lt;/td&gt;&lt;td&gt;1999&lt;/td&gt;
					 &lt;/tr&gt;
					 &lt;tr&gt;
						&lt;td colspan="2"&gt;Seriously, WTF.  The execution plan just creates more questions:
						  &lt;img src="http://birdman.acceleration.net/ryan/top_bullshit.png" /&gt;
						  It's somehow processing 4M rows from a table variable with 2K rows!?  Then the TOP 1998 gets performed in the Filter operation, which reduces it to 1999 rows?  Ok, so 2K * 2K = 4M, so the WHERE Id NOT IN () is effectively performing a cross join, which makes sense, but why the 1999 rows?
					 &lt;/td&gt;&lt;/tr&gt;
					 &lt;tr&gt;
						&lt;td&gt;TOP 10&lt;/td&gt;&lt;td&gt;1990&lt;/td&gt;
					 &lt;/tr&gt;
				  &lt;/table&gt;
				Ok, so something is seriously inconsistent with ROWCOUNT and TOP in subqueries.  When I take the SET ROWCOUNT out, it still is inconsistent.  When I take out the ORDER BY NEWID(), it is still inconsistent.
				
				&lt;h3&gt;TOP in a subquery is broken?&lt;/h3&gt;
				&lt;p&gt;
				  Consider this repro:
				  &lt;/p&gt;&lt;div class="Code"&gt;
&lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="keyword"&gt;DECLARE&lt;/span&gt;&lt;span&gt; @t &lt;/span&gt;&lt;span class="keyword"&gt;TABLE&lt;/span&gt;&lt;span&gt; (id &lt;/span&gt;&lt;span class="keyword"&gt;int&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="op"&gt;not&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="op"&gt;null&lt;/span&gt;&lt;span&gt;) &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;INSERT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;INTO&lt;/span&gt;&lt;span&gt; @t (id) ( &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;TOP&lt;/span&gt;&lt;span&gt; 20 Id &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; [someTable] &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;) &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="func"&gt;COUNT&lt;/span&gt;&lt;span&gt;(Id) &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; @t &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;WHERE&lt;/span&gt;&lt;span&gt; Id &lt;/span&gt;&lt;span class="op"&gt;IN&lt;/span&gt;&lt;span&gt; ( &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;TOP&lt;/span&gt;&lt;span&gt; 10 Id &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; @t &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="keyword"&gt;ORDER&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;BY&lt;/span&gt;&lt;span&gt; NEWID() &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;) &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
				  &lt;/div&gt;
				What do you think the count should be?  Hint: there's more than one answer.  Run that query repeatedly, and bask in inconsistency.  But... try this one:
				&lt;div class="Code"&gt;
				  &lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="func"&gt;COUNT&lt;/span&gt;&lt;span&gt;(Id) &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; [someTable] &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;WHERE&lt;/span&gt;&lt;span&gt; Id &lt;/span&gt;&lt;span class="op"&gt;IN&lt;/span&gt;&lt;span&gt; ( &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;TOP&lt;/span&gt;&lt;span&gt; 10 Id &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; [someTable] &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="keyword"&gt;ORDER&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;BY&lt;/span&gt;&lt;span&gt; NEWID() &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;) &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
				&lt;/div&gt;
				And bam, we get 10 every time.  So its not TOP and ORDER BY in subqueries, maybe its something to do with table variables?  So, with that in mind, lets go back to the original test, but using a temp table instead of a table variable.
				
				&lt;h3&gt;Out with table variables&lt;/h3&gt;
				&lt;p&gt;
				  So, back to the beginning:
				  &lt;/p&gt;&lt;div class="Code"&gt;
&lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;TOP&lt;/span&gt;&lt;span&gt; 2000 Id &lt;/span&gt;&lt;span class="keyword"&gt;INTO&lt;/span&gt;&lt;span&gt; #limiter &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; Items &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="comment"&gt;--limit SELECT results&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SET&lt;/span&gt;&lt;span&gt; ROWCOUNT 10 &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;DELETE&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; #limiter &lt;/span&gt;&lt;span class="keyword"&gt;WHERE&lt;/span&gt;&lt;span&gt; Id &lt;/span&gt;&lt;span class="op"&gt;NOT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="op"&gt;IN&lt;/span&gt;&lt;span&gt; ( &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;TOP&lt;/span&gt;&lt;span&gt; 100 PERCENT id &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; #limiter &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="keyword"&gt;ORDER&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;BY&lt;/span&gt;&lt;span&gt; NEWID() &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;) &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="comment"&gt;--clear the rowcount, so results aren't limited&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SET&lt;/span&gt;&lt;span&gt; ROWCOUNT 0 &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="func"&gt;COUNT&lt;/span&gt;&lt;span&gt;(id) &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; #limiter &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
				  &lt;/div&gt;
				Result: 1990 rows.  It should have 10.  So, table variables aren't the problem.
				
				&lt;h3&gt;Fine, you win, SQL Server&lt;/h3&gt;
				&lt;p&gt;
				  Fine.  If you want it that badly MSSQL, you can have it.  I'll keep my slimy solution from the &lt;a href="http://blogs.acceleration.net/ryan/archive/2006/02/09/2820.aspx"&gt;last entry&lt;/a&gt; (for now), as you obviously don't want to let me do anything in a straighforward manner.  I'll try to forget the inconsistencies I found today, but it'll take time to heal.  Be patient with me.  And don't get all jealous if you see me installing &lt;a href="http://www.postgresql.org/"&gt;postgresql&lt;/a&gt; to see if it can do this better than you.  You know I'm locked in to you.
				&lt;/p&gt;&lt;img src ="http://blogs.acceleration.net/ryan/aggbug/2832.aspx" width = "1" height = "1" /&gt;</description><body xmlns="http://www.w3.org/1999/xhtml"><p>
				  I received a <a href="http://blogs.acceleration.net/ryan/archive/2006/02/09/2820.aspx#2830">helpful comment</a> on my <a href="http://blogs.acceleration.net/ryan/archive/2006/02/09/2820.aspx">last entry</a>, and sat down to try out this seemingly ideal 
				  solution:
				  </p><div class="Code">
<table class="dp-sql" border="0" cellpadding="0" cellspacing="0"><tbody><tr></tr><tr><td class="line"><span></span><span class="comment">--limit results to @limit rows</span><span> </span></td></tr><tr><td class="line"><span class="keyword">SET</span><span> ROWCOUNT @limit </span></td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="keyword">DELETE</span><span> </span><span class="keyword">FROM</span><span> #</span><span class="keyword">temp</span><span> </span><span class="keyword">WHERE</span><span> Id </span><span class="op">NOT</span><span> </span><span class="op">IN</span><span>( </span></td></tr><tr><td class="line"><span class="comment">--due to rowcount, this should be my random sample of @limit Ids</span><span> </span></td></tr><tr><td class="line"><span class="keyword">SELECT</span><span> </span><span class="keyword">TOP</span><span> 100 PERCENT Id </span><span class="keyword">FROM</span><span> Items </span></td></tr><tr><td class="line"><span class="keyword">ORDER</span><span> </span><span class="keyword">BY</span><span> NEWID() </span></td></tr><tr><td class="line">) </td></tr><tr><td class="line"><span class="comment">--clear the rowcount, so results aren't limited</span><span> </span></td></tr><tr><td class="line"><span class="keyword">SET</span><span> ROWCOUNT 0 </span></td></tr></tbody></table>
				  </div>
				  It didn't work, and I got really confused when I tried to figure out why, so I started making some
				simple tests.  
				
				<h3>Setup and basic test</h3>
				<p>
				  I started by creating a table variable, and populating it with some data:
				  </p><div class="Code">
<table class="dp-sql" border="0" cellpadding="0" cellspacing="0"><tbody><tr></tr><tr><td class="line"><span></span><span class="keyword">DECLARE</span><span> @limiter </span><span class="keyword">TABLE</span><span> (id </span><span class="keyword">int</span><span> </span><span class="op">not</span><span> </span><span class="op">null</span><span>) </span></td></tr><tr><td class="line"><span class="comment">--get our set into storage</span><span> </span></td></tr><tr><td class="line"><span class="keyword">INSERT</span><span> </span><span class="keyword">INTO</span><span> @limiter (id) ( </span></td></tr><tr><td class="line">    <span class="keyword">SELECT</span><span> </span><span class="keyword">TOP</span><span> 2000 Id </span></td></tr><tr><td class="line">    <span class="keyword">FROM</span><span> Items </span></td></tr><tr><td class="line">) </td></tr></tbody></table>
				  </div>
				Next step, lets see if TOP 100 PERCENT works with ROWCOUNT the way I'd like it to:
				<div class="Code">
<table class="dp-sql" border="0" cellpadding="0" cellspacing="0"><tbody><tr></tr><tr><td class="line"><span></span><span class="keyword">SET</span><span> ROWCOUNT 10 </span></td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="keyword">SELECT</span><span> </span><span class="keyword">TOP</span><span> 100 PERCENT id </span><span class="keyword">FROM</span><span> @limiter </span></td></tr><tr><td class="line">    <span class="keyword">ORDER</span><span> </span><span class="keyword">BY</span><span> NEWID() </span></td></tr></tbody></table>
				  </div>
				Running this gives me 10 rows, and running it again give me 10 different rows, so my randomish requirement is met.
				  
				<h3>Subquery test</h3>
				<p>
				  Next test is my goal, the randomish sample.  I make the 10 row random sample in a subquery, and then delete everything not in that sample.  Yes, this is kinda backwards, but the actual code is more complicated than my simplification here, so its easier to generate all the possibilities then remove all that don't fall into our sample.  
				  </p><div class="Code">
				  <table class="dp-sql" border="0" cellpadding="0" cellspacing="0"><tbody><tr></tr><tr><td class="line"><span></span><span class="comment">--limit SELECT results</span><span> </span></td></tr><tr><td class="line"><span class="keyword">SET</span><span> ROWCOUNT 10 </span></td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="keyword">DELETE</span><span> </span><span class="keyword">FROM</span><span> @limiter </span><span class="keyword">WHERE</span><span> Id </span><span class="op">NOT</span><span> </span><span class="op">IN</span><span> ( </span></td></tr><tr><td class="line">    <span class="keyword">SELECT</span><span> </span><span class="keyword">TOP</span><span> 100 PERCENT id </span><span class="keyword">FROM</span><span> @limiter </span></td></tr><tr><td class="line">    <span class="keyword">ORDER</span><span> </span><span class="keyword">BY</span><span> NEWID() </span></td></tr><tr><td class="line">) </td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="comment">--clear the rowcount, so results aren't limited</span><span> </span></td></tr><tr><td class="line"><span class="keyword">SET</span><span> ROWCOUNT 0 </span></td></tr><tr><td class="line"><span class="keyword">SELECT</span><span> * </span><span class="keyword">FROM</span><span> @limiter </span></td></tr></tbody></table>
				  </div>
				I expect this to return 10 rows.  It returns 2000, the full contents of @limiter.  That implies the TOP 100 PERCENT in the subquery returned the full 2000 rows, so nothing was deleted.  Checking the execution plan verifies this.  SET ROWCOUNT doesn't work on subqueries?  Or maybe the TOP 100 PERCENT is confusing SQL Server, so I'll try using a TOP N:
				<div class="Code">
<table class="dp-sql" border="0" cellpadding="0" cellspacing="0"><tbody><tr></tr><tr><td class="line"><span></span><span class="keyword">SET</span><span> ROWCOUNT 10 </span></td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="keyword">DELETE</span><span> </span><span class="keyword">FROM</span><span> @limiter </span><span class="keyword">WHERE</span><span> Id </span><span class="op">NOT</span><span> </span><span class="op">IN</span><span> ( </span></td></tr><tr><td class="line">    <span class="keyword">SELECT</span><span> </span><span class="keyword">TOP</span><span> 20000 id </span><span class="keyword">FROM</span><span> @limiter </span></td></tr><tr><td class="line">    <span class="keyword">ORDER</span><span> </span><span class="keyword">BY</span><span> NEWID() </span></td></tr><tr><td class="line">) </td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="comment">--clear the rowcount, so results aren't limited</span><span> </span></td></tr><tr><td class="line"><span class="keyword">SET</span><span> ROWCOUNT 0 </span></td></tr><tr><td class="line"><span class="keyword">SELECT</span><span> </span><span class="func">COUNT</span><span>(id) </span><span class="keyword">FROM</span><span> @limiter </span></td></tr></tbody></table>
				  </div>
				Nope, still get 2000 rows in @limiter after the delete.  For grins, I start playing with the TOP statement, and get some confusing results:
				<table border="1">
					 <tr>
						<th>TOP N</th><th>Rows left in @limiter</th>
					 </tr>
					 <tr>
						<td>TOP 20000</td><td>2000</td>
					 </tr>
					 <tr>
						<td>TOP 2000</td><td>2000</td>
					 </tr>
					 <tr>
						<td>TOP 1999</td><td>1999</td>
					 </tr>
					 <tr>
						<td>TOP 1998</td><td>1997</td>
					 </tr>
					 <tr>
						<td colspan="2">Huh?  The top 1998 should have returned 1998 rows; the 2 rows not in that sample should have been deleted.  Lemme run that again to make sure I didn't mess up.</td>
					 </tr>
					 <tr>
						<td>TOP 1998</td><td>2000</td>
					 </tr>
					 <tr>
						<td colspan="2">What the hell?  Is this some weird query caching thing? Let me run it a few more times...</td>
					 </tr>
					 <tr>
						<td>TOP 1998</td><td>1998</td>
					 </tr>
					 <tr>
						<td>TOP 1998</td><td>1999</td>
					 </tr>
					 <tr>
						<td colspan="2">Seriously, WTF.  The execution plan just creates more questions:
						  <img src="http://birdman.acceleration.net/ryan/top_bullshit.png" />
						  It's somehow processing 4M rows from a table variable with 2K rows!?  Then the TOP 1998 gets performed in the Filter operation, which reduces it to 1999 rows?  Ok, so 2K * 2K = 4M, so the WHERE Id NOT IN () is effectively performing a cross join, which makes sense, but why the 1999 rows?
					 </td></tr>
					 <tr>
						<td>TOP 10</td><td>1990</td>
					 </tr>
				  </table>
				Ok, so something is seriously inconsistent with ROWCOUNT and TOP in subqueries.  When I take the SET ROWCOUNT out, it still is inconsistent.  When I take out the ORDER BY NEWID(), it is still inconsistent.
				
				<h3>TOP in a subquery is broken?</h3>
				<p>
				  Consider this repro:
				  </p><div class="Code">
<table class="dp-sql" border="0" cellpadding="0" cellspacing="0"><tbody><tr></tr><tr><td class="line"><span></span><span class="keyword">DECLARE</span><span> @t </span><span class="keyword">TABLE</span><span> (id </span><span class="keyword">int</span><span> </span><span class="op">not</span><span> </span><span class="op">null</span><span>) </span></td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="keyword">INSERT</span><span> </span><span class="keyword">INTO</span><span> @t (id) ( </span></td></tr><tr><td class="line">    <span class="keyword">SELECT</span><span> </span><span class="keyword">TOP</span><span> 20 Id </span><span class="keyword">FROM</span><span> [someTable] </span></td></tr><tr><td class="line">) </td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="keyword">SELECT</span><span> </span><span class="func">COUNT</span><span>(Id) </span><span class="keyword">FROM</span><span> @t </span></td></tr><tr><td class="line"><span class="keyword">WHERE</span><span> Id </span><span class="op">IN</span><span> ( </span></td></tr><tr><td class="line">    <span class="keyword">SELECT</span><span> </span><span class="keyword">TOP</span><span> 10 Id </span><span class="keyword">FROM</span><span> @t </span></td></tr><tr><td class="line">    <span class="keyword">ORDER</span><span> </span><span class="keyword">BY</span><span> NEWID() </span></td></tr><tr><td class="line">) </td></tr></tbody></table>
				  </div>
				What do you think the count should be?  Hint: there's more than one answer.  Run that query repeatedly, and bask in inconsistency.  But... try this one:
				<div class="Code">
				  <table class="dp-sql" border="0" cellpadding="0" cellspacing="0"><tbody><tr></tr><tr><td class="line"><span></span><span class="keyword">SELECT</span><span> </span><span class="func">COUNT</span><span>(Id) </span><span class="keyword">FROM</span><span> [someTable] </span></td></tr><tr><td class="line"><span class="keyword">WHERE</span><span> Id </span><span class="op">IN</span><span> ( </span></td></tr><tr><td class="line">    <span class="keyword">SELECT</span><span> </span><span class="keyword">TOP</span><span> 10 Id </span><span class="keyword">FROM</span><span> [someTable] </span></td></tr><tr><td class="line">    <span class="keyword">ORDER</span><span> </span><span class="keyword">BY</span><span> NEWID() </span></td></tr><tr><td class="line">) </td></tr></tbody></table>
				</div>
				And bam, we get 10 every time.  So its not TOP and ORDER BY in subqueries, maybe its something to do with table variables?  So, with that in mind, lets go back to the original test, but using a temp table instead of a table variable.
				
				<h3>Out with table variables</h3>
				<p>
				  So, back to the beginning:
				  </p><div class="Code">
<table class="dp-sql" border="0" cellpadding="0" cellspacing="0"><tbody><tr></tr><tr><td class="line"><span></span><span class="keyword">SELECT</span><span> </span><span class="keyword">TOP</span><span> 2000 Id </span><span class="keyword">INTO</span><span> #limiter </span></td></tr><tr><td class="line">    <span class="keyword">FROM</span><span> Items </span></td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="comment">--limit SELECT results</span><span> </span></td></tr><tr><td class="line"><span class="keyword">SET</span><span> ROWCOUNT 10 </span></td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="keyword">DELETE</span><span> </span><span class="keyword">FROM</span><span> #limiter </span><span class="keyword">WHERE</span><span> Id </span><span class="op">NOT</span><span> </span><span class="op">IN</span><span> ( </span></td></tr><tr><td class="line">    <span class="keyword">SELECT</span><span> </span><span class="keyword">TOP</span><span> 100 PERCENT id </span><span class="keyword">FROM</span><span> #limiter </span></td></tr><tr><td class="line">    <span class="keyword">ORDER</span><span> </span><span class="keyword">BY</span><span> NEWID() </span></td></tr><tr><td class="line">) </td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="comment">--clear the rowcount, so results aren't limited</span><span> </span></td></tr><tr><td class="line"><span class="keyword">SET</span><span> ROWCOUNT 0 </span></td></tr><tr><td class="line"><span class="keyword">SELECT</span><span> </span><span class="func">COUNT</span><span>(id) </span><span class="keyword">FROM</span><span> #limiter </span></td></tr></tbody></table>
				  </div>
				Result: 1990 rows.  It should have 10.  So, table variables aren't the problem.
				
				<h3>Fine, you win, SQL Server</h3>
				<p>
				  Fine.  If you want it that badly MSSQL, you can have it.  I'll keep my slimy solution from the <a href="http://blogs.acceleration.net/ryan/archive/2006/02/09/2820.aspx">last entry</a> (for now), as you obviously don't want to let me do anything in a straighforward manner.  I'll try to forget the inconsistencies I found today, but it'll take time to heal.  Be patient with me.  And don't get all jealous if you see me installing <a href="http://www.postgresql.org/">postgresql</a> to see if it can do this better than you.  You know I'm locked in to you.
				</p><img src ="http://blogs.acceleration.net/ryan/aggbug/2832.aspx" width = "1" height = "1" /></body></item><item><dc:creator>Ryan</dc:creator><title>Getting a randomish sample with SQL Server 2000</title><link>http://blogs.acceleration.net/ryan/archive/2006/02/09/2820.aspx</link><pubDate>Thu, 09 Feb 2006 19:10:00 GMT</pubDate><guid>http://blogs.acceleration.net/ryan/archive/2006/02/09/2820.aspx</guid><description>&lt;p&gt;
				  I was recently presented with a seemingly simple task, write a query to return a random&lt;span style="color:blue;" title="for some values of random"&gt;*&lt;/span&gt; sample of rows
				  from a table, with the number of rows to pull determined at runtime.  
				  The surrounding problem is a bit more complex, but this was the specific task I was trying to accomplish,
				  and SQL Server 2000 didn't make it very easy.  The rest of my task involved deleting rows where the id wasn't in my random sample.
				&lt;/p&gt;
				&lt;h4&gt;First try, my almost-dream syntax&lt;/h4&gt;
				&lt;p&gt;My first attempt was a bit naive:
				&lt;/p&gt;&lt;div&gt;
&lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;TOP&lt;/span&gt;&lt;span&gt; @limit Id &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; Items &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;ORDER&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;BY&lt;/span&gt;&lt;span&gt; NEWID() &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
				&lt;/div&gt;
				&lt;code&gt;@limit&lt;/code&gt; is a variable which has how many rows I should be returning.
				I order by NEWID(), which makes a GUID for every row, as this is the best way I know of to get it ordered randomly.  I don't
				think its a terribly good way, but its the best I know.  If you have a better way, please comment or email.
			 
				&lt;p&gt;
				  That snippet will of course fail, giving this error message:
				&lt;/p&gt;&lt;div&gt;Incorrect syntax near '@limit'.&lt;/div&gt;
				Ok, so I can't dynamically set the TOP.  Fine.
			 
				&lt;h4&gt;Use ROWCOUNT&lt;/h4&gt;
				&lt;p&gt;
				  By using SET ROWCOUNT @limit, I could instruct SQL Server to only return @limit rows when performing a SELECT.  So, this was the query to 
				  delete rows not in my random sample:
				  &lt;/p&gt;&lt;div&gt;
&lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="comment"&gt;--limit results to @limit rows&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SET&lt;/span&gt;&lt;span&gt; ROWCOUNT @limit &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;DELETE&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; #&lt;/span&gt;&lt;span class="keyword"&gt;temp&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;WHERE&lt;/span&gt;&lt;span&gt; Id &lt;/span&gt;&lt;span class="op"&gt;NOT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="op"&gt;IN&lt;/span&gt;&lt;span&gt;( &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="comment"&gt;--due to rowcount, this should be my random sample of @limit Ids&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; Id &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; Items &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="keyword"&gt;ORDER&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;BY&lt;/span&gt;&lt;span&gt; NEWID() &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    ) &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="comment"&gt;--clear the rowcount, so results aren't limited&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SET&lt;/span&gt;&lt;span&gt; ROWCOUNT 0 &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
				  &lt;/div&gt;
					That snippet also fails, giving this error:
				&lt;div class="Code"&gt;The ORDER BY clause is invalid in views, inline functions, derived tables, and subqueries, unless TOP is also specified.&lt;/div&gt;
				Ok, so I can't use ORDER BY in the subquery.  Well damn, that would have been really clean.  I could add a &lt;code&gt;TOP 2147483647&lt;/code&gt; in there, but that seemed like a bad idea.
				
				&lt;h4&gt;Use a table variable&lt;/h4&gt;
				&lt;p&gt;
				  Ok, next thought is to use a table variable, insert my rows in a random order, numbering each row, and then pull the rows with a number
				  below my limit:
				&lt;/p&gt;&lt;div&gt;
&lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="comment"&gt;--make the table variable, the num column will count upwards as we insert, numbering the rows&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;DECLARE&lt;/span&gt;&lt;span&gt; @&lt;/span&gt;&lt;span class="keyword"&gt;temp&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;TABLE&lt;/span&gt;&lt;span&gt; (id &lt;/span&gt;&lt;span class="keyword"&gt;int&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="op"&gt;NOT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="op"&gt;NULL&lt;/span&gt;&lt;span&gt;, num &lt;/span&gt;&lt;span class="keyword"&gt;int&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="op"&gt;NOT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="op"&gt;NULL&lt;/span&gt;&lt;span&gt; IDENTITY(1,1)) &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="comment"&gt;--insert into the table in random order&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;INSERT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;INTO&lt;/span&gt;&lt;span&gt; @&lt;/span&gt;&lt;span class="keyword"&gt;temp&lt;/span&gt;&lt;span&gt; (id) &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;(&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; Id  &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; Items &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;ORDER&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;BY&lt;/span&gt;&lt;span&gt; NEWID()) &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="comment"&gt;--get out our sample&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; Id &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; @&lt;/span&gt;&lt;span class="keyword"&gt;temp&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;WHERE&lt;/span&gt;&lt;span&gt; num &amp;lt;= @limit &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
				&lt;/div&gt;
				That snippet also fails, giving this error:
				&lt;div class="Code"&gt;Incorrect syntax near the keyword 'ORDER'.&lt;/div&gt;
				Oh right, I can't use ORDER BY in the subquery.  Fine.
				
				&lt;h4&gt;Use a temporary table and INSERT INTO&lt;/h4&gt;
				&lt;p&gt;
				  I was trying to avoid temporary tables, since they have some performance issues, but figured maybe there wasn't any other way.  So,
				  I broke down and used a temp table:
				  &lt;/p&gt;&lt;div&gt;
				  &lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="comment"&gt;--select into the temp table, using IDENTITY to get my row counter&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; Id, IDENTITY(&lt;/span&gt;&lt;span class="keyword"&gt;int&lt;/span&gt;&lt;span&gt;, 1,1) [num] &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;INTO&lt;/span&gt;&lt;span&gt; #&lt;/span&gt;&lt;span class="keyword"&gt;temp&lt;/span&gt;&lt;span&gt;  &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; Items &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;ORDER&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;BY&lt;/span&gt;&lt;span&gt; NEWID() &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="comment"&gt;--get out our sample&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; Id &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; #&lt;/span&gt;&lt;span class="keyword"&gt;temp&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;WHERE&lt;/span&gt;&lt;span&gt; num &amp;lt; @limit &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="comment"&gt;--clean up&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;DROP&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;TABLE&lt;/span&gt;&lt;span&gt; #&lt;/span&gt;&lt;span class="keyword"&gt;temp&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
				  &lt;/div&gt;
				That snippet also fails, giving this error:
				&lt;div class="Code"&gt;Cannot add identity column, using the SELECT INTO statement, to table '#temp', which already has column 'Id' that inherits the identity property.&lt;/div&gt;
				Ok... So the 'identity' property gets inherited into the temp table, and so I can't specify another identity column.  
				I did some searching in the docs, and there didn't seem to be a way to get an autoincrement field without it
				being an identity field.  Ok, fine, I know a retarded way to get around that.
				
				&lt;h4&gt;Use a temporary table, INSERT INTO, and a table variable to mask the identity column&lt;/h4&gt;
				&lt;p&gt;
				  I actually wrote a variant of this:
				  &lt;/p&gt;&lt;div&gt;
				  &lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="comment"&gt;--Make a table to copy my ids into&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;DECLARE&lt;/span&gt;&lt;span&gt; @retarded &lt;/span&gt;&lt;span class="keyword"&gt;TABLE&lt;/span&gt;&lt;span&gt; (id &lt;/span&gt;&lt;span class="keyword"&gt;int&lt;/span&gt;&lt;span&gt;) &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="comment"&gt;--copy all my ids in, but leave the 'identity' property behind&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;INSERT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;INTO&lt;/span&gt;&lt;span&gt; @retarded (id) &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    (&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; Id &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; Items) &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="comment"&gt;--select into the temp table, using IDENTITY to get my row counter&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; Id, IDENTITY(&lt;/span&gt;&lt;span class="keyword"&gt;int&lt;/span&gt;&lt;span&gt;, 1,1) [num] &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;INTO&lt;/span&gt;&lt;span&gt; #&lt;/span&gt;&lt;span class="keyword"&gt;temp&lt;/span&gt;&lt;span&gt;  &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; @retarded &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;ORDER&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;BY&lt;/span&gt;&lt;span&gt; NEWID() &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="comment"&gt;--get out our sample&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; Id &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; #&lt;/span&gt;&lt;span class="keyword"&gt;temp&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;WHERE&lt;/span&gt;&lt;span&gt; num &amp;lt; @limit &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="comment"&gt;--clean up&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;DROP&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;TABLE&lt;/span&gt;&lt;span&gt; #&lt;/span&gt;&lt;span class="keyword"&gt;temp&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
				  &lt;/div&gt;
				  This worked, but made me sick to my stomach.  There &lt;em&gt;had&lt;/em&gt; to be a better way.
				
				&lt;h4&gt;Use a temporary table, INSERT INTO, and a dummy expression&lt;/h4&gt;
				&lt;p&gt;
				  So, I dug through some more documentation, finding this gem:
				  &lt;/p&gt;&lt;blockquote&gt;
				  When an existing identity column is selected into a new table, the new column inherits the IDENTITY property, unless one of the following conditions is true: 
				  &lt;ul&gt;
					 &lt;li&gt;The SELECT statement contains a join, GROUP BY clause, or aggregate function.&lt;/li&gt;
					 &lt;li&gt;Multiple SELECT statements are joined by using UNION.&lt;/li&gt;
					 &lt;li&gt;The identity column is listed more than one time in the select list.&lt;/li&gt;
					 &lt;li&gt;The identity column is part of an expression.&lt;/li&gt;
				  &lt;/ul&gt;
				  If any one of these conditions is true, the column is created NOT NULL instead of inheriting the IDENTITY property. All rules and restrictions for the identity columns apply to the new table.
				&lt;/blockquote&gt;
				Ok, so I decided the least despicable thing to do would be to make a dummy expression to drop the identity property:
				&lt;div class="Code"&gt;
&lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="comment"&gt;--select into the temp table, using IDENTITY to get my row counter&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; Id*1 [Id], IDENTITY(&lt;/span&gt;&lt;span class="keyword"&gt;int&lt;/span&gt;&lt;span&gt;, 1,1) [num] &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;INTO&lt;/span&gt;&lt;span&gt; #&lt;/span&gt;&lt;span class="keyword"&gt;temp&lt;/span&gt;&lt;span&gt;  &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; Items &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;ORDER&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;BY&lt;/span&gt;&lt;span&gt; NEWID() &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="comment"&gt;--get out our sample&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; Id &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; #&lt;/span&gt;&lt;span class="keyword"&gt;temp&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;WHERE&lt;/span&gt;&lt;span&gt; num &amp;lt; @limit &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt; &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="comment"&gt;--clean up&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;DROP&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;TABLE&lt;/span&gt;&lt;span&gt; #&lt;/span&gt;&lt;span class="keyword"&gt;temp&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
				  &lt;/div&gt;
				I did some profiling, and doing &lt;code&gt;id*1&lt;/code&gt; was the same as doing &lt;code&gt;id+0&lt;/code&gt;, so I left it as multiplication.
				  
				&lt;p&gt;
				  All in all, very, very frustrating, and I'm not sure if the &lt;code&gt;TOP 2147483647&lt;/code&gt; option above is worse than the hack at the end.  Votes are welcome.&lt;br /&gt;&lt;br /&gt;

				  I hate how much I jump through hoops and reinvent to do what should be very simple tasks.  
				  I mean, c'mon, we've been programming for over 50 years now, and I'm pretty sure pulling a random sample from a dataset is
				  not a new task.  We should be better than this.
				&lt;/p&gt;&lt;img src ="http://blogs.acceleration.net/ryan/aggbug/2820.aspx" width = "1" height = "1" /&gt;</description><body xmlns="http://www.w3.org/1999/xhtml"><p>
				  I was recently presented with a seemingly simple task, write a query to return a random<span style="color:blue;" title="for some values of random">*</span> sample of rows
				  from a table, with the number of rows to pull determined at runtime.  
				  The surrounding problem is a bit more complex, but this was the specific task I was trying to accomplish,
				  and SQL Server 2000 didn't make it very easy.  The rest of my task involved deleting rows where the id wasn't in my random sample.
				</p>
				<h4>First try, my almost-dream syntax</h4>
				<p>My first attempt was a bit naive:
				</p><div>
<table class="dp-sql" border="0" cellpadding="0" cellspacing="0"><tbody><tr></tr><tr><td class="line"><span></span><span class="keyword">SELECT</span><span> </span><span class="keyword">TOP</span><span> @limit Id </span></td></tr><tr><td class="line"><span class="keyword">FROM</span><span> Items </span></td></tr><tr><td class="line"><span class="keyword">ORDER</span><span> </span><span class="keyword">BY</span><span> NEWID() </span></td></tr></tbody></table>
				</div>
				<code>@limit</code> is a variable which has how many rows I should be returning.
				I order by NEWID(), which makes a GUID for every row, as this is the best way I know of to get it ordered randomly.  I don't
				think its a terribly good way, but its the best I know.  If you have a better way, please comment or email.
			 
				<p>
				  That snippet will of course fail, giving this error message:
				</p><div>Incorrect syntax near '@limit'.</div>
				Ok, so I can't dynamically set the TOP.  Fine.
			 
				<h4>Use ROWCOUNT</h4>
				<p>
				  By using SET ROWCOUNT @limit, I could instruct SQL Server to only return @limit rows when performing a SELECT.  So, this was the query to 
				  delete rows not in my random sample:
				  </p><div>
<table class="dp-sql" border="0" cellpadding="0" cellspacing="0"><tbody><tr></tr><tr><td class="line"><span></span><span class="comment">--limit results to @limit rows</span><span> </span></td></tr><tr><td class="line"><span class="keyword">SET</span><span> ROWCOUNT @limit </span></td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="keyword">DELETE</span><span> </span><span class="keyword">FROM</span><span> #</span><span class="keyword">temp</span><span> </span><span class="keyword">WHERE</span><span> Id </span><span class="op">NOT</span><span> </span><span class="op">IN</span><span>( </span></td></tr><tr><td class="line">    <span class="comment">--due to rowcount, this should be my random sample of @limit Ids</span><span> </span></td></tr><tr><td class="line">    <span class="keyword">SELECT</span><span> Id </span><span class="keyword">FROM</span><span> Items </span></td></tr><tr><td class="line">    <span class="keyword">ORDER</span><span> </span><span class="keyword">BY</span><span> NEWID() </span></td></tr><tr><td class="line">    ) </td></tr><tr><td class="line"><span class="comment">--clear the rowcount, so results aren't limited</span><span> </span></td></tr><tr><td class="line"><span class="keyword">SET</span><span> ROWCOUNT 0 </span></td></tr></tbody></table>
				  </div>
					That snippet also fails, giving this error:
				<div class="Code">The ORDER BY clause is invalid in views, inline functions, derived tables, and subqueries, unless TOP is also specified.</div>
				Ok, so I can't use ORDER BY in the subquery.  Well damn, that would have been really clean.  I could add a <code>TOP 2147483647</code> in there, but that seemed like a bad idea.
				
				<h4>Use a table variable</h4>
				<p>
				  Ok, next thought is to use a table variable, insert my rows in a random order, numbering each row, and then pull the rows with a number
				  below my limit:
				</p><div>
<table class="dp-sql" border="0" cellpadding="0" cellspacing="0"><tbody><tr></tr><tr><td class="line"><span></span><span class="comment">--make the table variable, the num column will count upwards as we insert, numbering the rows</span><span> </span></td></tr><tr><td class="line"><span class="keyword">DECLARE</span><span> @</span><span class="keyword">temp</span><span> </span><span class="keyword">TABLE</span><span> (id </span><span class="keyword">int</span><span> </span><span class="op">NOT</span><span> </span><span class="op">NULL</span><span>, num </span><span class="keyword">int</span><span> </span><span class="op">NOT</span><span> </span><span class="op">NULL</span><span> IDENTITY(1,1)) </span></td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="comment">--insert into the table in random order</span><span> </span></td></tr><tr><td class="line"><span class="keyword">INSERT</span><span> </span><span class="keyword">INTO</span><span> @</span><span class="keyword">temp</span><span> (id) </span></td></tr><tr><td class="line">(<span class="keyword">SELECT</span><span> Id  </span></td></tr><tr><td class="line"><span class="keyword">FROM</span><span> Items </span></td></tr><tr><td class="line"><span class="keyword">ORDER</span><span> </span><span class="keyword">BY</span><span> NEWID()) </span></td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="comment">--get out our sample</span><span> </span></td></tr><tr><td class="line"><span class="keyword">SELECT</span><span> Id </span><span class="keyword">FROM</span><span> @</span><span class="keyword">temp</span><span> </span><span class="keyword">WHERE</span><span> num &lt;= @limit </span></td></tr></tbody></table>
				</div>
				That snippet also fails, giving this error:
				<div class="Code">Incorrect syntax near the keyword 'ORDER'.</div>
				Oh right, I can't use ORDER BY in the subquery.  Fine.
				
				<h4>Use a temporary table and INSERT INTO</h4>
				<p>
				  I was trying to avoid temporary tables, since they have some performance issues, but figured maybe there wasn't any other way.  So,
				  I broke down and used a temp table:
				  </p><div>
				  <table class="dp-sql" border="0" cellpadding="0" cellspacing="0"><tbody><tr></tr><tr><td class="line"><span></span><span class="comment">--select into the temp table, using IDENTITY to get my row counter</span><span> </span></td></tr><tr><td class="line"><span class="keyword">SELECT</span><span> Id, IDENTITY(</span><span class="keyword">int</span><span>, 1,1) [num] </span></td></tr><tr><td class="line"><span class="keyword">INTO</span><span> #</span><span class="keyword">temp</span><span>  </span></td></tr><tr><td class="line"><span class="keyword">FROM</span><span> Items </span></td></tr><tr><td class="line"><span class="keyword">ORDER</span><span> </span><span class="keyword">BY</span><span> NEWID() </span></td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="comment">--get out our sample</span><span> </span></td></tr><tr><td class="line"><span class="keyword">SELECT</span><span> Id </span><span class="keyword">FROM</span><span> #</span><span class="keyword">temp</span><span> </span><span class="keyword">WHERE</span><span> num &lt; @limit </span></td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="comment">--clean up</span><span> </span></td></tr><tr><td class="line"><span class="keyword">DROP</span><span> </span><span class="keyword">TABLE</span><span> #</span><span class="keyword">temp</span><span> </span></td></tr></tbody></table>
				  </div>
				That snippet also fails, giving this error:
				<div class="Code">Cannot add identity column, using the SELECT INTO statement, to table '#temp', which already has column 'Id' that inherits the identity property.</div>
				Ok... So the 'identity' property gets inherited into the temp table, and so I can't specify another identity column.  
				I did some searching in the docs, and there didn't seem to be a way to get an autoincrement field without it
				being an identity field.  Ok, fine, I know a retarded way to get around that.
				
				<h4>Use a temporary table, INSERT INTO, and a table variable to mask the identity column</h4>
				<p>
				  I actually wrote a variant of this:
				  </p><div>
				  <table class="dp-sql" border="0" cellpadding="0" cellspacing="0"><tbody><tr></tr><tr><td class="line"><span></span><span class="comment">--Make a table to copy my ids into</span><span> </span></td></tr><tr><td class="line"><span class="keyword">DECLARE</span><span> @retarded </span><span class="keyword">TABLE</span><span> (id </span><span class="keyword">int</span><span>) </span></td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="comment">--copy all my ids in, but leave the 'identity' property behind</span><span> </span></td></tr><tr><td class="line"><span class="keyword">INSERT</span><span> </span><span class="keyword">INTO</span><span> @retarded (id) </span></td></tr><tr><td class="line">    (<span class="keyword">SELECT</span><span> Id </span><span class="keyword">FROM</span><span> Items) </span></td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="comment">--select into the temp table, using IDENTITY to get my row counter</span><span> </span></td></tr><tr><td class="line"><span class="keyword">SELECT</span><span> Id, IDENTITY(</span><span class="keyword">int</span><span>, 1,1) [num] </span></td></tr><tr><td class="line"><span class="keyword">INTO</span><span> #</span><span class="keyword">temp</span><span>  </span></td></tr><tr><td class="line"><span class="keyword">FROM</span><span> @retarded </span></td></tr><tr><td class="line"><span class="keyword">ORDER</span><span> </span><span class="keyword">BY</span><span> NEWID() </span></td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="comment">--get out our sample</span><span> </span></td></tr><tr><td class="line"><span class="keyword">SELECT</span><span> Id </span><span class="keyword">FROM</span><span> #</span><span class="keyword">temp</span><span> </span><span class="keyword">WHERE</span><span> num &lt; @limit </span></td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="comment">--clean up</span><span> </span></td></tr><tr><td class="line"><span class="keyword">DROP</span><span> </span><span class="keyword">TABLE</span><span> #</span><span class="keyword">temp</span><span> </span></td></tr></tbody></table>
				  </div>
				  This worked, but made me sick to my stomach.  There <em>had</em> to be a better way.
				
				<h4>Use a temporary table, INSERT INTO, and a dummy expression</h4>
				<p>
				  So, I dug through some more documentation, finding this gem:
				  </p><blockquote>
				  When an existing identity column is selected into a new table, the new column inherits the IDENTITY property, unless one of the following conditions is true: 
				  <ul>
					 <li>The SELECT statement contains a join, GROUP BY clause, or aggregate function.</li>
					 <li>Multiple SELECT statements are joined by using UNION.</li>
					 <li>The identity column is listed more than one time in the select list.</li>
					 <li>The identity column is part of an expression.</li>
				  </ul>
				  If any one of these conditions is true, the column is created NOT NULL instead of inheriting the IDENTITY property. All rules and restrictions for the identity columns apply to the new table.
				</blockquote>
				Ok, so I decided the least despicable thing to do would be to make a dummy expression to drop the identity property:
				<div class="Code">
<table class="dp-sql" border="0" cellpadding="0" cellspacing="0"><tbody><tr></tr><tr><td class="line"><span></span><span class="comment">--select into the temp table, using IDENTITY to get my row counter</span><span> </span></td></tr><tr><td class="line"><span class="keyword">SELECT</span><span> Id*1 [Id], IDENTITY(</span><span class="keyword">int</span><span>, 1,1) [num] </span></td></tr><tr><td class="line"><span class="keyword">INTO</span><span> #</span><span class="keyword">temp</span><span>  </span></td></tr><tr><td class="line"><span class="keyword">FROM</span><span> Items </span></td></tr><tr><td class="line"><span class="keyword">ORDER</span><span> </span><span class="keyword">BY</span><span> NEWID() </span></td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="comment">--get out our sample</span><span> </span></td></tr><tr><td class="line"><span class="keyword">SELECT</span><span> Id </span><span class="keyword">FROM</span><span> #</span><span class="keyword">temp</span><span> </span><span class="keyword">WHERE</span><span> num &lt; @limit </span></td></tr><tr><td class="line"> </td></tr><tr><td class="line"><span class="comment">--clean up</span><span> </span></td></tr><tr><td class="line"><span class="keyword">DROP</span><span> </span><span class="keyword">TABLE</span><span> #</span><span class="keyword">temp</span><span> </span></td></tr></tbody></table>
				  </div>
				I did some profiling, and doing <code>id*1</code> was the same as doing <code>id+0</code>, so I left it as multiplication.
				  
				<p>
				  All in all, very, very frustrating, and I'm not sure if the <code>TOP 2147483647</code> option above is worse than the hack at the end.  Votes are welcome.<br /><br />

				  I hate how much I jump through hoops and reinvent to do what should be very simple tasks.  
				  I mean, c'mon, we've been programming for over 50 years now, and I'm pretty sure pulling a random sample from a dataset is
				  not a new task.  We should be better than this.
				</p><img src ="http://blogs.acceleration.net/ryan/aggbug/2820.aspx" width = "1" height = "1" /></body></item><item><dc:creator>Ryan</dc:creator><title>NOT IN doesn't like NULLs</title><link>http://blogs.acceleration.net/ryan/archive/2005/07/20/1832.aspx</link><pubDate>Wed, 20 Jul 2005 16:37:00 GMT</pubDate><guid>http://blogs.acceleration.net/ryan/archive/2005/07/20/1832.aspx</guid><description>&lt;p&gt;
      So, I was seeing some weirdness with NOT IN, and reduced it to this minimal case:
      &lt;/p&gt;&lt;pre&gt;SELECT 'foo'
WHERE 1 NOT IN (NULL, 2)
---- 
(0 row(s) affected)&lt;/pre&gt;
      Should that simply see that 1 is not in the set (NULL, 2) and successfully select 'foo'?  Ok... maybe it thinks 1 is actually in that
      set.
      &lt;pre class="Code"&gt;SELECT 'foo'
WHERE 1 IN (NULL, 2)
---- 
(0 row(s) affected)&lt;/pre&gt;
      Hrm, guess not.  Ok, so 1 is certainly not in the set of (NULL, 2).  Maybe its something weird with NOT IN.
      &lt;pre class="Code"&gt;SELECT 'foo'
WHERE NOT(1 IN (NULL, 2))
---- 
(0 row(s) affected)&lt;/pre&gt;
      Ok... so 1 is not in set, but 1 is also &lt;i&gt;not&lt;/i&gt; not in the set.  So maybe there's something weird with IN and NULLs.
      &lt;pre class="Code"&gt;SELECT 'foo'
WHERE 1 IN (NULL, 2, 1)
---- 
foo

(1 row(s) affected)&lt;/pre&gt;
      No, IN works just fine when 1 is in the set.  So maybe NOT IN is just broken?
      &lt;pre class="Code"&gt;SELECT 'foo'
WHERE 1 NOT IN (2, 3)
---- 
foo

(1 row(s) affected)&lt;/pre&gt;
      No, NOT IN works just fine, as long as you don't have any nulls.  Is that the lesson here?  Never use NOT IN with NULLs?      
   
   &lt;p&gt;
      From the MSDN docs (paraphrased):
      &lt;/p&gt;&lt;blockquote&gt;
         a list of expressions to test for a match. All expressions must be of the same type as the test_expression.
      &lt;/blockquote&gt;
      So I guess because NULL isn't the same type as 1, it is just failing?  
      &lt;pre class="Code"&gt;SELECT 'foo'
WHERE 1 NOT IN (CAST(NULL as int), 2)
---- 
(0 row(s) affected)&lt;/pre&gt;
      Didn't think that'd work.  
   
   &lt;p&gt;
      I guess this means anywhere I'm using NOT IN with a subquery, I need to be sure to wrap the selected column in an ISNULL.  Can anyone
      tell me I'm wrong on this?  Pretty please?
   &lt;/p&gt;&lt;img src ="http://blogs.acceleration.net/ryan/aggbug/1832.aspx" width = "1" height = "1" /&gt;</description><body xmlns="http://www.w3.org/1999/xhtml"><p>
      So, I was seeing some weirdness with NOT IN, and reduced it to this minimal case:
      </p><pre>SELECT 'foo'
WHERE 1 NOT IN (NULL, 2)
---- 
(0 row(s) affected)</pre>
      Should that simply see that 1 is not in the set (NULL, 2) and successfully select 'foo'?  Ok... maybe it thinks 1 is actually in that
      set.
      <pre class="Code">SELECT 'foo'
WHERE 1 IN (NULL, 2)
---- 
(0 row(s) affected)</pre>
      Hrm, guess not.  Ok, so 1 is certainly not in the set of (NULL, 2).  Maybe its something weird with NOT IN.
      <pre class="Code">SELECT 'foo'
WHERE NOT(1 IN (NULL, 2))
---- 
(0 row(s) affected)</pre>
      Ok... so 1 is not in set, but 1 is also <i>not</i> not in the set.  So maybe there's something weird with IN and NULLs.
      <pre class="Code">SELECT 'foo'
WHERE 1 IN (NULL, 2, 1)
---- 
foo

(1 row(s) affected)</pre>
      No, IN works just fine when 1 is in the set.  So maybe NOT IN is just broken?
      <pre class="Code">SELECT 'foo'
WHERE 1 NOT IN (2, 3)
---- 
foo

(1 row(s) affected)</pre>
      No, NOT IN works just fine, as long as you don't have any nulls.  Is that the lesson here?  Never use NOT IN with NULLs?      
   
   <p>
      From the MSDN docs (paraphrased):
      </p><blockquote>
         a list of expressions to test for a match. All expressions must be of the same type as the test_expression.
      </blockquote>
      So I guess because NULL isn't the same type as 1, it is just failing?  
      <pre class="Code">SELECT 'foo'
WHERE 1 NOT IN (CAST(NULL as int), 2)
---- 
(0 row(s) affected)</pre>
      Didn't think that'd work.  
   
   <p>
      I guess this means anywhere I'm using NOT IN with a subquery, I need to be sure to wrap the selected column in an ISNULL.  Can anyone
      tell me I'm wrong on this?  Pretty please?
   </p><img src ="http://blogs.acceleration.net/ryan/aggbug/1832.aspx" width = "1" height = "1" /></body></item><item><dc:creator>Ryan</dc:creator><title>Thoughts on Importing Data</title><link>http://blogs.acceleration.net/ryan/archive/2005/05/31/1166.aspx</link><pubDate>Tue, 31 May 2005 11:33:00 GMT</pubDate><guid>http://blogs.acceleration.net/ryan/archive/2005/05/31/1166.aspx</guid><description>&lt;p&gt;
      A lot of my job involves importing data, and over the years I've found a few nice ways to do this.  As Monday was a holiday (and I didn't want
      to do any &lt;em&gt;actual&lt;/em&gt; work) I figured I'd spend some time to put more of my brain on the web.
      As always, this is only my personal experiences about what seems to work for me, and if you know of better ways to do it, please leave some comments.
   &lt;/p&gt;
   &lt;p&gt;
      There are couple of major cases I've come across the need to import data, and they should be treated a little differently:
      &lt;/p&gt;&lt;ul&gt;
         &lt;li&gt;Initially importing a customer's old data into a new database schema.&lt;/li&gt;
         &lt;li&gt;Habitual imports of data from other systems.&lt;/li&gt;
      &lt;/ul&gt;
      I'll talk about initial imports now, since the habitual ones follow a lot of the same guidelines.
   
   
   &lt;p&gt;
      These tend to be a hairy.  The data you get from the customer is often is an inconvienent format, and in sorry shape.  The whole
      reason they need a new database is because their old one is god-awful, and doesn't meet their needs.  Here we have to deal with schema changes,
      invalid data, inconsistent relations, and usually a bunch of data the customer isn't really going to use, but needs in there anyway.      
   &lt;/p&gt;
   &lt;p&gt;
      For these large imports you want to automate everything you can.  You're obviously going to have some script or program to select from 
      one and insert into the other, and you want to make the import a one step operation.  You want to push the button then go get a beer
      and pray it worked.&lt;br /&gt;
         There are two ways I've approached this, and they have their ups and downs.
   &lt;/p&gt;
   
   &lt;h2&gt;Get it all in the DB, then munge.&lt;/h2&gt;
   &lt;p&gt;
   Copy the source data, &lt;u&gt;in it's original schema&lt;/u&gt;, into your database, and then run queries to convert the old schema to the new schema.&lt;br /&gt;
   On SQL Server, DTS packages are a good way to automate this.  An easy way to make one is to use Enterprise Manager:
   &lt;/p&gt;&lt;ol&gt;
      &lt;li&gt;Browse to the table list for your database.&lt;/li&gt;
      &lt;li&gt;Right click, choose All Tasks-&amp;gt;Import data&lt;/li&gt;
      &lt;li&gt;Follow the wizard, choosing data sources and data destinations.  When you choose destination tables, make
      sure you don't have any naming conflicts.  To make it easy for me to tell what is imported, I often prefix 
      the imported table names with "imp_" or "import_".&lt;/li&gt;
      &lt;li&gt;At the end of the wizard, you have the option to save the process.  Do so however you choose, DTS packages are pretty 
      easy to deal with, so that's my preference.&lt;/li&gt;
   &lt;/ol&gt;
   &lt;br /&gt;
   After you have the source data/schema in your destination schema, you get to do the hard part: the SQL script to convert from one schema to another.
   Some general tips:
   &lt;ul&gt;
      &lt;li&gt;Be generous with PRINT statments and comments.  I frequently use PRINT statements as comments, so when its actually running I can easily track the
   progress and have a decent idea of where it breaks.  I use Query Analyzer mostly, and that is notoriously bad about what line the script failed on.&lt;/li&gt;
      &lt;li&gt;
         Use set operations as much as you can.  The major advantage of importing the old data/schema intact is to allow set operations, which is absurdly faster than looping over a dataset.
      &lt;/li&gt;
      &lt;li&gt;Make a clean up script that wipes your destination database.  This can be useful after a failed import, or when the customer gives you a newer copy of the source data.&lt;/li&gt;
      &lt;li&gt;Wrap your import in a transaction.  This allows you to recover automatically from failures.  If you don't, then you'll need to manually run a 
      clean up script after each failure.&lt;/li&gt;
      &lt;li&gt;Keep an eye on your transaction logs.  They can get out of control if they aren't restricted, and an afternoon of debugging an import script
      can balloon the thing to preposterous sizes.  &lt;a href="http://blogs.acceleration.net/ryan/archive/2004/09/09/285.aspx"&gt;Clear the log&lt;/a&gt; 
      regularly, possibly as part of the cleaning script.&lt;/li&gt;
   &lt;/ul&gt;
   There are several common issues you'll run into converting from old schema to new:
   &lt;ul&gt;
      &lt;li&gt;Changing column types.  The old database may have been storing numeric data as text.  Before getting the old data in, you'll need to
      correct all the type conversion errors first.  Manually search the old data for problems, using WHERE ISNUMERIC(column) = 0. Once you find the 
      problem rows, add some UPDATE statements into your import script.  You'll end up with a lot of statements like this:
      &lt;div class="Code"&gt;
      &lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0" id="Table3"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="keyword"&gt;UPDATE&lt;/span&gt;&lt;span&gt; import_munchkin_leagues &lt;/span&gt;&lt;span class="keyword"&gt;SET&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;  members = &lt;span class="string"&gt;'4'&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;WHERE&lt;/span&gt;&lt;span&gt; members = &lt;/span&gt;&lt;span class="string"&gt;'4r'&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
      &lt;/div&gt;
      Now that your data can be converted, I like using DDL statements to evolve the old schema, making it closer to the destination schema.
      &lt;div class="Code"&gt;
         &lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0" id="Table2"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="keyword"&gt;ALTER&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;TABLE&lt;/span&gt;&lt;span&gt; import_munchkin_leagues &lt;/span&gt;&lt;span class="keyword"&gt;ALTER&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;COLUMN&lt;/span&gt;&lt;span&gt; members &lt;/span&gt;&lt;span class="keyword"&gt;int&lt;/span&gt;&lt;span&gt;; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span class="keyword"&gt;ALTER&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;TABLE&lt;/span&gt;&lt;span&gt; import_munchkin_leagues &lt;/span&gt;&lt;span class="keyword"&gt;ALTER&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;COLUMN&lt;/span&gt;&lt;span&gt; dues money &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
      &lt;/div&gt;
      Now the actual data import is trivial:
      &lt;div class="Code"&gt;
      &lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0" id="Table1"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="keyword"&gt;INSERT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;INTO&lt;/span&gt;&lt;span&gt; MunchkinLeagues (&lt;/span&gt;&lt;span class="keyword"&gt;Members&lt;/span&gt;&lt;span&gt;, &lt;/span&gt;&lt;span class="keyword"&gt;Name&lt;/span&gt;&lt;span&gt;, Dues)  &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;  (&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; members, league_name, dues &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; import_munchkin_leagues) &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
      &lt;/div&gt;
      You could do the conversions in the SELECT statement, but I find it a lot easier to think about and debug as seperate steps.
      &lt;/li&gt;
      &lt;li&gt;
         Making lookup tables.  A simple 
         &lt;div class="Code"&gt;
         &lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0" id="Table4"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="keyword"&gt;INSERT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;INTO&lt;/span&gt;&lt;span&gt; SlipperTypes (&lt;/span&gt;&lt;span class="keyword"&gt;Name&lt;/span&gt;&lt;span&gt;) &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;  (&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;DISTINCT&lt;/span&gt;&lt;span&gt; slipper_type &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; import_witches) &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
         &lt;/div&gt;
         works in a lot of cases, but when types are entered by users, you'll end up with a lot of near-duplicates ("Ruby" versus "ruby" versus "rubyt") that will
         need to just be hashed out with the client.  Making a simple tool to let the customer replace all uses of a bad type ("rubyt") with a proper type ("Ruby") is
         a good way to easily fix these, and that task can often be put back on the client.
      &lt;/li&gt;
      &lt;li&gt;
         Maintaining relationships.  If you have more than a flat table to import, then you'll need to maintain the relationships.  I like
         adding a column to an old table that links to it's counterpart in the new schema.  After copying the rows from the old table
         to the new table, update that column the old table to point at it's corresponding row in the new table.  I usually need one of
         those bridges for each major table in the old schema.
      &lt;/li&gt;
      &lt;li&gt;
         Normalizing.  A lot of times the old schema will have columns like: "Color1", "Color2", "ColorN" to get a M-N relationship.  Usually
         the new schema will have a "Colors" table, with a helper table to model the many-to-many relation.
         Use UNIONs to convert those columns to rows:
         &lt;div class="Code"&gt;
         &lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0" id="Table5"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="keyword"&gt;INSERT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;INTO&lt;/span&gt;&lt;span&gt; Colors (&lt;/span&gt;&lt;span class="keyword"&gt;Name&lt;/span&gt;&lt;span&gt;) &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;  (&lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; Color1 &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; import_horses &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;   &lt;span class="keyword"&gt;UNION&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;   &lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; Color2 &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; import_horses &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;   &lt;span class="keyword"&gt;UNION&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;   &lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; Color3 &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; import_horses) &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
         &lt;/div&gt;
         In SQL Server, the UNION operator eliminates duplicates, but I'm not sure about other RDBMS, you might need a DISTINCT in there.
         Now, assuming you've added a column linking to your destination table, you can seed your many-to-many table to import the relationship:
         &lt;div class="Code"&gt;
         &lt;table class="dp-sql" border="0" cellpadding="0" cellspacing="0" id="Table6"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="keyword"&gt;INSERT&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;span class="keyword"&gt;INTO&lt;/span&gt;&lt;span&gt; HorseColors (ColorId, HorseId) &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;  ( &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; Colors.Id, HorseId &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; Colors &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    &lt;span class="op"&gt;JOIN&lt;/span&gt;&lt;span&gt; ( &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;      &lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; Color1, HorseId &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; import_horses &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;      &lt;span class="keyword"&gt;UNION&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;      &lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; Color2, HorseId &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; import_horses &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;      &lt;span class="keyword"&gt;UNION&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;      &lt;span class="keyword"&gt;SELECT&lt;/span&gt;&lt;span&gt; Color3, HorseId &lt;/span&gt;&lt;span class="keyword"&gt;FROM&lt;/span&gt;&lt;span&gt; import_horses &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;    ) ih &lt;span class="keyword"&gt;ON&lt;/span&gt;&lt;span&gt; ih.Color1 = Colors.&lt;/span&gt;&lt;span class="keyword"&gt;Name&lt;/span&gt;&lt;span&gt; &lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="line"&gt;  ) &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
         &lt;/div&gt;
      &lt;/li&gt;
   &lt;/ul&gt;
   
   &lt;h2&gt;Munge in object space&lt;/h2&gt;
   &lt;p&gt;
      O/R mappers are pretty common now, and sometimes its quicker to write a console application, and run the imports in object space.  Odds are you
      already have a data access layer of some kind for your new database.  There are a few pros and cons.&lt;br /&gt;
      &lt;strong&gt;Pros&lt;/strong&gt;
      &lt;/p&gt;&lt;ul&gt;
         &lt;li&gt;If the DALs are generated, this can be faster to write.&lt;/li&gt;
         &lt;li&gt;Sometimes its easier to think in object space.&lt;/li&gt;
      &lt;/ul&gt;
      &lt;strong&gt;Cons&lt;/strong&gt;
      &lt;ul&gt;
         &lt;li&gt;Speed.  All the data is having to go into your app, then to the database.  You also can't easily do set operations in most OO languages,
         so everything gets iterated over at some point.  I had one import application run for about 4 days.&lt;/li&gt;
      &lt;/ul&gt;      
   
   &lt;hr /&gt;
   &lt;p&gt;
      I'll put down my thoughts on habitual imports some other time.  For now, I need to do some of that actual work.
   &lt;/p&gt;&lt;img src ="http://blogs.acceleration.net/ryan/aggbug/1166.aspx" width = "1" height = "1" /&gt;</description><body xmlns="http://www.w3.org/1999/xhtml"><p>
      A lot of my job involves importing data, and over the years I've found a few nice ways to do this.  As Monday was a holiday (and I didn't want
      to do any <em>actual</em> work) I figured I'd spend some time to put more of my brain on the web.
      As always, this is only my personal experiences about what seems to work for me, and if you know of better ways to do it, please leave some comments.
   </p>
   <p>
      There are couple of major cases I've come across the need to import data, and they should be treated a little differently:
      </p><ul>
         <li>Initially importing a customer's old data into a new database schema.</li>
         <li>Habitual imports of data from other systems.</li>
      </ul>
      I'll talk about initial imports now, since the habitual ones follow a lot of the same guidelines.
   
   
   <p>
      These tend to be a hairy.  The data you get from the customer is often is an inconvienent format, and in sorry shape.  The whole
      reason they need a new database is because their old one is god-awful, and doesn't meet their needs.  Here we have to deal with schema changes,
      invalid data, inconsistent relations, and usually a bunch of data the customer isn't really going to use, but needs in there anyway.      
   </p>
   <p>
      For these large imports you want to automate everything you can.  You're obviously going to have some script or program to select from 
      one and insert into the other, and you want to make the import a one step operation.  You want to push the button then go get a beer
      and pray it worked.<br />
         There are two ways I've approached this, and they have their ups and downs.
   </p>
   
   <h2>Get it all in the DB, then munge.</h2>
   <p>
   Copy the source data, <u>in it's original schema</u>, into your database, and then run queries to convert the old schema to the new schema.<br />
   On SQL Server, DTS packages are a good way to automate this.  An easy way to make one is to use Enterprise Manager:
   </p><ol>
      <li>Browse to the table list for your database.</li>
      <li>Right click, choose All Tasks-&gt;Import data</li>
      <li>Follow the wizard, choosing data sources and data destinations.  When you choose destination tables, make
      sure you don't have any naming conflicts.  To make it easy for me to tell what is imported, I often prefix 
      the imported table names with "imp_" or "import_".</li>
      <li>At the end of the wizard, you have the option to save the process.  Do so however you choose, DTS packages are pretty 
      easy to deal with, so that's my preference.</li>
   </ol>
   <br />
   After you have the source data/schema in your destination schema, you get to do the hard part: the SQL script to convert from one schema to another.
   Some general tips:
   <ul>
      <li>Be generous with PRINT statments and comments.  I frequently use PRINT statements as comments, so when its actually running I can easily track the
   progress and have a decent idea of where it breaks.  I use Query Analyzer mostly, and that is notoriously bad about what line the script failed on.</li>
      <li>
         Use set operations as much as you can.  The major advantage of importing the old data/schema intact is to allow set operations, which is absurdly faster than looping over a dataset.
      </li>
      <li>Make a clean up script that wipes your destination database.  This can be useful after a failed import, or when the customer gives you a newer copy of the source data.</li>
      <li>Wrap your import in a transaction.  This allows you to recover automatically from failures.  If you don't, then you'll need to manually run a 
      clean up script after each failure.</li>
      <li>Keep an eye on your transaction logs.  They can get out of control if they aren't restricted, and an afternoon of debugging an import script
      can balloon the thing to preposterous sizes.  <a href="http://blogs.acceleration.net/ryan/archive/2004/09/09/285.aspx">Clear the log</a> 
      regularly, possibly as part of the cleaning script.</li>
   </ul>
   There are several common issues you'll run into converting from old schema to new:
   <ul>
      <li>Changing column types.  The old database may have been storing numeric data as text.  Before getting the old data in, you'll need to
      correct all the type conversion errors first.  Manually search the old data for problems, using WHERE ISNUMERIC(column) = 0. Once you find the 
      problem rows, add some UPDATE statements into your import script.  You'll end up with a lot of statements like this:
      <div class="Code">
      <table class="dp-sql" border="0" cellpadding="0" cellspacing="0" id="Table3"><tbody><tr></tr><tr><td class="line"><span></span><span class="keyword">UPDATE</span><span> import_munchkin_leagues </span><span class="keyword">SET</span><span> </span></td></tr><tr><td class="line">  members = <span class="string">'4'</span><span> </span></td></tr><tr><td class="line"><span class="keyword">WHERE</span><span> members = </span><span class="string">'4r'</span><span> </span></td></tr></tbody></table>
      </div>
      Now that your data can be converted, I like using DDL statements to evolve the old schema, making it closer to the destination schema.
      <div class="Code">
         <table class="dp-sql" border="0" cellpadding="0" cellspacing="0" id="Table2"><tbody><tr></tr><tr><td class="line"><span></span><span class="keyword">ALTER</span><span> </span><span class="keyword">TABLE</span><span> import_munchkin_leagues </span><span class="keyword">ALTER</span><span> </span><span class="keyword">COLUMN</span><span> members </span><span class="keyword">int</span><span>; </span></td></tr><tr><td class="line"><span class="keyword">ALTER</span><span> </span><span class="keyword">TABLE</span><span> import_munchkin_leagues </span><span class="keyword">ALTER</span><span> </span><span class="keyword">COLUMN</span><span> dues money </span></td></tr></tbody></table>
      </div>
      Now the actual data import is trivial:
      <div class="Code">
      <table class="dp-sql" border="0" cellpadding="0" cellspacing="0" id="Table1"><tbody><tr></tr><tr><td class="line"><span></span><span class="keyword">INSERT</span><span> </span><span class="keyword">INTO</span><span> MunchkinLeagues (</span><span class="keyword">Members</span><span>, </span><span class="keyword">Name</span><span>, Dues)  </span></td></tr><tr><td class="line">  (<span class="keyword">SELECT</span><span> members, league_name, dues </span></td></tr><tr><td class="line">    <span class="keyword">FROM</span><span> import_munchkin_leagues) </span></td></tr></tbody></table>
      </div>
      You could do the conversions in the SELECT statement, but I find it a lot easier to think about and debug as seperate steps.
      </li>
      <li>
         Making lookup tables.  A simple 
         <div class="Code">
         <table class="dp-sql" border="0" cellpadding="0" cellspacing="0" id="Table4"><tbody><tr></tr><tr><td class="line"><span></span><span class="keyword">INSERT</span><span> </span><span class="keyword">INTO</span><span> SlipperTypes (</span><span class="keyword">Name</span><span>) </span></td></tr><tr><td class="line">  (<span class="keyword">SELECT</span><span> </span><span class="keyword">DISTINCT</span><span> slipper_type </span><span class="keyword">FROM</span><span> import_witches) </span></td></tr></tbody></table>
         </div>
         works in a lot of cases, but when types are entered by users, you'll end up with a lot of near-duplicates ("Ruby" versus "ruby" versus "rubyt") that will
         need to just be hashed out with the client.  Making a simple tool to let the customer replace all uses of a bad type ("rubyt") with a proper type ("Ruby") is
         a good way to easily fix these, and that task can often be put back on the client.
      </li>
      <li>
         Maintaining relationships.  If you have more than a flat table to import, then you'll need to maintain the relationships.  I like
         adding a column to an old table that links to it's counterpart in the new schema.  After copying the rows from the old table
         to the new table, update that column the old table to point at it's corresponding row in the new table.  I usually need one of
         those bridges for each major table in the old schema.
      </li>
      <li>
         Normalizing.  A lot of times the old schema will have columns like: "Color1", "Color2", "ColorN" to get a M-N relationship.  Usually
         the new schema will have a "Colors" table, with a helper table to model the many-to-many relation.
         Use UNIONs to convert those columns to rows:
         <div class="Code">
         <table class="dp-sql" border="0" cellpadding="0" cellspacing="0" id="Table5"><tbody><tr></tr><tr><td class="line"><span></span><span class="keyword">INSERT</span><span> </span><span class="keyword">INTO</span><span> Colors (</span><span class="keyword">Name</span><span>) </span></td></tr><tr><td class="line">  (<span class="keyword">SELECT</span><span> Color1 </span><span class="keyword">FROM</span><span> import_horses </span></td></tr><tr><td class="line">   <span class="keyword">UNION</span><span> </span></td></tr><tr><td class="line">   <span class="keyword">SELECT</span><span> Color2 </span><span class="keyword">FROM</span><span> import_horses </span></td></tr><tr><td class="line">   <span class="keyword">UNION</span><span> </span></td></tr><tr><td class="line">   <span class="keyword">SELECT</span><span> Color3 </span><span class="keyword">FROM</span><span> import_horses) </span></td></tr></tbody></table>
         </div>
         In SQL Server, the UNION operator eliminates duplicates, but I'm not sure about other RDBMS, you might need a DISTINCT in there.
         Now, assuming you've added a column linking to your destination table, you can seed your many-to-many table to import the relationship:
         <div class="Code">
         <table class="dp-sql" border="0" cellpadding="0" cellspacing="0" id="Table6"><tbody><tr></tr><tr><td class="line"><span></span><span class="keyword">INSERT</span><span> </span><span class="keyword">INTO</span><span> HorseColors (ColorId, HorseId) </span></td></tr><tr><td class="line">  ( </td></tr><tr><td class="line">    <span class="keyword">SELECT</span><span> Colors.Id, HorseId </span><span class="keyword">FROM</span><span> Colors </span></td></tr><tr><td class="line">    <span class="op">JOIN</span><span> ( </span></td></tr><tr><td class="line">      <span class="keyword">SELECT</span><span> Color1, HorseId </span><span class="keyword">FROM</span><span> import_horses </span></td></tr><tr><td class="line">      <span class="keyword">UNION</span><span> </span></td></tr><tr><td class="line">      <span class="keyword">SELECT</span><span> Color2, HorseId </span><span class="keyword">FROM</span><span> import_horses </span></td></tr><tr><td class="line">      <span class="keyword">UNION</span><span> </span></td></tr><tr><td class="line">      <span class="keyword">SELECT</span><span> Color3, HorseId </span><span class="keyword">FROM</span><span> import_horses </span></td></tr><tr><td class="line">    ) ih <span class="keyword">ON</span><span> ih.Color1 = Colors.</span><span class="keyword">Name</span><span> </span></td></tr><tr><td class="line">  ) </td></tr></tbody></table>
         </div>
      </li>
   </ul>
   
   <h2>Munge in object space</h2>
   <p>
      O/R mappers are pretty common now, and sometimes its quicker to write a console application, and run the imports in object space.  Odds are you
      already have a data access layer of some kind for your new database.  There are a few pros and cons.<br />
      <strong>Pros</strong>
      </p><ul>
         <li>If the DALs are generated, this can be faster to write.</li>
         <li>Sometimes its easier to think in object space.</li>
      </ul>
      <strong>Cons</strong>
      <ul>
         <li>Speed.  All the data is having to go into your app, then to the database.  You also can't easily do set operations in most OO languages,
         so everything gets iterated over at some point.  I had one import application run for about 4 days.</li>
      </ul>      
   
   <hr />
   <p>
      I'll put down my thoughts on habitual imports some other time.  For now, I need to do some of that actual work.
   </p><img src ="http://blogs.acceleration.net/ryan/aggbug/1166.aspx" width = "1" height = "1" /></body></item><item><dc:creator>Ryan</dc:creator><title>IMEX=1, of course!</title><link>http://blogs.acceleration.net/ryan/archive/2005/01/11/477.aspx</link><pubDate>Tue, 11 Jan 2005 17:28:00 GMT</pubDate><guid>http://blogs.acceleration.net/ryan/archive/2005/01/11/477.aspx</guid><description>So, I've been writing a little app to combine some Excel spreadsheets.  These sheets all have header rows, so I cand open them up with an OleDbConnection, do a "SELECT * FROM [Sheet1$]"
   and go along my merry way.  The problem is, two of the columns I need to work with just aren't there.  They're on the excel sheet just
   fine, but my OleDbDataReader finds nothing in those columns on any row.  Looking at the sheet, I see that in those columns, there isn't any data for about 12 rows.
   So, I put in zeros at the top of those columns, and then it works fine.  &lt;br /&gt;&lt;br /&gt;
   Is the OleDbConnection really making assumptions about the dataset based on the first row?  After much googling and little success, I try to find a definition of the 
   connectionstring, hoping there's some attribute like "rows to scan for schema" I can set, to tell it to actually read my data.  &lt;a title="Nathan's blog" href="http://blogs.acceleration.net/birdman/"&gt;Nathan&lt;/a&gt; points me to an excellent
   resource, &lt;a href="http://www.connectionstrings.com/"&gt;Connectionstrings.com&lt;/a&gt;, and they kindly let me know that I can specify &lt;code&gt;HDR=Yes;&lt;/code&gt; to indicate that I have a 
   header row in my sheets, and &lt;code&gt;IMEX=1;&lt;/code&gt; which, according to  &lt;a href="http://www.connectionstrings.com/"&gt;Connectionstrings.com&lt;/a&gt;:
   &lt;blockquote&gt;
      tells the driver to always read "intermixed" data columns as text
   &lt;/blockquote&gt;
   Apparently, the two columns in question were, in fact "intermixed" data columns, and once I set that in my connectionstring, all worked fine.&lt;br /&gt;&lt;br /&gt;
   The mixed use of "1" and "Yes" aside, why the hell would your database driver just silently ignore data?  I mean, if "intermixed" data columns is an error
   condition, then have the balls to throw a exception, warning, event log entry, anything.  Don't just not work and expect me to magically know where the problem is.
   I almost reimplemented the whole damn thing using Excel objects and the Office API, and that would've taken me another couple of days.  A pox on the Excel team!&lt;br /&gt;
   &lt;br /&gt;Well, I guess not a pox, because at some point some manager sat them in a room and said "Ok, now lets let people query this using SQL!", which was probably
   punishment enough.&lt;img src ="http://blogs.acceleration.net/ryan/aggbug/477.aspx" width = "1" height = "1" /&gt;</description><body xmlns="http://www.w3.org/1999/xhtml">So, I've been writing a little app to combine some Excel spreadsheets.  These sheets all have header rows, so I cand open them up with an OleDbConnection, do a "SELECT * FROM [Sheet1$]"
   and go along my merry way.  The problem is, two of the columns I need to work with just aren't there.  They're on the excel sheet just
   fine, but my OleDbDataReader finds nothing in those columns on any row.  Looking at the sheet, I see that in those columns, there isn't any data for about 12 rows.
   So, I put in zeros at the top of those columns, and then it works fine.  <br /><br />
   Is the OleDbConnection really making assumptions about the dataset based on the first row?  After much googling and little success, I try to find a definition of the 
   connectionstring, hoping there's some attribute like "rows to scan for schema" I can set, to tell it to actually read my data.  <a title="Nathan's blog" href="http://blogs.acceleration.net/birdman/">Nathan</a> points me to an excellent
   resource, <a href="http://www.connectionstrings.com/">Connectionstrings.com</a>, and they kindly let me know that I can specify <code>HDR=Yes;</code> to indicate that I have a 
   header row in my sheets, and <code>IMEX=1;</code> which, according to  <a href="http://www.connectionstrings.com/">Connectionstrings.com</a>:
   <blockquote>
      tells the driver to always read "intermixed" data columns as text
   </blockquote>
   Apparently, the two columns in question were, in fact "intermixed" data columns, and once I set that in my connectionstring, all worked fine.<br /><br />
   The mixed use of "1" and "Yes" aside, why the hell would your database driver just silently ignore data?  I mean, if "intermixed" data columns is an error
   condition, then have the balls to throw a exception, warning, event log entry, anything.  Don't just not work and expect me to magically know where the problem is.
   I almost reimplemented the whole damn thing using Excel objects and the Office API, and that would've taken me another couple of days.  A pox on the Excel team!<br />
   <br />Well, I guess not a pox, because at some point some manager sat them in a room and said "Ok, now lets let people query this using SQL!", which was probably
   punishment enough.<img src ="http://blogs.acceleration.net/ryan/aggbug/477.aspx" width = "1" height = "1" /></body></item><item><dc:creator>Ryan</dc:creator><title>Clearing a transaction log</title><link>http://blogs.acceleration.net/ryan/archive/2004/09/09/285.aspx</link><pubDate>Thu, 09 Sep 2004 16:25:00 GMT</pubDate><guid>http://blogs.acceleration.net/ryan/archive/2004/09/09/285.aspx</guid><description>Log files can sometimes get out of control and use up too much space.  &lt;br /&gt;
This tends to be a problem when testing import scripts that do a lot of operations, then rollback, you can get crazy log file sizes.  Here's how to clear them:
&lt;ol&gt;
  &lt;li&gt;Open up Query analyzer, connected as a user with admin priveleges.&lt;/li&gt;
  &lt;li&gt;Run this query, substituting "db_name" with the proper database name:
&lt;pre class="Code"&gt;
BACKUP LOG db_name WITH NO_LOG
GO
DBCC SHRINKDATABASE( db_name, 0)
GO&lt;/pre&gt;&lt;/li&gt;
  &lt;li&gt;Check the size of the log file&lt;/li&gt;
&lt;/ol&gt;
Don't do this when the transaction log is important, though.  Anyone know of a better way to do this?&lt;img src ="http://blogs.acceleration.net/ryan/aggbug/285.aspx" width = "1" height = "1" /&gt;</description><body xmlns="http://www.w3.org/1999/xhtml">Log files can sometimes get out of control and use up too much space.  <br />
This tends to be a problem when testing import scripts that do a lot of operations, then rollback, you can get crazy log file sizes.  Here's how to clear them:
<ol>
  <li>Open up Query analyzer, connected as a user with admin priveleges.</li>
  <li>Run this query, substituting "db_name" with the proper database name:
<pre class="Code">
BACKUP LOG db_name WITH NO_LOG
GO
DBCC SHRINKDATABASE( db_name, 0)
GO</pre></li>
  <li>Check the size of the log file</li>
</ol>
Don't do this when the transaction log is important, though.  Anyone know of a better way to do this?<img src ="http://blogs.acceleration.net/ryan/aggbug/285.aspx" width = "1" height = "1" /></body></item></channel></rss>