SQL Server Nation
For all of your SQL Server needs.
Updating values in one table with values from another table

Sometimes, it is necessary to update a field in one table, with information from another table.  In this article, I will go over with you a few different methods for doing this.  At the bottom there is a link to download the entire SQL script used in this post.

The first thing that we will do is create a few temporary tables to test with.  We will be loading these temporary tables, intentionally leaving the count columns in the #Employers table NULL because after each update we will set them back to NULL to verify that each update statement works.

create table #Employers
(employer_id int primary key clustered
,employer_name nvarchar(50) not null unique
,history_employee_count int null
,current_employee_count int null
)
go
create table #Employees
(employee_id int primary key clustered
,employee_fname nvarchar(50) not null
,employee_lname nvarchar(50) not null
)
go
create table #Employment
(employment_id int identity primary key clustered
,employee_id int not null
,employer_id int not null
,start_date datetime not null
,end_date datetime null
)
go

 

Next we will load the tables with sample data.  This sample data will load 7 "fictional" employers, 8 fictional employees and map those employees to employers

insert into #Employers (employer_id,employer_name) 
values(1,N'Acme Anvil Co.')
insert into #Employers (employer_id,employer_name)

values(2,N'Kwik-E-Mart')
insert into #Employers (employer_id,employer_name)
values(3,N'The Android''s Dungeon & Baseball Card Shop')
insert into #Employers (employer_id,employer_name)

values(4,N'Happy-Go-Lucky Toy Factory')
insert into #Employers (employer_id,employer_name)

values(5,N'SQL Server Nation')
insert into #Employers (employer_id,employer_name)

values(6,N'ABC Co.')
insert into #Employers (employer_id,employer_name)

values(7,N'XYZ Co.')
go
insert into #Employees (employee_id,employee_fname,employee_lname)

values(1,N'Apu',N'Nahasapeemapetilon')
insert into #Employees (employee_id,employee_fname,employee_lname)

values(2,N'Sanjay',N'Nahasapeemapetilon')
insert into #Employees (employee_id,employee_fname,employee_lname)

values(3,N'Jeff',N'Albertson')
insert into #Employees (employee_id,employee_fname,employee_lname)

values(4,N'Peter',N'Griffin')
insert into #Employees (employee_id,employee_fname,employee_lname)

values(5,N'Jonathon',N'Weed')
insert into #Employees (employee_id,employee_fname,employee_lname)

values(6,N'The',N'Roadrunner')
insert into #Employees (employee_id,employee_fname,employee_lname)

values(7,N'Brandon',N'Galderisi')
insert into #Employees (employee_id,employee_fname,employee_lname)

values(8,N'Tim',N'Chapman')
go
insert into #Employment(employee_id,employer_id,start_date,end_date)

values(1,2,'1989-12-17',null)
insert into #Employment(employee_id,employer_id,start_date,end_date)

values(2,2,'1989-12-17',null)
insert into #Employment(employee_id,employer_id,start_date,end_date)

values(3,3,'1989-12-17',null)
insert into #Employment(employee_id,employer_id,start_date,end_date)

values(4,4,'1999-01-31','2002-02-14')
insert into #Employment(employee_id,employer_id,start_date,end_date)

values(5,4,'1999-01-31','2002-02-14')
insert into #Employment(employee_id,employer_id,start_date,end_date)

values(6,1,'1959-01-01','1989-12-31')
insert into #Employment(employee_id,employer_id,start_date,end_date)

values(7,5,'2009-03-04',null)
insert into #Employment(employee_id,employer_id,start_date,end_date)

values(8,5,'2009-03-04',null)
insert into #Employment(employee_id,employer_id,start_date,end_date)

values(7,6,'2007-11-28',null)
insert into #Employment(employee_id,employer_id,start_date,end_date)

values(8,7,'2008-06-01',null)
go

 

Now that we have sample data to work with, let's go over the various methods for performing these updates.

 

The first method that we will use is the correlated sub-select.  If you are familiar with sub-selects, both correlated and non-correlated, then this will be somewhat easy to understand.  But just in case, I'll explain how this works.  A sub-select is a select statement that returns a single value to be used in a select statement.  The sub-select is correlated because it refers to a table that exists solely outside of the sub-select and is used to determine the value returned by the sub-select.

update #employers
set history_employee_count =
(select count(distinct employee_id)
from #employment
where employer_id = #employers.employer_id
)
,current_employee_count =
(select count(distinct employee_id)
from #employment
where employer_id = #employers.employer_id
and start_date <= getdate()
and (end_date > getdate()
or
end_date is null
)
)

go
select * from #Employers

 

As you can see, the counts accurately represent the historical, and current employee counts for each company.  Our goal, will be that each result set be identical to what you see below since this is the accurate result.

 

The next method that can be used in this situation is a JOIN to a derived table.  If you are not familiar with them, a derived table is a select statement wrapped in parentheses that returns more than one column.  One of the columns returned from the derived table is used to JOIN back to the table to be updated and the other is the value that will be used to update #Employers.  In this case, we use individual JOIN statements to retrieve each value for the #Employers table.  You will that the results of this select statement are identical to the first one.

update #employers
set history_employee_count = null
,current_employee_count = null
go
update #employers
set history_employee_count = isnull(hec.history_employee_count,0)
,current_employee_count = isnull(hcc.current_employee_count,0)
from #employers
left join
(select employer_id,
count(distinct employee_id) history_employee_count
from #employment
group by employer_id
) hec
on #employers.employer_id = hec.employer_id
left join
(select employer_id,
count(distinct employee_id) current_employee_count
from #employment
where start_date <= getdate()
and (end_date > getdate()
or
end_date is null
)
group by employer_id
) hcc
on #employers.employer_id = hcc.employer_id

go
select * from #Employers

 

 

 

The last method that we will use is also a JOIN to a derived table, but in this case, we will use a single table to return both values.  Again, you will see that the results of the select statement are identical to the first

update #employers
set history_employee_count = null
,current_employee_count = null
go
update #employers
set history_employee_count = isnull(hec.history_employee_count,0)
,current_employee_count = isnull(hec.current_employee_count,0)
from #employers
left join
(select employer_id
,count(distinct employee_id) history_employee_count
,sum(case when getdate() >= start_date
and (getdate() < end_date or end_date is null)
then 1
else 0
end) current_employee_count
from #employment
group by employer_id
) hec
on #employers.employer_id = hec.employer_id
go
select * from #Employers

 

Finally, this will clean up the temporary tables used in this article.

drop table #Employers
drop table #Employees
drop table #Employment

 

What this shows is that there is more than one way to do a lot of common tasks.  There isn't always a right way, because often times several methods will achieve the same result.

 

To download the entire SQL script used in this post, click here.


Posted 19 Mar 2009 5:24 PM by BrandonGalderisi
Copyright SQL Server Nation 2010
Powered by Community Server (Non-Commercial Edition), by Telligent Systems