Agent Jobs

Find And Extract Sharepoint Documents With SQL

Here’s another example on how you can both search for, and extract individual Sharepoint Documents with SQL. With the image above you can see how on the left pane I’m doing a basic query to find the documents I’m looking for.
All I needed was the Database Name, List Name, File Name, and the URL so I could positively locate the file. I wanted to get extra information about the file so I added the [alldocs].[timecreated] and [docstreams].[size] just to get an idea of the time when the documents were created, and how large the files are.

I then simply copied and pasted the 4 values I needed to extract the file (using the script on the right).

One could always go through Sharepoint and get the files that way; however these are peppered across a variety of different sites. I created the two (Find & Extract) scripts to sure up the process, but ultimately I combined them together into one massive automation so that many thousands of files could be identified, and extracted by just one click.

One of the more common questions I get around this process is what happens if you run it more than once, and forget to remove the file?

Anyway; hope you find this helpful. Both scripts are below.


use [WSS_Content_Database];
set nocount on

	'database'	= db_name()
,	'time_created'	= left(alldocs.timecreated, 19)
,	'kb'		= (convert(bigint,alldocstreams.size))/1024
,	'mb'		= (convert(bigint,alldocstreams.size))/1024/1024
,	'list_name'	= alllists.tp_title
,	'file_name'	= alldocs.leafname
,	'url'		= alldocs.dirname
,	'last_url_folder' = right(alldocs.dirname, charindex('/', reverse('/' + alldocs.dirname)) - 1)
	alldocs join alldocstreams	on 
	join alllists			on alllists.tp_id = alldocs.listid
	--alldocstreams.[size] > 2048
	right([alldocs].[leafname], 2) in ('oc', 'cx', 'df', 'sg', 'xt')
	and alllists.tp_title like '%FY12 Documents%'
order by
	alldocs.timecreated desc
,	alldocs.dirname 


use master;
set nocount on

declare @ole_automation int
set		@ole_automation = (select cast([value_in_use] as int) from sys.configurations where [configuration_id] = '16388')
if		@ole_automation = 0
	exec sp_configure 'Ole Automation Procedures', 1; reconfigure with override;

use tempdb;
set nocount on

declare	@url			varchar(1000)
declare	@list			varchar(255)
declare @file			varchar(255)
declare	@database		varchar(255)
declare	@extension		varchar(5)
declare	@destination_path	varchar(255)
set @database	= 'WSS_Content_Database'
set @list	= 'Archive FY12 Documents'
set @file	= '7684_HiringPacket.pdf'
set @url	= 'sites/Archive of Hiring Docs FY2012'
set @extension = (select reverse(left(reverse(@file),charindex('.',reverse(@file))-1)))
set @destination_path	= '\\sps1\w$\' + @file

declare	@extract_file	varchar(max)
set	@extract_file	= 
'use [' + @database + '];
set nocount on;

declare	@object_token int
declare @content_binary varbinary(max)
select	@content_binary = alldocstreams.content from alldocs join alldocstreams on = join alllists on alllists.tp_id = alldocs.listid
	alllists.tp_title	= ''' + @list + '''
	and alldocs.leafname	= ''' + @file + '''
	and alldocs.dirname	= ''' + @url  + '''

exec sp_oacreate '''', @object_token output
exec sp_oasetproperty @object_token, ''type'', 1
exec sp_oamethod @object_token, ''open''
exec sp_oamethod @object_token, ''write'', null, @content_binary
exec sp_oamethod @object_token, ''savetofile'',	null, ''' + @destination_path + ''', 2
exec sp_oamethod @object_token, ''close''
exec sp_oadestroy @object_token
exec	(@extract_file)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s