<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>searchable pdf in sharepoint &#8211; Sibeesh Passion</title>
	<atom:link href="https://sibeeshpassion.com/tag/searchable-pdf-in-sharepoint/feed/" rel="self" type="application/rss+xml" />
	<link>https://sibeeshpassion.com</link>
	<description>My passion towards life</description>
	<lastBuildDate>Tue, 24 Aug 2021 17:21:25 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>/wp-content/uploads/2017/04/Sibeesh_Passion_Logo_Small.png</url>
	<title>searchable pdf in sharepoint &#8211; Sibeesh Passion</title>
	<link>https://sibeeshpassion.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Search Contents of a PDF File in SharePoint Online, Make them Searchable Using Microsoft Flow</title>
		<link>https://sibeeshpassion.com/search-contents-of-a-pdf-file-in-sharepoint-online-make-them-searchable-using-microsoft-flow/</link>
					<comments>https://sibeeshpassion.com/search-contents-of-a-pdf-file-in-sharepoint-online-make-them-searchable-using-microsoft-flow/#disqus_thread</comments>
		
		<dc:creator><![CDATA[SibeeshVenu]]></dc:creator>
		<pubDate>Wed, 04 Mar 2020 15:20:57 +0000</pubDate>
				<category><![CDATA[Azure]]></category>
		<category><![CDATA[Cognitive Services]]></category>
		<category><![CDATA[Office 365]]></category>
		<category><![CDATA[SharePoint]]></category>
		<category><![CDATA[aquaforest]]></category>
		<category><![CDATA[Computer Vision]]></category>
		<category><![CDATA[flow]]></category>
		<category><![CDATA[microsoft flow]]></category>
		<category><![CDATA[ocr]]></category>
		<category><![CDATA[searchable pdf in sharepoint]]></category>
		<category><![CDATA[sharepoint]]></category>
		<category><![CDATA[sharepoint flow]]></category>
		<category><![CDATA[sharepoint online]]></category>
		<category><![CDATA[SharePoint Tips]]></category>
		<guid isPermaLink="false">https://sibeeshpassion.com/?p=13986</guid>

					<description><![CDATA[[toc] Introduction We all get stuck somewhere in our so-called &#8220;Programmer Life&#8221; for a small requirement. And I was stuck with such a requirement that the content of the PDF file uploaded to my SharePoint online is not searchable, however, the PDF I created manually from the Word document works fine. Let me tell you why!. Typically there are 3 kinds of PDF files. Normal PDF: These are the files that you get from applications like Microsoft Word, Adobe tools, etc. The beauty of this file is that the content of this file can be searched, you can select the [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p>[toc]</p>



<h2 class="wp-block-heading">Introduction</h2>



<p>We all get stuck somewhere in our so-called &#8220;Programmer Life&#8221; for a small requirement. And I was stuck with such a requirement that the content of the PDF file uploaded to my SharePoint online is not searchable, however, the PDF I created manually from the Word document works fine. Let me tell you why!. Typically there are 3 kinds of PDF files.</p>



<ol class="wp-block-list"><li><strong>Normal PDF</strong>: These are the files that you get from applications like Microsoft Word, Adobe tools, etc. The beauty of this file is that the content of this file can be searched, you can select the text in this file, style them and copy-paste, etc. </li><li>Scanned PDF: This one is exactly opposite to the first one, and this was Villain in my requirement. The issue with this type is that though the content looks visually the same, it can not be searchable, select, copy-paste, etc, as in the end it is an image inserted to a PDF document. Now how can we read the contents of this file, that is where the technology called OCR (Optical Character Recognition) comes into the picture. With this, we can read the content, and make them searchable, etc. And when we do that, we introduce the third type of PDF file</li><li>Searchable/OCRed PDF: It is the type that we get from the OCR process as an output. In the end, this type will have two-layer in it, one is the image that we get from a scanner, and the second is the text content. With this two-layer, this file becomes almost equal to the first kind </li></ol>



<p>Now let&#8217;s go see what was my requirement and how did I overcome this process.</p>



<h2 class="wp-block-heading">Background</h2>



<p>Technology is fast and starts running today if you want to touch it. I have a One Drive Sync folder to which I save the scanned PDF files from my scanner and once that is done the same will be synced to my SharePoint online. So far so good. But the problem is the content of these files are not searchable. Now let&#8217;s fix that.</p>



<h2 class="wp-block-heading">Fix to make Scanned PDF files searchable</h2>



<p>We use Microsoft Flow to do this process of converting the Scanned PDF to the Searchable PDF file. And in the flow, there are many ways that you can do this, I initially tried to do it with the combination of Computer Vision AI and some other services as preceding. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img fetchpriority="high" decoding="async" width="621" height="367" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Connect-to-the-services-needed.png" alt="" class="wp-image-13987" srcset="/wp-content/uploads/2020/03/Connect-to-the-services-needed.png 621w, /wp-content/uploads/2020/03/Connect-to-the-services-needed-300x177.png 300w, /wp-content/uploads/2020/03/Connect-to-the-services-needed-425x251.png 425w" sizes="(max-width: 621px) 100vw, 621px" /><figcaption>Computer Vision AI in SharePoint</figcaption></figure></div>



<p>But, I was not getting the expected output when I was using them. So, I decided to go with other options. <a href="https://sibeeshpassion.com/using-azure-cognitive-service-computer-vision-ai-to-read-text-from-an-image/">If you are new with OCR technology or Computer Vision AI, you can find my article here</a>. </p>



<h3 class="wp-block-heading">Create a flow</h3>



<p>The files are being synced to my Document folder in SharePoint, thus I needed to create a flow that gets triggered whenever there is a file uploaded.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="817" height="155" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Create-a-Flow.png" alt="" class="wp-image-13988" srcset="/wp-content/uploads/2020/03/Create-a-Flow.png 817w, /wp-content/uploads/2020/03/Create-a-Flow-300x57.png 300w, /wp-content/uploads/2020/03/Create-a-Flow-768x146.png 768w, /wp-content/uploads/2020/03/Create-a-Flow-425x81.png 425w" sizes="(max-width: 817px) 100vw, 817px" /><figcaption>Create Flow</figcaption></figure></div>



<p>Click on the &#8220;Create a flow&#8221; then you will be asked to select the flow template. I selected the template &#8220;When a new file is added in SharePoint, complete a custom action&#8221;. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="585" height="594" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/When-a-new-file-is-added-in-SharePoint-complete-a-custom-action.png" alt="" class="wp-image-13989" srcset="/wp-content/uploads/2020/03/When-a-new-file-is-added-in-SharePoint-complete-a-custom-action.png 585w, /wp-content/uploads/2020/03/When-a-new-file-is-added-in-SharePoint-complete-a-custom-action-295x300.png 295w, /wp-content/uploads/2020/03/When-a-new-file-is-added-in-SharePoint-complete-a-custom-action-425x432.png 425w" sizes="(max-width: 585px) 100vw, 585px" /><figcaption>When a new file is added in SharePoint, complete a custom action</figcaption></figure></div>



<p>Once you click on the Continue button, you are good to create new steps in your flow. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="499" height="265" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Add-Steps-in-Flow.png" alt="" class="wp-image-13990" srcset="/wp-content/uploads/2020/03/Add-Steps-in-Flow.png 499w, /wp-content/uploads/2020/03/Add-Steps-in-Flow-300x159.png 300w, /wp-content/uploads/2020/03/Add-Steps-in-Flow-425x226.png 425w" sizes="(max-width: 499px) 100vw, 499px" /><figcaption>Add Steps in Flow</figcaption></figure></div>



<p>Flow is a step by step solution and some steps may be having an output that we can carry to the next step and in our flow, we use this a lot. Once you connect to the SharePoint site, we need to get the uploaded file properties, to do that, click on the +(plus) icon, select &#8220;Add an action&#8221; and then search for &#8220;Get File Properties&#8221; </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="490" height="362" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Get-File-Properties-Step.png" alt="" class="wp-image-13993" srcset="/wp-content/uploads/2020/03/Get-File-Properties-Step.png 490w, /wp-content/uploads/2020/03/Get-File-Properties-Step-300x222.png 300w, /wp-content/uploads/2020/03/Get-File-Properties-Step-425x314.png 425w" sizes="(max-width: 490px) 100vw, 490px" /><figcaption> Get File Properties Step </figcaption></figure></div>



<p>Now select the Site address and the library, and then click on the ID field, you will see an option to select the output of the previous step.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="416" height="265" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/ID-of-the-file-created.png" alt="" class="wp-image-13994" srcset="/wp-content/uploads/2020/03/ID-of-the-file-created.png 416w, /wp-content/uploads/2020/03/ID-of-the-file-created-300x191.png 300w" sizes="(max-width: 416px) 100vw, 416px" /><figcaption>The ID of the file created</figcaption></figure></div>



<p>Now we get the file and need to check the file type right, to do that add a condition control and then add the conditions to it.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="608" height="415" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Condition-to-check-whether-PDF-or-image.png" alt="" class="wp-image-13995" srcset="/wp-content/uploads/2020/03/Condition-to-check-whether-PDF-or-image.png 608w, /wp-content/uploads/2020/03/Condition-to-check-whether-PDF-or-image-300x205.png 300w, /wp-content/uploads/2020/03/Condition-to-check-whether-PDF-or-image-425x290.png 425w" sizes="(max-width: 608px) 100vw, 608px" /><figcaption>Condition to check whether PDF or image</figcaption></figure></div>



<p>Each condition will have an output as &#8220;Yes&#8221; or &#8220;No&#8221; and in the &#8220;Yes&#8221; part, we will add all of our other steps and we will not think about the &#8220;No&#8221; output now. But, you can think of adding some tasks there.</p>



<p>Now in the &#8220;Yes&#8221; tab, we can get the file and pass it to the OCR process, that is where the tool called AquaForest comes into the story. Please follow the steps mentioned in <a href="https://www.aquaforest.com/en/aquaforest-flow-doc.asp">this article</a> and get the key needed. Once that is done, add the action &#8220;OCR PDF or Images&#8221; by searching the word &#8221; AquaForest&#8221;.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="600" height="393" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/AquaForest-OCR-PDF-or-Images.png" alt="" class="wp-image-13996" srcset="/wp-content/uploads/2020/03/AquaForest-OCR-PDF-or-Images.png 600w, /wp-content/uploads/2020/03/AquaForest-OCR-PDF-or-Images-300x197.png 300w, /wp-content/uploads/2020/03/AquaForest-OCR-PDF-or-Images-425x278.png 425w" sizes="(max-width: 600px) 100vw, 600px" /><figcaption>AquaForest OCR PDF or Images</figcaption></figure></div>



<p>Give the connection a name and add the key in the next popup. There are many properties that you can set here, but the below two are important.</p>



<p> </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="596" height="131" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/File-Content-with-OCR.png" alt="" class="wp-image-13997" srcset="/wp-content/uploads/2020/03/File-Content-with-OCR.png 596w, /wp-content/uploads/2020/03/File-Content-with-OCR-300x66.png 300w, /wp-content/uploads/2020/03/File-Content-with-OCR-425x93.png 425w" sizes="(max-width: 596px) 100vw, 596px" /><figcaption>File Content with OCR</figcaption></figure></div>



<p>As an output of this step, we get the OCRed file and now all we have to do is to add the action called &#8220;Create File&#8221; and set up the same.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="599" height="216" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Save-the-OCRed-File.png" alt="" class="wp-image-13998" srcset="/wp-content/uploads/2020/03/Save-the-OCRed-File.png 599w, /wp-content/uploads/2020/03/Save-the-OCRed-File-300x108.png 300w, /wp-content/uploads/2020/03/Save-the-OCRed-File-425x153.png 425w" sizes="(max-width: 599px) 100vw, 599px" /><figcaption>Save the OCRed File</figcaption></figure></div>



<p>Wow, now we have a Searchable PDF in our Document folder. Go search with any content of your newly updated PDF. If you wish, you can also create an action to send an acknowledgment mail. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="619" height="556" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Send-mail-from-SharePoint-Flow.png" alt="" class="wp-image-14000" srcset="/wp-content/uploads/2020/03/Send-mail-from-SharePoint-Flow.png 619w, /wp-content/uploads/2020/03/Send-mail-from-SharePoint-Flow-300x269.png 300w, /wp-content/uploads/2020/03/Send-mail-from-SharePoint-Flow-425x382.png 425w" sizes="(max-width: 619px) 100vw, 619px" /><figcaption>Send email step in Flow</figcaption></figure></div>



<h3 class="wp-block-heading">Testing the flow</h3>



<p>As we already created the flow, now it is time to test the same. To do that, I added a scanned document to my one drive folder. We can check the Flow running status in the <a href="https://emea.flow.microsoft.com/">portal</a>. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="759" height="360" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Run-History-of-Flow.png" alt="" class="wp-image-14001" srcset="/wp-content/uploads/2020/03/Run-History-of-Flow.png 759w, /wp-content/uploads/2020/03/Run-History-of-Flow-300x142.png 300w, /wp-content/uploads/2020/03/Run-History-of-Flow-425x202.png 425w" sizes="(max-width: 759px) 100vw, 759px" /><figcaption>Run History of Flow</figcaption></figure></div>



<p>Below is the sample run history output of my flow. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="837" height="611" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Sample-Flow-Run-History-PDF-OCR.png" alt="" class="wp-image-14002" srcset="/wp-content/uploads/2020/03/Sample-Flow-Run-History-PDF-OCR.png 837w, /wp-content/uploads/2020/03/Sample-Flow-Run-History-PDF-OCR-300x219.png 300w, /wp-content/uploads/2020/03/Sample-Flow-Run-History-PDF-OCR-768x561.png 768w, /wp-content/uploads/2020/03/Sample-Flow-Run-History-PDF-OCR-315x230.png 315w, /wp-content/uploads/2020/03/Sample-Flow-Run-History-PDF-OCR-425x310.png 425w" sizes="(max-width: 837px) 100vw, 837px" /><figcaption>Sample Flow Run History PDF OCR</figcaption></figure></div>



<h2 class="wp-block-heading">Conclusion</h2>



<p>Thanks a lot for staying with me for a long time and reading this article. I hope now you have learned about</p>



<ul class="wp-block-list"><li>creating a flow in SharePoint online</li><li>creating the steps in Flow</li><li>use the connections in Flow</li><li>OCR the PDF using Computer Vision</li><li>OCR the PDF using AquaForest API</li><li>creating a new File with OCRed output</li><li>send mails from Flow</li></ul>



<p>If you have learned anything else from this article, please let me know in the comment section.</p>



<h2 class="wp-block-heading">Follow me</h2>



<p>If you like this article, consider following me, haha!.</p>



<ul class="wp-block-list"><li><a href="https://github.com/SibeeshVenu">GitHub</a></li><li><a href="https://medium.com/@sibeeshvenu">medium</a></li><li><a href="https://twitter.com/sibeeshvenu">Twitter</a></li></ul>



<h2 class="wp-block-heading">Your turn. What do you think?</h2>



<p>Thanks a lot for reading. Did I miss anything that you may think which is needed in this article? Could you find this post useful? Kindly do not forget to share your feedback.</p>



<p>Kindest Regards<br>Sibeesh Venu</p>
]]></content:encoded>
					
					<wfw:commentRss>https://sibeeshpassion.com/search-contents-of-a-pdf-file-in-sharepoint-online-make-them-searchable-using-microsoft-flow/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
	</channel>
</rss>
