ColdFusion: compare local files with remote FTP files

Written on 10 October 2011, 04:59pm

Tagged with: ,

What do you do when you discover that the production version of an application is not the same as the development version? (ignore the bigger problems of application development workflow)
You identify the files in question, yes. And what do you identity them when the application contains thousands of source files?
Of course, you make a script.


<cfsetting requesttimeout="300" showdebugoutput="no" />
<cfset initialDir = ExpandPath( '../' )> 
<!--- script is in a sub-folder of the app --->
<cfset total = 0> <!--- total changed files --->
<cfset subTotal = 0> <!--- actual changed files, see below --->
<cfset localRoot = "C:\your\local\app\root" />
<cfset remoteRoot = "/your/remote/root" />
<cfset tmpFile = "/this/is/just/a/temp/file" />

<cfset tick = GetTickCount()>


<!--- get list of ALL local files, recursively --->
<cfdirectory action="list" 
    directory="#initialDir#" 
    name="localQuery" 
    recurse="true"/>

<!--- make the FTP connection --->
<cfftp connection = "ftpConnection" 
    username = "*********"
    password = "*********"
    server = "********"    
    port="****"
    action = "open" 
    stopOnError = "Yes"> 
<cfif cfftp.succeeded EQ false>
    ERROR - FTP open connection failed!
    <cfabort>
</cfif>

<!--- call the recursive function for the first time --->
<cfset compareFiles(remoteRoot) />

<!--- close the FTP connection --->
<cfftp connection = "ftpConnection"
    action = "close"
    stopOnError = "Yes">
<cfif cfftp.succeeded EQ false>
    ERROR - FTP close connection failed!<cfabort>
</cfif>

<cfset tock = GetTickCount()>
<cfset time = round((tock-tick)/1000)>
<cfoutput><hr />#subTotal#/#total# files, #time# seconds</cfoutput>

This is the skeleton. The business logic happens in the function below:



<cffunction name="compareFiles" access="public" description="Recursive function; scans the remote FTP folder and compares each file with the local version. If the file size differs, then it calculates the hash() of the file contents." returntype="void" output="yes">
    <cfargument name="ftpDir" type="string" required="yes">

	<!---<h4>Checking folder #arguments.ftpDir#...</h4>--->

    <cfftp connection = "ftpConnection"
        action = "LISTDIR"
        stopOnError = "Yes"
        name = "remoteQuery"
        directory = "#arguments.ftpDir#">

    <cfif cfftp.succeeded EQ false>
        ERROR - in function - FTP LISTDIR connection failed<cfabort>
    </cfif>

    <cfoutput query="remoteQuery">
        <!---file--->
        <cfif #isdirectory# EQ false>
            <cfset extension = LCase(listLast(name,".")) />
            <cfif extension EQ 'cfm' or extension EQ 'cfc' or 
                    extension EQ 'htm' or extension EQ 'html'> 
                <!--- remote file name: #name#, 
                remote path: #path#,
                remote length: #length#
                get the local path
                query of query to get the local file details
                --->

                <cfset rpath = replace(path,remoteRoot,'')> <!--- relative remote path --->
                <cfset lpath = localRoot & replace(rpath,'/','\','ALL') /> <!--- local path --->
                <cfset lfile = ListLast(lpath,'\')> <!---local file --->
                <cfset ldir = replace(lpath,'\'&lfile,'')> <!--- local dir --->
                <cfquery dbtype="query" name="innerQuery">
                    select size 
                    from localQuery
                    where  
                    directory = <cfqueryparam cfsqltype="cf_sql_varchar" value="#ldir#">
                    and name=<cfqueryparam cfsqltype="cf_sql_varchar" value="#lfile#">
                </cfquery>

                <!--- compare the file sizes --->
                <cfif innerQuery.size NEQ length>
                    <cfset total = total + 1>
                    <cfif FileExists(lpath)>
                        <cffile action="read" file="#lpath#" variable="localFileContent" charset="utf-8">
                        <cfset localFileContent = replace(localFileContent,Chr(10),"","ALL")/>
                        <cfset localFileContent = replace(localFileContent,Chr(13),"","ALL")/>
                        <cfset h1 = hash(localFileContent)> <!--- the hash of the first file --->
                        
                        <cfftp action="getFile" connection="ftpConnection" remotefile="#path#" localfile="#tmpFile#" failIfExists="no">
                        <cffile action="read" file="#tmpFile#" variable="remoteFileContent" charset="utf-8">
                        <cffile action="delete" file="#tmpFile#" />
                        <cfset remoteFileContent = replace(remoteFileContent,Chr(10),"","ALL")/>
                        <cfset remoteFileContent = replace(remoteFileContent,Chr(13),"","ALL")/>
                        <cfset h2 = hash(remoteFileContent)> <!--- the hash of the 2nd file --->
                        
                        <cfif h1 NEQ h2> <!--- output the results --->
                            #rpath# #length#:#innerQuery.size#<br />
                            <cfset subTotal = subTotal + 1>
                        </cfif>
                    </cfif> <!--- end if fileExists() --->
                </cfif><!--- end size compare --->

             </cfif> <!--- end extension check --->
        </cfif> <!--- end file type check --->
    </cfoutput>

    
    <cfoutput query="remoteQuery">
    <!---directory--->
        <cfif #isdirectory# EQ true> <!--- recursive call --->
            <cfset compareFiles(arguments.ftpDir&name) />
        </cfif>    
    </cfoutput>
    
    <cfreturn />
</cffunction>

The function above loops through each remote file (accessed via FTP), and compares its size with the local version. If the sizes are different, more calculations are performed.
That’s because two files that differ in size do not necessarily differ in content (due to different line break characters). For each of the two files, I calculate the hash after I remove all the CR and LF characters. If the hashes differ, the two files are different and I output them.

This is just the basic, primitive version. It can be improved by automatically determining the separator character, better error handling, better output/logging, etc.
Also, a possible caveat of this script is that it does not catches files that differ, but have the same file-size. This can be easily achieved by comparing all the files, not only the ones that have different size. To do that, remove the cfquery at line 33 and the cfif statement on the line 42 above, but be aware of the massive increase in execution time. You might have to increase the requestTimeout.

Comments (1)

  1. Anas — August 12, 2015 at 03:22

    This is a bit misleading as to what the reitaly is in PHP. While it’s true these are the queries etc in real world practice you only do these once in an abstraction layer that you use over and over.A typical query would look something like this:query(“SELECT * FROM table”);or if it was a dynamically built query$sql = “SELECT * FROM table”; query($sql);If you wanted you could also make it entirely dynamic if you were using multiple databases/connections etc:query(“myConnection”, “dbName”, “SELECT * FROM table”);Same thing goes for the output:out($query_results, $template);Even GET/POST can be simplified to:p(“varName”);g(“varName”);with a simple method:p($name){ return $_POST[‘{$name}’];}So yes, PHP is a little “messier” in that you have to do this type of thing the first time around, but it’s all the better in that you CAN do this any way that you need/want to and not be handicapped by how someone else thought it might work best for all situations, not how it does work best for your situation.

    Reply

Leave a response