Categories
Go

Go – Finding and removing duplicate files on a Mac

My laptop’s hard drive is plagued with the same family pics stored in different folders. So, I wrote a simple app in Go to take care of this for me. The app does this by comparing MD5 hashes of files.

Usage

Assuming you saved the code in a file called main.go, here is how you can use it to find duplicate files:

go run main.go -dir /some/dir -dir=/another/dir 

This will print duplicate files to the terminal. I added the flag `-dupe_action=sym` flag in the app which will sym-link all duplicates to one file.

go run main.go -dir /some/dir -dir=/another/dir -dupe_action=sym
package main

import (
	"os/exec"
	"fmt"
	"strings"
	"os"
	"crypto/md5"
	"io"
	"encoding/hex"
	"flag"
)

// This type is here so we can accept a list of directories
type arrayFlags []string

func (r *arrayFlags) String() string {
	return "something"
}

func (r *arrayFlags) Set(value string) error {
	*r = append(*r, value)
	return nil
}

// getFilesInDirectories gets all the files in directories recursively
func getFilesInDirectories(dir []string) []string {
	var ret []string

	for _, item := range dir {
		// This could have been done with pure Go, but I was lazy             
		c := exec.Command(`find`, item, `-type`, `f`, `-iname`, `*.jpg`)
		outBytes, _ := c.Output()

		files := strings.Split(string(outBytes), "\n")
		for _, fileItem := range files {
			ret = append(ret, fileItem)
		}
	}

	return ret
}

// getFileMd5 returns the md5 hash of a file
func getFileMd5(path string) (string, error) {
	file, err := os.Open(path)
	if err != nil {
		return ``, err
	}

	defer file.Close()

	hash := md5.New()
	if _, err := io.Copy(hash, file); err != nil {
		return ``, err
	}

	hashInBytes := hash.Sum(nil)[:16]
	ret := hex.EncodeToString(hashInBytes)

	return ret, nil
}

func main() {
	var dupeAction string
	var lookInDirectories arrayFlags

	flag.StringVar(&dupeAction, "action", ``, `Action to take with duplicate files. Value: sym`)
	flag.Var(&lookInDirectories, `dir`, `Directory to look in`)

	flag.Parse()

	// Get all relevant files
	output := getFilesInDirectories(lookInDirectories)

	fileMap := make(map[string][]string)
	for _, file := range output {
		hash, err := getFileMd5(file)
		if err != nil {
			fmt.Println("MD5_ERROR", file, err)
		}
		fileMap[hash] = append(fileMap[hash], file)
	}

	// Print dupes
	for _, item := range fileMap {
		// Do nothing if the file has no duplicates
		if len(item) <= 1 {
			continue
		}

		firstFile := item[0]
		for i := 1; i < len(item); i++ {
			file := item[i]
			fmt.Println(file)

			// Sym link dupes if flag was set
			if dupeAction == `sym` {
				err := os.Remove(file)
				if err != nil {
					fmt.Println(`Failed to delete `, file, err)
				}
				exec.Command(`ln`, `-s`, firstFile, file).Run()
			}
		}
		fmt.Println()
	}
}


Categories
Go

Why Golang/Go is so awesome

I have been working with Go for quite some time now and know more about it. My initial thoughts about it were accurate, I think with some variations. Go really can do almost everything.

It can be run like any interpreted language for development and compiled for production

This is something I love about Go a lot. You can run any Go code by just typing:

go run someFile.go

And, you can compile your whole app into a binary, copy that binary to a production server and just run it. No dependencies needed. If you have assets that your app reads (like images, etc) , they can be compiled into the binary as well! This gives a whole new meaning to portability (at least for me).

It supports parallelism and concurrency

With Go routines, you can run multiple things at the same time (assuming you have a multi-core CPU). Go takes care of splitting your routines into multiple-threads and all the house-keeping chores that come along with it. It is, by far, the best multi-threaded language I have come across.

Many developers think Node supports parallelism but it doesn’t. Node is a great language for its asynchronous model but it executes only 1 thread at a time. Most scripting languages that I know of (Ruby, Python, etc) have this issue. They have a global interpreter lock, which makes sure only one thread executes at a time. So, they are concurrent but do not support parallelism (running multiple things at the same time).

Since, web servers spawns multiple processes for processing multiple HTTP requests, you don’t generally notice the bottleneck.

It has a unit testing framework in its standard library

It’s the first language, I’ve seen doing this. No need to install external libraries. The testing framework (in a package called “testing”) is very performant. It also lets you run benchmark tests and example code for generating documentation.

It has support for GUIs apps

There are a bunch of libraries in Go that will let you write native applications for Windows, Mac, etc. Go has bindings for GTK, QML, etc. Some Go libraries will even let you write hybrid apps (for desktop applications – will talk about mobile later).The content in these native UIs can be done using HTML/CSS. I tried this for a proof-of-concept app that generates configuration for a proprietary ETL tool at my work and it is so easy to do! The library I used was : https://github.com/murlokswarm/app

It has support for writing web services and web applications

Go, with it’s standard library has a templating engine built-in. And, it is very performant. No need to worry about which library to use for templating, etc. You can, with a few lines of code, create a web service!

There are quite a few features that make Go a great language to work with. I hope it proliferates the IT market so much that more companies ask for Go as a main requirement for jobs (and not just a nice-to-have).

Categories
Go

Golang: Restart web server on file change

A great feature of scripting languages like PHP, Python and Ruby is that you don’t need to re-compile the app or restart a web server every time you change something. With Go, you need to restart the web server for your changes to take affect. This can be a pretty daunting task.

We can, however, have this feature in Go as well (with some extra code). We just need to write a file watcher that will restart the web server on any file changes. Below is working code (from a project I am building) that does exactly this. You can modify it to suit your needs or just put it in your project as is.

package main

// This file is: web/main_dev.go
// It will recursively monitor any path we are interested in and re-start the web server
// on any file changes
//
// web/main.go is where my web server code resides. You should change the code
// to your run the command you use for your web server in main() 

import (
    "os/exec"
    "path/filepath"
    "os"
    "crypto/md5"
    "io"
    "encoding/hex"
    "time"
    "fmt"
)

// We will store all MD5 hashes of files we are interested in, in this variable
var fileHashes map[string]string

// Command to start the web server
var cmd *exec.Cmd

// Path we want to monitor for file changes
var pathToMonitor string

// fileMd5 calculates the md5 hash of a file
// Source obtained from http://www.mrwaggel.be/post/generate-md5-hash-of-a-file/
func fileMd5(filePath string) (string, error) {
    //Initialize variable returnMD5String now in case an error has to be returned
    var returnMD5String string

    //Open the passed argument and check for any error
    file, err := os.Open(filePath)
    if err != nil {
        return returnMD5String, err
    }

    //Tell the program to call the following function when the current function returns
    defer file.Close()

    //Open a new hash interface to write to
    hash := md5.New()

    //Copy the file in the hash interface and check for any error
    if _, err := io.Copy(hash, file); err != nil {
        return returnMD5String, err
    }

    //Get the 16 bytes hash
    hashInBytes := hash.Sum(nil)[:16]

    //Convert the bytes to a string
    returnMD5String = hex.EncodeToString(hashInBytes)

    return returnMD5String, nil

}

// fileWatcher monitors files and restarts the web server if any file changes
func fileWatcher() {
    for {
        filepath.Walk(pathToMonitor, func(path string, f os.FileInfo, err error) error {
            fileHash, err := fileMd5(path)
            if err != nil {
                //panic(`Could not calculate hash for ` + path)
            }

            if _, ok := fileHashes[path]; !ok {
                fileHashes[path], _ = fileMd5(path)

            } else if fileHashes[path] != fileHash {
                fileHashes[path] = fileHash

                fmt.Println(`file changed`, path, ` . Restarting web server`)
                cmd.Process.Kill()
                cmd.Run()
            }

            return nil
        })

        time.Sleep(100)
    }
}

func main() {
    pathToMonitor = "./"

    // First we get MD5 hashes of all the files we want to monitor
    fileHashes = make(map[string]string)
    filepath.Walk(pathToMonitor, func(path string, f os.FileInfo, err error) error {
        fileHashes[path], _ = fileMd5(path)


        return nil
    })



    // Start a file watcher go rountine that will monitor the files for
    // any changes
    go fileWatcher()

    // Run the web server
    fmt.Println(`Started server`)
    cmd = exec.Command(`go`, `run`, `web/main.go`)
    cmd.Run()

    // Create a channel and wait on it. This is here so the main thread
    // exit
    doneChannel := make(chan bool)
    _ = <- doneChannel
}
    
Categories
Go

Golang: Connect to Postgres and selecting data from a table

You will need to get the Postgres driver first. In the terminal type:

go get github.com/lib/pq

Connecting to Postgres

package main

import (
    _ "github.com/lib/pq"
    "database/sql"
    "fmt"
)

func main() {
    // Connect to the DB, panic if failed
    db, err := sql.Open("postgres", "postgres://user:[email protected]/dbName?sslmode=disable")
    if err != nil {
        fmt.Println(`Could not connect to db`)
        panic(err)
    }
    defer db.Close()
}

Selecting data from a table

After connecting to the database, you can do the following:

rows, err := db.Query(`SELECT * FROM table WHERE name=$1`, `Moz`)
if err != nil {
    panic(err)
}

var col1 string
var col2 string
for rows.Next() {
    rows.Scan(&col1, &col2)
    fmt.Println(col1, col2)
}