Post

Delete duplicate files from google drive

If you have a large number of duplicate files in google drive for any reason (mine was syncing up files from multiple sources like laptop, tab, phone etc, and merging school’s g-drive with personal one), you may want to delete those so you can use the 200 Gb storage option instead of 1 Tb one.

what we’ll be using?

  • scripts.google.com
  • google sheet as persistence data structure

Let’s start

  • Go to gogle’s script site, make sure you are logged in with same google account as the drive
  • Click on new project, it should open a script editor, if not then navigate to the editor tab
  • Now, we will create a function that iterates through the drive’s files and checks for duplicate using hash tables. Easy enough right? Well…
  • There is a hard timeout limit of 6 mins, And it tool much much longer than that to complete scanning of files in my drive. So we need persistence across different runs.
  • The file iterator comes with continuation token, so iterator can start from the last execution given the token (think next page token).
  • To persist files already seen, we will use a column of google sheet. Just create a sheet and copy the sheet id (its in the url of the sheet, the fragment next to /d/)
  • To execute the function automatically, we will use trigger, and set it to execute every 5 mins.

The script

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
function findDuplicates() {

 var scriptProperties = PropertiesService.getScriptProperties();

 var continuationToken = scriptProperties.getProperty('continuationToken111');

 var sheet = SpreadsheetApp.openById('GOOGLE_SHEET_ID').getActiveSheet();

 var fileNames = getFileNamesFromSheet(sheet);

 // Search for files that are not in the trash
 var files = continuationToken ? DriveApp.continueFileIterator(continuationToken) : DriveApp.searchFiles('trashed = false');

 var startTime = (new Date()).getTime();

 while (files.hasNext() && (new Date()).getTime() - startTime < 260000) { // to make sure the execution completes within 5 min, before next execution is triggered

  var file = files.next();
  var name = file.getName();

  if (fileNames[name]) {
   Logger.log('Duplicate: ' + name + ' ' + file.getUrl());

   try {
    file.setTrashed(true);
   } catch (e) {
    Logger.log(e)
   }

  } else {

   fileNames[name] = true;

   sheet.appendRow([name]); // Add new file name to the sheet
  }
 }

 if (files.hasNext()) {

  // Save the continuation token if there are more files to process
  scriptProperties.setProperty('continuationToken111', files.getContinuationToken());

 } else {
  Logger.log('no files left');
 }

}


function getFileNamesFromSheet(sheet) {

 var fileNames = {};

 if (sheet.getLastRow() < 1)

  return fileNames;

 var rows = sheet.getRange(1, 1, sheet.getLastRow()).getValues();

 rows.forEach(function(row) {

  fileNames[row[0]] = true;

 });

 return fileNames;

}
This post is licensed under CC BY 4.0 by the author.