4 min read

Pad Numbers in a File Name

Intro

This post was inspired by the podcast series Criminal. When I downloaded the episodes using iTunes, they almost all started like “Episode 1_…” or “Episode 23_…”, and that causes sorting problems on my Android phone when it orders by filename. So instead of going through episodes 1-9 manually and adding two leading zeros (so they would be ‘001’, ‘002’, …, ‘009’) and then going through episodes 10-99 to add one leading zero, I decided to do it programatically. If you’re just here to copy the code, I’ll put the code chuck first, and then I’ll walk through it line-by-line.

Pad Zeros in Filenames Function

All of the functions except str_sub and str_pad are base functions. They both require the stringr package. If you don’t know if you have the function, run this code chunk to verify:

"stringr" %in% rownames(installed.packages())
## [1] TRUE

Here’s what it took for me:


setwd("C:/Users/Peter/Desktop/Criminal")

rawnames = dir(".")

newnames = gsub("01 ", "", rawnames)

underscore = regexpr("_ ", newnames)

numbers = stringr::str_sub(newnames, start = 9, end = underscore-1)

for (i in seq_along(numbers)){
  if(is.numeric(as.numeric(numbers[i]))) 
    new_nums[i] = stringr::str_pad(numbers[i], width = 3, pad = "0")
}

beginning = stringr::str_sub(newnames, start = 1, end = 8)

ending = stringr::str_sub(newnames, start = underscore)

replacement_names = paste0(beginning,new_nums,ending)

file.rename(from = rawnames, to = replacement_names)

Get filenames in R

This is the folder where the files I’ll rename live, and the dirs function returns a vector of file names from a path. dirs(".") will get filenames from the working directory.

setwd("C:/Users/Peter/Desktop/Criminal")

rawnames = dir(".")

Separate episode numbers from text

Some of the files started with "01 " and some didn’t. gsub finds patterns and replace them, but it doesn’t do anything if the pattern isn’t found. In this case, I just wanted to get rid of the leading "01 " on the tracks so they would be consistent. regexpr finds the numeric position of the supplied pattern. So my underscore variable just holds the position of "_ " in each podcast title. (Finding the "_ " positions wouldn’t have been necessary if all of the episode numbers were padded to 3 characters.)

With the start of podcast titles consistent (starting with "Episode " and no random "01 "s) and the position of "_ " identified, I can extract the numbers from the middle of the podcast titles. str_sub is the tidyverse version of substring; my feature of the function is that you can use negative indicies. My line of code plucks the episode numbers from the titles regardless of the number of characters (i.e. 1, 2, 63, or 124).

newnames = gsub("01 ", "", rawnames)

underscore = regexpr("_ ", newnames)

numbers = stringr::str_sub(newnames, start = 9, end = underscore-1)

Pad the numbers

This is a pretty standard for-loop written with some error checking. seq_along is the preferred way to loop through elements of an object in R because it is agnostic to indicies starting at 0 or 1. (This is helpful to remember when working with list objects.) The error checking line of code allows the loop to skip any non-numeric-convertible characters without stopping the erroring out. Then you can check the new_nums vector for non-numeric type elements after the loop.

I think str_pad is the easiest function to use for adding leading zeros, but I’m sure there are others. The function is generic enough to allow for other leading characters, and it has different parameter settings to ensure you pad your characters the exact way you want. width = 3, pad = "0" standardizes each element to be three characters long, and adds 0s to the left side if necessary.

for (i in seq_along(numbers)){
  if(is.numeric(as.numeric(numbers[i]))) 
    new_nums[i] = stringr::str_pad(numbers[i], width = 3, pad = "0")
}

Bring it back together and rename the files

Now that we have the new, padded numbers, we can reassemble the track names and rename the files. Get the beginning and end of the track names, and paste them together with the new numbers in the middle. (Note that the paste0 function differs from paste by defaulting to no separaters between vectors whereas paste defaults to a single space " ".) I think file.rename works because we’ve changed the working directory to the location of the files, but I’m not exactly sure. The documentation says that the function also accepts file paths as inputs.

beginning = stringr::str_sub(newnames, start = 1, end = 8)

ending = stringr::str_sub(newnames, start = underscore)

replacement_names = paste0(beginning,new_nums,ending)

file.rename(from = rawnames, to = replacement_names)