Friends, have you ever come across a date column where dates are given in multiple formats? For example:
Why Can This Happen?
Many times when we are collaborating on some tasks in google sheet this problem can occur since different people may input data in a different format. Also, sometimes the deadline of a task is so tight that ensuring dates in a certain format may get missed. In some cases, the data is collated from multiple sources and different data sources may have different date formats.
There are some ways to ensure format compliance using data validation (which we will cover in the later section of this article) but let’s first find a solution to how to standardize dates in Google Sheets.
Step 1: Extract Year
Year generally resides in the first 4 characters or the last 4 characters. There are some inbuilt functions in google sheets that help to easily extract the first 4 or last 4 characters. An Example of the same is presented below
From the above table, it is clear that in most of the cases we have extracted the year’s first 4 or the last 4 characters. If the first four characters are numbers then we most likely have extracted the year. This can be easily checked as follows:
- Add 0 to the first 4 characters: Adding 0 to text will return an error but adding 0 to a number will have no impact on the number.
- In case of an error on the first 4 characters, add 0 to the last 4 characters.
Some abnormalities in the year extraction are:
- Very high numbers like 44197, 44211
- Very low numbers like 1501
This again needs some handling as follows:
- Check if the year is very high or very low. (High Low can be different for different scenarios. In the below example, I have taken 1900 as low and 2050 as High)
- If yes then use the last 4 characters only as the year (this issue is caused if the first 4 characters somehow are meeting the number criteria)
Finally, we have accurately extracted the Year in Column E. All this can be done in a single column as follows:
=IFERROR( IF( OR( LEFT(A2,4)+0 < 1900 , LEFT(A2,4)+0 >2050 ) ,
Step 2: Clean The Date
Once Year is extracted next, we want to extract the month. Generally, the Month is never in the end and is either at the start or in mid. But first, let’s remove the extracted year and slashes from the main Date. This can be done using the SUBSTITUTE Function as follows:
Note: We are removing year, “,” , “/” & “-“ from the date.
The above formulae look big but it is simple to understand. We are nesting the output of Substitute within another Substitute so that all the undesirable characters & years are removed from the date in column A.
In Column C, we realize that there are still some abnormalities like “th” & “Jan”. We can again use substitutes to remove “th” , “st” , “nd”, “rd” and spaces as well. The output looks as follows:
“nd”, “”), ”
rd”, “”) ,
Note: We are also handling cases like 1st, 2nd, 3rd and hence are also replacing st, nd and rd.
Now Our Date is cleaned and contains only month and day. Cleaning can be done in one column by combining the two formulas in one cell.
Step 3: Extracting Date Of Month
This is a very tricky step as sometimes the Date of the Month can come before the month and sometimes the month can also come before the day. We know for sure that the month cannot be greater than 12 and hence if the extracted number if greater than 12 then it is the Date of Month and the remaining part is Month. But this logic fails when both numbers are less than 12. Let’s first try to extract the first 2 and last 2 characters from the cleaned data and convert it to numeric as follows:
In the above formulae, if we are not able to find a number in either left 2 or right 2 characters then by default, we keep the extracted characters as 0 since it is clear that the other part is Date of Month. Now we know for sure that one of the two extracted figures is the Date of the Month.
In our case date is generally reported is dd/mm/yyyy (you may change this assumption based on your use case). Based on our assumption date of the month comes in the left part and so the right part will only be picked if it is between 12 to 31 else, we will give preference to the left part. This can be done using the If function as follows:
=if( and(E2>12,E2<‘=31) ,
Finally, we have extracted the date of the month. Although we are working on the above assumption of prioritizing the left part you can change it as per your need. All this can be done in a single column and we may not need D & E Column as Follows:
if( and( RIGHT(C2,2)+0 > 12, RIGHT(C2,2)+0 <‘= 31 ),
Step 4: Extracting Month
Since we have already extracted the Date and Year in previous steps, extracting the month will be relatively easy. In the previous step, we extracted the date of the month from the column Date without Year & slashes & abnormalities. We can remove the date of the month from that column to get the month as follows:
Note: We are removing the contents of cell D2 from C2.
We have extracted the month but it is not standardized as sometimes Jan is appearing, sometimes 01 is appearing and sometimes 1 is appearing. So, we just need to handle these scenarios.
- We will first create a mapping of Jan-Dec into 1 to 12
- We will look up column E to our mapping (Jan -Dec)
- In case of error, we will simply convert the month into the number
The above steps are followed in the screenshot below. I have created the mapping in columns L & M on which we are doing the VLOOKUP.
=iferror( VLOOKUP(E2 , $K$1:$L$13 ,2,0) ,
Finally, we have the standardized Month in Column F. Sometimes the handling required to standardize the month may be more than the above-mentioned steps and hence need to be handled accordingly. The above handling will cover most of the scenarios and other handling can also be done similarly.
Step 5: Combining The Outputs From Previous Steps
This is the final step where we will combine the Extracted Year, Month, and Date of Month to get the Final Date. The Date Function allows us to easily do that as follows:
Note: Date Function Takes in input as Year, Month, Date and Returns the Date.
How to standardize dates in Google Sheets can be a very lengthy Exercise but with the above 5 steps, we can standardize dates for any size of Data size (Even Million rows Data Set). Once we can standardize the date then we can easily do any form of Reporting and Analysis.
In some scenarios, we can use Data Validation to ensure that other collaborators/Editors cannot enter any different format but that will require some training to be given to people who fill the data.
Hope this article will help you solve some of the problems that you face while handling the dates. Do let us know in the comments if you like this article or you can contact us. We are happy to resolve any further issues that you face.