What is the HDFS command to list all the files in HDFS according to the timestamp

3.1K    Asked by MiuraBaba in Big Data Hadoop , Asked on Jul 28, 2021

What is the command to list the directories in HDFS as per timestamp? I tried hdfs dfs ls -l which provides the list of directories with their respective permissions. I tried a workaround with hdfs -dfs -ls /tmp | sort -k6,7. Is there an inbuilt hdfs command for this?

Answered by Jane Fisher

The following arguments are available with hadoop list files:

Usage: hadoop fs -ls [-d] [-h] [-R] [-t] [-S] [-r] [-u]

Options:

-d: Directories are listed as plain files.

-h: Format file sizes in a human-readable fashion (eg 64.0m instead of 67108864).

-R: Recursively list subdirectories encountered.

-t: Sort output by modification time (most recent first).

-S: Sort output by file size.

-r: Reverse the sort order.

-u: Use access time rather than modification time for display and sorting.

You can sort the files using following command:

      hdfs dfs -ls -t -R (-r) /tmp


Your Answer

Answer (1)

To list all the files in HDFS according to their timestamp, you can use the hdfs dfs -ls command with the -R (recursive) option to list all files, and then sort the output by the timestamp.

Here's a step-by-step guide to achieve this:

1. List All Files Recursively

Use the hdfs dfs -ls -R command to list all files and directories recursively:

hdfs dfs -ls -R /

2. Sort by Timestamp

To sort the output by the timestamp, you can pipe the result to the sort command. Assuming that the output format of hdfs dfs -ls -R is consistent (with the timestamp starting from the 6th column), you can use the following command:

hdfs dfs -ls -R / | sort -k 6,7

This command breaks down as follows:

hdfs dfs -ls -R /: Lists all files and directories in HDFS recursively.

|: Pipes the output of the first command to the next command.

sort -k 6,7: Sorts the output based on the 6th and 7th columns (which typically contain the date and time).

Example

Here’s an example command:

hdfs dfs -ls -R / | sort -k 6,7Notes

  • Column Index: The sort command assumes that the timestamp starts from the 6th column. If the format is different in your environment, you may need to adjust the column indices accordingly.
  • Output Format: The hdfs dfs -ls command outputs in the format similar to ls -l in Unix/Linux, which includes permission, number of replicas, owner, group, size, modification date, and filename.

By following these steps, you can list all the files in HDFS sorted by their timestamp.








10 Months

Interviews

Parent Categories