What is the HDFS command to list all the files in HDFS according to the timestamp

2.6K    Asked by MiuraBaba in Big Data Hadoop , Asked on Jul 28, 2021

What is the command to list the directories in HDFS as per timestamp? I tried hdfs dfs ls -l which provides the list of directories with their respective permissions. I tried a workaround with hdfs -dfs -ls /tmp | sort -k6,7. Is there an inbuilt hdfs command for this?

Answered by Jane Fisher

The following arguments are available with hadoop list files:

Usage: hadoop fs -ls [-d] [-h] [-R] [-t] [-S] [-r] [-u]

Options:

-d: Directories are listed as plain files.

-h: Format file sizes in a human-readable fashion (eg 64.0m instead of 67108864).

-R: Recursively list subdirectories encountered.

-t: Sort output by modification time (most recent first).

-S: Sort output by file size.

-r: Reverse the sort order.

-u: Use access time rather than modification time for display and sorting.

You can sort the files using following command:

      hdfs dfs -ls -t -R (-r) /tmp


Your Answer

Answer (1)

To list all the files in HDFS according to their timestamp, you can use the hdfs dfs -ls command with the -R (recursive) option to list all files, and then sort the output by the timestamp.

Here's a step-by-step guide to achieve this:

1. List All Files Recursively

Use the hdfs dfs -ls -R command to list all files and directories recursively:

hdfs dfs -ls -R /

2. Sort by Timestamp

To sort the output by the timestamp, you can pipe the result to the sort command. Assuming that the output format of hdfs dfs -ls -R is consistent (with the timestamp starting from the 6th column), you can use the following command:

hdfs dfs -ls -R / | sort -k 6,7

This command breaks down as follows:

hdfs dfs -ls -R /: Lists all files and directories in HDFS recursively.

|: Pipes the output of the first command to the next command.

sort -k 6,7: Sorts the output based on the 6th and 7th columns (which typically contain the date and time).

Example

Here’s an example command:

hdfs dfs -ls -R / | sort -k 6,7Notes

  • Column Index: The sort command assumes that the timestamp starts from the 6th column. If the format is different in your environment, you may need to adjust the column indices accordingly.
  • Output Format: The hdfs dfs -ls command outputs in the format similar to ls -l in Unix/Linux, which includes permission, number of replicas, owner, group, size, modification date, and filename.

By following these steps, you can list all the files in HDFS sorted by their timestamp.








5 Months

Interviews

Parent Categories