Jump to content

Wikipedia:Reference desk/Archives/Computing/2023 December 31

From Wikipedia, the free encyclopedia
Computing desk
< December 30 << Nov | December | Jan >> Current desk >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


December 31[edit]

Folder address batch cmd[edit]

Hi, in the cmd box, what part of my folder string should I put in place here? (pdfs/party\ manifestos/ is the writer's own lorem ipsum.) Thanks everyone—and a Happy New Year to all! ——Serial 17:52, 31 December 2023 (UTC)[reply]

1=FOLDER=~/pdfs/party\ manifestos/

find "$FOLDER" -name '*.pdf' -exec pdftotext -enc UTF-8 {} \; ——Serial 17:52, 31 December 2023 (UTC)[reply]

You should omit the part  1=  in the first line and replace the part  ~/pdfs/party\ manifestos/  by the pathname of a folder in your filespace containing the pdf documents that you want to be converted. Make sure you escape any spaces and other potentially problematic non-alphanumeric characters in the pathname. You can issue the command
echo "$FOLDER"
after the command that set its value, to make sure you have the correct pathname. You need to have both read access and write access to the folder and its subfolders.  --Lambiam 17:33, 2 January 2024 (UTC)[reply]
Thanks very much Lambiam, appreciate your help. So my command prompt should look something like this?

C:\Users\SN54129>FOLDER=Articles\EngHistRev/

find "$FOLDER" -name '*.pdf' -exec pdftotext -enc UTF-8 {} \;

——Serial 18:19, 2 January 2024 (UTC)[reply]

I'm insufficiently familiar with the Windows syntax for pathnames to be confident of the effect of the forward slash ("/") as the last character of the folder name. I suppose, though, that it should work fine if you leave it out and just use  FOLDER=Articles\EngHistRev .  --Lambiam 00:24, 3 January 2024 (UTC)[reply]
It appears that you do not understand that FOLDER is a variable. It is not necessary to use a variable in this case. you can simple use: find "Articles\EngHistRev/" -name '*.pdf' -exec pdftotext -enc UTF-8 {} \;
I do not know if find or pdftotext will work on Windows. Will it even recognize the / at the end? Does it require \? Basically, you are taking a Posix command line instruction and trying to use it in Windows. If you are in PowerShell with the Linux subsystem installed, it MIGHT work. Don't expect it to work. I epxect it to claim that the syntax is bad because it is trying to use the Windows find command, not the Posix find command. 97.82.165.112 (talk) 17:45, 3 January 2024 (UTC)[reply]
For the confused, all Windows pathnames only ever use backslash \ and all *Nix pathnames only ever use forward slash /. I'm not aware that the final slash is significant. Is all. MinorProphet (talk) 02:13, 4 January 2024 (UTC)[reply]
Yeah if you're trying to do this on Windows via Windows Subsystem for Linux, you want to be running this inside the Linux environment, in bash or your preferred *nix shell. Then it should Just Work. You can access the host Windows environment filesystems under /mnt.
If you have done this and something still isn't working, copy-and-paste here everything from the terminal, inside <syntaxhighlight> tags: see Help:Wikitext#Format. Include error messages etc. Otherwise back up and tell us what you're trying to accomplish, under what conditions—"I have X, and I want to do Y". Slowking Man (talk) 04:27, 4 January 2024 (UTC)[reply]
  • Hi all, and many thanks for the replies. Fair play to the anon: it's true, I "do not understand that FOLDER is a variable". In fact, I do not know what a variable is. And I'm afraid that goes for slash, bash, Posix, Powershell, *nix, etc., as well. I am a tool. FWIW, I get this in the Command line:
    C:\Users\User54129>find "Articles\EngHistRev/" -name '*.pdf' -exec pdftotext -enc UTF-8 {} \;
    File not found - '*.pdf'
    

Would it be possible, notwithstanding homework, of course, if we went back to the beginning as Slowking Man has suggested, to re-approach it in the X/Y fashion? Here's hoping! ——Serial 15:41, 4 January 2024 (UTC)[reply]

That output matches what you would receive when using the Windows find command in the shell. The line of text you are trying to use is !!!NOT!!! Windows. It is Posix (Unix or Linux). Windows is NOT Posix. To use the command, you must use a Unix or Linux computer, not a Windows computer. The answers can cause a bit of confusion because modern Windows computers have an optional Linux subsystem which lets you perform some Linux things in Windows.
The take-away you should be getting is that you copied a command for Linux. You are not using Linux. It won't work. 97.82.165.112 (talk) 16:59, 4 January 2024 (UTC)[reply]
@.97, and I thank you. I understand that now. Following your instructions and per homework, I have installed the WSL base for Linux. Could you possibly comment? Unfortunately, I'm getting some negativity from the results. (See image.) ——Serial 17:40, 4 January 2024 (UTC)[reply]
Linux does NOT use \ in file paths. It uses /. Do not use \. That means something completely different in Linux. The first step is to figure out the path to your files. Type ls. What do you see? Do you see your files? If not, you have to find them. For example, they might be in someplace like documents/files/pdfs/. So, I would type ls documents/files/pdfs/ and see if the file names are shown. Keep in mind that nobody knows where your files are on your computer. I can list a thousand different directories and I will have very little chance of magically guessing where your files are located. It is up to YOU to learn to search YOUR computer to find YOUR files. Once you know where your files are, you can use find on that directory. I strongly expect the next step will involve recognizing that you never installed pdftotoext. 97.82.165.112 (talk) 18:44, 4 January 2024 (UTC)[reply]
Aright, calm down! Keep your hair on. As it happens, I already had installed pdftotext (although it disappeared!). But why don't I just use something like this instead? It would have saved me brain ache! :D cheers! ——Serial 20:01, 4 January 2024 (UTC)[reply]
Do not use quotes in filenames (eg '*.pdf'), they will be interpreted as part of the filename itself. Search instead for *.pdf MinorProphet (talk) 13:37, 5 January 2024 (UTC)[reply]
No. Here, we need the literal string *.pdf passed as an argument to find(1) by the shell. If you don't quote or escape it, the shell will try to expand *.pdf as a glob expression, because it contains a glob character. This is "wrong" and doesn't work as you think it does if you don't understand *nix shell syntax.
#/bin/bash

set -x
mkdir dir
touch x.pdf dir/foo.pdf dir/bar.pdf
echo *.pdf
echo '*.pdf'
find . -name '*.pdf' -ls
find . -name *.pdf -ls
+ mkdir dir
+ touch x.pdf dir/foo.pdf dir/bar.pdf
x.pdf
*.pdf
+ echo x.pdf
+ echo '*.pdf'
+ find . -name '*.pdf' -ls
       14      0 -rw-rw-r--   1 sbx_user1051 990             0 Jan  6 00:43 ./dir/foo.pdf
       15      0 -rw-rw-r--   1 sbx_user1051 990             0 Jan  6 00:43 ./dir/bar.pdf
       13      0 -rw-rw-r--   1 sbx_user1051 990             0 Jan  6 00:43 ./x.pdf
+ find . -name x.pdf -ls
       13      0 -rw-rw-r--   1 sbx_user1051 990             0 Jan  6 00:43 ./x.pdf
Slowking Man (talk) 00:46, 6 January 2024 (UTC)[reply]