Introduction to the CLI

Finding Your Stuff

(find, grep and some other niceties)

Santiago Álvarez Rodríguez
Front-end Dev at PSL
santiaro90@gmail.com

OK, so your Product Manager comes to you and asks you for a (hopefully very) easy task:

Help me find out how many SLOC* have we got in total... but wait, I'd like to see only Ruby code... Throw anything else away. How about I come back in 10 minutes?

(*) SLOC: Source LOC, that is, no comments nor docstrings, nor blank lines.

Not cool, right?

Well, no need to tear your hair out. By the end of these slides, you'll be like...

NOTE:

 

We'll be using the Ruby on Rails source code for the examples. You can clone it from Github:


git clone https://github.com/rails/rails

finding your way through the file system

Some basics first

# Remember: we care only about Ruby code, so let's find Ruby files first
# General syntax: find <root dir> <conditions> <other options>
~/Documents/rails $ find . -name '*.rb'
# you've probably seen a lot of output here
# I mean it, a lot!!!...
./rails/railties/test/json_params_parsing_test.rb
./rails/railties/test/path_generation_test.rb
./rails/railties/test/paths_test.rb
./rails/railties/test/rack_logger_test.rb
./rails/railties/test/rails_info_controller_test.rb
./rails/railties/test/rails_info_test.rb
./rails/railties/test/railties/engine_test.rb
./rails/railties/test/railties/generators_test.rb
./rails/railties/test/railties/mounted_engine_test.rb
./rails/railties/test/railties/railtie_test.rb
./rails/railties/test/test_unit/reporter_test.rb
./rails/railties/test/version_test.rb
./rails/tasks/release.rb
./rails/tools/test.rb
./rails/version.rb

That was nice. Now, we probably wanna skip test files

# You really need the '*'. Otherwise, 'find' would try
# to do exact matches
~/Documents/rails $ find . -name '*.rb' -not -path '*/test/*'
# more output above...
./rails/railties/lib/rails/railtie/configurable.rb
./rails/railties/lib/rails/railtie/configuration.rb
./rails/railties/lib/rails/ruby_version_check.rb
./rails/railties/lib/rails/source_annotation_extractor.rb
./rails/railties/lib/rails/tasks.rb
./rails/railties/lib/rails/test_help.rb
./rails/railties/lib/rails/test_unit/line_filtering.rb
./rails/railties/lib/rails/test_unit/minitest_plugin.rb
./rails/railties/lib/rails/test_unit/railtie.rb
./rails/railties/lib/rails/test_unit/reporter.rb
./rails/railties/lib/rails/test_unit/test_requirer.rb
./rails/railties/lib/rails/version.rb
./rails/railties/lib/rails/welcome_controller.rb
./rails/tasks/release.rb
./rails/tools/test.rb
./rails/version.rb

Let's get fancy

# '-regex' matches the whole path, including file name
~/Documents/rails $ find . -regextype posix-extended -regex '.*/associations/(has|belong).*\.rb$'
./activerecord/lib/active_record/associations/belongs_to_association.rb
./activerecord/lib/active_record/associations/belongs_to_polymorphic_association.rb
./activerecord/lib/active_record/associations/has_many_association.rb
./activerecord/lib/active_record/associations/has_many_through_association.rb
./activerecord/lib/active_record/associations/has_one_association.rb
./activerecord/lib/active_record/associations/has_one_through_association.rb
./activerecord/test/cases/associations/belongs_to_associations_test.rb
./activerecord/test/cases/associations/has_and_belongs_to_many_associations_test.rb
./activerecord/test/cases/associations/has_many_associations_test.rb
./activerecord/test/cases/associations/has_many_through_associations_test.rb
./activerecord/test/cases/associations/has_one_associations_test.rb
./activerecord/test/cases/associations/has_one_through_associations_test.rb

Damn, those test files again!

# That '\' at the end of line is just the way you tell Bash
# the command's being split into multiple lines
~/Documents/rails $ find . -regextype posix-extended -regex '.*/associations/(has|belong).*\.rb$' \
> -not -path '*/test/*'
./activerecord/lib/active_record/associations/belongs_to_association.rb
./activerecord/lib/active_record/associations/belongs_to_polymorphic_association.rb
./activerecord/lib/active_record/associations/has_many_association.rb
./activerecord/lib/active_record/associations/has_many_through_association.rb
./activerecord/lib/active_record/associations/has_one_association.rb
./activerecord/lib/active_record/associations/has_one_through_association.rb

Further filtering

# '-type d' to search for directories. '-type f' to search for files
~/Documents/rails $ find . -type d -name '*active_record*'
./activerecord/lib/active_record
./activerecord/lib/rails/generators/active_record
./activerecord/test/active_record
./railties/test/fixtures/lib/generators/active_record


# '-maxdepth' to limit how deep you wanna go through
~/Documents/rails $ find . -maxdepth 3 -type f -name '*active_record*'
./actionview/test/active_record_unit.rb
./activerecord/lib/active_record.rb
./guides/bug_report_templates/active_record_gem.rb
./guides/bug_report_templates/active_record_master.rb
./guides/bug_report_templates/active_record_migrations_gem.rb
./guides/bug_report_templates/active_record_migrations_master.rb
./guides/source/active_record_basics.md
./guides/source/active_record_callbacks.md
./guides/source/active_record_migrations.md
./guides/source/active_record_postgresql.md
./guides/source/active_record_querying.md
./guides/source/active_record_validations.md
# '-size -N' to get files less than N bytes.
# '-size +N' to get files greater than N bytes.
# You can also specify units (e.g, -4k, +3.5M)
~/Documents/rails $ find . -type f -name '*.rb' -size -1k
./actionpack/test/controller/controller_fixtures/app/controllers/admin/user_controller.rb
./actionpack/test/controller/controller_fixtures/app/controllers/user_controller.rb
./actionpack/test/controller/controller_fixtures/vendor/plugins/bad_plugin/lib/plugin_controller.rb
./activejob/test/support/delayed_job/delayed/serialization/test.rb


~/Documents/rails $ find . -size +1M
./.git/objects/pack/pack-0d3c6964a83a501da27c9eb5ee55e077fa112064.pack
./.git/objects/pack/pack-0d3c6964a83a501da27c9eb5ee55e077fa112064.idx
./activesupport/lib/active_support/values/unicode_tables.dat
./guides/assets/images/getting_started/rails_welcome.png

Do something with those files you found

# '{}' is a placeholder and gets replaced for each file 'find' has found
# Notice how we end the command: '\;'
~/Documents/rails $ find . -regextype posix-extended -regex '.*/associations/(has|belong).*\.rb$' \
> -exec rm -f {} \;

~/Documents/rails $ git status
# On branch master
# Changes not staged for commit:
#   (use "git add/rm <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#	deleted:    activerecord/lib/active_record/associations/belongs_to_association.rb
#	deleted:    activerecord/lib/active_record/associations/belongs_to_polymorphic_association.rb
#	deleted:    activerecord/lib/active_record/associations/has_many_association.rb
#	deleted:    activerecord/lib/active_record/associations/has_many_through_association.rb
#	deleted:    activerecord/lib/active_record/associations/has_one_association.rb
#	deleted:    activerecord/lib/active_record/associations/has_one_through_association.rb
#	deleted:    activerecord/test/cases/associations/belongs_to_associations_test.rb
#	deleted:    activerecord/test/cases/associations/has_and_belongs_to_many_associations_test.rb
#	deleted:    activerecord/test/cases/associations/has_many_associations_test.rb
#	deleted:    activerecord/test/cases/associations/has_many_through_associations_test.rb
#	deleted:    activerecord/test/cases/associations/has_one_associations_test.rb
#	deleted:    activerecord/test/cases/associations/has_one_through_associations_test.rb
#
no changes added to commit (use "git add" and/or "git commit -a")

Some grep to get exactly what you want

The most simple use case

# From 'active_support.rb', take those lines matching 'require'
~/Documents/rails $ grep require activesupport/lib/active_support.rb
require "securerandom"
require "active_support/dependencies/autoload"
require "active_support/version"
require "active_support/logger"
require "active_support/lazy_load_hooks"
require "active_support/core_ext/date_and_time/compatibility"


# '-E' to use extended regex syntax
~/Documents/rails $ grep -E 'require "[^/]+"$' activesupport/lib/active_support.rb
require "securerandom"

Now, show me lines NOT matching the pattern

~/Documents/rails $ grep -vE 'require "[^/]+"$' activesupport/lib/active_support.rb
# more output above...
  def self.halt_callback_chains_on_return_false
    Callbacks.halt_and_display_warning_on_return_false
  end

  def self.halt_callback_chains_on_return_false=(value)
    Callbacks.halt_and_display_warning_on_return_false = value
  end

  def self.to_time_preserves_timezone
    DateAndTime::Compatibility.preserve_timezone
  end

  def self.to_time_preserves_timezone=(value)
    DateAndTime::Compatibility.preserve_timezone = value
  end
end

autoload :I18n, "active_support/i18n"

Let's match multiple expressions

~/Documents/rails $ grep -E -e '^require' -e 'activesupport' activesupport/lib/active_support.rb
require "securerandom"
require "active_support/dependencies/autoload"
require "active_support/version"
require "active_support/logger"
require "active_support/lazy_load_hooks"
require "active_support/core_ext/date_and_time/compatibility"

# Something looks odd here... let's try to fix that
# '-i' for case-insensitive matching
~/Documents/rails $ grep -iE -e '^require' -e 'activesupport' activesupport/lib/active_support.rb
require "securerandom"
require "active_support/dependencies/autoload"
require "active_support/version"
require "active_support/logger"
require "active_support/lazy_load_hooks"
require "active_support/core_ext/date_and_time/compatibility"
module ActiveSupport
  extend ActiveSupport::Autoload


# Now that's what I'm talking about! :)

Can I query multiple files?... Oh, you bet!

~/Documents/rails $ cd activerecord/lib/active_record/associations

lib/active_record/associations $ grep -E '^\s*class ' *.rb
alias_tracker.rb:    class AliasTracker # :nodoc:
association.rb:    class Association #:nodoc:
association_scope.rb:    class AssociationScope #:nodoc:
association_scope.rb:        class ReflectionProxy < SimpleDelegator # :nodoc:
belongs_to_association.rb:    class BelongsToAssociation < SingularAssociation #:nodoc:
belongs_to_polymorphic_association.rb:    class BelongsToPolymorphicAssociation < BelongsToAssociation #:nodoc:
collection_association.rb:    class CollectionAssociation < Association #:nodoc:
collection_proxy.rb:    class CollectionProxy < Relation
has_many_association.rb:    class HasManyAssociation < CollectionAssociation #:nodoc:
has_many_through_association.rb:    class HasManyThroughAssociation < HasManyAssociation #:nodoc:
has_one_association.rb:    class HasOneAssociation < SingularAssociation #:nodoc:
has_one_through_association.rb:    class HasOneThroughAssociation < HasOneAssociation #:nodoc:
join_dependency.rb:    class JoinDependency # :nodoc:
join_dependency.rb:      class Aliases # :nodoc:
join_dependency.rb:        class Table < Struct.new(:node, :columns) # :nodoc:
preloader.rb:    class Preloader #:nodoc:
preloader.rb:        class AlreadyLoaded # :nodoc:
preloader.rb:        class NullPreloader # :nodoc:
singular_association.rb:    class SingularAssociation < Association #:nodoc:

Show me some line numbers!

~/Documents/rails $ cd activerecord/lib/active_record/associations

lib/active_record/associations $ grep -nE '^\s*class ' *.rb
alias_tracker.rb:6:    class AliasTracker # :nodoc:
association.rb:18:    class Association #:nodoc:
association_scope.rb:3:    class AssociationScope #:nodoc:
association_scope.rb:96:        class ReflectionProxy < SimpleDelegator # :nodoc:
belongs_to_association.rb:4:    class BelongsToAssociation < SingularAssociation #:nodoc:
belongs_to_polymorphic_association.rb:4:    class BelongsToPolymorphicAssociation < BelongsToAssociation #:nodoc:
collection_association.rb:26:    class CollectionAssociation < Association #:nodoc:
collection_proxy.rb:30:    class CollectionProxy < Relation
has_many_association.rb:8:    class HasManyAssociation < CollectionAssociation #:nodoc:
has_many_through_association.rb:4:    class HasManyThroughAssociation < HasManyAssociation #:nodoc:
has_one_association.rb:4:    class HasOneAssociation < SingularAssociation #:nodoc:
has_one_through_association.rb:4:    class HasOneThroughAssociation < HasOneAssociation #:nodoc:
join_dependency.rb:3:    class JoinDependency # :nodoc:
join_dependency.rb:7:      class Aliases # :nodoc:
join_dependency.rb:35:        class Table < Struct.new(:node, :columns) # :nodoc:
preloader.rb:41:    class Preloader #:nodoc:
preloader.rb:168:        class AlreadyLoaded # :nodoc:
preloader.rb:183:        class NullPreloader # :nodoc:
singular_association.rb:3:    class SingularAssociation < Association #:nodoc:

Well, you know what? I only need to know which files have classes defined... ¯\_(ツ)_/¯

~/Documents/rails $ cd activerecord/lib/active_record/associations

lib/active_record/associations $ grep -lE '^\s*class ' *.rb
alias_tracker.rb
association.rb
association_scope.rb
belongs_to_association.rb
belongs_to_polymorphic_association.rb
collection_association.rb
collection_proxy.rb
has_many_association.rb
has_many_through_association.rb
has_one_association.rb
has_one_through_association.rb
join_dependency.rb
preloader.rb
singular_association.rb

What about showing how many matches are there?

~/Documents/rails $ cd activerecord/lib/active_record/associations

lib/active_record/associations $ grep -cE '^\s*class ' *.rb
alias_tracker.rb:1
association.rb:1
association_scope.rb:2
belongs_to_association.rb:1
belongs_to_polymorphic_association.rb:1
collection_association.rb:1
collection_proxy.rb:1
foreign_association.rb:0
has_many_association.rb:1
has_many_through_association.rb:1
has_one_association.rb:1
has_one_through_association.rb:1
join_dependency.rb:3
preloader.rb:3
singular_association.rb:1
through_association.rb:0

I probably won't remember (or know) '\bclass\b'... How can I match 'class' only as a whole word?

~/Documents/rails $ cd activerecord/lib/active_record/associations

lib/active_record/associations $ grep -wE class *.rb
alias_tracker.rb:    class AliasTracker # :nodoc:
association.rb:    # This is the root class of all associations ('+ Foo' signifies an included module Foo):
association.rb:    class Association #:nodoc:
association.rb:      # Returns the name of the table of the associated class:
association.rb:      # Returns the class of the target. belongs_to polymorphic overrides this to look at the
association.rb:        @reflection = @owner.class._reflect_on_association(reflection_name)
association.rb:              attributes[reflection.type] = owner.class.base_class.name
association.rb:        # the kind of the class of the associated objects. Meant to be used as
association.rb:                "got #{record.inspect} which is an instance of #{record.class}(##{record.class.object_id})"
association.rb:        # the association in the specific class of the record.
association_scope.rb:    class AssociationScope #:nodoc:
# more output below...

Joining forces... Getting Things Real

Let's remember the "simple" task we've been given previously:

Help me find out how many SLOC have we got in total... but wait, I'd like to see only Ruby code... Throw anything else away. How about I come back in 10 minutes?

You're now able to do it, right?

Let's figure it out!

Only Ruby code; no tests: find . -name '*.rb' -not -path '*/test/*'

Drop comments: grep -vE -e '^\s*#' <input stream(1)>

Drop blanks: grep -vE '^\s*$' <input stream>

Count lines (2): wc -l <input stream>

Putting everything together...

~/Documents/rails $ find . -name '*.rb' -not -path '*/test/*' | grep -vE -e '^\s*#' -e '^\s*$' | wc -l
    1030

1030 SLOC? Really?

That doesn't seem right!

Let's see what we're actually doing

~/Documents/rails $ find . -name '*.rb' -not -path '*/test/*'
# more output above...
./railties/lib/rails/tasks.rb
./railties/lib/rails/test_help.rb
./railties/lib/rails/test_unit/line_filtering.rb
./railties/lib/rails/test_unit/minitest_plugin.rb
./railties/lib/rails/test_unit/railtie.rb
./railties/lib/rails/test_unit/reporter.rb
./railties/lib/rails/test_unit/test_requirer.rb
./railties/lib/rails/version.rb
./railties/lib/rails/welcome_controller.rb
./railties/lib/rails.rb
./tasks/release.rb
./tools/test.rb
./version.rb

That's what we're feeding grep with... Not quite what we want. Can you tell why?

How to fix it?

xargs

Let's try....

# '-print0' and '-0' are needed when piping 'find' through 'xargs'
~/Documents/rails $ find . -name '*.rb' -not -path '*/test/*' -print0 | \
> xargs -0 grep -vE -e '^\s*#' -e '^\s*$' | wc -l
    65616

I don't know about you, but that seems a lot more reasonable to me.

The way xargs works is pretty simple (at least for the use case we've just seen):

It executes a command once for each line it gets as input. In the previous example, it takes each file name provided by find, and greps over each one of them.

Pretty much the same thing -exec does when used as option in find.

Nothing better than an example

~/Documents/rails $ find . -name '*.md' -print0 | xargs -0 rm

~/Documents/rails $ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	deleted:    .github/issue_template.md
	deleted:    .github/pull_request_template.md
	deleted:    CODE_OF_CONDUCT.md
	deleted:    CONTRIBUTING.md
	deleted:    README.md
	deleted:    RELEASING_RAILS.md
	deleted:    actioncable/CHANGELOG.md
	deleted:    actioncable/README.md
        # more output below...

Some Links

Introduction to the CLI: Finding Your Stuff

By Santiago Álvarez Rodríguez

Introduction to the CLI: Finding Your Stuff

find, grep and the tools you can't (almost) live without if your day-to-day is the shell.

  • 998